pmxbot IRC Log Viewer

[00:28:02] <zamnuts> need some help: uploading a ~900mb file via GridFS (node driver), the node process is maxed out on the core, i see 3636 of 3703 chunks in the fs.chunks, but am not seeing the file yet in fs.file. what gives?

[00:28:15] <zamnuts> s/via/to/

[01:12:54] <zamnuts> well for the record, this was happening when running node in debug mode, running normally it still takes some time to process (a few extra seconds, vs 15 minutes w/ still no end in sight)

[02:31:59] <hfp> GothAlice: I figured it out, thanks.

[02:52:00] <nobody18188181> since mongodb isn't acid compliant; how reliable is mongodb for working with financial transactions, and direct customer data in the event of a catastrophic crash?

[02:53:22] <joannac> nobody18188181: w:1 (or higher), journalling, replica sets, two phase commit

[02:53:56] <nobody18188181> what is w:1?

[02:54:03] <joannac> write concern

[02:54:26] <nobody18188181> i see; with journaling is that similar to the query log for mysql?

[02:56:45] <nobody18188181> lets say I have the perfect mongodb cluster built across the world; someone does a $1M transaction and everything goes wrong at the same time (meteors perfectly hit datacenters, etc) what is the worst that could go wrong with the data assuming a complete cluster rebuild of mongodb?

[03:05:47] <Boomtime> nobody18188181: you only need to plan for disaster scenarios in which civilization survives, can you restate the question with this outcome?

[03:07:32] <nobody18188181> i'm thinking about using mongodb instead of mysql/galera for an ecommerce platform; what i've found is that this is "risky". I found a few articles of bitcoin issues with mongodb, so; I figured I'd ask here in the IRC to see what the thoughts are.

[03:08:28] <Boomtime> every database is risky if you don't deploy with any redundancy

[03:08:39] <Boomtime> mongodb uses replica-sets for redundancy

[03:09:04] <Boomtime> write-concern is the method at the application level of gaining assurance of the resilency of your writes

[03:09:21] <nobody18188181> yea, but what i'm worried about is mid transaction a crash that results in money being lost b/c it wasn't saved properly

[03:09:46] <Boomtime> that applies to every database too

[03:09:49] <nobody18188181> i would ofc deploy with replica sets, etc

[03:09:58] <Boomtime> you need to perform a two-stage commit

[03:10:35] <Boomtime> http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/

[03:12:28] <Boomtime> the specific (simple) example in that tutorial is a banking transaction

[03:17:36] <nobody18188181> it seems client side though?

[03:19:27] <Boomtime> yes, mongodb does not have multi-operation transactions - only single operations are atomic

[03:34:08] <scjoe1> If you use a write concern of majority and j:1 which means that those servers have also performed a commit to disk, you would need a big meteor to loose data. This assumes your majority is across data centers, disc arrays and so forth. I would trust MongoDB just as much as Sql Server if set up correctly.

[03:35:10] <scjoe1> This is on top of using the pattern of the two phase commit shown by Boomtime

[04:10:44] <JakeTheAfroPedob> hi there guys anyone here?

[04:11:07] <JakeTheAfroPedob> need a little help regarding mongodb

[04:11:43] <Boomtime> hi

[04:11:53] <JakeTheAfroPedob> is there any website other than mongodb itself where i can get a quick tutorial on it?i have been trying to transfer the contents of a .txt file into mongo db but failed to do so

[04:11:55] <Boomtime> ask away, somebody might be able to help you

[04:12:59] <Boomtime> we'll need to know a lot more than that - are you familiar with JSON?

[04:13:03] <JakeTheAfroPedob> i ran a python code .py but i cant fidn the command to access the file

[04:13:16] <JakeTheAfroPedob> yes am using JSON file in the text file

[04:13:56] <JakeTheAfroPedob> i have tried one or two codes playing around with it so that the contents of the .txt file in JSON format can be copied into mongo db

[04:14:01] <Boomtime> accessing a file on your hard-disk is not a mongodb problem - you need to figure out how to read a file yourself, you can then import the contents into mongodb using the pymongo driver

[04:14:06] <JakeTheAfroPedob> but i cant seem to access it on my mongo db when i log in

[04:14:47] <Boomtime> you can also use mongoimport if you like

[04:14:57] <JakeTheAfroPedob> i have tried using db.collection.find

[04:15:16] <Boomtime> have you imported the data?

[04:15:20] <JakeTheAfroPedob> i saw the filename in the db command line

[04:15:21] <JakeTheAfroPedob> yes

[04:15:25] <JakeTheAfroPedob> but i cant access it

[04:15:31] <JakeTheAfroPedob> i cant find the command line for it

[04:15:33] <Boomtime> it sounds like you think that mongodb will just magically know about your json file somehow

[04:15:53] <Boomtime> that isn't how databases work, like, no database in the world works like that

[04:15:58] <JakeTheAfroPedob> one min

[04:16:11] <Boomtime> you need to import your data using the tools provided, "mongoimport" is one such

[04:16:35] <Boomtime> in python, you can use the pymongo driver to insert data into the database

[04:16:57] <JakeTheAfroPedob> yes i ran a py code doing that exactly

[04:17:18] <Boomtime> "a py code".. do you understand what that code did?

[04:17:19] <JakeTheAfroPedob> from pymongo import MongoClient ....etc

[04:17:32] <joannac> maybe pastebin the python code?

[04:17:41] <JakeTheAfroPedob> ok one min

[04:18:57] <JakeTheAfroPedob> http://pastebin.com/3g3QVbSa

[04:19:27] <JakeTheAfroPedob> i dont actually what contents does

[04:19:30] <JakeTheAfroPedob> am new in mongodb

[04:19:51] <JakeTheAfroPedob> *understand what

[04:20:03] <joannac> um

[04:20:09] <Boomtime> ok, so from your code it doesn't really matter what datamined.txt contains because you have just set it as a field value

[04:20:43] <JakeTheAfroPedob> @Boomtime that is the .txt file where the json stuffs are

[04:20:53] <JakeTheAfroPedob> i wanna import those into mongodb

[04:21:04] <Boomtime> sure, but "json" doesn't matter, you read it as text so it will appear as a big text blob

[04:21:15] <JakeTheAfroPedob> yeah true.

[04:21:16] <Boomtime> how big is that file? in KB

[04:21:37] <JakeTheAfroPedob> 25kb

[04:21:44] <JakeTheAfroPedob> approx

[04:22:08] <Boomtime> ok, so that should be fine, i don't know python but insert often returns the result of the op in other languages

[04:22:28] <Boomtime> you should check the return value from insert

[04:23:18] <JakeTheAfroPedob> well what i inted to do now is to check whether the contents in the .txt filewas actually copied into mongodb. Prob is i cant find the right code for it

[04:23:20] <Boomtime> do you know how to use the mongo shell?

[04:23:50] <JakeTheAfroPedob> am using that. you are reffering to command lines yeah?

[04:23:57] <Boomtime> yes

[04:24:04] <Boomtime> db.files.find()

[04:24:06] <JakeTheAfroPedob> yes. using that atm.

[04:24:22] <Boomtime> oops, "test_database_2"

[04:24:24] <Boomtime> use test_database_2

[04:24:29] <Boomtime> ok:

[04:24:34] <Boomtime> > use test_database_2

[04:24:39] <Boomtime> > db.files.find()

[04:25:18] <Boomtime> or as a single command:

[04:25:23] <Boomtime> db.getSiblingDB("test_database_2").files.find()

[04:55:16] <JakeTheAfroPedob> @Boomtime sorry for the delay

[04:55:21] <JakeTheAfroPedob> @Boomtime i got it already

[04:55:28] <JakeTheAfroPedob> @Boomtime thanks a bunch

[04:55:46] <JakeTheAfroPedob> for the next ques i have is how do i sort them into proper queries?

[04:56:22] <Boomtime> wut?

[04:56:31] <Boomtime> sort what?

[04:56:48] <Boomtime> you have a single document at the moment, step one would be to insert more documents

[05:00:18] <JakeTheAfroPedob> well what i meant was how do i sort the content in the document so i can query them later?

[05:01:39] <JakeTheAfroPedob> and is there a place i can get tutorials on mongodb? like getting queries out later on and etc

[05:05:07] <JakeTheAfroPedob> @Boomtime

[05:08:28] <joannac> you separate your single document into multiple documents

[05:12:34] <JakeTheAfroPedob> @joannac i have an example. For example the text that i have are json files(.txt files) retrieved from the net. To put an example a search result from Google is the json file

[05:13:21] <JakeTheAfroPedob> in that single json file there are multiple google search result of a single query. Is it possible if i seprate them into their respective url's titles and so forth?

[05:20:41] <joannac> yes

[05:26:57] <JakeTheAfroPedob> @jonnac how can i do it?

[05:27:25] <JakeTheAfroPedob> as in how can i do it after transferring them into mongodb since mongodb reads them as a collection of data

[05:27:26] <joannac> I don't understand the question. How did you get the text file in the first place?

[05:27:38] <joannac> You can't after importing them into mongodb

[05:27:43] <JakeTheAfroPedob> i retrieved it from the net using API using python as the prog language

[05:27:48] <joannac> you have to do it pre-importing them from mongodb

[05:28:01] <joannac> pre-importing them to mongodb*

[05:28:32] <JakeTheAfroPedob> so it has to be done in the same py code that i will be using to transfer the .txt into mongodb?

[05:28:35] <joannac> figure out what the documents should look like, and then fix your text file and/or import procedure as necessary

[05:28:38] <joannac> yes

[07:08:03] <zw> Hi. I'm new to mongodb. I'm looking for a way to get all collections, but "show collections" on mongo shell is nog giving me anything ?

[07:19:59] <joannac> zw that's the way to get all collections for the active database. is that what you want?

[07:21:08] <zw> joannac: yes, nevermind, I was looking at the wrong db

[08:08:30] <crocket> Ohai

[08:08:46] <crocket> How do I store and retrieve files in GridFS with mongoose?

[08:16:23] <crocket> Yo???

[09:20:23] <rskvazh> hi mongodb lovers!

[09:21:42] <lqez> welcome :)

[09:21:58] <Zelest> whalecum

[09:23:09] <rskvazh> i have a problem with updates to sharded big collection, could you advice me?

[09:27:54] <lqez> feel free to describe your problem-

[09:28:18] <lqez> someone may help you, or maybe not :)

[09:32:27] <crocket> mongol

[09:34:19] <rskvazh> i have collection sharded on {owner: "hashed"}, and many updates with q: {_id: a, owner: b}. There is owner field because of "shard hinting". But sometimes targeted mongod use incorrect index (very high nscanned) - owner, instead of _id for a minute or two. I try plancachefilter for a day - and seems all ok. But plancachefilter not persistent... How I can fix this problem? I know about https://jira.mongodb.org/browse/SERVER-1599. There is a bug in mongod?

[11:05:56] <zw> Hi

[11:06:20] <zw> Is there a way to calculate how long a binary restore of 500gb will take ?

[11:08:16] <Moogly2012> I couldn't personally say, depends on numerous factors though, hardware being used, and method used

[11:09:06] <zw> Moogly2012: VMware / mongodump

[11:10:22] <Moogly2012> still down to the hardware, an SSD vs. a HDD, and disk speeds

[12:14:10] <kas84> hi guys

[12:14:30] <kas84> can I not use Object ID in mongo objects?

[12:14:36] <kas84> or use a custom one

[12:14:50] <cheeser> what?

[12:22:51] <kas84> I want to save space, so I want to use a custom objectId or not use it at all

[12:22:54] <kas84> if possible

[12:23:43] <kali> kas84: ObjectId is just a type. _id does not have to be an ObjectId, you can use about anything you want

[12:24:23] <kali> kas84: you can also use ObjectId in other places (if you want id on subdocuments, for instance)

[12:24:42] <kas84> does it need to be unique per db or per collection?

[12:24:56] <kali> per collection

[12:25:24] <kas84> awesome!

[14:17:57] <rapsli> is it possible to fetch a record and add the previous and the next doc to that document?

[14:21:10] <Ravenheart> how can i force mongo to absolutely ignore everything to do with datetime

[14:21:39] <Ravenheart> its getting ridiculous the way it modifies dates

[14:21:44] <Ravenheart> i want the pure raw date

[14:21:46] <kali> Ravenheart: what ?

[14:21:50] <Ravenheart> no timezone applied no nothing

[14:21:59] <Ravenheart> mongo is converting dates willy nilly

[14:22:06] <rapsli> Ravenheart just store a timestamp

[14:22:07] <Ravenheart> adding time zones removing

[14:22:08] <kali> Ravenheart: date are stored as utc timestamp, no TZ

[14:22:15] <Ravenheart> i don't want to just store timestamps

[14:22:24] <kali> Ravenheart: it's your driver and language combination that set a TZ

[14:22:37] <Ravenheart> they've been disabled

[14:23:41] <kali> rapsli: you can't do this directly. you need a client to do it from outside

[14:24:10] <rapsli> :S not even with aggregation?

[14:24:58] <cheeser> rapsli: nope

[14:25:35] <rapsli> that's too bad

[14:28:15] <rapsli> but that's probably the reason for not getting any answers on stackoverflow ;)

[14:28:20] <rapsli> http://stackoverflow.com/questions/26861070/mongodb-aggregation-add-next-document-a-meteor-app

[14:53:29] <kas84> guys, is there any doc for knowing the size of an data type in the bson type? like doubles

[14:53:47] <amcgregor> kas84: http://bsonspec.org/spec.html

[14:54:07] <amcgregor> Waitaminute… I have the wrong nick!

[14:54:33] <GothAlice> That's better!

[14:54:39] <Gian2014> hello

[14:55:13] <Gian2014> GothAlice: do you recommend any ORM for Python for MongoDB? :)

[14:55:41] <GothAlice> Gian2014: I use MongoEngine quite heavily, even sending the occasional patch back upstream. :)

[14:56:07] <Gian2014> thanks

[14:59:22] <Gian2014> successfully installed, woop

[15:06:44] <wsmoak> can someone help with json document syntax? I’ve been staring at it too long. https://gist.github.com/wsmoak/31b79efedbbdfe2cdd61

[15:08:38] <GothAlice> As it stands, what you have is a Ruby syntax error. Sadly I don't Ruby, so #ruby might be a better place to ask.

[15:08:41] <GothAlice> wsmoak: ^

[15:11:23] <wsmoak> GothAlice: noted. if you squint at it, does the general idea of that inner array for “data” look like it should work ?

[15:12:19] <GothAlice> In an abstract syntax sense, yeah, nesting those structures *should* work.

[15:13:05] <wsmoak> thanks

[15:31:41] <nawadanp> GothAlice, Hi ! FYI, the gentoo bug (segfault on mongo shell and mongod) have been resolved : https://bugs.gentoo.org/show_bug.cgi?id=526114

[15:37:59] <rskvazh> Hi! Could you help me? https://groups.google.com/forum/#!topic/mongodb-user/NwKHZhGqjqc

[15:39:56] <wsmoak> (ftr that code was absolutely fine. upgraded editor stopped doing “save when focus is lost” :/ )

[15:41:47] <GothAlice> nawadanp: Thanks for the heads up!

[15:42:27] <GothAlice> nawadanp: Aaaaaah, GRSEC annotations.

[15:44:32] <GothAlice> wsmoak: I tend to not rely on save-on-defocus due to the fact that my development environments auto-reload when files are modified, and not all modifications I wish to save immediately. (Or will allow the reload mechanism to even survive…)

[15:45:06] <GothAlice> I.e. accidentally saving an unintentional syntax error nukes dev from orbit.

[17:45:25] <wsmoak> now that I’ve gotten that inner array added… I’m having trouble appending to it. :/

[17:45:33] <wsmoak> trying to apply http://docs.mongodb.org/v2.2/applications/update/#add-an-element-to-an-array

[17:46:01] <wsmoak> but it’s not quite the same … https://gist.github.com/wsmoak/27e92ea2e5b5bd313155

[17:46:36] <wsmoak> I need to match on the weights.date to get into the right element of weights, and then append to data

[18:00:53] <wsmoak> http://docs.mongodb.org/manual/reference/operator/update/positional/#update-documents-in-an-array looks promising

[18:07:03] <wsmoak> \o/

[18:35:03] <ehershey> oops

[19:41:26] <J-Gonzalez> I'm having an interesting issue with aggregation framework that I can't seem to figure out.

[19:46:59] <J-Gonzalez> https://gist.github.com/J-Gonzalez/675bddacbf6f546e2f5c

[19:47:38] <J-Gonzalez> I'm aggregating orders per month and can add the totals and averages easily

[19:47:49] <J-Gonzalez> however, I'm now trying to add the number of items purchased

[19:48:01] <J-Gonzalez> which is an array within the order object

[19:48:36] <J-Gonzalez> If I unwind based on the array, wouldn't I get double (or more) the amount for order totals etc?

[19:48:53] <J-Gonzalez> and is there a way I can do a sum on the arrays length?

[19:49:36] <J-Gonzalez> Note if you see gist: I can use either the number of items in the user_ticket_ref array (so user_ticket_ref.length) or the qty inside the ticket_items array - both values should be the same

[19:49:45] <J-Gonzalez> just I don't really know how I can aggregate this correctly

[19:57:06] <J-Gonzalez> hmm - unwinding DOESN'T give me double for totals, but it does give me slightly different numbers

[20:19:17] <J-Gonzalez> when I don't expect it to

[20:19:35] <girb> is it must to have arbiter in replication ?

[20:20:06] <girb> I need only 1 primary and 1 always secondary .. please help new to mongodb

[20:27:10] <krion> girb yes

[20:27:23] <krion> put the arbiter anywhere else

[20:39:35] <rskvazh> wow! mongodb 2.8.0.rc0 released! WiredTiger interesting thing!

[20:41:10] <Derick> :-)

[20:42:41] <ndb> I have 2 kinds of doc schema for the same collection, the registers are 1 million, I search for one by key like action: CONFERENCE, even for small amount 1k registers find the time is pretty low (4s)

[20:43:07] <ndb> is there some way to makes things fast? Like indexes by 'action' key?

[20:43:20] <ndb> or a collection for each kind of schema

[20:46:28] <wsmoak> what are the two different documents? GothAlice advised me to put different “things” in different collections. (no idea how much faster that would be though)

[20:47:20] <ndb> i have action: EVENTS and a group of events are summarized on this last document

[20:47:48] <ndb> like a resume denormalization, normally we just get this last "summary" and use it

[20:48:05] <ndb> even than we have tons of events, and it hm makes things a little slow on a find

[20:54:08] <rskvazh> Derick, hello! in mongo-php driver how I can do MongoUpdateBatch with upsert option? add(['q' => ..., 'u' => ..., 'upsert' => true])?

[20:55:29] <rskvazh> sorry, find in manual. RTFM :( http://php.net/manual/en/mongowritebatch.add.php

[20:57:01] <Derick> rskvazh: glad you found it :-)

[20:57:22] <rskvazh> Derick thank you for your work!

[21:01:04] <bazineta> kali Background flush on the GP SSD is indeed a thing of wonder and beauty

[21:18:05] <kali> bazineta: great. good to know

[21:19:30] <bazineta> kali same workload on 1K PIOPS, 1.5 to 2 seconds flush. On 500GB SSD, 1.5k to 3k burst, 400ms flush. With flush at the 60 second default, the volume is always in burst credit surplus. Half the price per month.

[21:23:53] <kali> bazineta: that's still weird. maybe we should expect a refactoring of the pricing grid in the weeks to come

[21:24:10] <kali> bazineta: but for now, well...

[21:24:36] <bazineta> kali I assume it's a technology refresh level thing, much like the c1 to c3 pricing went -- encourgaing you to move.

[21:24:51] <kali> yeah, probably

[21:26:30] <kali> bazineta: thanks for the followup

[21:39:54] <speaker1234> I'm putting a history of activity as part of a record. The sequence is changing state of the record and adding a history entry. Is there any way I can copy original records state into the history line in a find and modify command?

[21:40:52] <GothAlice> speaker1234: No; pivot your actions a bit. Instead of wanting to move the old into the history when updating state, add the updated state to the history each time you change it. (That way the "current" value need not be moved.)

[21:43:17] <speaker1234> in other words, think of the history asa record of what changed you to that state (i.e. trigger), not what state you were in before the trigger

[21:43:33] <quuxman> The way I would accomplish this on the first pass, is to have a history attribute of the record. The record is updated, you take that updated state (minus the history attribute) and prepend or append it to the history attr

[21:43:36] <GothAlice> I.e. {state: 'active', history: [{state: 'active', when: utcdate(…)}]} when changing state to "inactive" becomes: {state: 'inactive', history: [{state: 'active', when: utcdate(…)}, {state: 'inactive', when: utcdate(…)}]}

[21:43:39] <quuxman> so everything is copied with each state change

[21:45:10] <GothAlice> Accomplished via: db.somecollection.update({_id: ObjectId('…'), {state: 'inactive', $push: {history: {state: 'inactive', when: utcdate(…)}}})

[21:45:22] <GothAlice> Er, something like that p-code, at least. ^_^;

[21:46:09] <speaker1234> that's right if I have the_ID field, I don't need to do a find. I can just modify

[21:46:20] <GothAlice> Yup!

[21:46:44] <GothAlice> (And changes, including appending to the history, become single atomic operations.)

[21:46:59] <speaker1234> and since the ID field is held by only one worker be, I don't have to worry about race conditions

[21:47:09] <GothAlice> :)

[21:48:03] <GothAlice> (There can't be race conditions if state transitions are fixed. I.e. to go to "inactive" it must first be "active" — you can include that in your query to make it into a update-if-not-modified, and the operations are atomic…)

[21:48:12] <quuxman> even if multiple workers were updating the history, simply adding a timestamp to when the change was made would eliminate conflict. Things may get inserted out of orde, but your app layer could resort it

[21:48:48] <speaker1234> also passing over the _ID, state, and the history field from the worker be to the work manager process is not going to be that big a burden since the history is only going to be about 5 to 7 entries long

[21:49:26] <GothAlice> Uhm… don't pass the full history.

[21:49:33] <GothAlice> For the love of all that is sacred, don't do that. ;)

[21:49:50] <speaker1234> believe me, nothing is sacred :-)

[21:50:01] <GothAlice> Does your worker actually need access to the history?

[21:50:06] <quuxman> yeah why not pass the whole history? if it's only 7 items long?

[21:50:23] <GothAlice> (My worker bees literally only get the _id of the job.)

[21:50:26] <speaker1234> what I started to say was that the worker be only generates the state change and what triggered the state change (i.e. the history entry)

[21:50:28] <GothAlice> *get passed

[21:51:39] <speaker1234> so when I pass information back to the hive, I can do a find an update to accomplish the same thing about adding to the history

[21:51:49] <speaker1234> as long as I pass the current state in addition to the new state

[21:52:18] <GothAlice> _id, old state (for race condition prevention), and new state, yes. Nothing else needs to be passed around in order to update state and add to the history.

[21:52:35] <speaker1234> I also need to pass the reason

[21:53:25] <speaker1234> so the document argument would look something like:

[21:53:30] <GothAlice> That can be a requirement. All of my states have very well-defined purposes and state transitions are explicit (you can't just hop around the states), so "reason" is inherent to the state in my case. (i.e. "invoiced" means someone actually invoiced)

[21:55:05] <speaker1234> { '$set': {'Delivery_state': data['New_state']}, '$push': {'History': 'something clever'}}

[21:55:48] <speaker1234> unfortunately for me, multiple reasons can cause the same state change. This is not a problem because the recovery process is the same for all of the failures

[21:57:47] <wblu> Will an Ubuntu package be uploaded to Mongo's apt repo for 2.8.0-rc0?

[22:02:35] <GothAlice> speaker1234: I hope the object you push to History is an embedded document, not a string, but yeah, that's the approach.

[22:03:28] <GothAlice> speaker1234: As a note, because MongoDB doesn't enforce consistent schemas between documents in a collection, it can't optimize away the names of the fields. This means that the fields take up room in the document the same way as the data… long field names will add up substantially over time. How many records did you say you have?

[22:04:11] <speaker1234> about 60 to 120 mil

[22:04:19] <speaker1234> 2 mil per day

[22:05:38] <GothAlice> 120000000 * ( len("Delivery_state") + 6 ) = ~240 MB (base 1000, not MiB base 1024) of storage just for the key "Delivery_state".

[22:05:58] <GothAlice> 2000000 * ( len("Delivery_state") + 6 ) = ~40MB per day growth just to store the name of that field.

[22:07:00] <GothAlice> Assuming two dozen fields total the storage of just keys would be 5.76 GB, with a growth of ~1 GB/day… just to store keys!

[22:07:00] <speaker1234> okay so if I went to single character field identifiers, it would save a lot right?

[22:07:20] <speaker1234> I'm only half serious :-)

[22:07:30] <GothAlice> It would. I find this to be the biggest reason to use an ODM.

[22:07:34] <GothAlice> I'm completely serious.

[22:07:41] <GothAlice> (And it really *does* add up that way!)

[22:07:48] <rskvazh> GothAlice, +

[22:08:56] <speaker1234> that's going to have to wait for rev two. I lost way too much in the past couple of weeks to Windows 8 hosing Me over moments

[22:09:02] <GothAlice> speaker1234: https://gist.github.com/amcgregor/1ca13e5a74b2ac318017 here is an example of an aggregate query (and matching map/reduce) against some of my data. Notice the single-character keys everywhere.

[22:10:33] <GothAlice> Indeed; single-character-izing everything wasn't my first step. ;) Re-naming fields is a bit of a PITA, though, esp. with large datasets, so it's something to consider up-front. (Most ODMs allow you to change the in-database name used for the field, i.e. name = StringField(db_field='n')

[22:10:56] <J-Gonzalez> This aggregate is killing me: https://gist.github.com/J-Gonzalez/675bddacbf6f546e2f5c - can't figure out how to sum values within an array while still keeping 'parent' grouped values accurate

[22:12:12] <speaker1234> I will cut the keys down to two or three letter acronyms look into and ODM later

[22:12:45] <speaker1234> fortunately, the data does expire and I always have the option of restarting with the new instance in the old one can expire eventually

[22:13:24] <speaker1234> this data is like beer, you don't buy it, you only rent it

[22:13:30] <GothAlice> Nice! Not something most datasets can benefit from. :) (If you weren't already, I'd recommend using an expiring index to let MongoDB clean up your data for you.)

[22:14:00] <speaker1234> that's a rev 1.5. I've got to prove to customers that this works and then they will fund changes

[22:14:41] <GothAlice> speaker1234: http://docs.mongodb.org/manual/tutorial/expire-data/ It's a one-liner in most environments. ;)

[22:14:59] <GothAlice> I use this approach: http://docs.mongodb.org/manual/tutorial/expire-data/#expire-documents-at-a-certain-clock-time

[22:15:15] <speaker1234> does it come with its own rimshot? <ba-dump>

[22:15:47] <GothAlice> Alas, rimshot sold separately.

[22:16:50] <speaker1234> ok, back in the code base of death.

[22:17:19] <GothAlice> J-Gonzalez: You want $size.

[22:17:59] <GothAlice> I.e. totalTicketsSold: {$sum: {$size: '$payment'}}

[22:18:19] <GothAlice> Er, actually, scrap the $sum on that.

[22:18:52] <GothAlice> Or don't. Ugh. Reading is hard after a day filled with meetings. Yeah, you're grouping on time. You'll want the $sum there.

[22:19:53] <GothAlice> Also $ticket_items, not $payment.

[22:25:24] <J-Gonzalez> Thanks - GothAlice - I'll take a look at trying to get $size to work

[22:25:29] <J-Gonzalez> hadn't even thought of using $size

[22:25:53] <ndb> j

[22:25:54] <ndb> ::

[22:25:55] <ndb> zzc

[22:41:18] <GothAlice> Now that I'm home… I'm going to see how much overhead would be added by using full field names on my Stupidly Large™ dataset…

[22:58:05] <GothAlice> As it is, field name storage currently takes 168 bytes per asset on average. ~200 million assets w/ an average compressed asset size of ~125MB… I've got some big things in there. Using full field names would add about 200 bytes per asset on average—each field is still only one word. That'd be overhead to the tune of 73.6 GB or ~0.3% total storage vs. current 33.6 GB or ~0.1%.

[22:58:15] <GothAlice> So for my dataset this optimization is minor. ^_^;

[23:03:01] <GothAlice> speaker1234: ^

[23:05:26] <speaker1234> yes??

[23:06:18] <speaker1234> looks like the overhead really isn't that big an issue as long as we keep the names "reasonable"

[23:06:48] <speaker1234> by the way, I highly recommend frozen strawberries blended with a quarter to a half cup of red wine and a little bit of cinnamon or lemon

[23:06:52] <GothAlice> I'm storing large BLOBs in GridFS, which is rather atypical pattern of use. (I really should only be measuring against the metadata collection data storage…)

[23:07:11] <GothAlice> But it sounds delicious!

[23:07:30] <speaker1234> any alcoholic beverage or just red wine?

[23:08:03] <GothAlice> Red wine and _certain_ other alcohols. I have a really nasty (and almost immediate) adverse reaction to something in it. Sulphates, maybe?

[23:08:59] <speaker1234> it's hard to tell. Sometimes it is the tannins

[23:09:15] <speaker1234> I'm not supposed to have alcohol at all because my triglycerides are "easily stimulated"

[23:09:32] <speaker1234> I'm floating between 750 and 850

[23:09:59] <GothAlice> Well, in theory nobody should consume alcohol. It *is* a poison, kills more than murder, and racks up long-term medical costs second only to tobacco. ;)

[23:11:04] <speaker1234> if I can find something that tasted like red wine (i.e. good red wine) without the alcohol, then I would gladly leave the alcohol out of my life. But consider that I consume at most 2 cups of wine a month, I got low risk to the other issues

[23:11:40] <speaker1234> also, if I drink my usual tiny amounts of alcohol, I don't drive for at least three hours.

[23:12:21] <speaker1234> The level I drink wouldn't impair most people but my body is "special"

[23:12:32] <GothAlice> Yeah, that's not bad at all. I've inherited terrible internal organs, so I have a fair number of digestion-related issues. (Lactose intolerance, potato starch intolerance, and I'm the only member of my family who still has a gall bladder… so anything greasy/oily/fatty I have to be very careful about. Any of these and I spend nearly 24h feeling like I'm dying.)

[23:13:20] <speaker1234> good Lord. We are too genetic peas in a pod.

[23:13:25] <GothAlice> XD

[23:14:24] <GothAlice> Also fibromyalgia… I wouldn't give my worst enemy my best week, in terms of feeling like a productive member of the living.

[23:15:52] <speaker1234> choice of girlfriend

[23:16:26] <speaker1234> maybe I should say choice of potentially ex-girlfriend

[23:16:31] <GothAlice> XD

[23:18:33] <speaker1234> I prefer pain is inevitable, suffering is optional

[23:18:47] <GothAlice> Ooh… I'll add that one when I make the change.

[23:19:06] <speaker1234> there's a lot of pain in the world but we create a lot more suffering

[23:19:33] <speaker1234> by the way, that saying about pain being inevitable is a Buddhist concept

[23:19:37] <GothAlice> Added. :) https://gist.github.com/amcgregor/9f5c0e7b30b7fc042d81

[23:51:07] <GothAlice> Oh, ORDER BY RAND(), my old nemesis.

[23:52:51] <Boomtime> you should propose it

[23:52:57] <Boomtime> btw, why do you want it?

[23:53:07] <GothAlice> High-accuracy storage of numbers between 0 and 1.

[23:53:10] <GothAlice> (Specifically.)

[23:53:30] <Boomtime> you doing astronomy?

[23:57:31] <Boomtime> so this scares me -> "store sufficiently random numbers ... to pull records randomly"

[23:57:42] <Boomtime> how do you use a stored random number to pull a random record?

[23:58:18] <Boomtime> interestingly, this method is actually written in to the v2 edition of the o'reilley book on mongodb

[23:58:30] <Boomtime> at it is catastrophically biased

Log file Viewer

Help | Karma | Search:

#mongodb logs for Wednesday the 12th of November, 2014