[04:12:42] <topwobble> Can someone help me understand results from the mongodb profiler? I can't figure out what query is taking > 1000 ms. I used profiler but don't understand results: https://gist.github.com/objectiveSee/8725e3bf8eb0647b626d
[04:19:18] <topwobble> leave comment in gist if i sign off
[12:50:52] <User7> I want to know good usecases for mongodb
[12:55:50] <starefossen> User7: here they are: http://www.slideshare.net/mongodb/nosql-now-2012-mongodb-use-cases
[13:28:35] <riker> What is the best way to augment the results of an existing collection with a computed value? I store documents with an answers field that is an array. I want to return the documents augmented with a new field 'ansavg' that has the average of the answers array.
[13:28:48] <riker> I don't need to average across document.
[13:51:51] <riker> because idealy I may perform a different computation each time. For example, average answers 1, 2 and 5 instead of them all.
[16:35:18] <topwobble> Can someone help me understand results from the mongodb profiler? I can't figure out what query is taking > 1000 ms. I used profiler but don't understand results: https://gist.github.com/objectiveSee/8725e3bf8eb0647b626d
[20:27:10] <kurtis> Hey guys, with the aggregation framework -- is there a possibility of "filtering" out data at the last stage of the pipeline? I'm hitting a size limit. I only want data greater than a certain calculated value so I figured I'd check before abandoning the agg. framework
[20:30:19] <kurtis> Derick: As far as I can tell, 'limit' simply dumps all data after X limit. Should I run a $sort before-hand? (I haven't looked at sort's docs, yet)
[20:30:45] <Derick> "than a certain calculated value" -- how do you calculate it?
[20:31:19] <kurtis> Derick: In my $group, I have a 'count: { '$sum': 1 }' field
[20:31:31] <Derick> right, so you can do a $match on that again
[20:31:35] <kurtis> I'd like to basically dump anything where dump is less than 5
[20:32:01] <kurtis> whoops, where 'count' is less than 5 :) but good idea! I'll check out match. Thanks
[20:34:25] <kurtis> Derick: that worked perfectly. Thank you so much!
[20:44:23] <bartzy> I store images there (no need for gridfs since they're small images)
[20:44:31] <bartzy> I delete images after 2-24 hours.
[20:44:48] <bartzy> There are hundred thousands of images each day
[20:45:14] <bartzy> I don't care that mongo will use a lot of space.
[20:45:59] <bartzy> Like, if actual data size is 4GB, I don't care if mongo will use 40GB or even 100GB. But I do care that those 100GB will be consistent. That it won't be 200GB after a while. Cause I have a disk size limit and I don't want to maintain mongo all the time (compacting)
[20:46:10] <bartzy> Is it possible (with powerof2 or any other way) ?
[20:46:25] <cheeser> not sure. i've never really looked in to the powerof2 stuff.
[20:46:49] <kali> i think powerof2 whould ensure a faster convergence to the asymptotic limit
[20:57:42] <bartzy> Thanks a lot everyone! I hope this will help
[21:45:50] <kaTOR> whats the best way to check for duplicate data, im currently thinking about doing a for loop to go through all the emails (unique ID?) and if found not to add the new data
[21:47:06] <starefossen> kaTOR: you can add an index to enforce a uniqe field, keep ing ming that the unique field in a collection is allways unique
[21:48:54] <kaTOR> thanks for all the advice guys will look into adding a unique field based on email i think
[21:50:07] <cheeser> if you already have dupes, you'll have to sanitize your data or simply drop the dupes.
[21:50:16] <cheeser> creating the index will drop them for you if you want.
[21:50:41] <kaTOR> yea its only test data so no trouble dropping data
[21:51:49] <Novimundus> Is this the right place to ask about pymongo as well?
[21:54:00] <Novimundus> I'm trying to do a query on facebook interactions, and I want to restrict it to gte and lt two datetimes. Should it look something like this: col.find({'created_at':{'$lt':today ,'$gte':7 days ago}}) ?}})
[22:58:38] <topwobble> I need to add a new required property on a collection. Is there a way to automatically set the default value for any doc in that collection that already existed before it was required (ie. undefined)
[23:02:02] <joannac> topwobble: $exists and the upsert flag
[23:02:37] <topwobble> joannac: can you explain more, sorry ;)
[23:04:44] <topwobble> joannac: is that a query I have to run?
[23:04:49] <joannac> to update more than one document
[23:05:34] <topwobble> we have 1.15 million documents that don't have the 'archived' key. I want to add that key to every document with a value of YES
[23:05:54] <joannac> Yes. run a query using the $exists operator to pull out all documents without the field, update it with whatever, set multi :true to update more than one document
[23:07:23] <topwobble> joannac: wow, so this is one query that will modify 1.15 million docs?
[23:10:27] <topwobble> assuming this script takes 1 hour : will our game server be unable to modify a document for 1 hour, or just while the individual document that it is trying to modify is being updated (which would be fine)
[23:38:32] <rekibnikufesin> I thought I saw an option to specify which shards in your cluster to shard your collection across. Can't find it in the docs- anyone know of such magic?
[23:38:52] <joannac> rekibnikufesin: tag aware sharding?
[23:40:58] <rekibnikufesin> joannac: /facepalm That was it! Thanks!