PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Sunday the 1st of December, 2013

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[04:12:42] <topwobble> Can someone help me understand results from the mongodb profiler? I can't figure out what query is taking > 1000 ms. I used profiler but don't understand results: https://gist.github.com/objectiveSee/8725e3bf8eb0647b626d
[04:19:18] <topwobble> leave comment in gist if i sign off
[12:49:21] <User7> hello
[12:49:57] <starefossen> User7: hi
[12:50:52] <User7> I want to know good usecases for mongodb
[12:55:50] <starefossen> User7: here they are: http://www.slideshare.net/mongodb/nosql-now-2012-mongodb-use-cases
[13:28:35] <riker> What is the best way to augment the results of an existing collection with a computed value? I store documents with an answers field that is an array. I want to return the documents augmented with a new field 'ansavg' that has the average of the answers array.
[13:28:48] <riker> I don't need to average across document.
[13:36:19] <joannac> aggregation framework?
[13:36:45] <joannac> or do you mean permanently?
[13:51:08] <riker> @joannac: aggregation
[13:51:51] <riker> because idealy I may perform a different computation each time. For example, average answers 1, 2 and 5 instead of them all.
[16:35:18] <topwobble> Can someone help me understand results from the mongodb profiler? I can't figure out what query is taking > 1000 ms. I used profiler but don't understand results: https://gist.github.com/objectiveSee/8725e3bf8eb0647b626d
[20:27:10] <kurtis> Hey guys, with the aggregation framework -- is there a possibility of "filtering" out data at the last stage of the pipeline? I'm hitting a size limit. I only want data greater than a certain calculated value so I figured I'd check before abandoning the agg. framework
[20:28:31] <Derick> you can do a $limit
[20:30:19] <kurtis> Derick: As far as I can tell, 'limit' simply dumps all data after X limit. Should I run a $sort before-hand? (I haven't looked at sort's docs, yet)
[20:30:45] <Derick> "than a certain calculated value" -- how do you calculate it?
[20:31:19] <kurtis> Derick: In my $group, I have a 'count: { '$sum': 1 }' field
[20:31:31] <Derick> right, so you can do a $match on that again
[20:31:35] <kurtis> I'd like to basically dump anything where dump is less than 5
[20:32:01] <kurtis> whoops, where 'count' is less than 5 :) but good idea! I'll check out match. Thanks
[20:34:25] <kurtis> Derick: that worked perfectly. Thank you so much!
[20:41:29] <bartzy> Hello
[20:41:57] <bartzy> I have a collection with a lot of deletions, all documents are from 10KB to 500KB.
[20:42:20] <bartzy> The storage size is just getting bigger and bigger, mongo is not using the free space it has from deleting documents...
[20:42:22] <bartzy> Why is that ?
[20:42:50] <cheeser> i don't think mongo (automatically) cleans up that fragmentation.
[20:42:55] <bartzy> If it was using the space from deleted documents, I could avoid using compact.
[20:43:05] <Derick> it does do that, if it can
[20:43:08] <bartzy> Yeah, I don't care about cleaning up. But it should write to free space.
[20:43:15] <bartzy> When it can't ?
[20:43:16] <Derick> but fragmentation might cause issues for that
[20:43:19] <bartzy> :|
[20:43:24] <bartzy> So I must run fragmentation...
[20:43:31] <bartzy> sorry compactation*
[20:43:32] <kali> you may want to try powerof2
[20:43:39] <Derick> yes, that can help indeed
[20:43:41] <kali> it helps with fragmentation in some case
[20:43:51] <Derick> (but it also uses a bit more space in general)
[20:44:07] <bartzy> OK
[20:44:12] <bartzy> Let me explain a bit better
[20:44:23] <bartzy> I store images there (no need for gridfs since they're small images)
[20:44:31] <bartzy> I delete images after 2-24 hours.
[20:44:48] <bartzy> There are hundred thousands of images each day
[20:45:14] <bartzy> I don't care that mongo will use a lot of space.
[20:45:59] <bartzy> Like, if actual data size is 4GB, I don't care if mongo will use 40GB or even 100GB. But I do care that those 100GB will be consistent. That it won't be 200GB after a while. Cause I have a disk size limit and I don't want to maintain mongo all the time (compacting)
[20:46:10] <bartzy> Is it possible (with powerof2 or any other way) ?
[20:46:25] <cheeser> not sure. i've never really looked in to the powerof2 stuff.
[20:46:49] <kali> i think powerof2 whould ensure a faster convergence to the asymptotic limit
[20:47:12] <kali> should/would, i'm obviousl ynot 100% sure :)
[20:47:56] <bartzy> OK. And if I have a live collection with 20K documents, 2.3GB in size, 35GB on disk.
[20:48:17] <bartzy> All of these documents will be deleted in the next day. Running powerof2 now will work well ?
[20:48:26] <bartzy> or should I run powerof2, then compact? :|
[20:48:39] <bartzy> I mean the collmod command.
[20:50:42] <kali> just do collmod
[20:51:01] <kali> and pay attention to your collection vitals for the next 24 hours, of course
[20:53:04] <bartzy> kali: Thanks.
[20:53:14] <bartzy> kali: Last question - By vitals you mean only size ?
[20:53:18] <bartzy> storage size, file size ?
[20:53:38] <kali> yeah
[20:55:07] <bartzy> kali: OK :) Just did collmod
[20:55:24] <bartzy> really last question - Can I query the state of powerof2 ? i.e. if it's enabled or not
[20:55:46] <bartzy> ah I see in the docs now
[20:55:47] <bartzy> thanks
[20:56:40] <kali> http://docs.mongodb.org/manual/reference/command/collStats/#collStats.userFlags
[20:57:35] <bartzy> yeah just read that :)
[20:57:42] <bartzy> Thanks a lot everyone! I hope this will help
[21:45:50] <kaTOR> whats the best way to check for duplicate data, im currently thinking about doing a for loop to go through all the emails (unique ID?) and if found not to add the new data
[21:47:06] <starefossen> kaTOR: you can add an index to enforce a uniqe field, keep ing ming that the unique field in a collection is allways unique
[21:47:15] <redsand> or use the unique key as _id
[21:47:26] <redsand> never a collision
[21:47:44] <cheeser> an upsert might help, too.
[21:48:13] <starefossen> kaTOR: if you want to find all duplicate values in an existing collection you can use the aggregation framework
[21:48:13] <redsand> word
[21:48:54] <kaTOR> thanks for all the advice guys will look into adding a unique field based on email i think
[21:50:07] <cheeser> if you already have dupes, you'll have to sanitize your data or simply drop the dupes.
[21:50:16] <cheeser> creating the index will drop them for you if you want.
[21:50:41] <kaTOR> yea its only test data so no trouble dropping data
[21:51:49] <Novimundus> Is this the right place to ask about pymongo as well?
[21:54:00] <Novimundus> I'm trying to do a query on facebook interactions, and I want to restrict it to gte and lt two datetimes. Should it look something like this: col.find({'created_at':{'$lt':today ,'$gte':7 days ago}}) ?}})
[22:58:38] <topwobble> I need to add a new required property on a collection. Is there a way to automatically set the default value for any doc in that collection that already existed before it was required (ie. undefined)
[23:02:02] <joannac> topwobble: $exists and the upsert flag
[23:02:37] <topwobble> joannac: can you explain more, sorry ;)
[23:04:43] <joannac> sorry, not upsert, multi
[23:04:44] <topwobble> joannac: is that a query I have to run?
[23:04:49] <joannac> to update more than one document
[23:05:34] <topwobble> we have 1.15 million documents that don't have the 'archived' key. I want to add that key to every document with a value of YES
[23:05:54] <joannac> Yes. run a query using the $exists operator to pull out all documents without the field, update it with whatever, set multi :true to update more than one document
[23:07:23] <topwobble> joannac: wow, so this is one query that will modify 1.15 million docs?
[23:07:38] <joannac> Yes. It'll run for a while.
[23:07:58] <joannac> You should test it on a subset to make sure it does what you want
[23:08:07] <topwobble> joannac: wow. ok. well this is production data. so it may be getting modified from another place
[23:08:31] <topwobble> concerned about what happens if another source tries to modified the doc while this query is running
[23:08:36] <topwobble> (because that will happen)
[23:09:00] <joannac> it'll have to wait
[23:09:25] <topwobble> "it" being the second source (ie. our game server)? So Mongo has some document lock?
[23:09:40] <topwobble> does it lock the entire query set, or just per document as it's being modified?
[23:09:41] <joannac> mongodb has a database lock
[23:10:27] <topwobble> assuming this script takes 1 hour : will our game server be unable to modify a document for 1 hour, or just while the individual document that it is trying to modify is being updated (which would be fine)
[23:10:30] <topwobble> ?
[23:11:03] <joannac> topwobble: http://docs.mongodb.org/manual/faq/concurrency/#does-a-read-or-write-operation-ever-yield-the-lock
[23:38:32] <rekibnikufesin> I thought I saw an option to specify which shards in your cluster to shard your collection across. Can't find it in the docs- anyone know of such magic?
[23:38:52] <joannac> rekibnikufesin: tag aware sharding?
[23:40:58] <rekibnikufesin> joannac: /facepalm That was it! Thanks!
[23:41:45] <joannac> np :)