[00:22:07] <jayjo> Boomtime: Thanks for the response, sorry I had walked away
[00:24:22] <jayjo> so I'll use the one-to-many relationship with document references, but if I go this way I'll be keeping objects connected through the objectid, should I create a separate id variable to be used in my url and keep the reference through objectid?
[00:25:10] <jayjo> Sorry - dumb question. I've figured it out just reading it again
[02:29:42] <huleo> this surely will be simple - I'm taking my first steps with aggregation framework
[02:30:15] <huleo> using $geoNear operator as first element of the pipeline
[02:30:41] <huleo> which is cool - it takes "query" argument, to actually use it when querying for documents to process
[02:31:19] <huleo> but sometimes I'm not querying by geo-location...how will I "find" documents by basic, simple query, to pass further to the pipeline?
[03:03:34] <sgo11> hi, I just want to log all user queries. I did "db.setProfilingLevel(2)". When I do "db.system.profile.find().pretty()", it shows there are many queries against "test.system.indexes" and "test.system.profile" (I don't know what triggers those queries). after a few minutes, 400 queries against "indexes" and "profile" ns compared to only 7 queries against my actual collection. how can I disable logging "indexes" and "profile" queries?
[03:03:42] <huleo> $project - will I actually be able to /exclude/ specific fields, instead of including them one-by-one?
[03:16:04] <TheAncientGoat> Being lazy here, but does anyone have an estimate of the perf/efficiency benefits of using the aggregation pipeline over just manually grouping fetched data in node for a datataset of a couple k docs?
[03:28:21] <huleo> TheAncientGoat: all I can speak about is my case - and here aggregation saves me plenty of tinkering to get distance of each result (2dsphere) from the point I'm querying against
[03:28:24] <Boomtime> TheAncientGoat: that has a lot of variables
[03:28:56] <Boomtime> huleo: that sounds like you are benefitting from the geo API not strictly the aggregation part
[03:30:21] <huleo> Boomtime: do you know any other way to get calculated distance in the results?
[03:30:41] <huleo> (aside from calculating it myself)
[03:31:20] <TheAncientGoat> My case is basically grouping transactions by day, a lot less LOC through aggregation, but not sure what the perf considerations are regarding it
[03:33:09] <Jonno_FTW> is it wise to have very large documents? for example, my data looks like: site_number: {date:[{2010-01-01:[{time:10:00,readings:[{detector:1,count:50}]}]}]}}, where there is ~2m individual readings, spread across different sites,dates and times
[03:35:04] <Boomtime> huleo: I assume you mean you specifically use "distanceField" in your pipeline, although that is aggregation specific, it is the geo engine that supplies the value and does the calculation - for you, this does mean that aggregation has significant convenience over doing the calcs client-side
[03:35:50] <huleo> Boomtime: exactly, no way to get distanceField with "regular" find() AFAIK
[03:36:03] <Boomtime> clearly however, the calculation of distance is performed in regular query since $geoNear doies precisely this - you just don't have access to the value
[03:36:09] <huleo> (other than calculating it manually all over again)
[03:40:21] <Boomtime> TheAncientGoat: there are still lots of variables, and it depends also on what you actually desire to "optimize" - do you want to save network bandwidth or server resources?
[03:44:27] <TheAncientGoat> I have a lot of concurrent, low bandwidth queries for other data. Not sure if I'll be choking at bandwidth or CPU first, but the connections to the clients could be pretty slow, so both are a factor. Aggregation will save bandwidth, I can see that, I just don't know at what cost
[03:46:09] <Boomtime> you're not going to get a definitive answer, it depends on CPU, memory, disk IO (current load of previous three, and how much spare), how many documents are involved, how frequently this is run...
[03:46:32] <Boomtime> the pipeline you describe sounds pretty straight forward
[03:47:57] <Boomtime> if you match, sort and group on fields which are indexed, then the pipeline can take advantage of that - something the remote client cannot - but the difference might not matter, depending on the numerous variables I mentioned before
[03:48:28] <Boomtime> aggregation is probably the right thing to do, but you should test it
[03:49:36] <Boomtime> btw, it's not just bandwidth, it's also latency - if you have more than a few hundred documents, the cursor has to go back to the server to get more - if you group these, you send less and subsequently get fewer round-trips to the server
[03:50:32] <TheAncientGoat> That's good insight, thanks
[12:35:16] <mnms_> Im trying to sum field for all my documents in collection. It takes much time. So I added single field index on this field, but it also didnt help
[12:35:55] <mnms_> Collection has 90mln documents.
[12:36:11] <kali> yeah, nothing will make it significantly better. your best option is to denormalize
[12:37:00] <mnms_> kali: what does it means denormalize ? Cause for me it cannot be more denormalized..
[12:39:40] <kali> store and maintain this sum separately
[12:42:32] <mnms_> kali: can I check somehow if query use specific index without executing query ?
[12:43:59] <mnms_> kali: cause waiting till its over is sometimes annoying
[12:45:11] <kali> explain option should work on the aggregation pipeline if you're running 2.6
[12:46:50] <mnms_> Im running 2.6 but I need to finish the query which is very long
[12:47:37] <mnms_> and my question is is there any way to check if query will use index without waiting til its over
[12:48:29] <kali> the explain option is all there is
[13:23:16] <zhiyajun> hi there, what's covered query?
[14:37:21] <sjose_> StephenLynx, Please have a look over here 'http://justpaste.it/iz7v', here 'trainings' is an embedded document. so can I update trainings with "_id" : ObjectId("54bcaf4eba09015284646f6d") with all new values in an easier way....?
[16:25:54] <Raffaele> Hello. I've noticed there seems to be some mismatch between engine_v8-3.25.cpp vs engine_v8.cpp up to the point that the former doesn't even compile because of silly errors like string and set missing std::
[16:26:09] <Raffaele> I wonder if this sounds completely new to anybody
[17:07:54] <toter> Hi everybody... I'm running a collection against another to find missing ['company'] values. The first collection has 345 documents and the second one, 20285. This operation is taking 63 seconds to complete... Is there anything I can do to improve this execution time? Code: http://hastebin.com/ifayexesuv
[19:49:16] <sekyms> I have a question regarding Schema as it relates to this: http://docs.mongodb.org/ecosystem/use-cases/product-catalog/#schema
[19:49:40] <sekyms> that I just answered by looking more carefully.
[19:52:05] <sekyms> is there a good place to find mongodb contractors
[20:09:41] <morenoh149> sekyms: I'm a contractor :)
[20:09:51] <morenoh149> how much can you afford ;P
[20:10:42] <sekyms> well of you are going to play that game so can I :-p
[20:10:52] <sekyms> I cant pay you now but I can offer you pizza
[22:35:14] <FunnyLookinHat> Is it fairly trivial to move a 2/1 ( member/arbiter ) set to a 3/0 ( primary with two secondary members ) ?
[23:10:13] <mortal1> howdy folks, say have a collection foo. this collection has a document with an id of 1, and a object of bar that I'd like to remove
[23:10:55] <mortal1> so db.foo.remove({_id:"1"},{"bar":1});?
[23:17:40] <morenoh149> mortal1: should the id be unique?
[23:17:48] <morenoh149> and you only want to remove the bar?
[23:34:25] <liamkeily> I want to append to an array in mongo. But want to make sure its not overwritten by another process. Whats the recommended way to do this?
[23:35:16] <Boomtime> liamkeily: $push or $addToSet