[04:06:33] <Boomtime> ok, so that's fine, but since you group on it, i'm trying to understand how many docs are likely to actually group on that field
[04:06:41] <dimaj> actually, i also have a question around that... when I do 'new ISODate("$date")', i either blow up or get a huge negative number for the year... not sure why
[04:07:16] <dimaj> so, this db is for test results... if i have 100 tests within a single run - 100 docs
[04:07:38] <Boomtime> 'new ISODate' would be evaluated on the client, because that's a javscript language object
[04:07:50] <Boomtime> you need it to be evaluated on the server
[04:31:39] <dimaj> found my answer... http://stackoverflow.com/questions/21967233/sorting-aggregation-addtoset-result
[04:42:42] <Boomtime> @dimaj: rather than $addToSet then $unwind like the suggestion, in your case, you can $group once (with the array field as part of it) to cull out duplicates, then $sort to get the right order, then $group again to build the array
[04:43:19] <Boomtime> on the second $group you will know there are no duplicates, so $push will work cleanly
[04:43:35] <Boomtime> should be cheaper than $addToSet followed by $unwind :p
[04:44:05] <Boomtime> but you can always test it yourself - the result is the same either way
[04:44:12] <dimaj> well... i'm thinking that if it's rather expensive procedure, i might just leave it the way $addToSet gives it to me
[05:28:18] <dimaj> @Boomtime: one more question... is it possible to inject a static object into an array of each resulting document?
[05:28:44] <swaps> menas what do you wnat to do here
[05:30:33] <dimaj> so, after i do my aggregate pipeline, I end up with a document that looks like this: '{name: 'some name', runs: [ {date: '2016-03-29', threads: ['thread1', 'thread2']}]}'
[05:30:56] <dimaj> I would like to inject a $literate 'All Runs' as the first element of the '$runs' array
[05:36:54] <Boomtime> @dimaj: the easy way would be to use $setUnion but i think that won't assure order again (cos sets)
[05:37:31] <Boomtime> can you carry that extra information in a seperate field?
[05:38:53] <dimaj> no.. i want to have a new "date" that would be interpreted by my web app as "give me results for all dates"
[05:41:02] <Boomtime> wait.. your example looks pretty simple
[05:41:30] <Boomtime> you just want an extra field in the object? - so add in a literal prior to the $group that forms that array
[05:42:31] <Boomtime> field:{$literal:<value>} in the $project btw
[05:43:23] <dimaj> this is my new pipeline: http://pastebin.com/k0PpS1SU
[05:47:44] <Boomtime> ok, i don't think i understand your original question, but whatever works for you
[06:05:59] <YoutubeAPI> Can someone explain why this doesn't save when I use my `id` and `reply` variables? https://gist.github.com/InternetExplorer7/3de11fd6bba592077e61f3ab43d1a476
[06:09:49] <dimaj> @YoutubeAPI: if I were to make a guess is because default behavior of _id is an autogenerated Object.. you might want to modify your model to rename you '_id' to something else like 'id'
[06:10:50] <dimaj> also, my schema looks something like this: new Schema({// my types //}, {collection: 'myCollection'});
[06:11:54] <YoutubeAPI> @dimaj: I knew that _id generated unique IDs per document, are we not allowed to edit the _id field in a document?
[06:12:55] <YoutubeAPI> Oh, you know what I think it is?
[06:13:18] <YoutubeAPI> I think there might possibly be a limit in length.
[06:13:24] <Boomtime> _id is immutable, but it can be set to whatever you like on first insert
[06:14:10] <Boomtime> if it is not present, the driver will invent one for you, if you beat up the driver and force an insert to the server of a document without a _id, the server will invent one for you
[06:14:22] <YoutubeAPI> Is there a limit on how long your _id can be?
[06:14:41] <Boomtime> it is a field like any other, but the index on that field has the usual limit of 1024 bytes
[06:15:42] <Boomtime> it also cannot be an array - a subdocument is also considered bad form, but not expressly denied
[06:16:05] <YoutubeAPI> Right, I'm converting it from an array to a string before insert.
[06:16:24] <YoutubeAPI> Hmm, then how did 1120 length String fit into another field?
[06:16:45] <Boomtime> it is not the field length that matters, as i said, it is the index on that field that matters
[06:17:15] <Boomtime> you can store whatever you like in any field you like, but if you try to index a field the index will only permit values of 1024 bytes to be indexed
[06:17:49] <YoutubeAPI> Right, and that makes sense. Thanks.
[09:39:30] <Ange7> is it possible to calc time for one query in mongoshell ?
[09:52:51] <Ange7> i try to optimize one aggregation which take 150sec with PHP Driver. so i try with other driver ... but i see that in mongoshell it's very slow too, so i don't know how to resolve performance problems
[09:53:38] <Derick> can you share the explain output, and the aggregation pipeline?
[13:11:16] <gcfhvjbkn> trying to figure out why my shard won't drain; i run removeShard, it is in the "ongoing" stage now, but number of chunks doesn't decrease; moreover, whenever i run "db.collection.getShardDistribution()" it shows the same docs count on the target shard
[13:11:36] <gcfhvjbkn> is it wrong to think that getShardDistribution can be used to track draining progress?
[13:11:50] <gcfhvjbkn> either that, or my draining has stalled
[13:12:04] <gcfhvjbkn> none of my collections are primary on the target shard btw
[14:15:16] <bjpenn> we had an issue yesterday where mongodb cpu spiked to near 100% usage on all CPUs
[14:15:28] <bjpenn> anyone know how this could happen?
[14:15:40] <bjpenn> i thought usually it would be IOPs taking a hit
[14:58:45] <Ange7> is is possible to do : findAndModify(['key' => $key], ['count' => '$count' + $otherVarCount]) ?
[15:17:26] <Derick> JustMozzy: what is the pipeline supposed to do?
[15:17:54] <JustMozzy> Derick: it is supposed to give me the count of all unique users and an array of all unique userids
[15:18:46] <Derick> easiest would be to just add (After line 20): , 'users' => [ '$addToSet' => '$users' ]
[15:18:59] <Derick> but, doing an unwind, and then an add to set is a bit silly
[15:20:42] <JustMozzy> Derick: just tested to see what would happen if the unwind was removed. it would result in a count=1. I think I got my head wrapped around the unwind functionality now
[15:23:00] <bjpenn> anyone know what can cause mongodb to go to 100% cpu?
[15:27:09] <Ange7> is is possible to do : findAndModify(['key' => $key], ['count' => '$count' + $otherVarCount]) ?
[15:27:53] <JustMozzy> bjpenn: mapreduce on a large (20gb+) dataset?
[15:30:06] <bjpenn> JustMozzy: wouldnt that just increase iops?
[15:31:06] <JustMozzy> bjpenn: good question. am no expert ;) but we had once the problem that we fired too many aggregates at the server at once and it went close to 100%
[15:33:15] <Doyle> bjpenn, if your CPU is under provisioned, the compression in WT could cause that. Indexing. An archive gzip dump.
[15:35:46] <Doyle> Here's a MongoDB 3.0 question. Sometimes, during periods of high load I see the mongodb process using swap space, and then get killed by the OOM killer. I know MMAP shouldn't use swap, but it does. How much swap space should be provisioned on a MongoDB server? Considering it should use any, I've been giving them 1GB swap for system use, but it's not enough it seems, or there's a bug.
[15:39:14] <bjpenn> Doyle, JustMozzy doing db.currentOps() found a bunch of queries taking a really long time
[15:39:27] <bjpenn> do you think thats a symptom of high CPU?
[16:17:34] <GothAlice> Doyle: MMAP will use swap; its entire point is to let the kernel handle paging memory on and off disk by using "memory mapped files". If you have more data than will fit in RAM, that process of loading/unloading is "swapping".
[16:24:40] <GothAlice> invapid2: .skip(100).limit(100) roughly. However, please note that skipping requires scanning the index to find where to continue from, making it slower to skip further at roughly O(log n).
[16:52:33] <Doyle> In situations where your dataset is 1TB+, you can't fit it in ram. The indexes likely won't fit in ram. Would you still limit the swap to 8GB, or give it crazy swap? 256GB swap?
[16:53:04] <GothAlice> Doyle: Neither. Both are incorrect solutions, where sharding is the correct solution.
[16:54:21] <Doyle> So in sharding, you'd want to limit the storage capacity of each shard, to the amount of ram you've got? Ideally, that is. Not give each shard a TB of storage...
[16:55:21] <GothAlice> For very large datasets, it's the indexes that really matter.
[17:09:10] <MacWinner> GothAlice, have you moved to WT?
[17:09:30] <GothAlice> Aye, with the release of 3.2.
[17:11:01] <MacWinner> cool, is there any performance downside to going to WT? the articles I read seem to indicate there are no downsides to WT, but it's confusing to me because of teh commpression/decompression overhead
[17:11:17] <MacWinner> i guess that overhead is minimal compared to other gains?
[17:12:58] <GothAlice> The compression algorithm used by default is "stupid fast".
[17:12:59] <Doyle> MacWinner, CPU became a bottleneck for me with 4 cores on a test system.
[17:13:21] <Doyle> Instead of disk, which was a first, so not bad.
[17:13:23] <GothAlice> Doyle: But under what workload?
[17:13:34] <Doyle> GothAlice, a stupid workload :P
[17:13:34] <GothAlice> Such stats are meaningless without context.
[17:14:08] <Doyle> Very heavy reads with an index being created in the background
[17:14:32] <Doyle> It wasn't a bad thing, the performance overall was hugely improved with WT over MMAP
[17:14:43] <Doyle> With MMAP disk was always the bottleneck
[17:15:08] <Doyle> And CPU was just hanging out, being a bruh, not doing much
[17:15:37] <Doyle> As far as I've seen, it'd be a strange day when someone didn't benefit from WT
[17:16:18] <Doyle> If your server was running a pentium M... maybe
[17:18:29] <GothAlice> I'm just glad for the restricted choice of compression algo. While I use lzma for my deep archive material (… I'm always adding features to MongoDB before they're added to core …) offering it or something like bzip2 would only result in support tickets about CPU use. XP
[17:18:43] <Doyle> MacWinner, you'll want to match your cores to the average number of active operations under WT for best performance. I noticed that with an 8 thread CPU, performance was best when the active ops didn't exceed 8. The AR/AW columns of mongostat
[17:20:48] <Doyle> LOL, don't give the masses a space shuttle instrument panel when they can barely drive a Fiat?
[17:23:02] <GothAlice> I noticed someone requesting bzip2 on JIRA a while back.
[17:23:37] <GothAlice> That just blew my mind; bzip2 is _terrible_ for speed (gzip wins) and compression ratio vs. speed (lzma thrashes bzip2).
[17:25:12] <Doyle> That's good to know. Noted. One feature I'd like is a delayed drop mechanism.
[17:27:05] <MacWinner> Doyle, I'll keep an eye out on mongostat too.. so right now my workload is primarily writing activity log data to a mongodb.. highly repetitive data.. only recent log data is in the working set.. I see in my test environment that it gets redued by about 80% on disk. I have another gridfs collection about 175 gigs.. after going to WT, it became about 150gigs.. but the collection is full of PDFs and PNGs which aren't compressing well..
[17:27:24] <MacWinner> GothAlice, alrighty.. if WT has your stamp on it, i'm going to do it this weekend
[17:27:26] <GothAlice> Another approach, depending on dataset size, would be to have a "expires" field and add a TTL with the time to live explicitly defined. The data won't likely all get cleaned up the same instant, of course, so solution would depend on needs.
[17:28:36] <GothAlice> MacWinner: Snappy compression is all about speed. If you want improved ratios at the cost of CPU, switch to zlib.
[17:28:53] <GothAlice> But as a note, WiredTiger uses page compression, not record compression.
[17:30:31] <GothAlice> https://docs.mongodb.org/manual/reference/configuration-options/#storage.wiredTiger.collectionConfig.blockCompressor < with a neat trick involving switching compression algos prior to creating specific collections in order to pin the algorithm used per-collection.
[17:30:56] <GothAlice> I.e. you can construct your GridFS collections without compression, then enable compression before constructing the remainder.
[17:55:04] <StephenLynx> I store the file and a compressed version.
[17:55:19] <StephenLynx> then use the compressed one if the client can read it.
[18:19:39] <GothAlice> StephenLynx: In my case it's a bit more convoluted; material older than 6 months that hasn't been accessed within the last 30 days is expensively LZMA'd, replacing the uncompressed version. I do my own full text indexing (also pre-dating MongoDB support for it ;) so searches aren't affected by the "deep archiving".
[18:20:24] <GothAlice> Turns out that even already compressed video is still further compressible (if you're willing to dedicate enough CPU to it ;) due the repeated packet headers.
[18:24:42] <StephenLynx> yeah, I can afford to don't bother that much because content is ephemeral on my scenario.
[18:24:53] <StephenLynx> so there is never too much data and data is never too old.
[18:29:46] <GothAlice> Hehe. Very different than "I haven't manually deleted anything since 2001". XP
[19:04:17] <kgee> I have a (fairly complex) mongodb query that is returning “InternalError: too much recursion” upon first execution. After a second execution however, the query returns results. db.version() reports 2.6.11 . Can someone explain the recursive behaviour? I dont see how my query requires recursion in the first place. http://pastebin.com/j9PNivBT
[20:02:04] <basldex> hi. if I use db.cloneCollection to copy a collection from a host to my current instance, is there any danger of duplicated object ids? (quite important data so I want to be sure)
[20:02:42] <cheeser> no, there's no (practical) risk.
[20:03:10] <cheeser> there's a machine id component to ObjectID as well as time
[20:03:26] <basldex> that's great! thank you very much
[20:10:57] <GothAlice> cheeser: Though it's important to note that with every client driver I can think of, the IDs are generated application-side, not mongod-side, identifying the application worker and not primary.
[20:24:53] <uuanton> hi yall anyone know why starting secondary requires "recover create file /data/local.1 2047MB" file because it takes forever
[20:26:11] <GothAlice> uuanton: In order to maintain internal state replicas maintain a "local" database containing such information as the oplog, oplog counters, plus a copy of the replication configuration and current status of all known nodes. If the initial 2GB allocation is too much for you, you can try enabling --smallFiles.
[20:27:14] <uuanton> thanks GothAlice it creates 10 of this files total of 2gb * 10
[20:31:48] <GothAlice> What size oplog are you using?
[20:37:59] <uuanton> if got db.getReplicationInfo() and "logSizeMB" : 20502.3212890625,
[20:39:30] <mexiwithacan> Can anyone advise me whether it's possible to update and $set a field to a dynamic calculation? I'm trying to do this: https://bpaste.net/show/97792aca060a
[20:44:37] <GothAlice> … that is a massive oplog, uuanton.
[20:46:35] <GothAlice> For one of my at-work datasets, that oplog would preserve 20 years of operations.
[21:22:36] <uuanton> GothAlive I haven't setup anything special it was default behavior
[21:27:57] <uuanton> @GothAlive anyway to avoid it ? smallFiles not the best option for me because i have 2 databases that are pretty large
[21:28:07] <uuanton> @GothAlice anyway to avoid it ? smallFiles not the best option for me because i have 2 databases that are pretty large
[21:32:18] <GothAlice> The deciding factor for oplog size is the duration of _time_ you want to be able to easily recover from vs. operation size. I.e. if you're writing a gigabyte of oplog a day, and you need 48 hours (two days) recovery time, then you'll need a two gigabyte oplog plus a little room for growth.
[21:33:22] <GothAlice> (I.e. as long as a secondary that has become disconnected reconnects to my hypothetical dataset within 48 hours it can "catch up" using the oplog, beyond that period of time it would need to perform a full sync.)
[21:38:45] <uuanton> makes sense why is default size so big
[21:42:39] <kgee> I’m trying to figure out why my mongodb 2.6.11 console is running out of stack space (InternalError: too much recursion). I’ve boiled it down to two queries, one that works and one that doesn’t. I still don’t understand the root cause though. http://pastebin.com/snGUXLw5
[21:44:10] <kgee> I think the combination of or(and(a,b),and(c,d)) could be causing the issue
[21:47:07] <kgee> If I take the regex out, the same problem occurs. It’s not a large collection (<600 records) so I’m really at a loss
[22:16:28] <mexiwithacan> kgee I ran into a similar problem earlier today with an $and operation.
[22:17:05] <kgee> mexiwithacan: so the query is failing on our production server (v2.6.11) but working on our staging server (v2.6.6)
[22:17:40] <kgee> mexiwithacan: the strange thing is that if I run the query twice, the second time works? Something is very strange
[22:19:16] <mexiwithacan> Searching for "too much recursion" on Google makes it seem like a JavaScript-related issue.
[22:19:30] <kgee> mexiwithacan: well mongodb seems to use a javascript shell
[22:20:11] <kgee> so the trouble must come when the json query gets parsed on the mongodb server. but that’s not transparent to me from my client application
[22:21:30] <kgee> I’ve taken out as much complexity as I can while reproducing the problem. It still happens with boolean comparisons: or(and(a:true,b:true),and(c:true,d:true))
[22:23:03] <kgee> in fact I mean that quite literally: db.getCollection('messagingSummary').find({$or: [{$and: [{a: true}, {b: true}]}, {$and: [{c: true}, {d: true}]}]})
[22:23:15] <kgee> regardless of the schema, that is crashing for me