[00:01:26] <pikapp> Hello, I have a document that contains an array of objects that contains an array of other objects. I need to loop through these inner-inner arrays and remove a field from each object in the inner-inner arrays. I have successfully looped through using db.collection.find() but how do I $unset them?
[00:02:11] <pikapp> Ultimately, I can access the objects I need using nested loops inside db.collection.find() but not a query that I can use in db.collection.update()
[00:03:00] <pikapp> Is there a way to perform an update inside the db.collection.find() function without using a query?
[00:18:20] <Freman> so, in mongo, I can call db.log_outputs.find({task_id: "550f3b73cd3b101400377d18"}, {_id : 1, date : 1, duration: 1, pid: 1, code: 1, server: 1, failed: 1}).sort({date: -1})
[00:18:45] <Freman> but mongoish can't call LogModelInstance.find({task_id: this.task._id}, '_id date duration pid code server failed', {sort: {date: -1 }, limit: limit}, function(err, docs))
[02:16:15] <Axy> Hey all, I have an object structure like {"_id":"uuid",source:{"link":"http://","desc":"description","content":{"p1":"a","p2":"b"}}}
[02:58:14] <joannac> mantovani: oh, you didn't read the docs
[02:58:15] <joannac> For JSON output formats, mongoexport includes only the specified field(s) and the _id field, and if the specified field(s) is a field within a sub-document, the mongoexport includes the sub-document with all its fields, not just the specified field within the document.
[03:04:18] <mantovani> mongodb is very nice for process high volume of "unique transactions( like query something or update)"
[03:04:37] <mantovani> but for throughput is very very very very bad
[03:04:55] <mantovani> and it doesn't support pipe, I had to put an extra driver just do to export/import
[03:05:59] <mantovani> in any database you can use named pipe "mkfifo" and use ssh or any socket to import/export from differents machines without use extra space
[03:26:06] <joannac> mantovani: np. glad we finally got there :)
[03:32:41] <tehgeekmeister> it seems like mongo is spending a whole ton of time in write locks (almost all writes are to the same database, but over many collections). I'm queuing up a fair number (mean is like 50) of writes. mongo is the bottleneck so far in this system, so I need to figure out what to do to optimize it. where should I look?
[03:33:17] <tehgeekmeister> (we're only at 15-60MB/s of write, far from the capacity of the disk most of the time. averaging like 15-20% disk utilization, as measured by iostat.)
[03:54:52] <mantovani> tehgeekmeister: you would love merge update
[06:04:03] <aps> I'm looking for an automated way to backup a prod mongo sharded cluster of data size ~600GB. Is there a script available for this or what is the preferred method to do this?
[06:19:15] <prabampm> Guys, is there any way to restrict mongodb RAM usuage? i have goggled a lot and tried cgroup as well to limit the process memory but nothing seems to be working.. Application becomes dead slow after few days and sometimes getting "cannot allocate memory" error. As an interim solution, we are clearing the cache every one hour.. Any ideas to go forward???
[06:26:42] <Boomtime> mongodb uses (by default) memory mapped files - almost all of the memory used for these remains available to other processes to take
[06:27:06] <Boomtime> when this out-of-memory condition occurs, can you capture the output from ps -aux?
[06:44:10] <prabampm> Boomtime: thanks and i ll do that
[07:21:51] <Jonno_FTW> i need help with a mapreduce, I have a colleciton of documents with readings:[{count:1234,id:12},...], I want to know the proportion of all readings with a count > 2000, readings has different length in each document
[07:26:14] <Boomtime> you probably can't get a "proportion" in a single query, but you can get a count of the number of documents with that criteria easily enough
[07:26:48] <Boomtime> sorry, a count of the number of array entries in total with that criteria
[07:27:33] <Boomtime> i would suggest a sequence like this: $match, $unwind, $match, $group
[07:27:54] <Boomtime> 1. $match filter to those documents containing at least one array entry with count > 2000
[07:28:05] <Boomtime> 2. $unwind the readings array
[07:28:21] <Boomtime> 3. $match the same criteria again, to filter out those unwound which don't match
[07:28:29] <Boomtime> 4. $group all, note the count
[07:53:57] <pschichtel> Hi! Is there an mongodb equivalent to MySQL's "INSERT INTO table (id, numer) VALUES (1, 10) ON DUPLICATE KEY UPDATE numer=number+10" ?
[07:58:07] <Derick> pschichtel: if you want auto-generated unique IDs for documents, just rely on the default _id generation
[08:03:06] <pschichtel> Derick: no I don't. My collections holds objects with a unique key and some counters. I want to either insert a new object if the key is not found or increment one of the counters by one
[08:18:21] <Boomtime> be aware that an update matched against a unique indexed field with upsert:true has a most curious possibility (due to concurrency) of generating a duplicate-key exception - be prepared to re-try the operation in that event
[08:18:50] <burhan> are writes guaranteed/immediate in mongodb?
[08:20:19] <Boomtime> you want to define both of those terms - for write durability guarantees, you should read up on WriteConcern - the word 'immediate' is difficult to even define in a network environment, so what do you mean by that?
[08:20:55] <Boomtime> if you have received a success reply to a write that you issued, then the write is definitely visible to all clients
[08:21:13] <Boomtime> however, it is entirely possible that clients see the write result before you receive the reply confirming it
[08:22:08] <burhan> in my application, there are scheduled writes (either and update, or an insert) for documents; and I want to make sure that these writes are immediately available to all clients. Right now, there is only one node, but eventually there will me many read-only nodes. My concern is how to prevent the read-only nodes from having stale data?
[08:24:58] <Boomtime> reading from secondaries means you read stale data, this is a fact, it is physically impossible to avoid since there is an undefinable delay between primary and secondary - if you want to ensure always-everytime-absolutely-guranteed-up-to-date data then you read from the primary only - which is the default mode for this reason
[08:25:33] <Boomtime> under most circumstances, the secondariers are milliseconds-at-most behind the primary
[08:25:53] <Boomtime> but if those milliseconds mean something to you, then you must not read from them
[08:26:17] <Boomtime> if you want to scale your reads, you should look at sharding
[08:26:31] <Boomtime> replica-sets are for redundancy and durability
[08:27:21] <Boomtime> this says it better than i ever could: http://askasya.com/post/canreplicashelpscaling
[08:41:17] <pschichtel> Derick: I got my stuff running now, thank you.
[08:57:51] <burhan> okay, that one was easy - the really complicated one that I am trying to figure out is how to do a query like "if any of the sub-documents in a collection match a condition"
[08:59:00] <burhan> I have a document describing customers, and a list of other documents describing purchases. So I need to find all customer that have at least one purchase where the product code is 543.
[09:05:22] <Derick> one to find all the customer "ids", and one to find the extra customer info
[09:06:05] <burhan> is that recommended? I thought it was better to have a nested document rather than multiple documents.
[09:07:25] <Derick> how many purchases are you going to get? that's an array that's going to grow grow (hopefully)
[09:07:50] <Derick> perhaps you might hit the 16mb limit? and, if you're using the mmap storage engine you end up moving the document around a lot too
[09:08:13] <Derick> nested documents like this are mostly ok if it's not data that's going to change a lot - I suppose
[09:08:35] <burhan> change means "added on" or "updated" - or it doesn't matter?
[09:09:57] <Derick> added on perhaps more than updated - updating nested documents in more than two levels is not easy to do
[09:51:35] <bilkulbekar> I have a mongo sharded cluster, with two shards, three replicas in each, is it a good thing to have mongos running on each of it?
[10:22:57] <burhan> is there a preferred GUI for mongodb?
[10:24:27] <_rgn> robomongo is pretty good i guess
[11:07:57] <Axy> Hello channel. I have my documents in this structure: http://pastebin.com/Te1KqJWS -- I want to be able to pull the url's of te last "image" - how do I pick the last object in the array via find?
[11:08:01] <bilkulbekar> I have a mongo sharded cluster, with two shards, three replicas in each, is it a good thing to have mongos running on each of it?
[11:08:06] <Axy> I tried slice without success, maybe someone can direct me
[11:09:32] <Axy> I can do .fond({},{"images.url":1}) and only return url's but that is not for the "last image "only
[11:09:39] <Axy> getting the url of the last image is what I need
[11:16:27] <BlackPanx> is there a way to speed up mongo dump process ?
[12:18:18] <Derick> first:Always backup all of your data before upgrading MongoDB.
[12:18:41] <Axy> Derick, I did mongodump - that's enough?
[12:18:50] <Derick> secondly, you should be able to : stop mongodb; put new binaries in place; start mongodb — providing you have a single node, and no other complicated setups
[12:19:08] <Derick> Axy: you should always verify your backups - but yes, that should be enough
[12:19:17] <Axy> no my setup was pretty straightforward, I followed a tutorial (I'm very new with all this)
[12:22:54] <chombium> i thought so, in the upgrade docs i've read that i need to upgrade first to 2.2, than 2.4, 2.6 and then to 3.0. i started with this http://docs.mongodb.org/manual/tutorial/install-mongodb-on-ubuntu/ but I can not find the packages for the older releases. is there some archive?
[12:23:54] <Derick> chombium: but I doubt there are ubuntu packages for these old releases
[12:37:08] <chombium> Derick: Thanks for the link. it seems like a good startpoint. unfortunatelly there are not packages, I couldn't fund ppa as well. so if I manually upgrade it to 2.2 and so on, the data should be preserved without any problems, or?
[12:37:48] <d-snp> chombium: I'd use docker images to do the upgrade
[12:38:06] <d-snp> find or build docker images of each of the mongo versions
[12:38:21] <d-snp> and then just mount your database (after backup ofc) in them one after another
[12:38:32] <d-snp> shouldnt take you more than a few horus
[12:39:20] <d-snp> you need a more recent version of ubuntu to run docker though
[12:40:56] <chombium> d-snp I don't have docker. it's a root server. from the last test i did this works: apt-get install mongodb-10gen=2.2.0 it seems that there are packages after all
[13:01:20] <Axy> d-snp, I followerd http://docs.mongodb.org/manual/tutorial/install-mongodb-on-ubuntu/ but it's still the old version of mongod when I sudo start service
[13:12:15] <Axy> Derick, d-snp I followed the tutorials again to update my mongod from 2.6.6 to latest, I'm stuck here http://docs.mongodb.org/manual/tutorial/upgrade-revision/#upgrade-replace-binaries at step 3
[13:12:27] <Axy> how do I "replace the existing mongodb binaries with the downloaded binaries"
[13:13:11] <Derick> Axy: how did you install the original binaries?
[13:31:00] <Axy> Anyway, thank you for the directions so far d-snp and Derick :]
[14:03:31] <watmm> Hey, would anyone have some suggestions as to how to avoid an uncaught MongoConnectionException with message Failed to connect to: localhost:27017: Connection refused when pm.max_children is reached?
[14:42:20] <iszak> Is there a limit to the $in function? e.g. 3.5 million limit?
[14:44:35] <Derick> iszak: 3.5million? sounds like you're hitting the maximum document size of 16MB there
[14:44:59] <iszak> Derick: it's a $match $in aggregate, so it's to filter out results, I wouldn't assume so?
[14:45:18] <Derick> the query is a document that is send to the server
[14:47:32] <Derick> but it also wouldn't matter as it would apply to the uncomprssed document
[14:47:57] <iszak> so if I have all this data in mongodb how do I get access to it with a 16mb limit?
[14:48:12] <Derick> well, each *document* can be 16mb
[14:48:34] <Derick> so it's not a problem for the returned result
[14:48:40] <Derick> it's a problem for the query you send to the server
[14:49:04] <Derick> 3.5million elements in $in, is going to be roughly 16mb of data - that has to be send, as a document (with a 16mb limit), to the server
[14:50:09] <iszak> maybe I misunderstood the 16mb limit, have we got reading on this?
[14:56:03] <Derick> It is thought that WT becomes default in 3.2
[14:56:51] <Derick> most of the MMAP engine limitations are OS limitations though... not much we can do about it. But, you can scale to multiple shards to get around that.
[15:58:25] <tehgeekmeister> in an aggregation, how do I say "make this field on the result be set this value, if it exists on the input document, or some default value otherwise"?
[16:02:25] <Derick> tehgeekmeister: you do that in your application after you've retrieved it
[16:03:21] <tehgeekmeister> hmm, for our use case that is impractical. i can do one aggregation pipeline pass, and a second collscan to reformat the output documents.
[16:03:38] <tehgeekmeister> I can't modify the consumers at this point
[16:03:51] <Derick> you can, but you should do it in your app / API end point
[16:04:00] <Derick> the consumer directly talk to your DB?
[16:04:19] <tehgeekmeister> yes, this is part of data analytics pipeline
[16:04:29] <tehgeekmeister> there are many many analytics flows that consume these collections
[16:04:33] <tehgeekmeister> hence, changing them all is a pain
[16:04:43] <Derick> aggregation pipeline it is then
[16:06:08] <tehgeekmeister> so, if i understand correctly, you're saying that a single pass of the aggregation pipeline will be difficult to use to accomplish the full goal?
[16:06:15] <tehgeekmeister> that's save me a lot of time to know
[16:06:21] <tehgeekmeister> so i just want to make sure I understand
[16:06:50] <Derick> i think you would need two pipeline stages
[16:09:26] <tehgeekmeister> so the gotcha might be in this detail, we select out like four attributes, and group on those. then, the project would be merging those into a single string, that becomes the new _id
[16:09:35] <tehgeekmeister> yeah, overloaded terminology here
[16:27:24] <Siamaster> what I don't like about contactrequest being in it's own collection is
[16:27:36] <Siamaster> I will need to have 2 additional fields
[16:27:53] <Siamaster> another _Id and a sender and a receiver
[16:28:31] <Siamaster> and then always query on sender / receiver when I need to get forexample; all the pending requests received by user
[16:29:16] <Siamaster> when having an embedded collection I can have a boolean indicating if the request was sent/received
[16:29:47] <Siamaster> but then I have to remember to always update two fields when updating it
[16:31:03] <Siamaster> hmm no I think I'll go with collection feels really weird having duplicated message/date and stuff
[18:25:03] <Torkable> is there a way to check for "updatedExisting" on a findAndModify?
[18:25:31] <Torkable> I want to know if the upsert did an update or insert
[18:25:58] <Torkable> on update it seems to return a nice status but findAndModify return the doc
[18:45:08] <hahuang65> joannac: mind if I ask you a quick question regarding CS-21221
[18:47:37] <hahuang65> joannac: :) I posted it on the ticket, nm :D
[19:20:51] <pokEarl> Anyone with some Java/(MongoJack) experience that mind helping me out? https://www.reddit.com/r/learnprogramming/comments/3bnz0y/javamongojack_having_trouble_making_a_pojo_from/ Having trouble going from the MongoDB and back to java :(
[20:01:01] <StephenLynx> anyone here having issues installing mongo from http://docs.mongodb.org/manual/tutorial/install-mongodb-on-ubuntu/ on trisquel?
[20:01:10] <StephenLynx> does it work on trisquel? it uses apt-get too.
[20:08:52] <rdw> morning. is there a way to make Morphia return JSON API format?
[23:42:28] <Freman> PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 80 bytes) in /Users/shannon/Development/PHP/Eventing/Eventing.php on line 139
[23:45:54] <Freman> the only difference between the php in pastebin and php in foo.php is <?php and the actual connect url (shrub.logging.net is not the real servers name)
[23:45:59] <Boomtime> and this happens with documents that are close to the bson limit ?
[23:46:28] <Freman> possibly, there are definitely documents there that are at that limit
[23:46:37] <Boomtime> that is probably important - but i notice the amount of memory it has "allocated" seems to be incredibly low
[23:47:09] <Freman> yeh it jumps all over the place, and it doesn't die on the same records
[23:48:08] <Boomtime> allowed to use only ~130MB though? given that streaming just one large document in will take at least 20MB, and the cursor will start receiving the next one before you even get to it...
[23:49:45] <Boomtime> not too surprising, the document sizes still mean there are tens of megabytes involveed
[23:50:24] <Boomtime> the fact is you're manipulating very large documents in a very small space - why is the available memory so low?
[23:50:52] <Boomtime> legitimate use of the memory would easily put you over that amount
[23:51:29] <Freman> because it's my dev env (production it's set to 512 but I don't like to push that because there's 200 php processes per machine in production)
[23:52:50] <Freman> so you're saying the library is streaming more data in the background than I'm consuming and it's garbage collection is cleaning up?
[23:53:57] <Freman> my understanding leads me to believe there should be no more than 2 or 3 documents loaded at once, which 128 meg should easily accomodate
[23:54:16] <Boomtime> what i'm saying is that if you know that the library is going to need ~40MB just to be able to provide the mimimal ability to iterate a cursor... having only 130MB available sounds like pushing your luck
[23:54:49] <Boomtime> you're expecting the PHP garbage collector to be awesome at it's job, turns out it might not be
[23:57:21] <Freman> (the increase of line numbers is a counter variable and an ini_set(), nothing more)
[23:57:36] <Boomtime> can you keep going as test? just to see what it takes to stabilise
[23:58:20] <Boomtime> if there is no upper limit then you should be able to construct a simple dataset, and iterate loop, that shows the problem - then you raise a driver bug
[23:59:00] <Boomtime> if what you're showing is simply an ineffeicency of memory re-use in the zend engine, then it's not really anything we do about it
[23:59:27] <Freman> http://pastebin.com/JjJm4AKM are the documents it slows down on