PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Friday the 10th of August, 2012

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[02:59:25] <cedrichurst> when doing a mapReduce and reducing to an existing collection with the same keys, its it possible to preserve the fields outside of value
[02:59:48] <cedrichurst> for example, if I have db.collectionA with {_id: 1, foo: 'bar'}
[03:00:15] <cedrichurst> and i do a db.collectionB.mapReduce(m, r, {out: {reduce: "collectionA"}})
[03:00:47] <cedrichurst> collectionA always ends up with {_id: 1, values: {buzz: 'blah'}}
[03:01:01] <cedrichurst> with the original foo field omitted
[07:49:57] <trupheenix> i have a document which looks like this {"a":"1", "b":{"c":2,"d":3}}. how would i use findAndModify to append values for "b"?
[07:50:24] <[AD]Turbo> hola
[08:05:39] <ron> trupheenix: well, the find part should be easy enough, right?
[08:10:14] <trupheenix> ron, yes
[08:10:25] <trupheenix> but what how would i add to "b"?
[08:10:39] <ron> $push
[08:11:26] <trupheenix> ron but "b" is not a list. it is a document itself?
[08:11:37] <trupheenix> "b" is not an array
[08:11:41] <ron> oh right, my bad.
[08:12:11] <ron> you just want to add values to b?
[08:14:33] <ron> trupheenix: ?
[08:15:25] <ron> you can probably do something along the lines of {$set : {"b.newfield" : "value"}}
[08:37:14] <trupheenix> ron along the lines?
[08:37:26] <ron> yup
[08:42:05] <exi> hi
[08:42:19] <Zelest> o/
[08:43:22] <exi> does anybody know whether the mongodb balancer rebalances my shard if i add a maxSize field in the config db some time after adding the shard?
[08:43:43] <exi> i am trying it right now but the balancer seems to ignore the newly set maxsize
[08:46:32] <exi> even after restarting mongos and the config server, there is no rebalancing taking place according to the new maxSize :(
[08:51:57] <ron> trupheenix: anything?
[09:11:54] <trupheenix> ron, i still didn't understand how i can append to the document of 'b'
[09:12:18] <ron> trupheenix: did you try the example I gave you?
[09:13:19] <trupheenix> ron no
[09:13:26] <NodeX> lol
[09:13:48] <ron> NodeX: you find it funny, I just don't know how to react.
[09:13:50] <trupheenix> ron u said along the lines of. so i didn't assume it to be definite.
[09:13:59] <NodeX> I have just joined the chan
[09:14:05] <NodeX> well logged into my bnc
[09:14:16] <ron> I see.
[09:14:19] <NodeX> what's da problem?
[09:14:53] <ron> no problem.
[09:16:23] <trupheenix> NodeX supposing i have a document like {"a":"1", "b":{"c":2,"d":3}}, how would i update "b" ?
[09:16:33] <NodeX> which part of "b"?
[09:16:53] <trupheenix> NodeX, i want to add to the document in "b"
[09:17:17] <NodeX> db.foo.update({some_criteria},{$set : { "b.c":"This is C's new value"}});
[09:17:22] <NodeX> or
[09:17:32] <ron> grr.
[09:17:35] <NodeX> db.foo.update({some_criteria},{$set : { "b.e":"This is A new element to B"}});
[09:17:46] <NodeX> which I am sure ron wouldv'e pointed out
[09:18:18] <ron> no, no, instead of just trying it, he chose to ignore it. which makes sense.
[09:18:28] <NodeX> yeh, that makes sense lOLOLOL
[09:19:15] <NodeX> I notice alot more people from india are taking up the mongo quest
[09:19:28] <ron> 'a lot'.
[09:19:39] <NodeX> (india the country not the race)
[09:19:58] <ron> NodeX: http://hyperboleandahalf.blogspot.com/2010/04/alot-is-better-than-you-at-everything.html
[09:22:01] <NodeX> lulz
[09:22:06] <NodeX> I dunt c4r3
[09:22:22] <ron> damnit, stop being 12.
[09:22:36] <NodeX> I never knew how to spell a-lot
[09:22:48] <NodeX> and I could care less but I wont ;)
[09:25:28] <trupheenix> NodeX, ok thanks! it works :)
[09:26:08] <NodeX> I know !!
[09:28:36] <NodeX> ron taught me everything I know
[09:28:48] <NodeX> so you can send him a beer/cookie
[09:28:56] <ron> lies.
[09:35:10] <remonvv> Hi all, anyone know what this is : Fri Aug 10 11:29:38 [Balancer] moveChunk result: { errmsg: "can't find index in storeCurrentLocs", ok: 0.0 }
[09:35:10] <remonvv> Fri Aug 10 11:29:38 [Balancer] balancer move failed: { errmsg: "can't find index in storeCurrentLocs", ok: 0.0 }
[09:36:33] <NodeX> I would hazard a guess it's an error message :P
[09:36:55] <ron> remonvv: you're hiring!
[09:37:00] <NodeX> </end-sarcasm>
[09:37:15] <remonvv> I am! ANd yeah that's the default, non editable text of LinkedIn
[09:37:26] <ron> remonvv: hire me!
[09:37:26] <remonvv> NodeX don't make me internet punch you son
[09:37:28] <remonvv> :s
[09:37:43] <ron> NodeX: dude, he's like huge.. you'd better watch out.
[09:37:44] <ron> :D
[09:38:00] <remonvv> Apparently it's some error regarding it not being able to find the shard key index on a node
[09:38:05] <remonvv> How does that even happen
[09:38:24] <ron> magic.
[09:38:46] <ron> dude, you're like the cto.. you should answer that yourself.
[09:38:50] <remonvv> NodeX, don't worry, I'm a runner, not a puncher
[09:39:01] <remonvv> And I wear a lot of pink
[09:39:04] <remonvv> And I'm a pacifist
[09:39:17] <remonvv> CTO is code for "I delegate everything"
[09:39:23] <NodeX> haha
[09:39:40] <ron> remonvv: you wear a lot of pink? o_O
[09:41:28] <trupheenix> NodeX is there a limit on the size of the embedded document in a document?
[09:42:13] <ron> yes, NodeX, is there?
[09:42:50] <ron> remonvv: do you want to hire me? I'm like.. really good.
[09:44:15] <remonvv> trupheenix, total document size is 16mb, there's no limit to an embedded document size other than it that it has to fit in the parent document (which, again, can at most be 16mb)
[09:44:31] <remonvv> If you run into situations with 16mb embedded documents your schema is probably broken though
[09:44:42] <remonvv> Okay apparently my issue is a stale mongos lock.
[09:44:50] <remonvv> How on earth does that not automatically recover.
[09:45:24] <trupheenix> remonvv, yes schema is definitely broken then!
[11:45:20] <jhsto> can someone look at this and say what gives: git://gist.github.com/3313738.git
[11:45:48] <jhsto> heres the url link https://gist.github.com/3313738
[11:46:33] <jhsto> i installed some new version of node today, and i wonder if thats what messed it up
[11:59:22] <amitprakash> Hi.. while I can easily search for a value in array in a collection, how can I search for value from a list in array in collection .. i.e.{listfield: $in: valarray} instead of {listfield: val}
[12:04:18] <amitprakash> nm
[12:09:22] <PDani> hi
[12:11:49] <PDani> I'm working on a feature in mongodb (https://jira.mongodb.org/browse/SERVER-6748), and I'd like to store for every Database a property, which describes a hint for memory manager based on typical read access behaviour on mmapped files (random or sequential). I'm thinking of a db.system.config namespace which could contain a { _id: "madvise", value: "RANDOM" } document.
[12:12:31] <PDani> is there any better solution for persisting property of databases?
[12:13:55] <PDani> I basically would like to open every MongoDataFile with a madvise on MMF if this property exists
[12:18:23] <remonvv> you can't store it in a collection, that's a chicken and the egg issue.
[12:19:08] <PDani> remonvv: yes, but madvise can happen later, after opening all existing files
[12:20:40] <PDani> another solution would be to make a madvise decision on-the-fly based on the pagefaults/accesses in a given database
[12:20:51] <PDani> in this case i shouldn't persist this information at all
[12:20:57] <remonvv> I'm sure but it's questionable to store database configuration in a manner that requires an initialized database.
[12:22:01] <remonvv> To be honest I think you should await 10gen feedback before doing anything really. Have the discussion before doing the work.
[12:22:23] <remonvv> It's pretty unlikely they'll accept an outside pull for something that crucial.
[12:23:11] <remonvv> By the way, in your issue : I was able to query with ~3MB/sec net, while I had only ~4MB/sec on the EBS storage
[12:23:18] <remonvv> That sounds like it got worse rather than better. Wrong numbers?
[12:23:47] <PDani> remonvv, it got better
[12:24:30] <PDani> remonvv, with readahead, I was able to query with 1.5megs/sec and there was a huge overhead on ebs read, which read with 40mb/sec in the meantime
[12:24:41] <PDani> and 100% utilization
[12:24:45] <PDani> on ebs
[12:25:31] <PDani> in the MADV_RANDOM case, there was only a few percent utilization on EBS, and better throughput on client-side
[12:28:34] <Derick> A chick-fil-a chicken?
[13:07:58] <NodeX> ./update.sh
[13:08:00] <NodeX> oops
[14:01:05] <ron> Hmm, are Jongo's developers in the house?
[14:22:08] <remonvv> PDani, I'm still not sure if that's an optimization. I'd rather have MongoDB swap more into memory rather than exactly what it needs.
[14:25:52] <PDani> remonvv: it definitely is an optimization in my case (fully random read with lot more data than memory). a) usually storage bandwidth costs money, so useless reads are expensive; b) readahead uses up storage bandwidth, other reads must wait. Its a special need of course, and maybe not the typical usage scenario, but in my case, it's a big gain :)
[14:30:05] <cedrichurst> anyone here have experience with mongo-hadoop?
[14:31:09] <PDani> bye
[14:31:19] <ron> gladly no, hopefully never, unfortunately sometime in the future.
[14:32:02] <cedrichurst> why do you say that?
[14:35:08] <ron> hadoop is evil. most people just fail to admit it yet.
[14:35:41] <cedrichurst> is there a better way to do mapreduce against mongo?
[14:35:51] <cedrichurst> the native mapreduce functions leave a lot to be desired
[14:36:15] <cedrichurst> at least that's what i've gathered from the last 2.5 days trying to get it to work the way i need it to
[14:36:21] <ron> haha
[14:36:24] <kali> cedrichurst: massive batch processing is not mongodb sweet spot
[14:36:37] <cedrichurst> understood
[14:36:37] <ron> well, there's little room for comparison between mongo's m/r and hadoop's.
[14:37:02] <cedrichurst> but i'm hunting for some solution to my problem
[14:46:27] <cedrichurst> i also tried group but apparently that can't handle more than 20k keys
[14:47:52] <kali> no, group is no good
[14:48:29] <kali> cedrichurst: mongodb m/r is the only thing that will work on big datasets
[14:48:46] <kali> cedrichurst: i'm constantly moving data between mongodb and hadoop too
[14:48:57] <cedrichurst> are you using mongo-hadoop?
[14:50:16] <kali> nope, import/export or ad-hoc insertion
[14:50:25] <cedrichurst> yeah, that's what i was hoping to avoid
[14:51:03] <cedrichurst> my issue with mongo-hadoop is that the 1.0.0 release in maven central is lacking the MongoLoader class
[14:51:07] <cedrichurst> it only has MongoStorage
[14:51:17] <cedrichurst> and my attempts to build trunk manually have been unsucessful
[14:51:29] <cedrichurst> all sorts of sbt issues
[14:51:45] <cedrichurst> so i was wondering if there was a SNAPSHOT jar being built somewhere out on a jenkins box, etc
[14:51:47] <kali> sbt ? it's not maven ?
[14:51:55] <cedrichurst> nope, sbt
[14:51:57] <cedrichurst> :)
[14:52:13] <cedrichurst> which is fun becasue i'm now three levels removed from the actual problem i'm trying to solve
[14:52:23] <cedrichurst> mongo -> hadoop/pig -> sbt
[14:52:57] <cedrichurst> it's like the movie inception where i'm stuck in a dream within a dream
[14:53:48] <cedrichurst> does mongo maintain some sort of ci box somewhere?
[14:54:58] <Derick> yes, but it's not public afaik
[14:56:05] <cedrichurst> does anyone here have access to it and, if so, can they check to see if mongo-hadoop is built there
[14:56:18] <cedrichurst> and, if so, can they find out if there's a snapshot repository the build pushes to?
[15:02:07] <apetrescu> What's the proper way to increase mongo-java-driver's connection pool size? Is it enough to just do "mongo.getMongoOptions().connectionsPerHost = 100" or whatever? How does that work; does it check that variable when grabbing a connection to decide whether to add a new connection to the pool or block for one to be released instead?
[15:24:41] <cedrichurst> apetrescu: i believe that's the proper way
[15:31:54] <apetrescu> cedrichurst: so then it doesn't immediately create the new connections?
[15:36:49] <apetrescu> It just *allows* the connection pool to grow to the larger size?
[15:36:56] <apetrescu> (Just confirming that my mental model is correct)
[16:48:22] <RLa> if you had blog posts with comments and votes how would you add a vote to a comment?
[16:48:50] <RLa> adding a comment would be { $push : { comments : { "..." } } }
[16:50:26] <ron> depends on how you want to save the votes for a comment :)
[16:51:51] <RLa> as array
[16:51:58] <RLa> this is just for example
[16:55:45] <RLa> so i have docs like { title: "A blog post", comments: [ { text: "A comment ...", votes: [ { value: 5 } ] } ] }
[16:56:59] <RLa> also trying to figure out how it maps to mongoose
[17:01:14] <RLa> ron, in my case $push: { 'comments.0.votes': { value: 5 } } would work?
[17:02:00] <ron> RLa: $push implies a list, whereas votes imply a counter.
[17:02:34] <RLa> oh, its not a counter
[17:02:54] <RLa> just a list of objects containing vote value
[17:03:04] <RLa> i couldn't come up with better example
[17:03:19] <ron> well, try it and see :)
[17:07:00] <souza> Hello guys i'm having problems to access an object inside an array, this is my code that read the document: http://pastie.org/4450738 | my print have a blank line and after the "TEST" word, but i have that field in mongodb and some data inside it, ideas?
[17:08:03] <RLa> ron, just noticed i actually use sub-docs for comments so i can refer a comment using its _id
[17:08:23] <RLa> that makes things way way easier
[17:08:27] <ron> heh
[17:16:46] <souza> anyone had worked with C?
[17:17:39] <RLa> i use c++ daily
[17:17:57] <RLa> make sure you have debugger installed
[17:21:12] <RLa> souza, you should update paste with example doc
[17:21:14] <kali> souza: distribution is mispelled.
[17:22:37] <souza> kali: sorry, but at mongo it mispelled too!
[17:22:39] <souza> :S
[17:24:17] <souza> RLa: sorry, but this is the code that i'm using, I didn't found any example at mongodb site or git or documentation, it's too out of date!
[17:25:09] <kali> souza: show us the docs you're trying to match...
[17:28:09] <souza> kali: this is my full code: http://pastie.org/4450849 | i can get the document, i know it when i print using bson_print() method, my problem is get value by value that are inside this BIG object that contains some arrays and another objects. My problem right now is, how can i iterate in some array getting the value from objects inside
[17:39:30] <souza> i've posted this question in stackoverflow > http://stackoverflow.com/questions/11906620/iterate-array-mongodb-c
[18:53:56] <flok> what would be faster: query a seperate table which key-reference-counts or really count how often a record points to some record? with millions of records. and with an index of course.
[18:54:28] <flok> because i was thinking an extra table with counts would cause at least 1 extra disk-seek
[19:05:54] <RLa> flok, table?
[19:07:04] <flok> RLa: collection. i'm assuming here that it is best to create a collection per data-type. where type is the kind of data i'm putting in. e.g. documents or addresses or measurements
[19:07:41] <RLa> hm
[19:08:04] <RLa> hm, i haven't worked on app with more than one collection so far
[19:08:11] <RLa> i'm mongo beginner tho
[19:08:17] <RLa> i use subdocs a lot
[19:08:34] <RLa> like comment is a subdoc of post
[19:08:40] <RLa> not separate collection
[19:09:23] <RLa> of course, you it might not work for you
[19:11:20] <flok> yeah. let me describe: i've got a couple of million users that upload a couple of million images. now to prevent too much diskspace usage, i want only unique images to be store in the database. so each user (a user can have 1 file) gets a
[19:11:29] <flok> hash assigned, the hash of an image
[19:11:43] <flok> then each image is stored with its hash in a seperate collection
[19:12:00] <flok> then, if the number of users of an image = 0, then i delete the image
[19:12:47] <flok> so the question now is: what would be faster; add an index to the hash in the user-hash-collection and then count the number of refferences to an image
[19:13:09] <flok> or add a count-field to my image-hash pair and update that each time a user is removed or added
[19:14:15] <RLa> hm, does image have to be deleted right away?
[19:15:02] <RLa> one could periodically run query that removes unused images
[19:15:52] <RLa> once a week or day depending how often images are added/removed
[19:16:12] <flok> that is an option yes. but does mongo have support for complex queries like that?
[19:19:52] <RLa> a simple solution would be to pull all image ids, then all used image ids then remove all those by id that are not in the list of used ids
[19:20:53] <RLa> maybe it could be done with map/reduce too
[19:24:56] <dpp> Fri Aug 10 14:22:58 [conn62] info DFM::findAll(): extent 0:b6c000 was empty, skipping ahead. ns:rc.offers
[19:25:23] <dpp> is an empty extent a good thing? bad thing?
[19:26:20] <flok> that would be a check of millions of images against millions of users. that would explode :-) map-reduce, hmm. got to look into that.
[19:29:21] <Guest53808> anyone happen to know if 'big data university' is still offering aws coupons for its hadoop course?
[19:45:54] <RLa> flok, if you run it once a week, it does not matter
[19:46:25] <RLa> it probably runs few seconds if your database fits into memory
[19:46:44] <RLa> hm, interesting
[19:47:05] <RLa> i found it strange that mongo generates id's on client side
[19:47:13] <RLa> but it actually makes a lot sense
[19:47:33] <RLa> how else one would do async inserts
[19:47:51] <vsmatck> Mongo will make _id if you don't include one.
[19:49:04] <rguillebert> hi
[19:49:26] <rguillebert> is it safe to use mongodb on an unjournaled filesystem if mongo's journaling is enabled ?
[19:52:25] <flok> other question: does mongodb store and update indexes on disk?
[19:55:50] <RLa> aren't they same as rest of mongo data, memory-mapped files?
[19:59:17] <flok> don't know?
[20:32:03] <kzinti> I have a collection with a large number to duplicate entries and when I ran a validate on it it came back false saying "ns corrupt, requires dbchk" does anybody have any idea what's going on? this started happening after a compact