[02:59:25] <cedrichurst> when doing a mapReduce and reducing to an existing collection with the same keys, its it possible to preserve the fields outside of value
[02:59:48] <cedrichurst> for example, if I have db.collectionA with {_id: 1, foo: 'bar'}
[03:00:15] <cedrichurst> and i do a db.collectionB.mapReduce(m, r, {out: {reduce: "collectionA"}})
[03:00:47] <cedrichurst> collectionA always ends up with {_id: 1, values: {buzz: 'blah'}}
[03:01:01] <cedrichurst> with the original foo field omitted
[07:49:57] <trupheenix> i have a document which looks like this {"a":"1", "b":{"c":2,"d":3}}. how would i use findAndModify to append values for "b"?
[08:43:22] <exi> does anybody know whether the mongodb balancer rebalances my shard if i add a maxSize field in the config db some time after adding the shard?
[08:43:43] <exi> i am trying it right now but the balancer seems to ignore the newly set maxsize
[08:46:32] <exi> even after restarting mongos and the config server, there is no rebalancing taking place according to the new maxSize :(
[09:35:10] <remonvv> Hi all, anyone know what this is : Fri Aug 10 11:29:38 [Balancer] moveChunk result: { errmsg: "can't find index in storeCurrentLocs", ok: 0.0 }
[09:35:10] <remonvv> Fri Aug 10 11:29:38 [Balancer] balancer move failed: { errmsg: "can't find index in storeCurrentLocs", ok: 0.0 }
[09:36:33] <NodeX> I would hazard a guess it's an error message :P
[09:42:50] <ron> remonvv: do you want to hire me? I'm like.. really good.
[09:44:15] <remonvv> trupheenix, total document size is 16mb, there's no limit to an embedded document size other than it that it has to fit in the parent document (which, again, can at most be 16mb)
[09:44:31] <remonvv> If you run into situations with 16mb embedded documents your schema is probably broken though
[09:44:42] <remonvv> Okay apparently my issue is a stale mongos lock.
[09:44:50] <remonvv> How on earth does that not automatically recover.
[09:45:24] <trupheenix> remonvv, yes schema is definitely broken then!
[11:45:20] <jhsto> can someone look at this and say what gives: git://gist.github.com/3313738.git
[11:45:48] <jhsto> heres the url link https://gist.github.com/3313738
[11:46:33] <jhsto> i installed some new version of node today, and i wonder if thats what messed it up
[11:59:22] <amitprakash> Hi.. while I can easily search for a value in array in a collection, how can I search for value from a list in array in collection .. i.e.{listfield: $in: valarray} instead of {listfield: val}
[12:11:49] <PDani> I'm working on a feature in mongodb (https://jira.mongodb.org/browse/SERVER-6748), and I'd like to store for every Database a property, which describes a hint for memory manager based on typical read access behaviour on mmapped files (random or sequential). I'm thinking of a db.system.config namespace which could contain a { _id: "madvise", value: "RANDOM" } document.
[12:12:31] <PDani> is there any better solution for persisting property of databases?
[12:13:55] <PDani> I basically would like to open every MongoDataFile with a madvise on MMF if this property exists
[12:18:23] <remonvv> you can't store it in a collection, that's a chicken and the egg issue.
[12:19:08] <PDani> remonvv: yes, but madvise can happen later, after opening all existing files
[12:20:40] <PDani> another solution would be to make a madvise decision on-the-fly based on the pagefaults/accesses in a given database
[12:20:51] <PDani> in this case i shouldn't persist this information at all
[12:20:57] <remonvv> I'm sure but it's questionable to store database configuration in a manner that requires an initialized database.
[12:22:01] <remonvv> To be honest I think you should await 10gen feedback before doing anything really. Have the discussion before doing the work.
[12:22:23] <remonvv> It's pretty unlikely they'll accept an outside pull for something that crucial.
[12:23:11] <remonvv> By the way, in your issue : I was able to query with ~3MB/sec net, while I had only ~4MB/sec on the EBS storage
[12:23:18] <remonvv> That sounds like it got worse rather than better. Wrong numbers?
[12:24:30] <PDani> remonvv, with readahead, I was able to query with 1.5megs/sec and there was a huge overhead on ebs read, which read with 40mb/sec in the meantime
[14:01:05] <ron> Hmm, are Jongo's developers in the house?
[14:22:08] <remonvv> PDani, I'm still not sure if that's an optimization. I'd rather have MongoDB swap more into memory rather than exactly what it needs.
[14:25:52] <PDani> remonvv: it definitely is an optimization in my case (fully random read with lot more data than memory). a) usually storage bandwidth costs money, so useless reads are expensive; b) readahead uses up storage bandwidth, other reads must wait. Its a special need of course, and maybe not the typical usage scenario, but in my case, it's a big gain :)
[14:30:05] <cedrichurst> anyone here have experience with mongo-hadoop?
[14:52:57] <cedrichurst> it's like the movie inception where i'm stuck in a dream within a dream
[14:53:48] <cedrichurst> does mongo maintain some sort of ci box somewhere?
[14:54:58] <Derick> yes, but it's not public afaik
[14:56:05] <cedrichurst> does anyone here have access to it and, if so, can they check to see if mongo-hadoop is built there
[14:56:18] <cedrichurst> and, if so, can they find out if there's a snapshot repository the build pushes to?
[15:02:07] <apetrescu> What's the proper way to increase mongo-java-driver's connection pool size? Is it enough to just do "mongo.getMongoOptions().connectionsPerHost = 100" or whatever? How does that work; does it check that variable when grabbing a connection to decide whether to add a new connection to the pool or block for one to be released instead?
[15:24:41] <cedrichurst> apetrescu: i believe that's the proper way
[15:31:54] <apetrescu> cedrichurst: so then it doesn't immediately create the new connections?
[15:36:49] <apetrescu> It just *allows* the connection pool to grow to the larger size?
[15:36:56] <apetrescu> (Just confirming that my mental model is correct)
[16:48:22] <RLa> if you had blog posts with comments and votes how would you add a vote to a comment?
[16:48:50] <RLa> adding a comment would be { $push : { comments : { "..." } } }
[16:50:26] <ron> depends on how you want to save the votes for a comment :)
[17:07:00] <souza> Hello guys i'm having problems to access an object inside an array, this is my code that read the document: http://pastie.org/4450738 | my print have a blank line and after the "TEST" word, but i have that field in mongodb and some data inside it, ideas?
[17:08:03] <RLa> ron, just noticed i actually use sub-docs for comments so i can refer a comment using its _id
[17:24:17] <souza> RLa: sorry, but this is the code that i'm using, I didn't found any example at mongodb site or git or documentation, it's too out of date!
[17:25:09] <kali> souza: show us the docs you're trying to match...
[17:28:09] <souza> kali: this is my full code: http://pastie.org/4450849 | i can get the document, i know it when i print using bson_print() method, my problem is get value by value that are inside this BIG object that contains some arrays and another objects. My problem right now is, how can i iterate in some array getting the value from objects inside
[17:39:30] <souza> i've posted this question in stackoverflow > http://stackoverflow.com/questions/11906620/iterate-array-mongodb-c
[18:53:56] <flok> what would be faster: query a seperate table which key-reference-counts or really count how often a record points to some record? with millions of records. and with an index of course.
[18:54:28] <flok> because i was thinking an extra table with counts would cause at least 1 extra disk-seek
[19:07:04] <flok> RLa: collection. i'm assuming here that it is best to create a collection per data-type. where type is the kind of data i'm putting in. e.g. documents or addresses or measurements
[19:09:23] <RLa> of course, you it might not work for you
[19:11:20] <flok> yeah. let me describe: i've got a couple of million users that upload a couple of million images. now to prevent too much diskspace usage, i want only unique images to be store in the database. so each user (a user can have 1 file) gets a
[19:11:29] <flok> hash assigned, the hash of an image
[19:11:43] <flok> then each image is stored with its hash in a seperate collection
[19:12:00] <flok> then, if the number of users of an image = 0, then i delete the image
[19:12:47] <flok> so the question now is: what would be faster; add an index to the hash in the user-hash-collection and then count the number of refferences to an image
[19:13:09] <flok> or add a count-field to my image-hash pair and update that each time a user is removed or added
[19:14:15] <RLa> hm, does image have to be deleted right away?
[19:15:02] <RLa> one could periodically run query that removes unused images
[19:15:52] <RLa> once a week or day depending how often images are added/removed
[19:16:12] <flok> that is an option yes. but does mongo have support for complex queries like that?
[19:19:52] <RLa> a simple solution would be to pull all image ids, then all used image ids then remove all those by id that are not in the list of used ids
[19:20:53] <RLa> maybe it could be done with map/reduce too
[19:24:56] <dpp> Fri Aug 10 14:22:58 [conn62] info DFM::findAll(): extent 0:b6c000 was empty, skipping ahead. ns:rc.offers
[19:25:23] <dpp> is an empty extent a good thing? bad thing?
[19:26:20] <flok> that would be a check of millions of images against millions of users. that would explode :-) map-reduce, hmm. got to look into that.
[19:29:21] <Guest53808> anyone happen to know if 'big data university' is still offering aws coupons for its hadoop course?
[19:45:54] <RLa> flok, if you run it once a week, it does not matter
[19:46:25] <RLa> it probably runs few seconds if your database fits into memory
[20:32:03] <kzinti> I have a collection with a large number to duplicate entries and when I ran a validate on it it came back false saying "ns corrupt, requires dbchk" does anybody have any idea what's going on? this started happening after a compact