[00:06:27] <joannac> Not possible to process it before going to paper?
[00:06:40] <joannac> If you ever have connection specific problems, it'll be a lot harder to debug
[02:03:11] <dgaffney> Hello. Has Anyone Really Been Far Even as do MapReduce? I am totally adrift in getting my sea legs with my first MR script and can't figure out where things went south.
[02:03:41] <dgaffney> https://gist.github.com/DGaffney/4ef629b5a3a01abc1c57 < this fellow
[02:22:50] <joannac> dgaffney: Have you gone through http://docs.mongodb.org/manual/tutorial/troubleshoot-map-function/ and http://docs.mongodb.org/manual/tutorial/troubleshoot-reduce-function/
[02:23:09] <dgaffney> as best as I could with this example...
[02:30:50] <joannac> Well, the map part should be okay
[02:30:54] <dgaffney> What seems to be happening is the result I get with is that the documents in the ultimate collection are identical to the output desired in the map
[02:31:24] <dgaffney> which leads me to believe that there's something goofy going on in the reduce *but* I ran the reduce with an example and got the output I was expecting...
[02:31:30] <dgaffney> and that's seeming to look really grand...
[02:31:42] <dgaffney> but it doesn't actually get stored in the collection.
[02:32:16] <joannac> Wait, you get the right output, but it's not getting written?
[02:32:48] <dgaffney> when I instantiate some vars in the repl and run my reduce function with those vars the output of the reduce function looks correct.
[02:33:18] <dgaffney> When I run the map reduce process proper, however, the result looks exactly like the documents that should be generated by the map function *not* the reduce function
[02:33:45] <dgaffney> its almost as if reduce effectively were behaving as return {key: key, value: value} and doing nothing else..
[02:34:10] <dgaffney> All of this tickles that gut feeling that there's just some silly element missing I am not aware I am not aware of...
[02:34:20] <dgaffney> that said, the other_networks object is *massive*
[02:34:25] <dgaffney> maybe that should be mentioned
[02:35:03] <dgaffney> (although its looking like it gets through the document limit size, obviously, as they get written out in the collection)
[02:37:12] <joannac> I don't understand how that could be, given your map returns a document, and your reduce returns a scores array
[02:37:49] <dgaffney> L59 is the massive object I'm referencing here - it's a set of 683 hashes each containing anywhere from 0-2000≈ object ids and a couple strings.
[02:38:02] <dgaffney> Maybe I need to shift my thinking here entirely?
[02:39:00] <dgaffney> The goal is relatively simple (maybe there's a better way for it?) - given two ids and two associated sets of arrays, generate a score that is the intersection length of those two arrays divided by the union length.
[04:37:41] <nathanielc> Anyone familiar with the internal of mongo? I am trying to figure out how splits work? I have found the split command and the applyOps queries it uses to create new chunk docs in config.chunks collection, but I haven't found where the data is actually split.
[04:37:57] <nathanielc> What is the association between chunks and the actual data on disk?
[04:39:27] <nathanielc> So the data exists on the mongod instance and so when a mongos instance need to find the data it just looks up which shard has it?
[04:39:46] <joannac> Correct, in the config server
[04:39:51] <nathanielc> So the data itself isn't split until a move chunk operation later
[04:42:06] <nathanielc> Thanks, can you clarify one more thing for me? The usage between "chunk version" and "shard version" in the context of the chunk metadata. Also a little background is I am working on this jira ticket https://jira.mongodb.org/browse/SERVER-924
[04:44:46] <nathanielc> Basically I am linking two collections and then removing the metadata for one of the collections, so they can share it. But since the original versions of the two sets of chunks are different its giving me some trouble
[05:53:36] <chaotic_good> hi I really need some help on a problem I cant crack
[05:54:03] <chaotic_good> I take all the contents of the /data/mongo dir on production, and move to 3 staging nodes
[05:54:28] <chaotic_good> I want staging to run as a different replica set
[05:54:36] <chaotic_good> so I delete the local.* files
[05:54:59] <chaotic_good> and it seems it should all work right?
[05:55:08] <chaotic_good> but I had some bizarre problems last week
[05:55:16] <chaotic_good> one time mongo even deleted all the data
[05:55:38] <chaotic_good> what is the normal process for seeding a dev environment from a prod cluster? 3 nodes?
[05:57:04] <chaotic_good> http://docs.mongodb.org/manual/faq/developers/#how-do-you-copy-all-objects-from-one-collection-to-another I am reading here as well
[06:02:42] <mongo-rooky> I want to take a 3 node mongodb with just 1 replicaset, and make a second 3 node cluster basic repicaset, but with different replicaset name
[06:02:58] <mongo-rooky> I am having tourble basically restoring when I rsync the files
[06:03:20] <mongo-rooky> If I delete the local* files
[06:03:26] <mongo-rooky> then I can run rs.initiate()
[06:03:39] <mongo-rooky> and I can get my new replicaset, and then add the 2 and 3 node
[06:03:46] <mongo-rooky> this is all on the new cluster
[06:04:12] <mongo-rooky> the production data is about 195G
[06:04:19] <mongo-rooky> so its slow going each time I fail
[06:04:40] <mongo-rooky> and if I only move the files to node 1 and let it sync, mongodb stops
[08:51:36] <cym3try> hi guys, i have hit the connection limit on a mongo instance (1000 something) although i have not specified the maxconns conf. From the docs I read that by default mongo does not limit the connections. So how is this even possible?
[09:36:37] <spicewiesel> I had to duplicate a mongodb environment. Both environments are identical, except for IPs. So, I copied the data directory to the 3 mongod nodes and started them. The replicaset itself seems to be fine now, the indexed the data and voted for primary and secondaries. What is the next step? I cannot see the databases in mongos. Do I have to reset mongos and configsrvs to fix that? Would be fine if anyone could help me with that
[09:37:27] <joannac> did you duplicate the mongoses?
[09:37:56] <spicewiesel> nope, I only copied the mongod data directories to the mongod instances
[09:38:55] <spicewiesel> we have a test environment
[09:39:15] <spicewiesel> there's data, and this data should now be copied to the other environment, in total
[09:41:04] <spicewiesel> the devs finished the test environment, they set up everything and loaded the data. This data should now be copied to another environment to be used there. So, what I need is a way to duplicate the whole environment
[09:51:41] <spicewiesel> joannac: it seems to be fine now, maybe... :)
[09:52:30] <spicewiesel> replicaset was up, I then deletede configservers local db, restarted them. Then I started mongos and added a shard. No I can see the shard, the databases and I am able to login with usercredentials that exist only on the source environment.
[10:05:16] <cym3try> hi guys, i have hit the connection limit on a mongo instance (1000 something) although i have not specified the maxconns conf. From the docs I read that by default mongo does not limit the connections. So how is this even possible?
[10:05:59] <cym3try> i noticed that the shell limit (uname -a) is 1024, but hte process limit (cat /proc/mongo-pid/imits is actually 12k
[10:20:41] <cym3try> ok...so i found out that the problem is that RHEL and Centos override override the process ulimit
[10:20:49] <cym3try> and since mongo connections are spawned as forks..
[13:55:49] <tiller> I'm trying to use mongodb's replication, but I've had a new server to my primary's and the state is on "Down" with: "lastHeartbeatMessage" : "still initializing"
[13:56:29] <tiller> Following the tutorial, I don't have to do anything particular on the mongod I want to add to the set, do I? (except put the replSet to the same as the primary)
[16:09:24] <cheeser> try not to speak on behalf of others
[16:09:27] <tiller> "If you are running as part of a replica set, you should always restore from a backup or restart the mongod instance with an empty dbpath and allow MongoDB to perform an initial sync to restore the data."
[16:09:35] <tiller> The option of the empty dbpath may be better?
[16:51:57] <intellix> can I do something like: db.collection.update({ "field": "column" }, { $set: { fieldName: "test" } });
[16:52:12] <intellix> basically, have a variable called fieldName, as I want to do something like "something." + variable
[17:03:23] <tkeith> Is searching by string prefix nearly as fast as searching by full string?
[17:03:33] <tkeith> assuming I have an index on the field
[17:09:20] <micahf> is it possible to version data in mongo with git
[17:36:31] <ranman> stashspot you may have to connect to only the admin DB to auth
[17:37:32] <stashspot> @ranman - I've tried and it doesn't work. i also tried not adding a db to the conn string, i tried /test and i tried a db i know exists
[17:43:21] <stashspot> I'm chatting with the node room
[17:43:23] <ranman> seems unlikely that the node driver would not support that. There is a JIRA ticket from July 2013 saying this was fixed in all drivers
[17:43:28] <stashspot> let me do some more deep diving
[17:56:57] <stashspot> ladies and gents, there is a .connect option called *drum roll*: uri_decode_auth
[17:57:18] <stashspot> if i uriencode the user and pass, boom - at signs are non-issue
[18:21:12] <stashspot> any ideas on why it would take about 4 mins for db.connect to return?
[19:53:00] <platzhirsch> I have 90k documents which I retrieve over an association. I need to get all values from an attribute. This seems to take too long time. I have already added an index for this value. Can I still improve this?
[20:00:54] <kali> platzhirsch: can you show us an example ,
[20:01:10] <platzhirsch> sure, let's see if I can get the raw query
[20:07:52] <platzhirsch> kali: Seems Moped cannot translate my query to a raw query string. How should I write up the example? Basically the query has a selector, which is the identifier of the document to distinguish the documents from and I only select one field, the attribute I need
[20:09:04] <kali> go to the mongodb shell, and reproduce the queries
[20:14:34] <platzhirsch> kali: alright, I assume this is the query: db.metadata_records.find({"snapshot_id": ObjectId("5288e6099207ff2094000008")}, {"score": 1})
[20:15:59] <kali> ok. what index have you created ?
[20:17:27] <platzhirsch> kali: Since snapshot is a nested document of repository I have created: snapshots.metadata_records, and metadata_records.score
[20:19:02] <platzhirsch> ah and "snapshot_id" for metadata_records
[20:19:13] <kali> i don't know anything about "repository", but your metadata_record query will probably get a boost with a composite index: { snapshot_id:1, score: 1}
[20:19:28] <platzhirsch> ah ok, that's a nice idea
[20:19:59] <kali> that is... if you can get moped to pass "_id : false" on the projection field
[20:20:26] <kali> the query has to be: db.metadata_records.find({"snapshot_id": ObjectId("5288e6099207ff2094000008")}, {"score": 1, _id: false})
[20:24:02] <platzhirsch> kali: Why the _id: false? so the default field is omitted?
[20:25:04] <kali> yes. you want a covered query: http://docs.mongodb.org/manual/tutorial/create-indexes-to-support-queries/#create-indexes-that-support-covered-queries
[20:32:14] <platzhirsch> kali: great. I guess Mongoid (Ruby) does that already when I specify .only(:score), but I try to assert this. It seems to be quicker already :)
[20:32:42] <platzhirsch> I guess an alternative would be to store the result in snapshot directly? 90k floating point numbers..., but I am not so sure, whether this is appropriate
[20:38:53] <kali> i'm not sure what we are talking about. i need to see some sample document to understand what we're dealing with :)
[20:40:42] <platzhirsch> I am afraid they are all a bit too big, I think it's okay ^^
[20:49:25] <platzhirsch> darn, _id seems not to be set to false in the query. Wouldn't be an alternative to include the id in the index¿
[20:51:30] <platzhirsch> but more out of interest if this is worse for the performance, I can fix the query somehow ^^
[21:33:17] <Sawbones> Another question, say in an SQL db you have two tables with a many to many relationship (intermediate table) I know you could set it up the same way in Mongo but is there a better way to do it since I'm not as restricted?
[21:33:46] <tg2> You have to architect it differently
[21:42:30] <tg2> You just need to keep the cache updated when it's foreign objects are updated
[21:43:14] <Sawbones> With the direction I plan to go I don't really like that since one of those collections will have a few other many-to-many relationships
[21:43:22] <tg2> Look into riak it allows "foreign" keys like pointers that are automatically dereferenced when you load it
[21:43:48] <tg2> Yeah in that case keep an array of object id's from the other collection
[21:44:12] <tg2> Lookup by object id is very fast like sql primary key
[21:44:58] <tg2> Not sure why mongo doesn't support pointers
[21:45:31] <lazypower> Did the MMS service change their requiements for reporting, that its now an Auth Key and a Secret Key combo?
[22:06:11] <cheeser> well, they kinda all mean different things.
[22:06:28] <cheeser> but mongo does support FKs/references in the form of DBRef
[22:06:47] <cheeser> they're just not enforced like a FK in pgsql would be
[22:56:15] <platzhirsch> If my query looks like this: db.metadata_records.find({"snapshot_id": ObjectId("5288e6099207ff2094000008")}, {"score": 1, "_id": 0}) What's the index to support this? db.metadata_records.ensureIndex({ snapshot_id: 1, score: 1 }) ?