PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Wednesday the 19th of August, 2015

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[06:43:26] <amitprakash> Hi, why is an UnorderedBulkOp() crashing on errors (duplicate _id) instead of trying the other records?
[07:19:32] <Boomtime> amitprakash: do you mean "crashing" to mean you failed to catch the exception that the driver raised? or something else?
[07:19:37] <quincylarson> Question: I am running MongoDB in production on a digital ocean instance. I was trying to connect to it with mongohub and it connected, but it stopped responding to production traffic
[07:20:14] <quincylarson> Is there a safe way to connect remotely without distracting MongoDB from responding to traffic?
[07:21:26] <Boomtime> quincylarson: it sounds like mongohub might have gone a bit rogue, a coincidence in timing, or some other thing occurred
[07:21:29] <amitprakash> Boomtime, I mean it does not seem to attempt any other record in the bulkOp
[07:21:49] <Boomtime> mongodb can't ever possibly judge the intended difference between two clients
[07:23:42] <Boomtime> amitprakash: with unorderedbulkoperation there isn't any real definition of what is the 'next' operation so i'm not sure how you are seeing what you claim
[07:23:44] <Boomtime> http://docs.mongodb.org/v3.0/reference/method/db.collection.initializeUnorderedBulkOp/#error-handling
[07:23:59] <Boomtime> can you tell me more about what you are attempting?
[07:24:07] <Boomtime> driver, perhaps a code snippet
[07:24:10] <Boomtime> etc.
[07:24:37] <Boomtime> wait, does your unorderedbulkoperation contain the same _id twice?
[07:24:41] <amitprakash> Boomtime, I basically move 5000 records from server A to server B
[07:24:45] <amitprakash> Boomtime, yes
[07:24:58] <Boomtime> how can that possibly work?
[07:25:14] <Boomtime> the operation, the entire bulk object, is funamentally flawed
[07:25:27] <Boomtime> there is no point in doing any of it
[07:25:46] <amitprakash> Boomtime, the errors occur when a document on serverA has a duplicate with an existing document on serverB i.e. doc(serverA)._id == somedoc(serverB)._id
[07:26:11] <amitprakash> Boomtime, there are no duplicate _ids in the operation itself
[07:26:13] <Boomtime> how does that lead to the same _id in a single bulk op?
[07:26:26] <amitprakash> sorry, i misunderstood the question
[07:26:28] <Boomtime> ok, that was my question
[07:26:36] <Boomtime> we're all good.. moving on
[07:26:49] <Boomtime> driver? code snippet to some pastie site please!
[07:27:01] <amitprakash> Boomtime, the way I am ensuring that the operation is crashing is by looking at record # on server B which does not increase at all since the crashes started
[07:27:18] <Boomtime> uh-huh.. maybe do some better debugging
[07:27:25] <Boomtime> that sounds like you are guessing
[07:28:03] <amitprakash> not really.. I also ended up throwing all _ids in bulkOp and I know for sure some of those should have been moved
[07:28:05] <amitprakash> yet none did
[07:28:29] <Boomtime> driver, code snippet please
[07:28:35] <amitprakash> doing so, sec :D
[07:28:36] <Boomtime> also a copy of the error you got
[07:30:36] <amitprakash> code @ http://paste.debian.net/297677/ , error @ http://pastebin.com/RFfyV3GB
[07:35:01] <quincylarson> Thanks, Boomtime - so you don't think this will happen if I try to connect with mongohub again?
[07:43:32] <Boomtime> quincylarson: how big are your documents?
[07:43:42] <Boomtime> oops, sorry
[07:43:50] <Boomtime> amitprakash: how big are your documents?
[07:44:50] <Boomtime> also, are you doing everything in the mongo shell?
[07:47:01] <amitprakash> Boomtime, yes, 10-16kb
[08:08:58] <Boomtime> amitprakash: not sure what is going on there, i'd have to do some tests
[09:05:39] <amitprakash> Hi, How is this statement supposed to work? results=bulkop.execute(function(err,result) { console.log(JSON.stringify(result));});
[09:05:52] <amitprakash> Since bulkop.execute only takes writeConcern as a parameter
[09:14:19] <nalum> hello all, is it better to use $or or $in on a single field?
[09:15:04] <amitprakash> nalum, use $in over $or
[09:15:21] <amitprakash> nalum, http://docs.mongodb.org/manual/reference/operator/query/or/#or-versus-in
[09:15:56] <nalum> amitprakash: thanks, I was looking for that page :)
[12:04:21] <velikan> hello everyone!
[12:04:51] <velikan> how to db.collection.group a datetime field by date?
[12:09:53] <aps> I think I've screwed up big time and would highly appreciate some help here.
[12:09:53] <aps> I added auth to my mongo replica set few days back and created a user with roles: [ { role: "root", db: "admin" } ], Now I'm need to unable to rs.add( ) new member to replica set - reason "not authorized on local to execute command". The new member has same keyfile.
[12:09:54] <aps> What do I do now?
[12:13:45] <cheeser> did you db.auth() after connecting?
[12:58:50] <barbagianni> Does anybody see why this returns an error:
[12:58:53] <barbagianni> db.eval('function x() {}x()')
[12:59:06] <barbagianni> While this does not:
[12:59:09] <barbagianni> db.eval('1;function x( {}x()')
[12:59:20] <barbagianni> sorry...
[12:59:25] <barbagianni> db.eval('1;function x() {}x()')
[12:59:30] <barbagianni> this does not
[13:01:45] <StephenLynx> thats one screwed up js code to start with.
[13:03:42] <StephenLynx> is the first code valid anywhere else?
[13:04:28] <StephenLynx> it seems to be declaring a function called X and then it calls X?
[13:04:44] <StephenLynx> why aren't you using semi commas?
[13:04:58] <StephenLynx> function x (){} x();
[13:05:48] <StephenLynx> barbagianni
[13:06:03] <barbagianni> yes, semi commas after named functions are optional
[13:06:32] <barbagianni> The second one i posted has messed up syntax. My bad.
[13:06:33] <StephenLynx> I didn't say to put after the function declaration
[13:06:50] <StephenLynx> I said to put after the call to the already declared function
[13:06:54] <StephenLynx> x();
[13:06:54] <deathanchor> soo... aggregating too much I guess: request heap use exceeded 10% of physical RAM
[13:07:34] <barbagianni> Does not help, but I think I'm on to something
[13:07:51] <StephenLynx> what error does it throws?
[13:08:11] <StephenLynx> function x(){} x(); is valid on io.js
[13:08:15] <deathanchor> I can't bump up the memory anymore, anyway around this other that making more filters to limit how much data it is aggregating?
[13:08:16] <barbagianni> Error: { "errmsg" : "exception: SyntaxError: Unexpected identifier", "code" : 16722, "ok" : 0 }
[13:09:24] <StephenLynx> WARNING: db.eval is deprecated
[13:09:46] <barbagianni> As far as I can see mongo assumes I'm providing a single function to execute, if the code starts with the keyword 'function'
[13:10:59] <barbagianni> That would explain the behaviour.
[13:11:19] <StephenLynx> even on standard js eval is messed up.
[13:12:10] <deathanchor> anyone know if I can aggregate to a new colleciton on a secondary member?
[13:12:25] <StephenLynx> $out?
[13:12:37] <barbagianni> Thanks for the help, StephenLynx
[13:18:09] <deathanchor> even to a secondary though?
[13:39:08] <deathanchor> eh, just added a { $limit : 40000000 } to the start of my aggregation since it's only for some statisical general info
[13:59:10] <bogn> Hi all, I'm about to move a 164 GB database. Is mongodump & mongorestore suitable for that or should I just copy the db folder? It's also a move from 3.0.1 to 3.0.5 that shouldn't be an issue?
[14:01:56] <deathanchor> bogn: if you stop mongod, then you are safe to do an rsync of the dbpath
[14:02:43] <deathanchor> if it is local network I wouldn't bother with compression and just use rsync -avP
[14:03:01] <bogn> it's not
[14:03:10] <bogn> two different cloud services
[14:04:06] <deathanchor> well depending on your throughput I still wouldn't compress, I have done an rsync -avP over wan and it was still faster than using -azvP
[14:04:51] <deathanchor> straight up rsync of the dbpath will be faster than a mongodump,rsync,mongorestore.
[14:05:01] <deathanchor> it won't even need to rebuild any indexes
[14:05:07] <bogn> OK, thank you
[14:05:09] <deathanchor> which mongodump and restore does
[14:05:20] <bogn> that saved time
[14:06:57] <bartzy> Hi
[14:07:44] <bartzy> Anyone knows how WiredTiger performs (in general) with large documents (>500KB, <4MB), compared to mmapv1?
[14:08:00] <bartzy> I understand that because of the in-memory cache it doesn’t work as well as mmap?
[14:08:04] <bartzy> or is that fixed by now?
[14:10:43] <Kosch> replication question: do I need to rs.initiate(..) on every additional member, before adding it to the rs?
[14:12:49] <Derick> no
[14:12:53] <Derick> you must *not* do that
[14:12:54] <kali> Kosch: nope. initiate() must only be run on the member that has the initial data
[14:12:57] <bartzy> Kosch: According to this tutorial, no : http://docs.mongodb.org/manual/tutorial/deploy-replica-set/
[14:14:00] <bartzy> if we’re on replication… I had a replica set which I turned into a single server (don’t ask…). Now I want to create a replica set again. What do I need to do on the single server (which was part of a replica set before) ?
[14:14:11] <bartzy> Should I drop any oplog collections or something?
[14:14:32] <Kosch> bartzy: yes, I wasn't sure about the preparation of the additional members.
[14:15:23] <Kosch> kali: so additional members stay in empty config state, until they are added.
[14:17:35] <kali> Kosch: you should not have to connect manuall to the additional members.
[14:17:39] <kali> +y
[14:19:14] <deathanchor> new replSet gist: standup mongo on all machines, mongo on primary (with data) rs.initiate(), then rs.add() the other members, done!
[14:19:16] <bartzy> Derick: Sorry to bother you, but perhaps you know or can point me in the right direction. I had a replica set on 2.4 (2 data nodes, one arbiter), and I upgraded to 2.6. After around a day (with no issues), the mongod process crashed after the following log: warning: DR102 too much data written uncommitted 314.978MB
[14:19:42] <bartzy> 500ms after that, the log stated: Assertion: 10345:passes >= maxPasses in NamespaceDetails::cappedAlloc: ns: local.oplog.rs, len: 148332, maxPasses: 5000, _maxDocsInCapped: 2
[14:19:43] <bartzy> 147483647, nrecords: 347811, datasize: 10035458332
[14:20:10] <bartzy> and then what seems to be a stack trace (and the process crashed)
[14:20:40] <Derick> bartzy: hmm, are you on the latest 2.6? I don't really understand the server errors most of the tim
[14:21:30] <deathanchor> barty, is this a former set that you are setting up again?
[14:22:02] <kali> i think you need to ditch the "local" directory
[14:22:04] <kali> lemme check
[14:22:15] <Derick> MongoDB still shouldn't crash though
[14:22:45] <deathanchor> it doesn't crash, it stops because of assertion errors (safety valves really)
[14:23:43] <kali> bartzy: https://docs.mongodb.org/manual/tutorial/reconfigure-replica-set-with-unavailable-members/#reconfigure-by-breaking-the-mirror
[14:24:18] <bartzy> Derick, deathanchor : I’m on 2.6.11. It’s not the former set - That’s what caused the set to become “former”.
[14:24:22] <kali> bartzy: that's the spirit. ditch "local"
[14:24:39] <bartzy> It happened over and over - It promoted the secondary, then after a few minutes it happened on the secondary..
[14:25:01] <bartzy> Until I got this one just before the stack trace: SEVERE: Fatal DBException in logOp(): 10345 passes >= maxPasses in NamespaceDetails::cappedAlloc: ns: local.oplog.rs, len: 148332, maxPasses: 5000, _maxDocsInCapped: 2147483647, nrecords: 347811, datasize: 10035458332
[14:25:32] <bartzy> kali: Thanks. Will do. I hope I won’t encounter that crash again. I fear the perhaps it was due to data curroption - is that possible..?
[14:25:46] <bartzy> The entire setup worked flawlessly for 2 years on 2.4 :|
[14:26:24] <kali> bartzy: well, there is something wrong somewhere. all bets are off
[14:26:28] <Kosch> deathanchor: kali: maybe its related to my setup. I've two separate machines. on both is running mongod on localhost:27017. To get them see each other I use a tunnel. I can imagine that confuses replication setup, when having multiple hosts with "localhost:27017" as name...
[14:26:45] <bartzy> kali: I didn’t find anything on that DR-102 warning.
[14:27:17] <deathanchor> Kosch: you can't do that
[14:27:55] <kali> Kosch: that would definitely confuse mongodb. replication needs to have meaningfull hostname for the whole setup, including client apps
[14:27:59] <deathanchor> Kosch: if you want to use localhost then you would need to setup tunnels to various ports so that the tunnels on the ports point the right machines.
[14:28:20] <bartzy> kali, deathanchor: My app saves images as binary data in regular collections. The images are 50-300KB, not huge. Perhaps 2.6 exhibits some other journal performance characteristics than 2.4, which caused some weird “uncommited” problem?
[14:28:34] <deathanchor> Kosch: kali is also correct, your clients will never know who to talk to
[14:29:29] <bartzy> There’s another thing to note in the upgrade - I almost never got slow queries for this images collection (>100ms). Now I get a few updates that are 100ms-800ms almost every few minutes.
[14:29:48] <Kosch> deathanchor: in my case the client runs on every host, so they are talking only to db on localhost.
[14:29:52] <bartzy> that’s what caused me to think that maybe something has changed internally in 2.6 regarding performance of larger documents …
[14:31:29] <Kosch> deathanchor: but yes, I've to find another way. root cause of this situation is just the lack of ssl support in the official non-enterprise packages. I wanted to avoid rebuild them by my own :)
[14:37:16] <bartzy> kali: About breaking the mirror - I have to shutdown the single server in order to move the local database?
[14:37:21] <bartzy> kali: Can’t I just drop it..?
[14:37:27] <bartzy> kali: I don’t want to interrupt the service
[14:37:39] <bartzy> oops. Sorry for the triple mention.
[14:39:38] <kali> bartzy: i'm not sure. if it were mine, i would follow the procedure to the letter :)
[14:41:38] <bartzy> :D
[14:41:40] <bartzy> will do
[14:42:03] <bartzy> I wonder if the 2nd option isn’t better suited for my situtation? https://docs.mongodb.org/manual/tutorial/reconfigure-replica-set-with-unavailable-members/#reconfigure-by-turning-off-replication
[14:42:19] <bartzy> I mean, that’s exactly what I did. But now there’s this part: When possible, re-deploy a replica set to provide redundancy and to protect your deployment from operational interruption.
[14:42:49] <bartzy> But how do I “re-deploy" when the single server WAS a replica set member…
[14:52:46] <bartzy> kali: If I stop the single server now, start if with the replSet option, will it still be “primary”? Meaning it will accept writes and reads?
[14:52:55] <bartzy> kali: Or will it become secondary and not want to promote
[14:53:32] <kali> bartzy: it will be operational when the rs.initiate() command is issued, after the start
[14:53:54] <kali> step 5 on the "breaking the mirror" procedure
[14:54:01] <bartzy> kali: so before rs.initiate() in what state is it going to be? And after, it is going to be primary?
[14:54:32] <kali> yes. after step 5, it will become primary
[14:54:44] <bartzy> kali: I think “breaking the mirror” does not apply directly to me - since I also need to set the —replSet option on the single server - it is not enabled now on the working server.
[14:55:07] <bartzy> so I need to stop the server , move the local database, AND start it with —replSet, correct?
[14:55:37] <kali> yes. and run rs.initiate() on it ASAP so it will elect itself
[14:56:03] <bartzy> ok, and then if clients have only that one server configured - they will connect without issue? Even if they have no “replSet” configured in their connect options?
[14:56:22] <bartzy> or do I need to time a deploy for the clients so they will have replSet in their connect options as well..
[14:56:37] <kali> bartzy: the client will be fine
[14:57:01] <bartzy> OK. So only after the replica set is whole again, I will deploy the replSet option to the clients.
[14:57:23] <bartzy> kali: Last questions - one of the members is an arbiter - The procedure is the same for that as well ? (Moving the entire data directory)
[14:57:44] <kali> bartzy: yes.
[14:57:48] <bartzy> Thanks!
[14:58:00] <kali> good luck :)
[14:58:11] <bartzy> I’ll report back from the combat zone ;p
[14:58:32] <kali> it feels more like "apollo 13" to me, but whatever
[14:59:42] <bartzy> lol
[15:01:07] <bartzy> processManagement.fork: false - this is set on my configuration file. That’s fine? :|
[15:01:46] <bartzy> oh, nevermind.
[15:01:54] <bartzy> The init.d script takes care of backgrounding
[15:07:15] <bartzy> kali: Really last question - I want to run repairDatabase to see if perhaps the performance issues (not related to the replication config) are somehow related to fragmentation…
[15:07:31] <bartzy> it won’t be necessary on the newly synced member, right?
[15:07:54] <bartzy> Since it writes everything from scratch (syncs from the current working node)
[15:09:30] <kali> bartzy: no need to repair after a resync.
[15:10:09] <bartzy> kali: and the current one will need the repair right? And I can do that by promoting the new one to primary, and then repairing the current (which will be secondary)
[15:10:47] <kali> bartzy: yeah, or alternatively, promote the new node, then scratch the old one to trigger a full resync
[15:13:08] <bartzy> scratch = ?
[15:13:17] <bartzy> kali: And why do that instead of repairDatabase? any benefits?
[15:13:40] <bartzy> kali: because if I do that, will I need to change the clients again to point only to the new node..? or will the set still work..?
[15:14:01] <kali> it may be faster or slower, depending on your disk and network setup...
[15:14:28] <bartzy> OK. But shoudn’t provide a different result, right?
[15:14:38] <bartzy> shouldn't*
[15:14:58] <und1sk0> got a nasty segfault, can't find anything about it in JIRA..
[15:14:58] <kali> once you have the replica set up and running, you can have the clients configured to the replica set, including all its nodes. when one node is not available, the clients will try another one
[15:15:35] <kali> bartzy: and yeah. repair VS resync, the outcome should be identical
[15:15:48] <bartzy> kali: OK. One more question - when specifying a node with rs.add(), is it safe to use the hostname without the domain (for example, storage-02:27017 instead of storage-02.internal.prod:27017) ?
[15:16:19] <bartzy> of course the resolver configuration sets the search domain to internal.prod on all servers…
[15:16:36] <kali> bartzy: you just need to make sure everybody involved in the setup (mongodb nodes and clients) will be able to resolve correcyly these names.
[15:16:50] <bartzy> OK.
[15:16:58] <und1sk0> bartzy: i use short hostnames, i think just as long as your config, shard and replica set memebers can all resolv those names you're good
[15:17:42] <und1sk0> anyone ever see this error?
[15:17:43] <und1sk0> ReferenceError: _$jscmd is not defined#012 at _funcs9 (_funcs9:2:1
[15:17:45] <und1sk0> 7) near 'if (_$jscmd("api/models/concerns/se' (line 2)#012
[15:18:09] <kali> und1sk0: you may get more help by posting to JIRA or the in the developper mailing list
[15:18:28] <und1sk0> kali: server project?
[15:18:44] <kali> actually, now you've shown it... it looks like client-side stuff
[15:18:57] <kali> maybe Derick can help
[15:18:58] <und1sk0> well, then the DB segfaults
[15:19:03] <kali> ha :)
[15:19:38] <und1sk0> (apologies for the paste)
[15:19:39] <und1sk0> 2015-08-18T15:03:53+00:00 rs0db2 mongod.27018[66032]: [ID 702911 local7.crit] [conn32213] Invalid access at address: 0#012
[15:19:42] <und1sk0> 2015-08-18T15:03:53+00:00 rs0db2 mongod.27018[66032]: [ID 702911 local7.crit] [conn32213] Got signal: 11 (Segmentation Fault).
[15:19:50] <und1sk0> [... stacktrace stuff ...]
[15:20:51] <kali> und1sk0: jira server, then, i guess
[15:21:12] <und1sk0> kali: thanks... i'll keep an eye on irc in case anyone has an insights
[15:21:46] <bartzy> kali: It’s syncing :D I added the arbiter first and then the second node. And now the arbiter is _id:1 and the 2nd node has _id:2. Bothers my OCD :o
[15:21:57] <bartzy> can I change that without too much hassle?
[15:22:41] <kali> bartzy: i would not dare
[15:23:01] <kali> but i don't have ocd
[15:23:10] <bartzy> Now checking that the heartbeat values are the same will be a pain
[15:23:49] <Derick> bartzy: don't change them :)
[15:23:55] <Derick> und1sk0: yes, you want the SERVER project in Jira
[15:24:07] <bartzy> Should I even check that the replication is working by comparing the heartbeat values? Or there some better way..?
[15:24:23] <Derick> insert, and see if it replicates?
[15:24:24] <bartzy> and more “precise"
[15:24:33] <Derick> or, rs.status() will tell you whether the RS is all okay
[15:24:40] <bartzy> where
[15:24:46] <Derick> on the mongo shell
[15:25:04] <bartzy> What does it say if it’s lagging..?
[15:27:12] <Derick> nothing specific
[15:27:24] <bartzy> only the heartbeats will be different, right..?
[15:28:04] <Derick> they are likely always going to be different
[15:28:07] <bartzy> kali: about the “scratch” method instead of repairDatabase - how would I do that ?
[15:28:18] <Derick> can you do a paste? I don't have a rs set handy to check
[15:28:29] <Derick> (paste in a pastebin, please)
[15:28:41] <und1sk0> Derick: cool, reported - https://jira.mongodb.org/browse/SERVER-20027
[15:30:46] <kali> bartzy: stop the server, rm dbpath, start it again
[15:30:57] <bartzy> kali: It will automatically resync?
[15:31:04] <kali> yes
[15:31:13] <bartzy> Cool. Thanks so much for all the help!!
[15:31:44] <kali> bartzy: well, not "rm dbpath" but "rm dbpath/*" actually. you need the directory to exist :)
[15:32:01] <bartzy> :)
[15:39:53] <bartzy> kali: Is that somethign to be worried about? DR101 latency warning on journal file open
[15:40:08] <bartzy> Saw it in the log of the new replica member, while syncing.
[15:41:58] <bartzy> Perhaps that’s fine during resyncing since the drive is saturated?
[15:42:03] <kali> bartzy: i don't think you should worry too much about anything as long as the replication is making progress
[15:42:06] <kali> yeah
[15:42:14] <bartzy> OK
[15:42:21] <kali> your oplog is long enough anyway ?
[15:42:40] <bartzy> I don’t follow
[15:42:49] <bartzy> I know what oplog means/does, but how can I check ?
[15:43:06] <kali> something like db.printReplicationInfo() i think
[15:43:25] <bartzy> 6021MB
[15:43:34] <kali> ha ! but you've just re-created the replica set
[15:43:44] <kali> so you don't really know how much time the oplog can hold
[15:44:01] <bartzy> oh, you mean if the oplog will roll-over and the replication still didn’t finish ?
[15:44:07] <kali> yeah
[15:44:11] <bartzy> Yeah that won’t happen
[15:44:16] <bartzy> It replicates extremely fast
[15:44:21] <kali> ok
[15:44:22] <bartzy> it is pushing 700Mbps
[15:44:30] <bartzy> now it’s done
[15:46:32] <context> how can i force the auth schema MONGODB-CR with 3.0
[15:47:48] <context> ahh think i found it
[15:49:55] <context> setParameter: authenticationMechanisms: MONGODB-CR still didnt work. and wiped old users and recreated
[15:49:58] <context> grrrr !@#
[16:16:30] <punnie> anyone know of a tool to continuously tail the profile log into a file?
[16:17:25] <punnie> or in a more specific term, how to create tailable logs in the latest ruby mongo driver
[16:20:44] <retran> Wazzup
[18:32:21] <amcsi_work> does mongo not run mapReduce() if the found row count is exactly 1?
[18:36:17] <amcsi_work> how do I force mongo to call the reduce function at least once?
[18:38:13] <amcsi_work> plz help
[18:38:24] <kali> you can't
[19:01:05] <jpfarias> hey guys, is there a way to speed up $all queries on a field that is a list of strings, assuming the field is already indexed
[19:26:37] <deathanchor> can I $sum several fields?
[19:28:33] <deathanchor> nevermind found $add
[20:53:21] <bogn> Hi all, when moving the data directory of a standalone node from one host to another, do I need to copy the journal and _tmp directories as well. It's also a move from 3.0.1 to 3.0.5
[20:55:29] <bogn> Forget about _tmp, it's empty. But the journal is 1.7 GB in three files.
[21:27:00] <domo> does findAndModify block reads while finding?
[21:40:58] <jpfarias> for a collection with 110 million documents and 216GB of data, what would be a good number of shards, assuming each shard has 16GB of memory
[21:41:00] <jpfarias> ?
[21:41:45] <jpfarias> the total index size is ~20GB
[21:53:28] <morenoh149> jpfarias: well you definitely want the index in memory so at least two shards
[21:54:10] <jpfarias> morenoh149: I have 6 shards now… still doing a query on one of the indexed fields takes forever
[21:54:38] <jpfarias> it is an $all query on an indexed field, that field is a list of strings, much like a “tags” field
[21:54:54] <jpfarias> in most blog post examples
[21:59:25] <morenoh149> jpfarias: what's your shard key
[21:59:43] <jpfarias> morenoh149: it is a hashed key, a string
[22:00:13] <jpfarias> morenoh149: supposed to be well balanced among the shards
[22:00:17] <jpfarias> :-)
[22:01:34] <morenoh149> jpfarias: don't see any documentation for $all query
[22:02:10] <jpfarias> my search looks like this:
[22:02:19] <morenoh149> http://docs.mongodb.org/manual/reference/operator/query/all/#op._S_all ?
[22:02:44] <jpfarias> db.transactions.search({‘tags’: {‘$all’: [‘john’, ‘smith’]}})
[22:02:47] <jpfarias> yes
[22:02:51] <jpfarias> that operator
[22:04:25] <morenoh149> oh it's sloow http://docs.mongodb.org/manual/reference/operator/query/all/#performance
[22:07:21] <jpfarias> yeah I noticed that
[22:07:30] <jpfarias> I just didnt expect it to be *that* slow
[22:07:46] <jpfarias> even tho the collection has ~ 110 million documents
[22:08:16] <jpfarias> most queries on that field have at most 1000 documents
[22:08:35] <jpfarias> is there a way to see the counts of the index per indexed keyword?
[22:08:59] <jpfarias> like how many documents for the keyword “john”
[22:09:10] <jpfarias> for all the indexed keywords
[22:09:41] <jpfarias> other than manually looping thru the whole database
[22:09:42] <jpfarias> lol
[22:24:14] <morenoh149> jpfarias: you got a multikey index on the array field?
[22:24:18] <morenoh149> http://docs.mongodb.org/manual/core/index-multikey/
[22:24:57] <jpfarias> morenoh149: yes
[22:25:21] <jpfarias> the index was created with: db.transactions.ensureIndex({‘tags’: 1})
[23:11:55] <energizer> Hello, new to mongodb, I hope you can help me. I have a python dictionary of Member objects, which are themselves dictionaries . I'd like to go through the dictionary and at each Member object, insert it into mongodb if its 'id' is not in the db. If its 'id' is in the db, i'd like to update the fields that are present in the dictionary. This seems like it should be easy to do, but i can't seem to figure it out.
[23:28:26] <daidoji> energizer: http://docs.mongodb.org/manual/reference/method/db.collection.update/
[23:28:36] <daidoji> set the query parameter "upsert": True
[23:32:31] <energizer> daidoji: Thanks for responding. What concerns be about that is: for Members whose 'id' values are present in the db I don't want to erase any fields that are not present in the dictionary.
[23:32:42] <slackorama> energizer: use '$set' if you only want to update specific fields (rather than a new document).
[23:33:59] <slackorama> energizer: db.products.update({'_id': 100}, {'$set': {'xxx': 'bar'}}, {'upsert': true})
[23:37:41] <energizer> slackorama: thanks, i'll try that out
[23:39:44] <slackorama> energizer: that's from the console but porting it to python should be straightforward.
[23:39:46] <daidoji> energizer: yeah what slackorama said
[23:40:08] <daidoji> slackorama: I think that query can be used verbatim in python (as long as your variable is called 'db')
[23:40:14] <daidoji> connection variable I mean
[23:47:32] <slackorama> daidoji: yeah, I just wanted to be clear. And true -> True :)
[23:48:35] <daidoji> right