PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Wednesday the 9th of July, 2014

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[04:21:00] <Guest57580> Is there anyway to create/use a 2dsphere index on coordinate points outside of [-180, 180]?
[04:52:51] <Guest57580> Any ideas why a query ( db.col.find({loc: {$near: [0, 0], $maxDistance: 50}, status: true}); ) scans all documents when returning 0 and having the following index: {loc: ‘2d’, status: 1}?
[07:14:24] <gh2> Hi guys. I already have a collection and in one the field I want it to be unique. So I guess I have to create a unique index. So I would run db.collection.ensureIndex( { product-url: 1 }, { unique: true } ) where product-url is the field I want it to be unique. thats all? does it create index for already existing db? and is it one time command? or do I have to run it periodically?
[07:17:09] <Aartsie> gh2: Hi! you have to run that command just one time, mongodb will create the index every time when you insert or change a document
[07:19:49] <gh2> nice! thank you :)
[11:23:35] <rasputnik> we've got a single shard. do we still need a balancer to be running?
[11:26:24] <kali> rasputnik: the balancer will trigger chunk splitting in your sharded collections, if there are some
[11:27:25] <kali> rasputnik: so if you have sharded a collection, you need the balancer to chunk them (so they can be migrated when another shard is added-
[11:27:45] <kali> rasputnik: in the other case, it will just idle harmlessly
[11:29:23] <rasputnik> kali: ah so even within a shard it'll still split up chunks?
[11:57:45] <adamcom> technically splitting is not done by the balancer
[11:58:03] <adamcom> on a single shard, the balancer will spin up, see there is nothing to do, go back to sleep
[11:58:23] <adamcom> splits are done by the mongos when it sees enough traffic hit a particular chunk
[11:58:46] <adamcom> if you have not run sh.shardCollection() on a collection, then it won't split either
[11:59:31] <adamcom> or you can start your mongos with the noAutoSplit option too if you want to be doubly sure
[11:59:57] <adamcom> once you do eventually add another shard though you will want to turn these things on
[12:08:55] <rasputnik> thanks, makes sense.
[12:09:49] <rasputnik> any way to drill into what queries are causing lock contention? hitting 80% fairly regularly, i'm assuming that it's writes of some kind but not sure how to identify which are the worst offenders
[12:25:48] <rspijker> rasputnik: turn on profiling?
[12:28:28] <rasputnik> rspijker: aye, suppose so. i'll need that enabled on every replica set primary presumably?
[12:29:57] <rspijker> rasputnik: profiling is a mongod thing, yes
[12:30:36] <rspijker> and depending on what you are afer, just the primaries might suffice
[12:32:04] <rasputnik> ta, i'll start with that quickly. heard there's a bit of overhead in profiling but at this point performance is ridiculously bad anyway.
[12:33:17] <ta> rasputnik, ok
[13:14:38] <adamcom> the lock percentage is the write lock percentage, so it's definitely writes, you just need to narrow it down - classic potential cause would be updates that are moving the document (and hence causing IO contention too, so look at IOStat also, and page faults)
[13:19:48] <rasputnik> adamcom: thanks, it's disappeared for now but i'll profile if it resurfaces (which i'm sure it will)
[13:21:56] <adamcom> rasputnik: if you don't have it already, get the cluster into MMS and get munin-node stats enabled for IO/CPU, then at least you'll have historic lock vs faults vs IO to look at without having to catch it in real time - doesn't help with the profiler, but will tell you the frequency, duration, corresponding spikes elsewhere
[13:28:16] <rasputnik> adamcom: yeah it's in mms we also have fluentd catching all the logs and about to put some statsd based instrumentation on too :)
[14:13:39] <edwin_amsler> Hello room! Has anyone here ever seen a kernel leak in Mac OS X 10.8 because of Mongo?
[14:14:53] <edwin_amsler> I'm guessing it's because of the memory mapped files, but I haven't built a test app for that yet.
[14:16:54] <djlee> $addToSet doesn't support specifying a subset of fields to compare when adding an object to an array. So, what would be the best method to remove duplicates based on a specific field? I have a users collection, which contains a tokens array, which contains objects for each token for the user. I only want one user token per device, so i need "users.tokens.device" to be unique
[14:20:26] <dandv> I have a query that's been taking MINUTES to run, and hasn't finished yet. Pretty sure something obvious is wrong, but I can't tell what: http://pastebin.com/t8qq7z5d
[14:21:30] <zippyzoo> Can anyone point me in the direction of how to handle a large "join" of data via two collections. One collection has ~40 million documents and the other around 5 million documents. I attempted to use map/reduce, however that takes a significant amount of time and time is critical here. Denormalizing the 40 million document collection isn't an option as the data is volatile and would require several updates across many collections.
[14:23:06] <dandv> zippyzoo: Have you looked into aggregation? It's faster than mapReduce for a number of operations.
[14:25:13] <zippyzoo> dandv, I looked into aggregation as well, however i didn't see anything about merging documents.
[14:38:48] <saml> how can I get all database and collection names?
[14:38:54] <saml> in mongo shell now
[14:39:15] <zippyzoo> dandv: to add more context; i have to filter on the 40 million document collection and the 5 million document collection to get the results
[14:39:25] <saml> i keep doing show dbs; use <db>; show collections;
[14:44:48] <dandv> saml: show collections in one DB
[14:45:39] <saml> yah i wanted to show collections for each db
[14:45:46] <saml> in a loop
[14:57:42] <rasputnik> my lock woes are back. there's a remove op that's been sat there for about an hour holding a lock, if it killOp it seems to reappear ?
[14:59:05] <rasputnik> client: field shows the mongos that sent the query, i can't see how to track down the originating app that's sending it in though
[15:35:10] <adamcom> if it's holiding the lock there are probably other removes queued up behind it, so when you kill one, the next one in the queue takes its turn
[15:36:11] <rspijker> rasputnik: if it’s truely sharded it might be a removal due to a migration?
[15:58:19] <rasputnik> rspijker: that's it though, it's not truly sharded, there's just one shard (long story).
[16:23:49] <Chaos_Zero> how can I update everything in an array, for example increment every value in array by 1?
[16:50:37] <circ-user-iz6yn> nick jbrinkman
[16:52:53] <jbrinkman> leave
[16:54:31] <stefandxm> lol
[17:15:02] <betty> If you have a 3 member replica set, can one of them be an arbiter?
[17:36:07] <cipher__> I just built the newest c++ driver, using, scons --prefix=$HOME/mongo-client-install --dbg=on --opt=on install-mongoclient on ubuntu 14.04, i installed all the boost libraries I could too, though I'm recieving compiler errors: http://pastie.org/private/oodqqsxxvbbfbcakf6jmdq
[17:36:52] <cipher__> linker error*
[17:48:28] <cipher__> Nevermind
[18:58:39] <Hunner> Hi. How can I get the `mongo` client command to read the port/bind_ip from the config file so I don't have to pass --port all the time?
[18:58:52] <Hunner> I don't see a --config option or something similar
[19:01:48] <Hunner> I mean, if I'm running the client from the machine running the daemon, it should "just work" imho
[19:03:37] <Hunner> /etc/mongorc.js doesn't seem to have this ability either
[19:24:20] <rasputnik> just discovered we're running with smallfiles enabled on our replica sets. about 10 DBs, using around 400Gb total. worth fixing?
[19:25:15] <rasputnik> *400Gb per replica
[19:26:49] <cheeser> not sure you can change that after the fact, can you?
[19:26:59] <cheeser> you'd have to dump/restore, iirc
[19:28:38] <rasputnik> cheeser: it's a replica set, so maybe i can change config on a 2ndary and resync
[19:29:34] <cheeser> possibly, yeah.
[19:37:11] <rasputnik> one other question then - if i have a second block device free in each replica, am i better off to use it for the oplog or the journal?
[19:41:16] <rasputnik> i'm going to say journal
[19:49:37] <dandv> Can anyone spare a second and look at a query that's been taking MINUTES to run? Pretty sure something obvious is wrong, but I can't tell what: http://pastebin.com/t8qq7z5d. The collection only has <400k records.
[20:51:52] <saml> hey, I have docs {"url": full url} how can I get distinct domains ?
[20:52:01] <saml> db.pages.mapReduce(..., ..( ?
[21:15:35] <tscanausa> future note sometimes mongo is slow because mongos does not have enough bandwidth
[22:16:09] <rgarcia_> has anyone ever seen a bunch of SocketException ... [SEND_ERROR] errors in EC2/VPC before?
[22:16:37] <rgarcia_> our production primary is acting up today with a number of these errors
[22:32:11] <dandv> Can anyone spare a second and look at a query that's been taking MINUTES to run? Pretty sure something obvious is wrong, but I can't tell what: http://pastebin.com/t8qq7z5d. The collection only has <400k records.
[22:38:58] <dawik> have you compared to find() and iterating the cursor with hasNext() .next() ?
[22:39:03] <dawik> thats how i got semidecent results..
[22:39:31] <kali> dandv: an index on { pubDate:-1, title:1 } will probably help... but you may want to check out your selector too... this $or and $in on empty array will not help
[22:41:24] <dandv> dawik: interesting, let me try. kali: I do have in fact such a compound index, "key": { "pubDate": -1, "title": 1 }
[22:42:18] <kali> dandv: show un an explain() on your slow query then
[22:43:16] <kali> dawik: that's weird.
[22:44:43] <dawik> kali: hmm? i only compared it to the toArray() method
[22:46:05] <kali> dawik: ha. that's understandable. toArray() will fetch the full dataset and wait till it have it all, while iterator methods will process it as a stream
[22:46:26] <kali> argentina.
[22:46:36] <kali> ok.
[22:47:05] <dandv> explain() stilll running...
[22:48:31] <kali> dandv: are you trying to achieve something usefull with this complex $or of $in s ? because unless i'm way drunker than i think i am, it's basically a noop, it does not exclude anything
[22:49:01] <kali> s/drunker/more drunked/ wow.
[22:49:28] <dandv> kali: it's called programmatically that way sometimes (which I'd better fix)...but normally there should be some values passed to the $in's
[22:49:50] <dandv> nYields is in the hundreds, is that normal?
[22:50:21] <kali> yeah
[22:52:41] <dandv> here's the explain: http://pastebin.com/wWaWc6bd
[22:54:08] <kali> dandv: ok. so you're basically pulling the whole table in this case. is this what your intend to do ?
[22:54:45] <kali> dandv: because most of the time, what you need is 1/ a count 2/ the first 30 lines (or so)
[22:55:01] <kali> dandv: and that make a hell of a difference from a database point of view
[22:55:30] <dandv> so it scans everything, but why is that so incredibly slow? db.content.find({nonexistent: 5}).sort({pubDate: -1, title: 1}).limit(100).explain() is much faster, http://pastebin.com/FuCMsK5h
[22:56:25] <dandv> kali: do I really pull the whole table if I have .limit(100).explain() at the end of both queries?
[22:56:45] <dandv> (both pastebin'ed explains above have .limit(100)
[22:57:19] <kali> well now there is a criteria, so the index no longer works
[22:57:36] <kali> well it works, but in a less efficient way
[22:58:07] <kali> this time, the right one would be { nonexistent:1, pubData:-1, title: 1 }
[22:59:25] <kali> ha, no, you're saying this one was actually faster
[22:59:28] <kali> sorry
[23:00:51] <kali> dandv: you may want to try without the degenerated selection criteria. it make no sense, so i'm staring to wonder if the optimizer can be confused
[23:02:06] <kali> and I hope somebody in a more compatible TZ will jump in to help now, i'm off for tonight.
[23:41:21] <dandv> thanks kali :(