PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Thursday the 30th of June, 2016

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:00:09] <bjpenn> how do we tell if the query is using an index?
[00:00:15] <bjpenn> i found A LOT of queries taking over millions of ms
[00:04:10] <bjpenn> im not sure if we index the rows because they get updated pretty often
[00:04:23] <bjpenn> everytime it gets updated, the index needs to be recreated?
[00:04:32] <UberDuper> The schema changes?
[00:04:41] <bjpenn> not the schema
[00:04:45] <bjpenn> just new data gets added in
[00:04:47] <bjpenn> very often
[00:04:57] <bjpenn> data gets added in, removed, editted, etc
[00:05:05] <UberDuper> If you index a field, every insert/update gets updated in the index realtime
[00:05:17] <bjpenn> i guess it doesnt matter, because indexing fields will just make sure all those fields get stored on memory
[00:05:42] <bjpenn_> got disc'd
[00:05:46] <UberDuper> If you index a field, every insert/update gets updated in the index realtime
[00:05:50] <bjpenn_> last thing i said was "not schema"
[00:05:57] <bjpenn_> ahh ok
[00:06:06] <UberDuper> You don't have to rebuild indexes
[00:06:18] <bjpenn_> i guess its telling the db to store all data that goes into this indexed field, into ram or some type of faster storage
[00:06:30] <bjpenn_> cool
[00:06:38] <bjpenn_> let me see if these queries are using indexes then
[00:06:52] <UberDuper> What version of mongo?
[00:06:59] <kurushiyama> UberDuper 'IGNORECASE = 1; /^(writeback|lockping).*ms$/{...}' ?
[00:07:08] <UberDuper> :D
[00:07:29] <bjpenn_> 2.6.9
[00:07:50] <UberDuper> Each log entry should say IXSCAN or COLLSCAN
[00:07:55] <kurushiyama> UberDuper Not sure about the sort part, though.
[00:08:08] <UberDuper> IXSCAN means it used an index. And then it'll list the indexes used.
[00:08:17] <UberDuper> COLLSCAN means no index.
[00:09:03] <UberDuper> If you have slow queries that say IXSCAN, check the nscanned value. If it's a large number, your index (or query) is no good.
[00:09:57] <kurushiyama> Though the latter does not necessarily mean that the queried fields were not in an index. Order matters, for example. an index with {foo:1,bar:1} and a query of {bar:"a",foo:15} should result in a collscan.
[00:10:16] <UberDuper> I thought that didn't happen anymore.
[00:10:29] <kurushiyama> UberDuper I _think_ not in 3.x
[00:10:50] <kurushiyama> UberDuper But it was coverd in m102 for 2.6
[00:11:07] <UberDuper> Maybe a 3.x thing yeah.
[00:11:46] <kurushiyama> But as a rule of thumb, "order matters" still has its worth. ;)
[00:11:56] <UberDuper> Indeed
[00:12:57] <kurushiyama> As does "only one index per query" ;)
[00:13:08] <UberDuper> Well...
[00:13:38] <kurushiyama> I had more pain with intersections than with that rule.
[00:13:46] <UberDuper> That may be ideal but generally not practical.
[00:13:51] <UberDuper> From what I've witnessed.
[00:17:10] <kurushiyama> Well, you trade RAM for performance. One of them is easy to deal with.
[00:20:23] <UberDuper> Time to go. Take care.
[00:20:32] <UberDuper> bjpenn_: Good luck.
[00:32:47] <bjpenn_> UberDuper: thanks!
[00:33:41] <bjpenn_> UberDuper: 2334096 nsscanned value
[00:33:44] <bjpenn_> thats pretty high right?
[00:33:51] <bjpenn_> or is that a normal range
[01:18:09] <UberDuper> bjpenn_: That's very high
[01:18:28] <UberDuper> In a perfect world it'll be equal to the number of docs returned.
[02:27:49] <UberDuper> bjpenn: Dunno if you got my last message. But that's a very high nscanned.
[07:53:57] <nohitall> if I dont specify a database with mongodump do I get everything?
[07:55:47] <joannac> try it and see!
[07:56:58] <nohitall> well ok yea, but somehow I fail at importing
[07:57:53] <nohitall> mongorestore or mongoimport
[08:47:48] <chujozaur> Hi
[08:48:09] <chujozaur> Is there anybody willing to help me with mongo replication
[08:48:11] <chujozaur> ?
[08:49:03] <chujozaur> I made two hosts join replicaSets, and load avg is ~300/400 there
[08:49:48] <chujozaur> I see many iowaits and many faults on newly spawned secondaries
[08:50:02] <chujozaur> but I have no idea how to mitigate them
[09:36:16] <MahaDev> Hi, please help me on this query - db.getCollection('users').find({"_id" : ObjectId("57502418391ebf041c94657b")}, {"timezone" : {$exists: true}}) What is wrong with this ? I am getting "Unsupported projection option: timezone: { $exists: true }" error
[09:38:35] <sumi> hello
[09:42:34] <MahaDev> Hello sumi
[10:41:14] <Mmike> Hi, lads. I have 3 node replicaset, and two nodes went down. The routers in the racks where those two nodes reside are being replaced
[10:41:22] <Mmike> now the remaining one is SECONDARY and I can't really use it
[10:41:36] <Mmike> so I am going to reconfigure replicaset
[10:41:54] <Mmike> when I do so the remaining node will become PRIMARY but with only one member
[10:42:33] <Mmike> when they fix routrers, what's going to happend with my replicaset? I need to manually kill them and re-add them, right?
[10:43:26] <kurushiyama> Mmike Wow, wait a sec
[10:44:13] <pamp> Hey
[10:44:48] <kurushiyama> Mmike Is this happening NOW or is it planned for the future?
[10:45:10] <Mmike> this is happening now
[10:45:10] <pamp> Its normal a query in a indexed field (hashed index) make a collscan in the db?
[10:45:14] <Mmike> I have only one working unit
[10:45:25] <Mmike> and two remaining ones will be working in like 4-5 hours, but I need my mongo now
[10:45:42] <kurushiyama> pamp Wait a sec please, Mmike has a more pressing matter.
[10:45:57] <kurushiyama> Mmike Give me a sec
[10:46:02] <Mmike> kurushiyama: sure thing
[10:48:04] <kurushiyama> Mmike Well, the problem with reconfig is that you'd have to do an initial sync (most likely) when the maintenance is done.
[10:48:41] <Mmike> yup, that is not the issue
[10:48:59] <Mmike> but when network is reestablished, will the two nodes that were cut off form their own replicaset?
[10:49:14] <Mmike> because they will have quorum
[10:49:42] <kurushiyama> Mmike Yes.
[10:50:09] <kurushiyama> Mmike As of their POV, it could well be just a network partition that happened.
[10:50:44] <kurushiyama> Mmike You can not shut them down or sth?
[10:50:46] <Mmike> kurushiyama: ok, so I need to make sure that my apps do NOT try to connect to those two, but just to the one that's in fresh replicaset
[10:51:12] <kurushiyama> Mmike What "fresh" replica set? You are talking of the remaining node?
[10:51:26] <Mmike> yes - so, I have nodes: A, B and C
[10:51:30] <Mmike> A and B lost network
[10:51:41] <Mmike> C is reconfigured to become PRIMARY in the new 1-node replicaset
[10:51:44] <kurushiyama> Mmike Was the remaining node primary as when the network was killed?
[10:53:09] <kurushiyama> Mmike BTW, i'd beat my ops team quite a bit.
[10:54:26] <Mmike> kurushiyama: don't know, could be. But as it los quorum it shifter to secondary
[10:55:00] <Mmike> well
[10:55:09] <Mmike> network gear dies sometimes
[10:55:23] <kurushiyama> Mmike Ok, that might lead to rollbacks. Be aware of that.
[10:55:31] <Mmike> I know
[10:56:03] <kurushiyama> Mmike Well, better to be telled when knowing that not when not knowning ;)
[10:56:06] <Mmike> The situation i need to be sure to aviid is to have clients writing to the old replicaset, when networking is fixed
[10:56:17] <Mmike> kurushiyama: so true :)
[10:56:19] <kurushiyama> Mmike Shoot the nodes in the head?
[10:56:52] <kurushiyama> Mmike Tell the Ops team to unplug them, if necessary.
[10:57:27] <kurushiyama> Mmike or at least one of them
[10:57:34] <Mmike> I firewalled them on the client machines
[10:57:46] <Mmike> so clients can connect only to the remaining, running node
[10:57:50] <Mmike> which is now primary
[10:58:03] <kurushiyama> Mmike Well, the remaining node still has the same IP, I guess ;)
[10:58:17] <kurushiyama> Mmike And/or same hostname
[10:58:41] <Mmike> so when networking goes back on, i'll connect to each of the ones that were unreachable, shut down mongo, delete datadir. Then I will fire up mongo, and do rs.add() on the always-working primary
[10:58:43] <Mmike> that should do it
[10:58:50] <kurushiyama> Mmike NO
[10:59:03] <kurushiyama> You do NOT delete the datadir
[10:59:07] <kurushiyama> The rollbacks would be lost
[10:59:26] <Mmike> SMART
[10:59:30] <Mmike> I'll backup the datadir
[10:59:44] <Mmike> but hey
[10:59:52] <Mmike> I did have write_concern set to 3
[11:00:04] <Mmike> so I should have all the data on all 3 servers
[11:00:06] <Mmike> before the split
[11:01:03] <kurushiyama> Mmike Ok. Side note: With that WC, you eliminate failover capabilities. Majority should do well. But in this case, no rollback
[11:01:19] <Mmike> yes, and that's what I need
[11:01:30] <Mmike> I had serious issues with rollbacks not being notices properly
[11:01:43] <kurushiyama> Mmike Well, I'd make sure and firewall off the current primary as well.
[11:02:03] <kurushiyama> Mmike use readConcern, then.
[11:02:15] <Mmike> like, have PRIMARY disconnect before it replicated data - remaining ones voted new primary, and when old primary joined it was like 'ooo, we have primary, my oplog is no longer, let's take stuff from new primary'
[11:02:33] <kurushiyama> Mmike And if you need data on 3 nodes while retaining failover, 5 is more suitable to your needs.
[11:03:41] <kurushiyama> Mmike Assuming you have firewalled the 2 nodes basically off everything, so that there are isolated, then you should be good.
[11:04:04] <kurushiyama> Mmike With a write concern of majority, rollbacks should not happen anyway.
[11:04:25] <kurushiyama> Mmike Well, except 2 nodes fail simultaneously ;)
[11:04:51] <kurushiyama> Mmike Lessons learned: put one of those 2 member somewhere else.
[11:05:01] <Mmike> they are in separate racks
[11:05:07] <Mmike> 3 servers, 3 racks
[11:05:12] <Mmike> I had percona cluster there too
[11:05:28] <Mmike> but it's a bit easier to deail with percona (or I'm just more experienced with that)
[11:05:48] <kurushiyama> Well,me too.
[11:06:03] <kurushiyama> Mmike As per consistency, you are aware of readConcern?
[11:06:45] <kurushiyama> Mmike And I was more referring to "as little of shared infrastructure as possible". One router killing two nodes is a bad sign.
[11:08:42] <Mmike> kurushiyama: no, no - two switches died :)
[11:08:51] <Mmike> each rack has a switch
[11:09:03] <Mmike> and those are connected to the main super-switch something
[11:09:10] <Mmike> so if you loose a switch, you loose a rack
[11:09:19] <Mmike> so you plan stuff so that all the HA stuff is never in the same rack
[11:09:27] <Mmike> like, no point in having 3 mongod nodes in the same rack
[11:10:04] <kurushiyama> Mmike Well... I know that. Was not really sure wether you did ;)
[11:10:40] <kurushiyama> Mmike Seriously, have a look in readConcern. And I would contemplate to increase the replication factor.
[11:13:58] <Mmike> kurushiyama: will do - want to chat about that more, just need to fiinish stuff here first :)
[11:14:02] <Mmike> kurushiyama++ for all the help
[11:14:14] <kurushiyama> Mmike Thanks, but no kudos here ;)
[11:14:53] <Mmike> i see :)
[11:18:09] <kurushiyama> Mmike But I take sweets ;)
[11:19:38] <enelar> Guys, i am getting oom while querying a lot of data through php->mongofill. As far as i understand, upon ->find command i should get nothing more than empty iterator (ie cursor). What memory allocating for?
[11:20:19] <enelar> Is it bug of mongofill (library name), or mongodb actually fetched data into client on ->find?
[11:21:26] <kurushiyama> enelar Sounds like a wrongly configured OS mongod is running on.
[11:21:55] <kurushiyama> enelar Wait, too early in my day: your query gets an OOM or the mongod?
[11:26:03] <enelar> kurushiyama, i am not quite sure whats happening. but i am getting OOM inside client. (which is PHP right now). This happens because database returning _BSON_ of data, which php trying to store into memory
[11:27:06] <enelar> And my question, is it expectable? Or it just bad coded library? (I am using https://github.com/mongofill/mongofill)
[11:28:20] <enelar> Let me come up again. Is collection->find() works similar to `sql select`
[11:29:42] <thapakazi> Hi there, does db.setProfilingLevel(2) requires restart, and any
[11:29:42] <thapakazi> better way to get insights on types of query mongo is
[11:29:42] <thapakazi> getting/processing for span of period like 6months or less ? THanks
[11:31:17] <kurushiyama> thapakazi What is you current profiling level?
[11:32:51] <thapakazi> "current profiling level ? " kurushiyama it was 0 and I am trying 1 and 2, but nothings happeing :(
[11:33:31] <kurushiyama> enelar Well, knowing poo about PHP and your data, I need a little more info. A find can potentially return huge datasets. Say you have a DB of 100k docs, and each is 10k in size – do the math. Now imagine you try to store that in an array.
[11:36:10] <kurushiyama> thapakazi I am not sure. I guess I restarted the mongod (since I usually change that stuff during maintenance). No problem if you have a replset: do it on the secondaries, restart them (one by one!), have the primary step down, set the level there, restart.
[11:37:53] <pamp> Hey its normal with WiredTiger have this high values of locks --> http://dpaste.com/08D97RV
[11:38:22] <kurushiyama> pamp That has little to do with WT.
[11:38:47] <kurushiyama> pamp Looks like a collscan
[11:39:55] <kurushiyama> pamp You made sure that the according query uses indices?
[11:41:33] <pamp> yes
[11:41:40] <pamp> I have all fields indexed
[11:41:46] <pamp> with hashed index
[11:42:51] <pamp> I thought that with WiredTiger We dont have global locks
[11:45:33] <kurushiyama> pamp Not global ones. But document level ones. And having all fields indexed does not say much. Say you have separate indices on `foo` and `bar`, but your query is {foo:"a",bar:"b"}, only the foo index will be used and then the bar condition needs to be searched i the docs found in the foo index.
[11:46:13] <pamp> hmm
[11:46:15] <pamp> thanks
[11:46:28] <pamp> I will analyse that
[11:47:28] <kurushiyama> pamp While not absolutely true, a good rule of thumb is "Only one index will be used per query."
[11:48:13] <kurushiyama> pamp As for the "not absolutely true" part: https://docs.mongodb.com/manual/core/index-intersection/
[11:50:15] <kurushiyama> pamp The problem with intersections is that you can not blindly rely one them – you have to investigate.
[11:50:50] <pamp> another question... its good insert 1000 docs in 120 millis? my docs have an average size of 3384
[11:51:01] <pamp> about the indexes thanks for your inputs
[11:51:22] <pamp> I'll investigate
[11:51:51] <kurushiyama> pamp You might want to think about having the indices hashed. Usually, this only makes sense if you want to use a monotonically increasing value as a shard key.
[11:52:30] <kurushiyama> Doing the math on your other question: that is 120µs/doc
[11:52:41] <enelar> I answered my question https://github.com/mongofill/mongofill/blob/c2a16b7cb6c8fa94aa4c286acd8b4a90c3a843f5/src/MongoCursor.php#L255
[11:52:44] <kurushiyama> For 3kb
[11:52:57] <enelar> Looks like everything is designed well. Will dig through my code
[11:53:02] <pamp> Its a good value?
[11:53:37] <kurushiyama> enelar My assumption is that there is sth like a toArray.
[11:53:57] <kurushiyama> pamp Well, in my book it is. Are you doing this with bulk ops?
[11:54:23] <enelar> kurushiyama, yes, but it fetches by 100 documents by default, which is ok
[11:54:37] <kurushiyama> enelar Stop
[11:54:41] <kurushiyama> enelar Batch size has nothing to do with that.
[11:54:43] <enelar> i could handle it now
[11:54:45] <pamp> yes bulk ordered insert
[11:54:46] <enelar> hm?
[11:54:54] <kurushiyama> enelar Say you have 1M docs matching.
[11:55:20] <kurushiyama> enelar If you store the content of the cursor in an array, those will be loaded.
[11:55:30] <enelar> ofc, i do not doing this
[11:55:49] <kurushiyama> enelar Yes, in batches of 100, but still, you end up with an array of 1M entries.
[11:55:54] <kurushiyama> enelar Well, just saying.
[11:56:00] <enelar> batch size means "how much at top cursor would store inside it system structure"
[11:56:08] <kurushiyama> pamp Does it need to be ordered?
[11:56:24] <enelar> but my script fails at very $res = collection->find();
[11:56:26] <kurushiyama> enelar I am _totally_ aware of what batch size means ;)
[11:56:30] <thapakazi> anything I missed, my chat logs are truncated :( kurushiyama ? regarding the query profiling ?
[11:57:04] <enelar> kurushiyama, i am glad we are synchronized. thanks for assistance. i will handle my fooliness)
[11:57:07] <kurushiyama> enelar That is what I said: I do not know how PHP works, but this looks to me like the whole result set is stored into $res
[11:57:30] <enelar> nah, it should work like reference in c++
[11:57:37] <kurushiyama> thapakazi Go to the logs, I wrote an answer
[11:57:46] <kurushiyama> thapakazi http://irclogger.com/.mongodb
[11:57:46] <enelar> or var a = ... in javascript. whatever
[11:57:51] <pamp> its not a requirement
[11:57:51] <thapakazi> hehe, so rude
[11:58:10] <enelar> s/reference/shared_ptr
[11:58:10] <thapakazi> thanks kurushiyama
[11:58:13] <kurushiyama> thapakazi Sorry, was by no means meant as rude.
[11:58:22] <pamp> but i made some teste, and I didnt see to much difference between ordered vs unordered
[11:58:52] <kurushiyama> pamp It will when you scale. And if you are just doing inserts, there is no benefit in ordered.
[11:59:54] <thapakazi> no worries,I am starting with secondary, ok I will try restart it
[12:00:38] <kurushiyama> thapakazi Depending on your load, be prepared to get _huge_ log files.
[12:01:08] <kurushiyama> enelar Well, if you say so, I have to take your word for it.
[12:01:55] <thapakazi> any thing inbuild for rotating _huge_ log files, or logrotate from system i could use ?
[12:01:58] <kurushiyama> enelar But actually the circumstances seem to back me.
[12:02:25] <kurushiyama> thapakazi use logrotate, but do not forget to SIGUSR1 mongod first.
[12:02:36] <thapakazi> thanks, I shall
[12:03:07] <kurushiyama> thapakazi And you definetly want to use bzip ;)
[12:03:57] <thapakazi> hehe, thanks again
[12:04:14] <kurushiyama> thapakazi Np.
[12:08:36] <thapakazi> hey, are there no restart alternative, with restart I have to warmup huge dataset again :(
[12:09:20] <thapakazi> I wish sth exists like config reload in nginx
[12:20:10] <thapakazi> luckey me, my team have mms installed inplace :)
[12:28:36] <goku147> if i change the mongoose schema and i add a lot of new fields, how do i make the previously entered documents to also have the same frields?
[12:35:12] <kurushiyama> goku147 That process is called "data migration" and you have to do it.
[12:35:47] <kurushiyama> goku147 Neither mongoose nor MongoDB can guess the data for said fields, right?
[13:33:08] <goku147> @kurushiyama YEs
[13:33:22] <kurushiyama> goku147 ?
[13:33:47] <goku147> kurushiyama: Yes, it doesn't allow me to!
[13:34:21] <kurushiyama> goku147 But I guess those new fields need to have a value of some sort?
[13:34:47] <goku147> yes
[13:34:59] <goku147> is it possible?
[13:35:34] <goku147> it would be wonderfull if i can do it in mongoose schema itself, because i have loads of data!
[13:39:29] <scruz> what should {$eq: [field, ‘4’]} return if: (1) field is an empty string, (2) if field is null?
[13:41:05] <scruz> my pipeline seems borked and is returning results inconsistent with direct querying
[13:51:10] <scruz> here’s part of my $project stage: https://bpaste.net/show/a0b7f3973bd6
[13:54:13] <scruz> when i query directly on the database, there are no records with status set to 4, but i am getting results in the ‘flagged and verified’ bucket
[13:54:23] <scruz> s/4/‘4’
[13:55:39] <scruz> simply put, the projection is supposed to assign each record into a bucket depending on the values of $status and $value.
[13:55:51] <scruz> but i’m getting records assigned to wrong buckets
[14:10:29] <logix812> during a findAndModify for a single document is $isolated implied?
[14:11:02] <logix812> aka: if 2 clients both issue a findAndModify, the first one there wins and the other waits?
[14:12:37] <scruz> sorry, i cleaned it up a bit: https://bpaste.net/show/4c5ff55a8d8a
[14:13:22] <saml> what's that
[14:14:27] <scruz> i don’t have any records with status set to 4, but i keep finding records in the “flagged and verified” bucket whenever i run the aggregation containing the last paste
[14:14:54] <scruz> so i’m wondering what’s wrong with my projection
[14:16:42] <saml> db.docs.find({status:4, value:2})
[14:17:05] <scruz> saml: it returns 0
[14:17:14] <saml> good
[14:17:19] <scruz> no, that’s not a query, it’s part of an aggregate pipeline
[14:17:39] <saml> i haven't seen bucket, if, then, ...
[14:17:44] <scruz> i then $group by bucket and $sum, and that’s where the issue is
[14:17:46] <saml> that's normal mongodb aggregation?
[14:17:54] <scruz> agg framework
[14:18:27] <saml> db.collection.aggregate() ?
[14:18:30] <scruz> yes
[14:19:32] <saml> is that the full aggregation?
[14:19:56] <trudko> hi everyone, I am playing with mongo and I've found query like this http://pastie.org/10895500 which seems extremelly complicated syntax wise
[14:19:58] <scruz> no, just the part of the project stage. specifically, the part that assigns the value of bucket
[14:20:25] <saml> do you ahve the full? i just wanted to try it on my own
[14:20:26] <scruz> trudko: that’s the aggregation framework
[14:20:54] <tantamount> How can I recover from "Fatal assertion 28579 NoSuchKey Unable to find metadata for table" on start?
[14:21:32] <trudko> scruz: s ok and so ?
[14:21:36] <trudko> sorry I am not familiar with it
[14:21:52] <saml> it's just syntax
[14:21:58] <scruz> trudko: what do you want to do?
[14:22:06] <saml> it's json.. so you can generate it at runtime as well
[14:22:48] <trudko> scruz: nothing in particular I am just curious if queries likes this are exception , because my first (uniformed) feeling is that this looks like complicated syntax, at least what I am used to seeing in sql
[14:23:06] <scruz> trudko: again, that’s not a query
[14:23:45] <saml> queries are usually simpler since there's no join
[14:24:34] <scruz> saml: https://bpaste.net/show/447fdd11b0e1
[14:24:51] <enelar> kurushiyama, actually mongofill storing everything in memory... https://github.com/mongofill/mongofill/blob/master/src/MongoCursor.php#L512
[14:25:08] <kurushiyama> enelar Ouch.
[14:25:19] <enelar> yep. cost me 3 hours
[14:25:47] <trudko> saml: so its is something like join in sql?
[14:25:50] <scruz> saml: https://bpaste.net/show/e8255f4204c5
[14:26:48] <scruz> that previous one had a typo
[14:27:28] <saml> scruz, what are you expecting as result?
[14:27:58] <scruz> i don’t expect to find any documents in the ‘flagged and verified’ bucket
[14:28:15] <scruz> i keep finding that the count for that is nonzero
[14:29:08] <scruz> db.coll.find({value: 2, status: 4}).count() <- returns 0
[14:30:14] <scruz> kurushiyama: re: my last paste. do you see anything wrong with the pipeline?
[14:30:14] <saml> db.coll.find({value: "2", status: "4"})
[14:31:10] <scruz> saml: you’re right, but that isn’t the issue
[14:32:37] <kurushiyama> scruz Not of the top of my head. But there are sample docs and a sample expected output missing.
[14:38:06] <scruz> https://bpaste.net/show/3babb7c3c4b6
[14:38:11] <tantamount> How can I recover from "Fatal assertion 28579 NoSuchKey Unable to find metadata for table" on start?
[14:39:23] <scruz> kurushiyama: added a new paste
[14:41:19] <saml> maybe there's a bug in nesting $cond like that
[14:52:11] <kurushiyama> tantamount A log excerpt and a bit more detailed explanation could help.
[14:59:30] <tantamount> I don't know what more I could tell you
[15:00:27] <tantamount> I've tried running --repair and it rebuilds the indexes but when I run without repair it drops them all with a series of messages like, "dropping unused ident: admin/index-3-5150148562311891939" and then proceeds to claim that it can't find that index with a message like, "Fatal assertion 28579 NoSuchKey Unable to find metadata for table:admin/index-3-8494706625464327628"
[15:34:47] <caliculk> Hey, just curious, for ubuntu, is the config file located at /etc/mongodb.conf is not in yaml, but the documentations on configuration options, say that you can use the previously mentioned file as a configuration file. So does that mean that YAML is optional or rqeuired?
[15:35:17] <caliculk> Or is the /etc/mongodb.conf only used for the daemon instaces?
[15:37:28] <saml> caliculk, use yaml
[15:37:58] <saml> you can specify configuration file to be used when you start mongod
[15:48:51] <UberDuper> I gave up on znc forever ago. I just irssi in screen.
[15:49:28] <UberDuper> irssi has a pretty minimal znc like functionality too but it's not pleasant to configure iirc.
[15:50:26] <GothAlice> Excepting this channel just now, and the occasional net/split, I've used a bouncer for several years without issue. (I.e. it's more helpful than not.) Screen, on the other hand, would segregate my data too far away from where I consume it, in a protocol nothing speaks. (I've seen Minecraft hosts trying to automate Screen… it's silly.)
[15:51:12] <GothAlice> UberDuper: Notably, I have four devices, and two servers, and push all listening to the bouncer.
[15:52:41] <UberDuper> I dunno what you mean by automating screen. But it's always just an ssh away. /shrug
[15:53:57] <GothAlice> UberDuper: My systems speak IRC, not "SSH with control codes to control the screen wrapper, with manual parsing of some visual representation presented by a full client". ;^P
[15:54:45] <UberDuper> Now I understand even less what you mean. :)
[15:56:11] <GothAlice> UberDuper: Screen presents a terminal window of a certain size. It requires a terminal emulator (i.e. Terminal.app, Putty, etc.) to hold a representation of the 2D grid of characters that is the screen, which Screen issues commands to update. (I.e. "this part updated, here's the new text".) Irssi is an IRC client, with a visual representation of IRC meant for humans.
[15:57:28] <GothAlice> Software systems (such as my logger, with data going back to 2001) aren't terminal emulators; the complexity of "teaching" a software system to interpret a grid of characters as having structure and meaning is… silly. They speak the IRC communications protocol directly, instead. Infinitely simpler and easier.
[15:59:28] <GothAlice> However, on the MongoDB front, I've been making… extensive use of index prefixing. (I.e. indexing {foo, bar} allows use when just searching {foo}.) I keep meaning to investigate index use improvements such as index merging; how far can MongoDB go these days in making use of independent indexes together? My concrete example is: {company, reference} in one index, {published, retracted} for range querying dates in the second. Merge-able
[15:59:29] <GothAlice> when searching company=, published<, retracted>?
[16:26:00] <sSs> hi there
[16:26:36] <sSs> a developer asked me for a mongodb server... What should I be aware of?
[16:30:00] <UberDuper> It's pretty straight forward to get up and running for dev/testing. You'll want to understand replica sets when you go into production.
[16:30:31] <UberDuper> IMO enable authentication from the very beginning.
[16:32:29] <sSs> should I start with 3 servers (primary, secondary + referee)?
[16:33:10] <UberDuper> For a production deployment, 3 is best. I'd recommend primary, secondary, secondary.
[16:34:30] <UberDuper> In general, I wouldn't use an arbiter.
[16:35:18] <GothAlice> sSs: Arbiters are primarily useful for interesting edge cases, such as having geographically distributed secondaries, to help control the network partition effect if backbone connections go down.
[16:35:31] <UberDuper> ^^
[16:36:46] <GothAlice> But you need at least three nodes (primary, secondary, arbiter, or primary and two secondaries) in order to maintain "high availability", i.e. access to your data if one of the nodes goes away. In the arbiter case, your data is guaranteed to switch to read-only in the event of a single failure, in the two secondary case, a new primary will be elected and things can continue as-is. I.e. it preserves write capability in the event of a
[16:36:46] <GothAlice> single outage.
[16:37:25] <GothAlice> Rather, in the even of the failure of a single data node. The arbiter going away wouldn't… really do much, of course.
[16:38:38] <GothAlice> sSs: Also, can't stress the "don't actually make it net accessible until you secure it" point that UberDuper made enough. The defaults assume a secure LAN or VLAN.
[16:40:30] <UberDuper> Just don't ever run mongo without auth.
[16:40:40] <GothAlice> You can, if you take other steps to secure it.
[16:41:11] <sSs> is there a guide to run mongo with auth?
[16:41:31] <UberDuper> Yes. The mongo docs site is quite good.
[16:41:34] <GothAlice> https://docs.mongodb.com/manual/core/authentication/ is the manual for it.
[16:41:49] <sSs> thank you very much
[16:41:49] <GothAlice> https://docs.mongodb.com/v3.2/tutorial/enable-authentication/ being one of the tutorials.
[16:42:11] <sSs> i asked the developer about space needed, he doesn't know how much
[16:42:28] <sSs> what is a good size to start?
[16:42:28] <UberDuper> Pretty standard
[16:42:48] <sSs> I will be using CentOS for the mongodb
[16:42:50] <GothAlice> MongoDB allocates files on disk using stepped power of two sizes, starting relatively small, eventually reaching a maximum size per on-disk slice.
[16:43:11] <GothAlice> If your client's database is small, you may wish to enable the smallFiles option to bump the starting allocation size down a bit.
[16:43:52] <sSs> the developer doesn't know the size...
[16:44:21] <GothAlice> Does he not have current or test data? Expectations around # records, and examples of the complexity of those records?
[16:44:25] <sSs> do you think a: 1core + 512Mb + 20Gb disk for each is a nice start?
[16:44:33] <UberDuper> It's not difficult to migrate to larger instances if you have to.
[16:44:49] <GothAlice> You can pretty easily do napkin calculations using http://bsonspec.org/spec.html to figure out the rough lower and upper bounds.
[16:45:17] <GothAlice> sSs: You may be interested in my FAQ: https://gist.github.com/amcgregor/4fb7052ce3166e2612ab
[16:45:27] <UberDuper> I'd recommend 2 cores and you're going to need more ram then that. If you're using WT, mongo wont even start with less then ~1GB of ram.
[16:45:30] <GothAlice> It goes into detail on memory allocation concerns.
[16:45:46] <sSs> thanks for the help
[16:46:18] <UberDuper> I'd start with a 5:1 disk to memory ratio.
[16:46:27] <UberDuper> And adjust as needed.
[16:46:33] <GothAlice> Aye, that's a good starting ratio.
[16:47:20] <GothAlice> sSs: One last point, if you want to ease some of the maintenance of your MongoDB cluster, you may wish to investigate https://www.mongodb.com/cloud
[16:47:26] <GothAlice> (Formerly the MongoDB Management Service, MMS.)
[16:47:51] <GothAlice> It provides some great analytics, too, to help you estimate growth, watch load, etc.
[16:47:59] <GothAlice> (In addition to making updates a total breeze.)
[16:48:24] <sSs> 5:1? 5GB HDD per 1Gb RAM ?
[16:48:33] <UberDuper> sSs: Yup.
[16:48:50] <GothAlice> sSs: Aye; see the "how much memory should I allocate" section of the FAQ for details on what the RAM is used for.
[16:48:59] <GothAlice> (And why having a lot of it relative to disk is good.)
[16:49:52] <UberDuper> You can get away with much less ram depending on your dataset/indexes/workload/etc.
[16:50:10] <GothAlice> But… that's a fine tuning process for the future, if your client doesn't currently have any statistics available.
[16:50:14] <UberDuper> But if you don't know what your dataset/indexes/workload is going to look like.. 5:1 is a reasonable starting point.
[16:50:32] <GothAlice> ^ +1
[16:56:32] <UberDuper> I have replsets at 15:1 pushing 90k qps. And some at 2:1 that struggle with 10k qps.
[16:57:30] <GothAlice> My at-home dataset (~35TiB or so) has a… 2240:1 ratio. It's… not fast. XD
[16:57:39] <UberDuper> haha
[16:57:43] <sSs> lol
[16:58:43] <GothAlice> (But fast isn't the point, utterly massive bulk GridFS metadata filesystem storage is.)
[16:58:49] <sSs> so, if 2 cores per VM, 10Gb of space for each with 2Gb of RAM to start messing with mongo
[16:59:21] <GothAlice> Aye; that's a good starting point. At work we have some production apps that don't need any more than that.
[16:59:56] <sSs> thanks for the info guys
[17:06:52] <GothAlice> It never hurts to help, sSs. :)
[17:10:15] <sSs> thanks
[18:00:07] <caliculk> So a few questions then about YAML config for replication, where can I specify the replication set? I see how to create a replicationset name, but, I don't see where to specify the IP addresses of the mongo instances?
[18:02:14] <GothAlice> caliculk: Replication state is stored in the "local" database within the DB engine, not configured via YAML.
[18:02:45] <GothAlice> (Maintained using the rs.* REPL commands, such as rs.init() and rs.reconfig())
[18:03:01] <GothAlice> https://docs.mongodb.com/manual/reference/method/js-replication/
[18:04:04] <GothAlice> https://docs.mongodb.com/manual/replication/ for overall documentation on replication.
[18:12:34] <caliculk> GothAlice alright, I will take a look there. Thanks.
[18:12:57] <GothAlice> It never hurts to help. :)
[18:13:35] <Doyle> Hey. Is there a way to prevent new DBs from being created on specif shards?
[18:13:57] <cheeser> Doyle: you can set permissions to just a set a users
[18:15:01] <caliculk> So GothAlice does this even need to be set then? https://docs.mongodb.com/manual/reference/configuration-options/#replication.replSetName
[18:15:53] <Doyle> hmm. Not sure if we're on the same page. Say you have a sharded cluster with one shard (rs) that shouldn't get any new db's landing on it. That's the scenario
[18:16:15] <GothAlice> caliculk: Yes; that informs the engine which replica set configuration to look for in the local DB. It must be set.
[18:17:33] <caliculk> Alright, and then, last question hopefully for a bit. Is it recommended (and if not recommended, what downsides is there) from feeding data into slaves, which then the salves replicate into a master? Say for instance, I have a bunch of individual databases that I want feeding all their information into one larger one. Would there be any performance impact?
[18:19:29] <cheeser> writes only go to a primary (mongodb doesn't have master/slave anymore)
[18:19:32] <GothAlice> caliculk: First up, there's a terminology issue. MongoDB does not use master/slave replication (hasn't since… 2.4? 2.2?) but instead uses primary / secondary replication. There are some subtle differences, mostly relating to high availability and (I think) how the data is pushed around. Are you actually referring to 2.2/2.4 master/slave, or modern primary/secondary?
[18:19:38] <GothAlice> Heh.
[18:21:42] <GothAlice> https://gist.github.com/amcgregor/df42f41f752f581eb116307175f3b301?ts=4 < phew! Document is a MutableMapping suitable for passing straight through to PyMongo. Phase 1 of deprecating MongoEngine is pretty much done. This relates to my earlier unanswered question about index merging: can a query on company=, (published unset or <), (retracted unset or >), make use of both _availability and _reference (or _template) indexes?
[18:22:05] <GothAlice> (Since now is proving to be a good time to review index usage and such.)
[18:23:44] <caliculk> So my idea is that we will be having three mongodb instances for HA, which will be directly connected to a webapp. While those synchronize, I also want a fourth mongodb instance that will be seperate and won't havea web application, but will be more or less "backing up" all the data from the three to the fourth mongodb instance
[18:24:38] <GothAlice> caliculk: A reasonable arrangement, as long as the web apps are actually on nodes other than the mongod ones; DB services should always be run in isolation. At work we have two replicas just for backups, that will never become primary.
[18:25:22] <GothAlice> Using: https://docs.mongodb.com/manual/tutorial/configure-secondary-only-replica-set-member/ and one of them delayed by 24h using https://docs.mongodb.com/manual/core/replica-set-delayed-member/
[18:39:58] <UberDuper> Doyle: I'm not sure of a way to prevent mongo from picking a shard to set primary on.
[18:40:51] <UberDuper> You might be able to set maxSize for each shard you *don't* want the primary on to less then the current disk allocation.
[18:41:22] <UberDuper> But I don't know if that would have the desired effect. It would also prevent chunk moves to those shards until you revert maxSize.
[18:42:01] <GothAlice> … or might even cause chunk moves in order to get the dataset size below maxSize.
[18:42:02] <UberDuper> Maybe use tags.
[18:42:22] <UberDuper> The balancer will never move a chunk off a shard to get it below maxSize.
[18:43:26] <UberDuper> It's one of the problems with the balancer.
[18:43:39] <GothAlice> Good to know. (maxSize isn't an option I've used, yet.)
[18:44:32] <Doyle> Getting some deep knowledge here
[18:45:35] <Doyle> With tag aware sharding, do db's still land anywhere/
[18:46:28] <Doyle> Sharding is at the collection level, but is there no flag to set in create db to land the db on a particular tag?
[18:47:41] <UberDuper> doh re: tags. I thought they worked a little different than that.
[18:47:58] <UberDuper> You may be stuck just having to movePrimary() post-creation.
[18:50:36] <UberDuper> It's an interesting question. Lemme ping one of our dbas.
[18:50:43] <Doyle> Thanks UberDuper
[18:50:52] <Doyle> It would be useful
[18:51:07] <Doyle> Rather than movePrimary and reload all the mongos's
[18:51:56] <UberDuper> A flushRouterConfig() should work fine when doing a movePrimary on an unsharded collection.
[18:52:18] <UberDuper> Rather then restarting the mongos.
[18:52:47] <Doyle> Yea, that's what I meant to say, but couldn't think of it.
[18:54:08] <UberDuper> Nope. We just movePrimary here.
[18:56:14] <Doyle> Oh well. I was hoping to prod a fancy unknown command out of someone here. It's as I understood then. Tag awareness can assure chunks are migrated only to specifically tagged shards, but the DB has to be manually moved. Thanks UberDuper
[18:58:07] <UberDuper> I'm very surprised there isn't an option for it during the create.
[18:58:12] <Doyle> Tiny problem to deal with, but it would be nice to have a create db command that would allow the specification of a tag to adhere to when selecting a shard to create the db at
[18:59:36] <Doyle> I think that's the only item on my wishlist for Mongodb atm. 3.0 and 3.2 took care of the other, mostly locking related wishlist items.
[19:05:33] <UberDuper> Wonder if the mongos would be smart enough to recognize if you created the DB at the shard.
[19:05:55] <UberDuper> My faith in mongos leads me to believe it wouldn't.
[19:05:59] <Doyle> Na, the mongos won't see it until you flush
[19:06:10] <Doyle> Looked at that myself :P
[19:08:10] <n1colas> Hello
[19:08:23] <Doyle> yo
[22:13:05] <_ohm> is there a limit on how any connections and how many times/s a mongodb can be queried? When I have one VM flooding the DB everything is fine, when I have 2 I get NoneType errors and both crash
[22:15:37] <GothAlice> _ohm: https://gist.github.com/amcgregor/4fb7052ce3166e2612ab#should-i-adjust-ulimit-resource-limits-on-mongodb-servers
[22:15:53] <_ohm> GothAlice, thanks
[22:16:24] <GothAlice> :)
[22:21:48] <_ohm> :\
[22:32:54] <GothAlice> _ohm: Is there anything in the logs to indicate issues?
[22:33:27] <_ohm> GothAlice, still doing the ulimits, debian is a little wonky
[22:33:54] <_ohm> recommended ulimit for threads was 64000, mine was 400
[22:34:01] <_ohm> well, 'threads'
[22:34:11] <GothAlice> That will certainly have an impact! :o
[22:34:38] <_ohm> we'll see, the ulimit command isn't permanent, so i have to edit config files manually
[22:42:55] <_ohm> didn't work :\
[22:43:06] <_ohm> increased maximum processes to 64000 rather than 400
[22:43:35] <_ohm> same result
[22:44:53] <_ohm> there's nothing in the logs that indicates anything
[22:45:13] <_ohm> just simply connections accepted and end connection
[22:48:12] <_ohm> would it be better if i had a mongodb cluster rather than just one vm for it?
[22:55:55] <cheeser> _ohm: https://www.mongodb.com/cloud
[22:55:56] <cheeser> :D
[22:56:27] <_ohm> cheeser, thanks i'll check it out
[22:56:50] <_ohm> what
[22:56:51] <_ohm> no
[22:58:26] <_ohm> i'm doing a personal project right now
[23:03:13] <cheeser> :D
[23:03:19] <cheeser> well, when you need it. ;)