PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Tuesday the 19th of January, 2016

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:15:19] <gottreu> I have a mongoldb instance on a VPS. Can I safely hand out mongodb accounts to strangers on the internet without fear?
[00:25:42] <StephenLynx> if you set authentication properly, yes
[04:58:41] <jr3> can someone provide some clarity on "the dataset should fit in memory"? Does this mean that if I have a db that's 16gigs I should have 32 gig to bring it all in memory?
[05:04:44] <Boomtime> @jr3: where does it say "the dataset should fit in memory"?
[05:04:55] <Boomtime> perhaps you mean "the working set should fit in memory"?
[05:04:59] <jr3> er
[05:05:01] <jr3> yes
[05:05:04] <Boomtime> okie
[05:05:29] <jr3> how do I determine what my working set is?
[05:05:32] <Boomtime> working set is a bit trickier to work out - there is some documentation on how to go about that but it gets a bit waffly
[05:05:52] <Boomtime> you can start with how broadly you hit indexes, and how many of them you hit
[05:06:12] <Boomtime> you really want the pertinent parts of indexes to fit into memory together
[05:06:34] <Boomtime> db.stats() and the like are your friend
[05:47:31] <Waheedi> i currently have two kvm-hosts each with two guests running mongod (replicaset), the two hosts are located in different physical location, i have applications that are mainly doing reads on the db on both locations, whats the best way to make applications read only from its physical hosts and I can take care of the writes problem :)
[05:47:47] <Waheedi> would nearest read preference be my best option?
[05:48:16] <Waheedi> db version v2.4.14
[05:50:27] <Waheedi> "nearest" can be done at the initiation of connection or it needs to be added with each query as an option?
[05:50:41] <Waheedi> of the* connection
[05:52:55] <Waheedi> hopefully i don't have to keep my connection up for the coming 24 hours :P
[05:53:05] <Waheedi> I'm using hotspot ;)
[06:42:49] <Boomtime> @Waheedi: read-pref can be specified in the connection-string, or it can be overridden per operation
[06:43:07] <Waheedi> thanks Boomtime
[06:43:13] <Boomtime> 'nearest' is probably what you want, but you should be aware of what it means;
[06:43:29] <Boomtime> it is a measure of how low the ping is
[06:43:38] <Waheedi> which is my favorite :)
[06:44:01] <Boomtime> there is also a threshold value, between which all servers will be considered equal - this value is up to the driver, but i think they default to 15ms
[06:44:35] <Boomtime> anyway, whatever the value is, if all servers measure ping within that threshold range then they are all equal according to 'nearest'
[06:44:40] <Waheedi> yeah i just wanted to make sure that ping is a major factor
[06:45:15] <Waheedi> I'm having 80ms on some far away nodes
[06:45:15] <Boomtime> https://docs.mongodb.org/manual/reference/read-preference/#minimize-latency
[06:45:32] <Waheedi> so 15ms would definitely default to my closest secondaries or primary!
[06:46:07] <Boomtime> ok, so example time; if your app measures ping to server 'A' at 10ms, and server 'B' at 30ms, then nearest will pretty much always go to server 'A'
[06:46:47] <Waheedi> and do you know every how often it updates its ping values?
[06:47:05] <Boomtime> but if server 'B' improves a little, and server 'A' gets a bit burdened and slows time slightly, making pings come out at 13ms for server 'A' and 25ms for server 'B', then suddenly they are 'equals'
[06:47:19] <Boomtime> slows time? i meant, slows down
[06:47:41] <Boomtime> update rates are driver specific as well... but i think there is a guide in the spec
[06:48:03] <Waheedi> thank you so much Boomtime
[06:48:06] <Waheedi> that really helped
[06:48:11] <Boomtime> https://github.com/mongodb/specifications/blob/master/source/server-selection/server-selection.rst
[06:51:12] <Boomtime> https://github.com/mongodb/specifications/blob/master/source/server-selection/server-selection.rst#heartbeatfrequencyms
[06:51:37] <Waheedi> yes
[06:51:49] <Boomtime> that last link clarifies the setting used by drivers to determine frequency - doesn't actually show a default though :/
[06:51:53] <Waheedi> https://github.com/mongodb/specifications/blob/master/source/server-discovery-and-monitoring/server-discovery-and-monitoring.rst#heartbeatfrequencyms
[06:52:12] <Boomtime> ah, 10 or 60 seconds
[06:52:13] <Waheedi> defaults to 10 seconds
[06:52:25] <Waheedi> lol
[06:53:22] <Waheedi> hmm
[06:55:48] <Waheedi> its very moody this heartbeatFrequencyMS
[06:59:00] <Waheedi> ok the question then, would nearest work on the primary's ping results? or the driver's?
[06:59:10] <Waheedi> I'm getting lost
[07:07:07] <Boomtime> @Waheedi: the driver makes the decision based on what it can connect to
[07:07:38] <Waheedi> yeah that makes more sense
[07:08:01] <Boomtime> all writes go to the primary, regardless so that value doesn't matter, a query must go to a functional and accessible member first most, and then the most appropriate one if there are multiple candidates
[08:48:26] <grepwood> hello everyone
[08:48:42] <grepwood> I've got a restore job that's been stuck on 69% since forever
[08:48:49] <grepwood> and another that's on 54%
[08:49:07] <grepwood> all of those jobs seem to start off fast, then stop and pretend to keep on going, but not really
[08:49:19] <grepwood> can anyone please tell me what could be the problem?
[09:24:15] <m3t4lukas> grepwood: did you monitor the resources?
[09:26:39] <grepwood> m3t4lukas, no, how do I do that?
[09:30:32] <m3t4lukas> grepwood: what OS do you use?
[09:32:45] <grepwood> ubuntu
[09:48:01] <m3t4lukas> grepwood: then please use the ubuntu system monitoring tools. you could use top for knowing whether the process is working and for ram usage, you could view the logs of mongod and you could use df for analyzing your disk space
[09:48:25] <grepwood> m3t4lukas, please tell me it's not a graphical program
[09:49:00] <m3t4lukas> there are also mongostat and mongotop for diagnosing and monitoring purposes
[09:49:12] <m3t4lukas> grepwood: of course not
[09:50:31] <m3t4lukas> grepwood: here is also some resources for your monitoring stuff: https://docs.mongodb.org/manual/administration/monitoring/
[10:20:12] <ramnes> why does MongoDB use a [long, lat] couple for geo points, while it seems to me that everyone else in the world uses [lat, long]?
[10:20:52] <Derick> geojson uses long, lat
[10:20:56] <Derick> it's a standard that we use
[10:21:01] <Derick> it follows: x, y
[10:21:19] <ramnes> s/MongoDB/geojson/, then
[10:21:20] <ramnes> :p
[10:21:29] <Derick> ask the geojson spec board :)
[10:21:39] <ramnes> feels really weird
[10:23:09] <Derick> http://geojson.org/geojson-spec.html#positions
[10:23:32] <Derick> it's btw not weird if you look at so many other things with geo
[10:23:53] <Derick> bounding box coordinates, can be any of : n, s, e, w; n, e, s, w; ... and reversed versions
[10:24:14] <Derick> and IMO, having it follow x, y, z (like in proper charts) also makes sense to me
[10:26:33] <Derick> http://www.macwright.org/lonlat/ is a nice FAQ on it
[13:49:34] <jamiejamie> Hey, I've got a pretty simple database to make, but I'm really unsure how to structure it. It's a database that'll store some information about stuff in a game, for example.. "cards, abilities, heroes". I'm considering just putting these all in an "items" collection, since I want to be able to search all from a single box and just get the closest match, is that really bad practice, just having one big collection?
[13:49:37] <jamiejamie> Or should I separate into cards, heroes, abilities collections, and then have an "items" collection which just has records that points to the ID of its respective item
[13:49:57] <jamiejamie> Almost like an index collection I guess, which would just purely be used for autocomplete
[14:03:26] <m3t4lukas> jamiejamie: only polymorphic stuff into the same collection. At least what best practice is concerned. You can put into one collection whatever you think needs to be together
[14:04:26] <jamiejamie> I mean it could work in both cases I guess m3t4lukas, they're all technically "items", but I feel like smaller collections by item "type" is still the way to go, but maybe that's SQL talking to me
[14:04:47] <jamiejamie> I guess I came here for reassurance that splitting into smaller collections like that is OK
[14:05:19] <deathanchor> splitting now insures you don't have to split it later :D
[14:16:16] <dretnx> .
[14:17:07] <dretnx> I want use gridfs with other language, not node.js. Is it possible to upload chunk by chunk to gridfs file as it is in nodejs?
[14:17:33] <cheeser> "chunk by chunk?"
[14:18:02] <dretnx> not all data in one call
[14:18:38] <dretnx> I mean the way stream works in node
[14:18:44] <cheeser> does the node.js api support that? i think the java api just takes an InputStream
[14:28:26] <StephenLynx> dretnx, yes. that implementation is on mongo side.
[14:28:56] <StephenLynx> the node driver just implements the spec.
[14:32:59] <StephenLynx> the important question is that if your driver implements a high-level behavior on top of the spec or if you will have to implement the stream abstraction.
[14:55:00] <m3t4lukas> jamiejamie: I'd definitely use different collections. Even if it's just for the sake of aggregation performance
[15:45:37] <Svedrin> is there an operator or stage in the aggregation pipeline that does the equivalent of "for( var varname in this.data ){ emit(varname, this.data[varname]) } "?
[15:46:10] <Svedrin> so, kinda like $unwind, just for objects instead of arrays
[15:52:38] <cheeser> a $flatten. not really not.
[15:55:40] <GothAlice> Svedrin: Well, $project can sorta do that. It lets you re-name fields, including un-nesting them if desired. The top level is always a document, though, not a bare value, unlike map/reduce which may use bare values. I think. Haven't used map/reduce since aggregates were introduced. :P
[15:56:42] <cheeser> but with a $project, you'll have to list each field explicitly.
[15:56:47] <Svedrin> GothAlice, I didn't find a way to use $project without having to explicitly specify all the field names... which I can't really do because data can have kinda-arbitrary keys
[15:57:10] <cheeser> i should write a $flatten. see just how hard it is to write new operators.
[15:58:05] <Svedrin> the documentation states »you don't want to store {key: value}, store [{key: "key", value: "value"}] instead«
[15:58:13] <Svedrin> but I don't really understand the difference
[15:58:45] <Svedrin> what is it that makes a list of {key: k, value: v} dicts easier to process than a {k: v} dict itself? :/
[15:59:29] <GothAlice> I have a slightly higher level example of that mandate from the documentation.
[15:59:52] <GothAlice> https://gist.github.com/amcgregor/aaf7e0f3266c68af20cd
[16:00:39] <GothAlice> When you have "arbitrary" fields, if you ever need to search the contents of those fields, or search for the presence of one of those fields, you either have to manually construct indexes for each one (which is kinda nuts, and very expensive), and on the latter you have to use $exists, which can't use an index.
[16:00:42] <GothAlice> Highly sub-optimal.
[16:01:25] <GothAlice> The "good" version stores the name and value as sub-document fields in an array/list. It's then trivial to index both the field names and values, while still maintaining the ability to add and remove them, and update them individually using $elemMatch updates.
[16:01:28] <GothAlice> Highly efficient.
[16:01:58] <Svedrin> makes sense, thank you :)
[16:02:01] <GothAlice> This is a tough lesson to learn once you have data stored the wrong way.
[16:02:37] <cheeser> haha. yeah. migration would be a bitch.
[16:02:48] <GothAlice> That Company example is real. >_<
[16:02:52] <GothAlice> cheeser: It was.
[16:03:27] <cheeser> aggregation with $out? or did this predate that stage?
[16:03:38] <GothAlice> Pre-dated by a bit. :P
[16:06:12] <GothAlice> cheeser: https://jira.mongodb.org/browse/SERVER-15815 is now more than a year old. *prods this to get fixed before adding new things*
[16:06:33] <GothAlice> Since that ticket is solely responsible for marrow.task not being released.
[16:06:57] <StephenLynx> oh yeah, that expiration of tailable cursors are ass.
[16:07:23] <cheeser> that needs to be a jira tag
[16:07:27] <GothAlice> I find it mildly amusing that it flies in the face of how queries are documented to work. Tailable cursors are special snowflakes.
[16:07:28] <StephenLynx> it also boned me when I had to add some sort of IPC between the daemon and the terminal
[16:07:41] <StephenLynx> I ended up using unix sockets
[16:07:55] <GothAlice> (They shouldn't be special snowflakes.)
[16:08:02] <StephenLynx> which caused me a different plethora of issues, but at least it worked.
[16:08:39] <cheeser> well, TCs were added later to support replication. not entirely suprising they'd be a slightly different code path.
[16:08:51] <GothAlice> cheeser: marrow.task being a pure-MongoDB alternative to Celery or other distributed RPC / task workers, with support for things Celery can only dream of, like deferred multi-host generator chains.
[16:08:52] <cheeser> still, the inconsistency is a bit of a pita, i'm sure.
[16:09:13] <cheeser> i was just think of writing a messaging/queuing POC on mongodb actually.
[16:09:45] <cheeser> python. pffft. :D
[16:09:55] <GothAlice> https://gist.github.com/amcgregor/4207375 < m.task has basically existed for 4 years, and this has been a problem the entire time.
[16:09:55] <StephenLynx> GothAlice, why not using TCP for multi-host messaging?
[16:10:18] <GothAlice> StephenLynx: 1.9 million DRPC calls per second with two consumers (workers) and four producers on one host.
[16:10:25] <GothAlice> StephenLynx: Because performance isn't the bottleneck. :P
[16:10:41] <StephenLynx> ok, but mongo just can't do that due to that expiration on tailable cursors.
[16:10:48] <GothAlice> (And being query-able has all sorts of benefits.)
[16:10:54] <GothAlice> Not quite.
[16:10:57] <StephenLynx> that I can agree though.
[16:11:09] <GothAlice> A task worker/runner never "stops", really, so timeouts are a non-issue.
[16:11:17] <GothAlice> At least for fire-and-forget tasks.
[16:11:37] <GothAlice> It's the producer waiting on the result of one of those tasks that gets taken out back and shot by the timeout behaviour.
[16:12:32] <GothAlice> This mostly matters for the aforementioned distributed generators, where multiple workers are (basically) waiting on each-other.
[16:14:34] <GothAlice> Fun fact: from the project those presentation slides came from, we calculated out that one million simultaneously active games would require 8GB of capped collection to avoid roll-over. I've never heard of people using capped collections that large before. XP
[16:15:34] <StephenLynx> kek
[16:31:30] <tantamount> Is it possible to sort a collection in a document during an aggregation? That is, I want to sort a field rather than the entire collection of documents
[16:32:03] <GothAlice> tantamount: Hmm, $unwind, $sort, $group would do it.
[16:33:05] <tantamount> Thanks GothAlice :)
[16:34:04] <GothAlice> cheeser: Wagh; $sample was added, yay, but doesn't accept a PRNG seed so it's impossible to have deterministic behaviour. :(
[16:36:26] <GothAlice> Consider: "featured products today" with a seed that changes at midnight.
[16:37:28] <GothAlice> (That'd let you, with say, the current Julian calendar date as the seed, easily go to any point in time to see what the featured products were on that day.)
[16:39:12] <cheeser> GothAlice: file a Jira (tee hee) but that's not quite what $sample was intended for
[16:39:55] <GothAlice> Random sampling of documents implies a RNG or PRNG, docs mention a PRNG, it needs to be seeded. Not allowing that seed to be passed through is unintentionally restrictive on use of the stage.
[16:39:55] <cheeser> it's intended to get a representative sample of your documents for schema evaluation (both in Compass and the BI Connector)
[16:41:21] <GothAlice> https://jira.mongodb.org/browse/SERVER-22068 is close.
[16:41:49] <GothAlice> https://jira.mongodb.org/browse/SERVER-22069 okay, ouch, this rabbit hole just keeps going. XD
[16:42:35] <cheeser> 3.4 planning is going on now so now would be the time to file/vote
[16:42:48] <cheeser> *might* be a 3.2.x fix depending...
[16:43:30] <GothAlice> "We've done quite a bit of testing to confirm the WiredTiger PRNG doesn't cycle and returns uniform results across the space…" — that's worrying. Uniform distribution is also un-random. 1,1,1,1,1,1,1,1,1 is just as likely a random sequence as any other, despite apparent bias. :P
[16:45:03] <cheeser> http://dilbert.com/strip/2001-10-25
[16:45:15] <GothAlice> Hehe.
[16:45:39] <GothAlice> Entropy correlation testing is its own thing. The key is that the next generated value shouldn't be predictable, not that it isn't the same as the last few generated. ;)
[16:57:10] <GothAlice> cheeser: https://jira.mongodb.org/browse/SERVER-22225
[16:58:11] <cheeser> +1
[17:01:34] <GothAlice> Hehe. Fun fact of the day: non-repeating was the fatal flaw in the original Enigma machine.
[17:02:23] <GothAlice> (I.e. A could never be encoded as A in the cipher text.)
[17:02:32] <GothAlice> ;)
[17:02:58] <GothAlice> That combined with known plaintext, of course.
[17:04:07] <Waheedi> in dbclient.h file i can't find the ReadPreferenceSetting Struct
[17:04:30] <Waheedi> I'm just wondering
[17:04:50] <GothAlice> mongo-cxx-driver or the server?
[17:05:03] <Waheedi> mongodb-dev package
[17:05:23] <Waheedi> mongo-cxx-driver
[17:05:58] <GothAlice> dbclient_rs.h
[17:06:04] <Waheedi> its there
[17:06:38] <Waheedi> its not there too
[17:06:38] <GothAlice> https://github.com/mongodb/mongo-cxx-driver/blob/legacy/src/mongo/client/dbclient.h#L59
[17:06:46] <GothAlice> And it's in dbclient.h by virtue of being directly included.
[17:07:18] <Waheedi> GothAlice: yeah I know its included in dbclient but in dbclient_rs i can't find the struct too
[17:07:31] <GothAlice> https://github.com/mongodb/mongo-cxx-driver/blob/legacy/src/mongo/client/dbclient_rs.h#L363
[17:07:44] <GothAlice> C is weird. They abstractly define it at the top of the file, and fill it out at the very bottom.
[17:08:50] <StephenLynx> forward-declaration
[17:09:14] <Waheedi> yeah GothAlice in that version i can see it but not on the version I'm running, mongodb-dev 1:2.0.4-1ubuntu2.1
[17:09:33] <Waheedi> maybe i need to consider upgrading but that would really require more resources than i can afford at the time being :|
[17:10:34] <GothAlice> Considering 2.4 isn't even supported, 2.0 is… nuts.
[17:11:14] <Derick> I had somebody ask about 1.6 the other week...
[17:12:19] <GothAlice> I think I've still got 1.2 buried somewhere in my exocortex codebase...
[17:13:16] <GothAlice> Yup, it's still happily chugging away in a VM. XD
[17:17:31] <StephenLynx> kek
[17:17:47] <StephenLynx> I got a dude running 2.4 on a server for lynxchan
[17:17:56] <StephenLynx> it would work, until midnight
[17:18:14] <StephenLynx> because there was a query that required an operator that didn't exist back then
[17:18:44] <StephenLynx> it crashed a couple of times before I noticed the pattern and asked for his db version
[17:19:19] <StephenLynx> they never read the doc
[17:24:15] <Waheedi> i feel old now :S
[17:25:07] <Waheedi> thank you :) I'm still 30 years old though
[17:25:28] <StephenLynx> ek
[17:25:29] <StephenLynx> kek
[17:47:06] <NoOutlet> Hey GothAlice.
[17:47:16] <GothAlice> Hey-o.
[17:47:34] <NoOutlet> I don't know if you remember me. We spoke awhile ago.
[17:48:02] <NoOutlet> :)
[17:48:45] <NoOutlet> Well, I got an email from the MongoDB Advocacy Hub about nominating a MongoDB Master. Do you know if you're nominated yet?
[17:49:03] <GothAlice> Uh, no, I have no suspicion of being nominated for that.
[17:49:04] <Waheedi> this is the version https://github.com/mongodb/mongo/blob/ae1ecd9c786911f9f1f0242f0f7d702b3e5dfeba/src/mongo/client/dbclient_rs.h
[17:49:05] <Waheedi> lol
[17:49:13] <Waheedi> its like 101
[17:50:56] <Waheedi> its really huge this mongo :)
[17:51:17] <GothAlice> A database engine is no simple affair.
[17:51:30] <Waheedi> surely but not that huge too :P
[17:51:31] <NoOutlet> The Advocacy Hub advertises it like "Nominate yourself for the Masters Program." I don't consider myself a master, but I do consider you one.
[17:52:23] <NoOutlet> But if you're not interested in the Advocacy Hub and all that stuff, no worries.
[17:53:03] <GothAlice> NoOutlet: There are too few hours in the day; I haven't investigated it, but I will do so in the near future.
[17:53:31] <GothAlice> Waheedi: You have me curious. I'm now collecting the source for MongoDB, Postgresql, and MariaDB, and will now run sloccount across them for comparison. :)
[17:53:55] <Waheedi> lol :)
[17:54:17] <Waheedi> well thats the way to go
[17:55:40] <NoOutlet> How about SQLite3 too?
[17:55:52] <GothAlice> Waheedi: Postgres core server: 745,541 SLoC. MariaDB (MySQL fork): 2,024,839 SLoC. MongoDB 1,959,337 SLoC. SQLite won't even compare. :P
[17:56:10] <Waheedi> that was fast!!
[17:56:15] <GothAlice> Now, these are "source lines of code", thus excluding comments and blank lines, and de-duplicating source files by hash.
[17:56:50] <GothAlice> MongoDB code is reasonably well commented and structured; can't speak to the other codebases.
[17:57:21] <Waheedi> postgres seems light!! ha
[17:57:34] <GothAlice> Well, that was just the "core server"; none of the fancy additions like JSON support and such.
[17:59:04] <Waheedi> thank you for that GothAlice, its always good to know
[17:59:55] <Waheedi> the weather is a bit crazy in Sacramento today the trees about to flee :)
[18:02:17] <NoOutlet> Well I have to go. I'm at work and probably chatting on IRC is not okay. Hopefully I'll have some more free time soon though and I've got a MongoDB project that I'm working on so I'll see you all later.
[18:24:26] <GothAlice> Also, any way to challenge the MongoDB U course certifications? :D
[18:24:54] <StephenLynx> what even is this advocacy hub?
[18:25:09] <GothAlice> https://mongodb.influitive.com
[18:25:55] <StephenLynx> >requires an account
[18:25:58] <GothAlice> The paying for reviews "challenge" is a bit dubious.
[18:26:01] <StephenLynx> >twitter login doesn't work
[18:26:07] <GothAlice> Huh, that worked for me.
[18:26:08] <StephenLynx> can you give me a tl,dr?
[18:26:55] <StephenLynx> I had to middle click the button, then it still asked for a password
[18:26:57] <StephenLynx> ffs
[18:26:58] <GothAlice> http://s.webcore.io/2t3V3C2Z1o3g < basically a gamified education and outreach portal. Complete challenges, rank up. Challenges range from "go over here and read this important bit of documentation" to "write a blog post".
[18:27:05] <StephenLynx> meh
[19:04:08] <zivester> hmm.. i dont think a mongodump/mongorestore can restore to a new database with the --archive option, is that true ?
[19:57:24] <Waheedi> at which version ReadPreferenceSetting got introduced to dbclient I need to use nearest read preference and not sure which version is the closest version to 1:2.0.4-1ubuntu2.1 that is ReadPreferenceSetting enabled?
[20:00:21] <GothAlice> Waheedi: Sharding is required for readPreference support, i.e. the old replication mechanism used "slave_ok" instead. That'd make it 2.2.
[20:01:32] <GothAlice> Ref: https://docs.mongodb.org/v2.2/applications/replication/#replica-set-read-preference
[20:02:02] <GothAlice> Rather, the new replication mechanism, not sharding itself. (Replication vs. master/slave.)
[20:03:12] <MacWinner> GothAlice, are you updgrade to 3.2 with wiretiger?
[20:03:37] <GothAlice> MacWinner: I am. My stress test issues have been resolved for that version, and I'm using WT with compression in production to great effect now.
[20:04:10] <MacWinner> awesome! what's your compression ratio looking like?
[20:05:21] <GothAlice> That'll be very dataset-dependent. On one production set, which is mostly natural language text, I'm getting a ~60-70% reduction. On my pre-aggregate data which already squeezes every possible byte out through key compression, the results are far less impressive. (10% or so.)
[20:05:32] <MacWinner> also, do you recommend first upgrading to WT on my 3.0.8 replica set, then upgrading to 3.2? or upgrade to 3.2+WT in a rolling basis
[20:06:05] <GothAlice> I always recommend upgrading, then testing, one major version upgrade at a time. There are things like index and authentication migrations that need to be performed, and safest is to do so one step at a time in a controlled manner.
[20:06:19] <GothAlice> I wouldn't switch to the WT engine until 3.2, however.
[20:06:50] <GothAlice> Mixed-version clusters (across major versions) are problematic for the same differing-schema reason.
[20:07:36] <MacWinner> i'm trying to find a good guide from someone who has done this. i feel like i have a pretty vanilla setup with e a 3-node replica set on 3.0.8.. not using WT
[20:08:30] <GothAlice> https://docs.mongodb.org/manual/release-notes/3.2-upgrade/ is a complete outline of the process.
[20:08:41] <GothAlice> Each major manual version includes step-by-step instructions for upgrading.
[20:39:53] <MacWinner> should this upgrade plan from 3.0.8 to 3.2.1 work for replica set. 1) shutdown secondary node. 2) upgrade binaries 3) convert config to YAML (including WT) 4) delete data files 5) start secondary node..
[20:40:13] <MacWinner> I'm thinking this will just go through the standard resync process for a secondary node.. but it will use WT as the storage engine
[20:40:30] <MacWinner> after all nodes are upgraded in this fasion, then I can upgrade the replication protocol version
[20:48:52] <Doyle> Hey. Backup question. In a sharded cluster, if you have to recover from a config server backup, is the cluster able to re-discover any migrations/splits, etc that happened after the backup took place?
[20:52:05] <Doyle> V3.0 https://docs.mongodb.org/v3.0/tutorial/backup-sharded-cluster-metadata/
[20:52:09] <Boomtime> @Doyle: migrations and splits are stored in the config servers, if you restore a config server to a point before the data was moved then it's gone - just like if you restore your harddrive to a point before you copied some data to it, how can it 'rediscover' what you copied to it?
[20:52:12] <GothAlice> Doyle: So… you're asking if a snapshot can predict the future?
[20:52:32] <cheeser> my crystal ball says ... "maybe?"
[20:52:46] <GothAlice> My 8-Ball came back with "Looks dubious."
[20:52:54] <Doyle> Well, if you have a busy sharded cluster, what's the point of a config server backup?
[20:53:10] <Boomtime> by itself?
[20:53:17] <cheeser> GothAlice: a little early in the day to be doing that kind of stuff, no? early in the week for that matter!
[20:53:52] <Doyle> I see the big picture. To be used in conjunction with other backups
[20:53:54] <GothAlice> cheeser: Not that kind of 8-ball. :P
[20:54:05] <Doyle> To be used in conjunction with an entire cluster backup
[20:54:25] <GothAlice> Exactly, Doyle. A backup of a config server is somewhat useless without the data it refers to.
[20:54:46] <GothAlice> Apologies if any of the above came across as snark. ;P
[20:54:48] <cheeser> GothAlice: oh! :D
[20:55:21] <Doyle> Thanks guys
[20:56:56] <Doyle> if you have to restart a config server, it's just: stop balancer, wait for migrations to finish, do restart, start balancer, right?
[21:03:14] <Boomtime> @Doyle: sure, that's probably the most polite way of doing it
[21:04:30] <Doyle> ty Boomtime
[21:29:30] <cornfeedhobo> hello. i am still tackling an issue that is driving me nuts.
[21:30:11] <cornfeedhobo> I have a cluster that i am playing with replication on, and while testing fail-over scenarios, I have found that the instructions on https://docs.mongodb.org/manual/tutorial/configure-linux-iptables-firewall/ don't seem to work
[21:31:13] <cornfeedhobo> i can get the cluster happy and all, but if i take a node down, give it a new ip, start it back up, and update the /etc/hosts entry for the node that was cycled, and then everything just hangs saying it is waiting for a config
[21:32:20] <Boomtime> @cornfeedhobo: you modify the /etc/hosts entry on the node that changed? nowhere else?
[21:32:40] <cornfeedhobo> Boomtime: no, on the master. sorry, should have been more clear
[21:33:02] <Boomtime> ok, how many members do you have? just two?
[21:33:16] <cornfeedhobo> 4
[21:33:33] <cornfeedhobo> (including the master)
[21:33:47] <magicantler> Could I use sharding to act as a sort of load balancer for users? Do a hashed shard on a consistent field where the values of 1, 2, or 3, and give each user a value?
[21:34:01] <Boomtime> so out of 4 members, how many of them know about the new name you put in /etc/hosts?
[21:34:31] <Boomtime> also, please don't use the term 'master' that is old terminology and actually refers to a different mode
[21:34:42] <Boomtime> primary/secondary instead
[21:36:17] <cornfeedhobo> Boomtime: good point re:master. will use "primary". currently, i have only tested by updating the host entry on the primary. the reason was that in testing, i found that after bringing the cycled node back up, everything works/connects properly if i flush iptables for ~30s before replacing them how they were.
[21:37:33] <cornfeedhobo> that is where i have become lost. everything works, but i just have to flush iptables for 30s for the cycled node to receive a config/resync
[21:37:47] <Boomtime> the replica-set config uses names? or ips?
[21:38:44] <cornfeedhobo> names
[21:39:43] <magicantler> or is there anyway to tell which server the primary is on for a given hash or when the primary moves?
[21:39:45] <Boomtime> when you say 30 seconds, how are you measuring that? what event tells you the new host is working?
[21:40:08] <Waheedi> oh finally http://api.mongodb.org/cplusplus/2.0.4/dbclient__rs_8h.html
[21:40:49] <magicantler> i'd like to map the client app to the proper mongod instance, before having mongos reroute
[21:40:54] <cornfeedhobo> Boomtime: i am hopping in the mongo shell and checking rs.status() until i see it stop complaining about being unable to retrieve a config
[21:41:54] <Boomtime> @magicantler: if you have a mongos at all, then the app connects to those only
[21:42:05] <magicantler> Boomtime: I have three.
[21:42:28] <magicantler> Boomtime: one in front of each api, but i want to intelligently map client calls to correct API, to avoid a network hop
[21:42:43] <magicantler> Boomtime: Based on the chunks or shard-hash
[21:43:01] <Waheedi> so read preference settings were used start from v2.4.0+
[21:43:02] <Boomtime> @cornfeedhobo: ok, that should be much faster than 30 seconds - check the primary server log (where you change the hosts) and see the connections it is trying to make
[21:43:20] <Boomtime> it should print both the name and the IP it resolved
[21:43:49] <Boomtime> if the IP is wrong then you know your hosts isn't being picked up real quick, if the IP is right then maybe you have some other problem
[21:44:06] <Waheedi> I'm on 2.0.4 which sounds a bit too old,. what would be the right way to go for upgrading my code to work with 2.4.0?
[21:44:37] <Boomtime> @magicantler: you just want a specific application to connect to a specific mongos?
[21:45:00] <Waheedi> or 2.4.x
[21:45:15] <magicantler> Yes, I'd like the client app to hit the api which lies on the same mongod sharded instance as the given primary for that specific user's hash
[21:46:12] <cornfeedhobo> Boomtime: good idea! i will try that now. brb
[21:46:13] <magicantler> Boomtime: basically, i want to try to try to intelligently map to the correct ip address that its primary exists on in runtime, through the app, and i want to update it if the chunks balance out
[21:47:35] <Waheedi> I'm not sure what are the changes, and for me to know the api changes would take few days :/
[21:48:07] <Waheedi> between 2.0.4 2.4.x, but I will need to that anyway
[21:48:13] <Boomtime> @magicantler: connecting to a specific mongos is trivial; it's just a hostname, put the appropriate hostname of the mongos to connect to in that application connection-string
[21:48:15] <Waheedi> do*
[21:48:51] <magicantler> Boomtime: I know, but i'd like to have the client correct to the "best" mongos, i.e. the one that lies on the same primary for that given call
[21:48:55] <Boomtime> @magicantler: but your other words don't make a lot of sense - what do you mean about 'given primary'? in a sharded cluster you do not connect to a primary
[21:49:26] <magicantler> Right, so I have 3 servers, with 3 shards (rotating primary, secondary, and arbiter between them)
[21:49:51] <magicantler> Boomtime: and i'm hashing user_id values as the shard, but i'd like that users future call to send them to the proper mongos server consistently
[21:49:59] <magicantler> as each server also has an api and mongos (beefy servers)
[21:50:21] <Derick> is each server running two mongod instances?
[21:50:58] <Waheedi> magicantler: you want to set read preference settings to nearest for your apps/clients
[21:51:20] <magicantler> Each server has 3 mongod instances + a mongos
[21:51:23] <Waheedi> which is pain in the *** for me to do as I'm on 2.0.4 version :)
[21:51:36] <magicantler> Waheedi: They are in same datacenter
[21:51:44] <Derick> magicantler: don't do that, they will fight for memory
[21:51:54] <magicantler> Derick: That's what I keep hearing, but they are massive servers.
[21:52:05] <Derick> magicantler: okay, feel free to ignore the advice.
[21:52:13] <magicantler> Derick: Is there no way around it?
[21:52:33] <Boomtime> @magicantler: what version?
[21:52:39] <magicantler> Boomtime: 3.2
[21:52:39] <Boomtime> (server version)
[21:52:55] <magicantler> just running cent OS
[21:52:57] <Boomtime> ok, you want to set the cacheSizeGB wiredtiger option
[21:53:15] <Boomtime> make sure all instances combined don't go over about half you RAM, or a bit less
[21:53:45] <Boomtime> a mongos is a bit trickier to deal with - you're better off with those directly on your app servers if you can
[21:54:13] <magicantler> Boomtime: The app server is on these as well. Trying to reduce latency
[21:54:36] <Boomtime> sigh
[22:06:56] <magicantler> :[! Well is there anyway to get the routing table that mongos uses ? Then i could hand that back to the client and send calls preemptively
[22:07:32] <Boomtime> i think you have a misunderstanding about what mongos does
[22:08:14] <Boomtime> mongos may (and will frequently) talk to more than one shard in the course of serving a single query
[22:08:47] <Boomtime> the 'routing table' you are asking for would be the contents of the config server - a complicated set of metadata about how the database is distributed across shards
[22:10:24] <magicantler> wow actually yes, I didn't grasp it right. i thought it could just use the config meta-data to know where to go
[22:10:59] <magicantler> mongos that is.
[22:11:34] <Boomtime> https://docs.mongodb.org/manual/core/sharding-introduction/#data-partitioning
[22:12:13] <Boomtime> data is partitioned across shards, it is the mongos that locates all the relevant data for a query and makes it look like it came from a single place
[22:12:41] <Boomtime> so mongos is a 'router' but not the simple network router you are thinking of
[22:13:30] <magicantler> right, totally with you on that, but we're using documents and queries that will only map to specific users anyway, and thus won't span more than one shard.
[22:13:38] <magicantler> i have weird requirements for this project.
[22:14:02] <magicantler> it won't span more than one shard, b/c it's a single document per user.
[22:15:09] <Boomtime> is a user document is only accessible from a single location?
[22:15:35] <magicantler> yes
[22:16:14] <magicantler> i really feel like we're making a hacky load balancer and backup router (in case a primary goes down) via sharding.
[22:17:04] <Derick> sharding is not meant for failover, that's what replication is for. You're messing that all up with your two data nodes per server
[22:17:16] <Boomtime> if you only write a document in a single location, and only ever read it from that same location.. what is the sharding for?
[22:17:17] <magicantler> well each shard is a replica set.
[22:17:33] <magicantler> the sharding is to expand parallel reads + writes. distribute out users
[22:18:29] <Boomtime> but you're overlaying that with your own rules about which user can operate on which server
[22:18:49] <Boomtime> you've already made up your mind which server to use before you even involve the database
[22:23:09] <magicantler> true, and solid point. so i guess i could just use 3 replica sets, split across the 3 servers instead, but then i'd have to manage 3 mongodbclient connections on my own
[22:24:23] <Derick> one data node -> one physical server
[22:24:37] <Derick> don't try to reinvent things yourself and go against how MongoDB is supposed to be used.
[22:24:44] <magicantler> Yeah... boss man said no to that..
[22:24:51] <Derick> then drop the sharding
[22:25:29] <magicantler> ok i'll drop sharding, and hash on my own, to the correct replica set. does that sound better?
[22:25:41] <Derick> how would hashing on your own help?
[22:25:59] <magicantler> i still need to split up users between servers as he wants faster reads and writes.
[22:26:05] <Derick> in a replicaset, all data is on each different nodes
[22:26:17] <Derick> no, if you want to split up users (why?) you need sharding
[22:26:17] <magicantler> i was going to do split up replica sets
[22:26:23] <Derick> split them up?
[22:26:32] <magicantler> yeah into thirds. wouldn't that be faster?
[22:26:37] <Derick> how? one data node -> one physical server. It's a really simple rule
[22:26:57] <Derick> I also think you might be over complicating things for no reason
[22:27:06] <Waheedi> Derick: i disagree on that one physical theory
[22:27:13] <Derick> Waheedi: ok, whatever
[22:27:25] <Waheedi> lol
[22:27:33] <Waheedi> i really do :)
[23:05:01] <cornfeedhobo> Boomtime: okay. I have tried what you recommended. when i change ips on the "failing" node, the logs output a notice about failure to contact. once I update the /etc/hosts entry for the failing node on the primary, the log entry stops appearing, but the failing node stays in a perpetual state of "DOWN"
[23:22:15] <jiffe> [repl writer worker 15] Invariant failure n < 10000 src/mongo/db/storage/mmap_v1/btree/btree_logic.cpp 2075
[23:23:36] <jiffe> that ring a bell with anyone? two of the three members in a replica set are dying with this and won't start up again
[23:56:41] <regreddit> i have found what i think is a bug, but im not sure where, or how bad it is, but its a blocker for me - can i describe it here, to see where i should file it? I can also easily replicate it and have a test case, if someone is willing to test it
[23:57:51] <regreddit> i'll paste the replication test case in a gist