[01:43:25] <xmad> I need to upgrade a 400gb database from 3.0 to 3.2, ideally with no downtime. I looked into doing it with https://github.com/compose/transporter which supports oplog tailing but it has some issues and can't spare too much time fixing them. What are my options?
[02:18:40] <cheeser> xmad: are you using a replica set?
[08:32:22] <jokke> isn't it possible to calculate averages of nested values?
[10:06:08] <jokke> i made it work with this workaround: https://p.jreinert.com/ed2nA/javascript
[10:06:47] <kurushiyama> jokke: Sorry, just got up (long night). Lgtm, wouldn't have done any different.
[10:07:48] <jokke> kurushiyama: so $group can only work with flat documents?
[10:08:01] <jokke> or rather only produce flat documents
[10:08:34] <kurushiyama> jokke: Of the top of my head, with only one coffee yet, I'd say yes.
[10:09:36] <jokke> i have a question about $out. I want to feed the result of the aggregation into a collection. problem: $out overwrites the collection if it exists. If i have different retentions in my collections that's a problem
[10:09:54] <kurushiyama> jokke: you could probably try "value.r" and "values.i" as keys in the group stage, thinking of it.
[10:15:26] <jokke> but i'd make sure there'd be no duplicates
[10:16:52] <kurushiyama> I'd probably have a collection for each granularity level and use $out for each level within the maximum retention time of the original data.
[10:19:45] <kurushiyama> Let's say you want a granularity of say 5minutes for the last day, 1h for up to a week, 1d for up to a month and 1w until max retention time.
[10:22:24] <kurushiyama> Ol, with this granularity and your max retention of the original data, you _could_ display everything up to 3 month with an "on-the-fly" aggregation
[10:27:01] <kurushiyama> which – you are right with that – would cause quite some overhead (albeit not as much I would guess, since we'd do an average on <MinGranularity> at least), but access to the refined data would be blazing fast, then
[10:31:47] <kurushiyama> So, the system ran for 3 month, everyone lived happily and now we want to have the data surpasing the max retention time aggregated for the granularity chosen for +3month, the result merged with the according collection and the original data deleted.
[10:32:48] <kurushiyama> Put this way, it is pretty obvious what to do.
[10:37:24] <kurushiyama> Get all documents with a var date = now - 3 month, build the average say for a calendaric month. For all documents each of the documents returned from the aggregation, find the according one in the +3m collection if it exists, insert otherwise.
[10:37:26] <jokke> i can still only think of not using $out but bulk inserting the results of the aggregation
[10:37:48] <kurushiyama> You dont, since it is not for merging.
[10:40:14] <kurushiyama> Just make sure you execute the bulk insert into the +3m collection _before_ you execute the bulk delete on the original data ;)
[10:42:01] <kurushiyama> Because here is the trick I used once: Simply add the processed document '_id's to a result field for the aggregation you use for outdating the data, and use this result set to add deletes to a bulk delete.
[10:42:10] <moneybeard> Hmm wonder if I can speak now?
[10:45:57] <kurushiyama> moneybeard: Ok, I would never backup from a primary, for starters
[10:46:13] <kali> moneybeard: it will freeze the vm, so you will probably trigger a failover
[10:46:31] <moneybeard> Yeah that is exactly what happened but not sure if it failed over completely.
[10:46:40] <kurushiyama> moneybeard: kali is right, it will block, which is the reason why I'd do it from a secondary
[10:46:51] <moneybeard> Then node01 and 02 complained about needing a reload as disk.service was out of sync
[10:47:45] <moneybeard> kurushiyama: So do I need to do the reload to get things back in a good state or will they sync up on their own in time?
[10:48:20] <jokke> kurushiyama: how can i check whether my collection is sharded evenly
[10:48:58] <kurushiyama> jokke: sh.status(true) or db.collection.getShardDistribution(). As *cough* documented ;)
[10:49:05] <moneybeard> kurushiyama: The issue is I needed a snapshot of the OS cause I was upgrading some packages so it had to be the primary, so in that case I was thinking my best bet is shut node 1 primary down , let a new master take over , then do the snap shot on 1
[10:49:24] <jokke> kurushiyama: *cough* i knew that *cough*
[10:51:02] <moneybeard> ^ I was patching system packages unrelated to mongo
[10:54:41] <kurushiyama> moneybeard: Doesn't matter. Do not interrupt a service without need. Since you can do the update on secondaries first, fire them up again and have the primary step down prior to updating, there is no need.
[10:56:23] <moneybeard> From: https://docs.mongodb.org/manual/administration/production-notes/ It is possible to clone a virtual machine running MongoDB. You might use this function to spin up a new virtual host to add as a member of a replica set. If you clone a VM with journaling enabled, the clone snapshot will be valid. If not using journaling, first stop mongod, then clone the VM, and finally, restart mongod.
[10:56:36] <Industrial> I now have a shell script that does `echo rs.initiate(CONFIGHERE)` | mongo
[10:56:57] <Industrial> I can only run this once (and i did it wrong) and now every time it says the replicaset already exists
[10:58:27] <kurushiyama> moneybeard: iirc, a clone is a _cow_-snapshot of the original data, with a dependency on the original image.
[10:58:29] <moneybeard> kurushiyama: NVM I found my answer : Clone> An exact copy of a VM at a specific moment in time, although this is usually performed on a powered off VM (a clone of a running VM is called as snapshot
[10:58:43] <kurushiyama> moneybeard: Though I do not use VMs for MongoDB, anyway.
[10:59:21] <moneybeard> kurushiyama: Ok true that. Hey thanks for your advice and expertise.
[11:00:07] <kurushiyama> moneybeard: You are welcome!
[11:19:27] <jokke> do you guys know if i can get the total size of an index for a collection with the ruby mongodb driver?
[11:34:27] <kurushiyama> FuzzySockets: + your brain, ofc. ;)
[11:34:44] <kurushiyama> FuzzySockets: Here is the problem _I_ have with Mongoose.
[11:35:42] <kurushiyama> FuzzySockets: It forces an SQL approach to data modelling: Define entities and their properties + the relations between the entities, model them, than bang your head against the wall to get the data you need to implement your use cases.
[11:36:23] <kurushiyama> FuzzySockets: Basically, it makes you treat MongoDB like an SQL-ish object store. Which could not be farer from the truth, imho
[11:36:24] <FuzzySockets> kurushiyama: But using it doesn't tie you down to only using it
[11:36:53] <FuzzySockets> kurushiyama: I could see how that'd be a problem if you don't understand designing for a document store
[11:37:06] <kurushiyama> FuzzySockets: I fail to see any advantages. It is slower, it is bloated, and it abstracts away MongoDB's real strengths
[11:38:45] <kurushiyama> FuzzySockets: The slower and bloated part not from experience, but StephenLynx, who is usually rather active on the channel says so, and I take his word for fact.
[11:41:12] <kurushiyama> FuzzySockets: Imho, the proper approach to data modelling in MongoDB is to define your use cases, order them by how common they are, derive the questions you'd have to the data from them and create an optimized data model, with more common use cases having precedence over rarer ones when it comes to optimization.
[11:44:29] <kurushiyama> FuzzySockets: Not that this could not be achieved with Mongoose, but then you end up with an unnecessary abstraction layer.
[11:47:47] <FuzzySockets> kurushiyama: Speed of development is a higher concern for me right now. I already know mongoose syntax, and it'd be a fairly simple interface to replace if speed ever became an issue down the line.
[11:47:57] <FuzzySockets> But I'll definitely keep it in mind
[11:49:01] <kurushiyama> FuzzySockets: That is amassing technical debts. Just the other day there was some guy who used mongoose and now his application is getting some real momnetum, it fails to scale. I can understand your concern, just do not blame it on MongoDB ;)
[11:49:58] <kurushiyama> And the interface is not the problem: The data models produced by Mongoose are. And be aware of populate(), avoid it at all costs.
[11:49:59] <FuzzySockets> kurushiyama: I'm blaming it on my lack of knowledge on the native driver as well as already having 3-4 models written in mongoose. Besides, OOP principle 101, code to an interface, not an implementation.
[11:51:32] <kurushiyama> FuzzySockets: Well, you have been warned ;) Just be prepared that you'll need an ETL specialist later down the road.
[11:51:55] <FuzzySockets> kurushiyama: my application is actually fairly flat, and not incredibly write intensive
[11:54:22] <kurushiyama> FuzzySockets: You should be able to judge that better than I do. And I make a good part of my living from data-remodelling and (query) optimization, so I probably should not argue too much ;)
[11:55:22] <FuzzySockets> kurushiyama: Can't imagine you'd have much of a stance anyways considering you don't know what my application does ;)
[11:56:12] <kurushiyama> FuzzySockets: Well, I do not exactly starve. Go figure ;P
[12:01:46] <jokke> any idea what this might be about? https://p.jreinert.com/AMzlw/
[12:25:16] <kurushiyama> scruz: Alternatively, if you want the count for all possible values of status, including _explicit_ null: db.checknull.aggregate({$group:{_id:"$status",count:{$sum:1}}})
[12:27:14] <scruz> thanks! i think i’m going to use a composite ID then
[12:27:36] <kurushiyama> scruz: As you wish. There is a caveat, though. Gimme a sec
[12:31:40] <scruz> i think that’s okay for my use case.
[12:31:49] <kurushiyama> scruz: Nope, it was included in the group null, since non-existant fields by definition evaluate to null
[12:32:17] <scruz> we treat null as missing anyway.
[12:32:40] <scruz> and i want to get a count specifically for missing data apart from the other explicit statuses.
[12:33:49] <kurushiyama> scruz: Then you are actually using the convention to your benefit.
[14:38:59] <jokke> kurushiyama: i see some weird stuff with some of my schemas when stress testing the cluster. it seems that some values are lost: db.flat.count() => 180000, db.panel_aggregate.aggregate([{$unwind: '$values'}, ..., { $group: { _id: '1', count: { $sum: 1 } } }]) => { _id: 1, count: 162340 }
[14:39:07] <jokke> any ideas what might be happening/
[14:45:17] <kurushiyama> jokke: a) please make sure you have used the proper writeConcern. b) use pastebin if you paste code. c) In order to help you, I need what I request. I do not do this out of curiosity.
[14:46:22] <jokke> to a) i haven't thought about that. would you mind elaborating that a bit?
[14:48:24] <jokke> i didn't specify any write concern
[14:50:10] <kurushiyama> jokke: MongoDB does not just "loose arbitrary data". There can be _very_ special circumstances in which that happens, and _only_ if your writeConcern does not match your durability needs: https://docs.mongodb.org/manual/reference/write-concern/ Every single time somebody claims that MongoDB "looses data" it can be tracked down to a user's fault. As a matter of fact, the control over durability on MongoDB is much higher than
[14:50:10] <kurushiyama> with any other DBMS I have encountered, including Cassandra
[14:51:43] <kurushiyama> So, in order to debug what has happened, please do a .count() on both collections first
[16:47:30] <kurushiyama> jokke: Sort of. If you have a replSet, you might want to make sure that the write made it to the majority of members. Reason: the member with the most recent data will be elected primary in a failover situation. With a wc:1, if an failover happens while the data has not made it to secondary yet, a so called rollback will happen when the member joins again.
[17:24:57] <saml> kurushiyama, given unstructured json objects, how would you insert them to mongo using mgo?
[17:28:07] <saml> never mind. i found https://godoc.org/gopkg.in/mgo.v2/bson#Getter
[17:45:23] <brucebag> anyone using 'hint' in their queries?
[17:45:49] <brucebag> seems that mongo is picking the wrong index for many of our queries
[17:48:21] <ramnes> I have a really weird behavior
[17:48:38] <ramnes> where a same connection command works differently on two computers
[18:55:00] <kurushiyama> And of course not about the impact of those buildflags on the dependencies. And an undefined buildflag does not necessarily mean "no impact"... ...iirc... *blush*
[19:12:17] <ramnes> that's a bit cumbersome since ldd only list dynamicly linked libraries
[19:12:39] <Yuri4_> Hi, guys! I need to train transfering databases between servers. Anybody know where I can find a simple database, that I can train on?
[19:14:17] <Yuri4_> ramnes, yeah, tried that one already. This is just a single .json file all you need to do is to press "import" and you are all done. Looking for a bit complex databse now.
[19:18:22] <Yuri4_> I have a freelance order to transfer databases. Probably will need to do that with remote connection to clients machine. Just wanna make sure that all goes smooth, don't do stupid mistakes and don't waste clients time.
[19:19:01] <kurushiyama> Yuri4_: Define "transfer" a bit more precisely, please.
[19:20:16] <Yuri4_> I haven't had any expirence with MongoDB, so a bit nerveous.
[19:20:32] <kurushiyama> Yuri4_: Is there preexisting data on the target server? Do you need to merge it? Is the target server a production server? Is any downtime planned/acceptable? Are we talking of a sharded environment, a replica set or standalones?
[19:21:41] <Yuri4_> kurushiyama, standalone server, downtime is not acceptable. New server is a fresh install.
[19:22:09] <kurushiyama> Yuri4_: As a word from one freelancer to the other: accepting a contract you do not have any clue of is... not exactly a quality mark. You _live_ of your reputation. If you sack that, that's going to be major problem.
[19:23:42] <kurushiyama> Yuri4_: Ok, now lets see how we prevent that.
[19:24:23] <Yuri4_> kurushiyama, I appreciate your input!
[19:25:08] <kurushiyama> Yuri4_: First: why is downtime not acceptable if the target instance is empty? I assume the source is production data then?
[19:25:31] <Yuri4_> kurushiyama, sorry my mistake.
[19:25:39] <Yuri4_> Original server shouldn't be affected.
[19:25:49] <Yuri4_> The target server is already down
[19:26:05] <Yuri4_> As soon as target server is setup, original server will be shut down
[19:26:41] <kurushiyama> Wait. What? Are we talking of a database migration from one server to the other, then?
[19:27:29] <kurushiyama> Ok, here is the problem: As long as the original server runs, it will get data, correct?
[19:28:06] <Yuri4_> kurushiyama, not sure what you are asking
[19:28:19] <Yuri4_> Exporting database from orignal server shouldnot affect uptime
[19:29:36] <kurushiyama> Yuri4_: The problem is that the data will change after you did an export.
[19:30:10] <kurushiyama> Yuri4_: So exporting/dumping/snapshotting is _not_ an option.
[19:31:17] <kurushiyama> Yuri4_: The only way to do a data migration without (or to be precise only minimal) downtime is to build a temporary replica set, consisting of the old server, the new server and an arbiter.
[19:32:22] <Yuri4_> kurushiyama, thanks for the input. Gonna take a break then set up a test lab. Cheers!
[19:42:10] <kurushiyama> Yuri4_: At least, you should bother to read the release notes. Not going to work with the way I suggested. So either, you need a downtime, to make sure the data does not get changed after or while you take the dump/export, or you need to downgrade the target machine to 2.6 first
[19:43:25] <Yuri4_> kurushiyama, thank you. I'm very ashamed. I deserved this. Going to get my head straight now. Gracias.
[19:43:50] <kurushiyama> Yuri4_: We can talk you trough this.
[19:44:34] <Yuri4_> kurushiyama, i'd rather don't waste anyones time and be very thourgh. You are too kind.
[19:44:56] <kurushiyama> Yuri4_: So take it as a lesson learned. ;)
[19:45:25] <Yuri4_> kurushiyama, yes I will. You are the best!
[19:46:13] <kurushiyama> Yuri4_: Ok, the idea is to synch to the new server, to 2.6. As soon as this is synced, you shut down the original server and update it to 3.0.x
[19:47:43] <kurushiyama> Yuri4_: removing the contents of dbpath before the update. Then you wait until the data is synced back to the original server, where we are on the target version.
[19:49:16] <Yuri4_> Oh, by the way, what tool to manage database would you recomend? I currently use mongochef
[19:49:29] <kurushiyama> Yuri4_: Then, you can simply update the new server to 3.0, wait until the data from the original server synced, have it step down. Then, you can restart the new server as a standalone and you are done. With minimal downtime, although this process is going to take a while.
[19:49:36] <kurushiyama> Yuri4_: Yes. It is called shell.
[19:54:19] <kurushiyama> Yuri4_: So here is what has to happen _before_ the downtime: you need to set up the arbiter and the new server in 2.6, start them with the replset option as documented, but do _not_ initialize the replica set yet.
[19:55:41] <Yuri4_> kurushiyama, that's cool. I've done suimilar jobs with MySQL before.
[19:57:09] <kurushiyama> Yuri4_: Then, downtime happens. The mongodb connection URI of the application needs to be changed so that it connects to the replica set we are going to create during the downtime. In parallel, you restart the original server with the same replset option you started the new servers with and initiate the replica set. Then, you add the new server and the arbiter to the replica set. At this point, the application can be restar
[19:57:09] <kurushiyama> ted. Proceed as described above.
[19:58:26] <FuzzySockets> that's a lot of downtime ... sorry downtime !
[19:59:30] <Yuri4_> Alright, I got more than I need. Cherrs guys! Hope to see you again.
[19:59:36] <kurushiyama> FuzzySockets: is it? Doing an rs.initiate() and 2 rs.add() take seconds.
[20:00:53] <kurushiyama> FuzzySockets: if you have a better idea to minimize downtime for a migration from 2.4 to 3.0, I'd be happy to hear it!
[20:01:27] <FuzzySockets> kurushiyama: I was being facetious... you keep pinging some guy in here named downtime
[20:06:14] <kurushiyama> downtime: I am so sorry. I did not know.... But you have to admit that this is not exactly an ideal nick in a channel like this... ;)
[20:08:06] <downtime> kurushiyama: I have no issue with pings :)
[20:16:03] <kurushiyama> Sounds like me. On a good day ;)
[20:52:41] <tinylobsta> i'm using mongoose, and for some reason when i try to call methods such as findOne, findById, etc, on a model wrapper, nothing happens
[20:52:44] <tinylobsta> it doesn't even return error
[20:53:53] <StephenLynx> you could try implementing your stuff directly with the driver, though.
[20:54:21] <kurushiyama> deathanchor: a replica set reconfig should do the trick as well.
[20:54:45] <tinylobsta> i guess i could just wrap the methods i'm using from mongoose, huh
[20:55:57] <kurushiyama> StephenLynx: Do you have your results for Mongoose performance somewhere? Took over the "job" of discouraging the use of Mongoose over the day, thought it might be helpful...
[20:56:32] <deathanchor> kurushiyama: this made me doubt it: http://dba.stackexchange.com/questions/61023/can-arbiteronly-replica-in-mongodb-become-secondary-and-what-it-means
[20:56:37] <tinylobsta> okay so i shouldn't use mongoose, i agree
[20:56:42] <tinylobsta> but has anybody experienced this before?
[20:56:45] <StephenLynx> I just read someone that tested it somewhere
[20:56:51] <StephenLynx> never had the thing installed
[20:57:12] <deathanchor> kurushiyama: gist of the page it says you see this error with a reconfig: http://dba.stackexchange.com/questions/61023/can-arbiteronly-replica-in-mongodb-become-secondary-and-what-it-means
[20:57:15] <tinylobsta> i tried to move my models into a separate module and i'm requiring the module containing the models
[20:57:17] <tinylobsta> which is when everything broke
[20:57:21] <deathanchor> crap.. "errmsg" : "exception: arbiterOnly may not change for members"
[20:58:02] <StephenLynx> performance is just adding insult to injury, though
[20:58:36] <kurushiyama> deathanchor: As per the dba question: Neil, in his one of a kind style, pretty much summed it up.
[21:01:17] <kurushiyama> deathanchor: You might need to force the reconfig. However, if you feel uncomfortable with it, simply remove the arbiter and readd it. should take seconds.
[22:04:26] <oky> anyone using a mongo instance for analytics? (and have timestamped events in it?)