[01:15:00] <brotatochip> I’m trying to recover a replica set with as little downtime as possible without incurring the 2+ hour downtime for a full resync
[01:15:14] <brotatochip> i’m currently not replicating
[01:34:41] <cheeser> i don't think such a thing exists.
[01:35:12] <brotatochip> ok cheeser let me try to explain what I’d like to accomplish and maybe you can tell me if it’s possible
[01:36:23] <brotatochip> i’m not sure what the root cause was, but somehow my secondary fell days behind my primary and then the primary entered a FATAL state, causing the secondary to be elected
[01:36:48] <brotatochip> i was not able to return the primary to a healthy replicating state, so I disabled replication to get the app back online
[01:39:16] <cheeser> also, i'd consider signing up for monitorin so you get alerts if your secondaries are falling too far behind
[01:39:40] <brotatochip> Oh I’m scripting out my own
[01:51:03] <brotatochip> Ok looks like I found the root cause cheeser, networking issue, somehow the primary lost connectivity to the secondary which caused it to step down
[01:51:59] <brotatochip> it lost contact with the arbiter too
[01:53:59] <cheeser> colocate the arbiter with the primary
[01:54:10] <cheeser> you should really have two secondaries, though.
[01:54:51] <brotatochip> Two secondaries meaning one of them can step up to primary in case there is a problem with the primary, right?
[01:55:33] <brotatochip> Currently my secondary is only set up to be a geographically redundant failover
[02:03:25] <cheeser> right but with another secondary, you probalby wouldn't be in this situation.
[02:03:38] <cheeser> resyncing wouldn't mean downtime
[02:23:33] <brotatochip> cheeser: so I’ve done a full resync using another server with this morning’s backup. if I stop mongodb in the secondary, point it and the arbiter back at the original primary and then restart the original primary with replication enabled, will mongo fill in the gap in the data from this morning to tomorrow (when I’ll be performing the maintenance)?
[02:23:47] <brotatochip> it would be great if I could do that without having to worry about data loss
[02:24:48] <cheeser> the secondary will attempt apply anything in the primary's oplog that has not been applied to the secondary.
[02:25:12] <cheeser> if the last sync time is farther back than what's in the oplog, it'll have to do a full resync.
[02:25:15] <brotatochip> right, the question is will it be in the oplog
[02:25:51] <brotatochip> does mongodb write to the oplog even when not a member of a replica set?
[02:25:55] <cheeser> you can also do this: https://docs.mongodb.org/manual/tutorial/resync-replica-set-member/#sync-by-copying-data-files-from-another-member
[02:26:06] <cheeser> the oplog is only present in replica sets.
[02:26:18] <brotatochip> exactly, so it currently does not exist on the primary
[02:26:32] <brotatochip> as it is not currently a replica set member
[02:26:38] <cheeser> a primary, by definition, is a member of a replica set.
[02:27:24] <brotatochip> correct, so I guess I mean to say the intended primary
[02:27:58] <cheeser> if you init that mongod as a primary, you can then add secondaries but you'll have to do a full resync at that point
[02:29:36] <brotatochip> it’s been initiated already
[02:29:49] <brotatochip> i restarted it with replSet commented out in the config
[02:31:38] <brotatochip> so it’s got an oplog but that oplog hasn’t been written to, so those changes won’t be in it
[02:35:56] <brotatochip> Oh, no, duh, I need to copy the DB I want replicated obviously as well
[02:39:58] <brotatochip> cheeser: it sounds like I need to remove the oplog from my intended primary as it is no longer valid and would probably cause an issue
[02:40:59] <cheeser> no, it'll add to that oplog as you write to the db
[02:42:43] <brotatochip> but there are going to be over 3 days of writes to the DB that are not in the oplog
[02:43:35] <brotatochip> the problem with that is again, i would need to stop any writes to the DB while I copied the files over to the secondary, would I not?
[02:43:50] <brotatochip> the intended primary isn’t going to start writing to an oplog until it’s a replica set member
[02:44:21] <brotatochip> i don’t see any other option than a full resync
[02:45:05] <cheeser> if you've run rs.initiate() on that mongod, that makes it a replica set
[02:45:14] <cheeser> and start with --replSet of course
[10:10:02] <Zelest> Derick, is a 5 node replicaset a good idea? (with 2 arbitary nodes)
[10:10:31] <Zelest> atm we have 3 nodes and I'm somewhat confused when "no candidate servers" happens or not
[10:15:25] <Derick> 5 nodes, with 2 arbiters makes little sense
[10:16:12] <Derick> "no candidate servers" should only happen if a. you're writing, and b. two nodes are down or c. the latency to the servers is too high
[10:17:42] <Zelest> well, if b or c happens now and two nodes are down, "no candidate servers" happens.. with 5 nodes, that won't happen?
[10:18:47] <Zelest> in this case, i plan on have 3 nodes in 3 different cities/countries
[10:18:50] <Zelest> hence my worry for downtime :)
[10:20:41] <Derick> Zelest: if you're using PHP, make a log file
[12:05:31] <m3t4lukas> sources: http://grepcode.com/file/repo1.maven.org/maven2/org.mongodb.morphia/morphia/1.0.1/org/mongodb/morphia/aggregation/Group.java?av=f and http://grepcode.com/file/repo1.maven.org/maven2/org.mongodb.morphia/morphia/1.0.1/org/mongodb/morphia/aggregation/Accumulator.java?av=f
[12:05:47] <m3t4lukas> the accumulator automatically adds a dollar sign
[12:06:52] <m3t4lukas> Derick, cheeser: never mind. I just saw that the accumulator has two constructors
[13:41:52] <Zelest> Derick, if I plan on using gridfs.. and wish to lets say md5 every file before storing it in order to find duplicate files, what is the best approach for this?
[13:42:07] <Zelest> Derick, e.g, can I have multiple "files" point to the same data?
[13:47:08] <m3t4lukas> Zelest, not if you use the default GridFS driver. I would not recommend to change that though. What happens if you save to one of these previously similar files?
[13:48:22] <Zelest> My goal is to prevent duplicate data (and storage space) ..
[13:48:34] <Derick> that should be a layer above gridfs then
[13:56:02] <m3t4lukas> cheeser: I guess it makes sense to not make it that way. From a software enginering perspective
[15:27:36] <ut2k3> Hi Guys, we want to create daily collections that are sharded. Before we were only using "insert" and the new collection was automatically created. What are the best practices to create directly sharded collections?
[15:29:07] <cheeser> you'll to explicitly create those collections or at least make the explicit call to shard it.
[16:00:59] <basiclaser> do you know how to make a schema/document in mongoDB which accepts 0 or more instances of a bunch of different key:values?
[16:01:00] <basiclaser> My feature is this: A user can create a ‘result’. It can contain 0 or more instances of 8 different datatypes (text,link,image,video,sound,file,map,entity).
[16:01:00] <basiclaser> So one result could be [text,image,video,text,text,link] and another could be [image, image]
[16:01:00] <basiclaser> but they are all the same type of document, they are a ‘result’
[16:01:00] <basiclaser> in frontend i think i know how to program this, but im not sure how to define the server router schema to allow for such flexible objects
[16:05:14] <cheeser> hard to say for sure without knowing more of your intended model
[16:05:35] <basiclaser> what sort of information would you need to give a more precise opinion?
[16:06:36] <basiclaser> that’s pretty much as defined as the project is at this point. a webapp constructs a collection of [text,image,video,text,text,link] and sends to server
[16:10:46] <basiclaser> cheeser: is there a formal term for this type of problem so i can read about it further ?
[20:25:04] <brotatochip> hey cheeser, so I just swapped my testing primary’s storage volume with a more recent backup of my production (intended) primary, restarted it, and it came up as secondary while my intended secondary was elected as primary - even though my testing primary has a priority of 10. I restarted the (intended) secondary, which then came back up as secondary while the testing primary stepped up to again become primary
[20:25:33] <brotatochip> Is there any way for me to know if the secondary has the incremental changes from between yesterday and today?
[20:26:38] <eiabea> hey guys, i am trying to query an nested object by _id. is that even possible?
[20:26:59] <cheeser> eiabea: nested documents get an _id unless you assign one.
[20:27:11] <cheeser> brotatochip: you can check the logs for something about rollbacks.
[20:27:35] <brotatochip> on the intended secondary or the primary?
[20:28:09] <eiabea> cheeser, thank you, i am doing something like this to create the nested object: Event.findOneAndUpdate({_id: out._id}, {$set: {athene: athene}}, {new: true}, eventUpdated)
[20:29:00] <eiabea> the nested object (athene) gets an _id, but i can't find it with: vent.find({"athene._id": atheneId}, callback)
[20:34:45] <brotatochip> ok, so how certain can I be that all of the data from between yesterday and today made it to the secondary cheeser?
[20:35:07] <brotatochip> this is super important, the client will be pretty pissed if there is data loss
[20:42:03] <eiabea> does only the top level object can be queried by _id?
[20:42:17] <brotatochip> The next thing to try is to reinit today’s backup of the intended primary with yesterday’s data on the intended secondary, but if I can be certain there is no data loss, I’m not going to bother
[20:44:43] <brotatochip> cheeser do you think that would be worth doing?
[20:52:56] <Muimi> hey guys: I'm on win 7. I should get the WIndows 64-bit legacy?
[20:53:08] <Muimi> or win 64-bit 2008 r2+ (not that, right)?
[21:01:05] <brotatochip> boo, ok, nearly 2 hour downtime makes me sad
[21:01:24] <brotatochip> thanks for the help cheeser
[21:02:00] <brotatochip> ok, so going forward, having a second secondary in the same location as the primary is probably the best long term solution
[21:14:16] <m3t4lukas> brotatochip: as far as I remember your case is covered by the basic dba course on university.mongodb.com
[21:15:26] <m3t4lukas> brotatochip: as far as I understood your problem you don't wanna loose the data the primary was ahead. The course actually teaches manual desaster recovery in that sitch
[21:16:12] <brotatochip> m3t4lukas: I’m not sure that’s true as my primary isn’t currently a primary, as it is currenly not a replica set member
[21:16:29] <brotatochip> which means the data that I want to get into the secondary is not in the oplog
[21:16:39] <brotatochip> oh I agree, I’m checking it out
[21:16:44] <brotatochip> could definitely not hurt
[21:16:49] <m3t4lukas> brotatochip: so you mastered the first step of manual desaster recovery
[21:17:09] <brotatochip> i still don’t think there’s any way to avoid the full resync unless I can try a re init with a seeded secondary
[21:17:47] <m3t4lukas> borotatochip: don't pin me down to it but I think there is a directory for data that has not fully been synced. I think that's what is used during manual desaster recovery
[21:18:38] <m3t4lukas> otherwise you can export the collections and use the import/export tools and meld to do it.
[21:19:18] <brotatochip> from what I’ve read a mongodump is not only not possible for this but it would also take probably 15+ hours
[21:19:29] <brotatochip> er, the mongorestore would
[21:20:01] <m3t4lukas> in that case first step would be mongoexporting each collection on the current primary and on the crashed one you took out, running those files through meld and mongoimport the diff back into your current replicaset
[21:27:57] <cheeser> no. that'd be a terribly fragile cluster.
[21:28:43] <m3t4lukas> brotatochip: if you did not bring the crashed one back to the cluster the first thing you should do is spin up another secondary, so you have a cluster of three
[21:29:08] <brotatochip> i did not do that, what I did was restart the former primary with replication disabled
[21:29:14] <brotatochip> in order to bring the site back online
[21:29:48] <m3t4lukas> that was right, but you should realy set up a new instance of mongo quick (!!!) and add it to the rs
[21:30:11] <m3t4lukas> because as it is right now the cluster has no means of desaster recovery
[21:30:48] <m3t4lukas> what you schould also do is look for the cause why the secondary lagged behind in the first place and eliminate the cause
[21:32:19] <cheeser> we went over all this yesterday :)
[21:34:04] <m3t4lukas> cheeser: ah, okay, I did not read it :P Now he's heard it from two sides. brotatochip: if he told you yesterday you should have had another mongo added to the rs right now. If money is involved what you do is kinda dangerous
[21:41:49] <brotatochip> yeah the cause of the secondary falling behind is mostly obvious as the primary and secondary are on separate coasts of the US connected via an ipsec VPN tunnel
[21:50:25] <m3t4lukas> if I have a server, let's say in Munich and one in Stockholm (both in data centers) I could transfer gigabytes in minutes. It actually is something like 20 megs per second. I'm just wondering that the US has that bad of an infrastructure
[21:50:47] <freeone3000> m3t4lukas: No. It's much faster. AWS internal transfer speeds are around 200MB/s internally.
[21:51:59] <brotatochip> Primary is in North Virginia and secondary is in North California
[21:52:00] <m3t4lukas> why do all people use AWS when you can have openstack?
[21:52:32] <freeone3000> m3t4lukas: I don't use AWS for the API. I use AWS because I can get servers instantly. OpenStack provides the API (and not as good of one, at that), not the servers.
[21:52:36] <brotatochip> because AWS makes most things really easy
[21:53:32] <brotatochip> freeone3000 how would I keep the mongodb traffic internal to AWS between separate regions?
[21:53:35] <m3t4lukas> freeone3000: with the openstack web interface you can spin up vm's in seconds. clone them etc. It's king combined with puppet or chef :P
[21:54:02] <freeone3000> m3t4lukas: Sure. Now just give me a *server* provider for it.
[21:54:29] <freeone3000> brotatochip: VPC-to-VPC bridge is the AWS solution. A simpler solution is to hit it by its IP. It's a routing path, it's not magical. (Well, it is magical, but of the really simple variety.)
[21:54:47] <freeone3000> m3t4lukas: Right. For colocation, I have to actually buy a server, and physically put it somewhere. I can't get, say, 20 of them, in Seoul, in 20 minutes. That's what we use AWS for.
[21:54:50] <brotatochip> freeone3000 what you’re referring to is a peering connection
[21:54:55] <brotatochip> that does not expand outside of a region
[21:55:20] <brotatochip> and using the public IP would go outside of the AWS network, would it not?
[21:55:39] <brotatochip> On top of that my VPN is comprised of ec2 instances
[21:55:49] <brotatochip> that are connected by their public IP addresses
[21:55:49] <freeone3000> As opposed to what, rutebegas?
[21:55:54] <m3t4lukas> even if I plan on 30% capacity for eventualities and quick vm spinups, I'm cheaper off using openstack
[21:56:18] <freeone3000> m3t4lukas: Sure. But it'd still be days, or weeks, to get a server.
[21:56:33] <freeone3000> m3t4lukas: You have to plan. our idea is that we plan less, so that we can save money by doing more things less efficienctly.
[21:56:49] <m3t4lukas> freeone3000: yeah, about four days
[21:57:30] <freeone3000> brotatochip: Well, if they're connected by their public IPs just do that. Shoot. ANd yes, it will go "outside" AWS, but it takes the AWS-to-AWS routing path because that's the most efficient.
[21:57:49] <brotatochip> it’s already doing that, however, the issue again is encryption
[21:59:27] <m3t4lukas> brotatochip: what OS do you use? I know the process of automating building from sources can be automated pretty easily on RedHat systems.
[22:00:03] <freeone3000> brotatochip: Okay. So if your solution to that is going to be a VPN tunnel, you designate a left, a right, and then have it route. Again, it'll take the proper path.
[22:00:42] <freeone3000> It's not fedora because it doesn't have the fedora core enhancements and follows the release schedule. It's centos.
[22:01:01] <brotatochip> Uh, what? It’s based on Fedora. Look at their own docs.
[22:02:03] <m3t4lukas> whoa, why fedora? fedora is desktop... Do you have X11 running on those machines? That might cause instabilities. Especially concerning ram usage over time.
[22:02:16] <brotatochip> Of course I don’t have X11 on these machines
[22:02:34] <brotatochip> Fedora is GNU/Linux which means you can do whatever you like to it
[22:02:46] <m3t4lukas> brotatochip: just asking. I've seen a lot :P
[22:02:49] <brotatochip> Including creating forks of it that don’t have a GUI
[22:03:22] <m3t4lukas> brotatochip: yeah, I know, but it would be less work to fork centos
[22:03:33] <brotatochip> but yeah, I could totally use FPM or YUM to automate the building of mongodb, however, I went with what I felt was the safer option at the time
[22:04:07] <brotatochip> Yeah but CentOS is so far behind package version wise, it’s really hard to develop software on it and maintain your dependencies
[22:04:22] <brotatochip> Which is a big reason why so many people use Ubuntu
[22:04:40] <brotatochip> I love CentOS personally - my Dev/OPS stack runs on it
[22:05:40] <m3t4lukas> that's why there are repos. There also are mongo repos ;)
[22:05:57] <brotatochip> Another reason I’m using Amazon Linux is that before I spun up any servers at my current job I was a complete AWS noob, and AWS supposedly supports it
[22:06:08] <brotatochip> Yeah I’m using the repos to install and maintain MongoDB
[22:06:48] <m3t4lukas> if you think CentOS is far behind package version wise you should try debian :D Try to install non omnibus gitlab on debian :D :D :D
[22:09:17] <m3t4lukas> never tried gentoo :P I came from debian over to ubuntu (because I was annoyed from outdated stuff), once tried RHEL and stuck with it because of the freedesktop system.
[22:09:54] <brotatochip> you should build gentoo for fun m3t4lukas, roll the kernel and everything hehe
[22:10:23] <m3t4lukas> brotatochip: sounds like fun, but I imagine it to be time consuming without generating income :P
[22:10:55] <brotatochip> it’s a learning experience, with gentoo you have to do everything
[22:11:25] <m3t4lukas> Doyle: yeah, some people still do that
[22:11:25] <freeone3000> Doyle: Debian with misspelled config files
[22:11:28] <brotatochip> arch is good, gentoo is like arch’s bigger meaner older brother
[22:11:55] <Doyle> I've had serious issues running Mongod on Ubuntu. No issues under centos.
[22:12:12] <brotatochip> a lot of people run Ubuntu in prod, I’m not sure if they are running MongoDB on it but it’s a popular server OS
[22:12:26] <m3t4lukas> Doyle: +1 getting mongodb up and running on CentOS is just super fast
[22:12:47] <m3t4lukas> freeone3000: that was a good one :P
[22:13:35] <m3t4lukas> brotatochip: I like arch for testing how far they are on wayland. I'm really waiting for wayland to be shipped with the regular distros
[22:14:59] <brotatochip> I haven’t even tried wayland out. Is mir still a thing?
[22:15:41] <brotatochip> also how is wayland m3t4lukas ?
[22:16:07] <m3t4lukas> I hate that ubuntu started mir. It's like an act of active community splitting. They should stay in bed with amazon and leave the community alone :D
[22:16:51] <m3t4lukas> and maybe fix the spelling of their config files :D
[22:17:12] <cheeser> i run mongod on ubuntu. 0 problems.
[22:17:47] <m3t4lukas> brotatochip: wayland is fast fast fast, just fast
[22:20:11] <m3t4lukas> brotatochip: still have to do it. Actually after over a year nvidia still has not gotten to utilize dual GPU notebooks GPUs in an efficient manner
[22:20:58] <Muimi> Where is the security.authorization config?
[22:24:21] <m3t4lukas> don't whine :o You'll see mongo in linux is waaay more fun
[22:24:52] <m3t4lukas> okay, actually the last time I looked at a windows machine is years ago :P
[22:25:20] <m3t4lukas> so I don't know that it is more fun. Only thing I know is that Windows is a pain in the butt
[22:25:38] <brotatochip> have a good night m3t4lukas
[22:25:42] <Muimi> Well, I got like 2 gigs of data left on my pc....
[22:26:02] <m3t4lukas> brotatochip: thanks, I wish you luck and success with your sitch
[22:26:07] <Muimi> i'll probably be doing almost everything on my server, itself, but I wish I had a local testing environment for offline stuff. I might uninstall a bunch of stuff and reinstall it in a vb.
[22:26:25] <m3t4lukas> Muimi: that is an issue for many reasons
[22:26:46] <brotatochip> thanks m3t4lukas, also thanks for your help and suggestions
[22:27:03] <m3t4lukas> Muimi: maybe you can free up some gigs if you use explorer search for "log" and delete all log files on our machine. Might even make it faster
[23:27:54] <brotatochip> hey cheeser, in this doc https://docs.mongodb.org/manual/core/backups/#backup-with-file-copies in the section under “Backup by Copying …” where it mentions snapshots and journaling, the journal it is referring to is the filesystem journal, correct?
[23:29:28] <brotatochip> actually i’m starting to think it’s referring to the mongod journal
[23:29:48] <brotatochip> yeah nvm, it’s gotta be mongod’s journal
[23:34:45] <ThisIsDog> I'm trying to find the intersection of two queries. I tried using $and, but its not working as I expected. Does anyone see what I'm doing wrong? https://dpaste.de/TyqA
[23:39:12] <joannac> ThisIsDog: why do you even need $and for that?
[23:39:37] <ThisIsDog> I don't feel I can make any assumptions about the structure of my query
[23:40:34] <joannac> what does cls.query.find({"isActive": True, "name": "Dog"}).all() return?
[23:42:48] <ThisIsDog> The query from my example, but I'm afraid of a case where a complex query is sent and it explicity defines isActive False. I'm worried I can't just set isActive without changing its meaning.
[23:44:17] <ThisIsDog> I'd want to make sure I return nothing, instead of assuming what they wanted.
[23:46:11] <joannac> what does cls.query.find({$and: [{"isActive": True}, {"name": "Dog"}]}).all() return ?