[14:19:35] <GothAlice> Their question. Distressingly common. Also, easily answered with a search: https://stackoverflow.com/a/55386123/211827
[14:25:23] <GothAlice> kinlo: Define "using a large chunk of memory". RSS or VSZ?
[14:26:27] <GothAlice> Because if you're measuring VSZ, that might very well include the memory-mapped files being written to, which may seem extreme or absurd, and does not actually represent real memory usage.
[14:26:32] <kinlo> GothAlice: we see our mongodump oom'ing the mongo server
[14:26:50] <kinlo> and I'm well aware of the way memory-mapped files work :)
[14:27:02] <GothAlice> That's unusual. Has literally never happened to me, and I deal with some extreme amounts of data.
[14:28:00] <kinlo> we're talking about a 1TB datase here, so the backup takes several hours. I was under the impression the oplog is stored in ram, is that correct?
[14:29:25] <GothAlice> It's stored in a capped collection, that is, a fixed-size ring buffer. Which needs to be of sufficient allocated size to handle a buildup of changes during the period of backup, depending on flag usage. Similar to needing it to be large enough to accomodate the time it takes a new replica to spin up, or initial replication will never succeed.
[14:30:22] <kinlo> so basically, it's the mongodb server that is storing the oplog is what you are saying?
[14:31:02] <GothAlice> Nope; I can't say that. You haven't provided an MCVE illustrating what you're actually trying to do. I can only speak to generic problems, and that's a typical one.
[14:34:08] <kinlo> would it be beneficial to run mongodump from a remote server to reduce memory strain on the server, or does that make no sense?
[14:35:36] <GothAlice> I do not run any tools locally on the actual mongod nodes themselves. Dump will already have an impact on overall performance while underway, no need to make it worse by forcing pages to be swapped out (loss of cache) on the storage nodes.
[14:38:05] <GothAlice> Additionally, mongodump can be a poor choice for backups. Instead, run a non-voting, non-electable secondary. Turn it on and let it synchronize as your backup. Additionally, you can run a replica deferred, allowing recovery from user error, not just catastrophic failure. Ref: https://docs.mongodb.com/manual/core/replica-set-elections/ / https://docs.mongodb.com/manual/tutorial/expand-replica-set/ / https://docs.mongodb.com/manual/core/re
[14:38:15] <GothAlice> Dang, apologies on that last link, should have been: https://docs.mongodb.com/manual/core/replica-set-delayed-member/index.html
[14:40:18] <kinlo> thanks, I am aware that mongodump is not the ideal solution, but the "recommended" solutions seem to all be paying options. I will propose the extra secondary tough
[14:42:53] <GothAlice> For my work dataset, we run two in-office replicas. One "live", the other 48h delayed. Live lets us recover from catastrophic failure in a few minutes if needed (e.g. by directing the app's domain at the office's IP), or roll back any user-submitted alteration to the data that was not actually desired.
[14:43:58] <GothAlice> http://f.cl.ly/items/281M07023u3L3n1R3K1o/Screen+Shot+2017-08-11+at+23.32.11.png ← from a relational project at work. http://f.cl.ly/items/1Q1a3d063z0W130V3Q1R/Screen+Shot+2017-08-11+at+23.31.45.png ← from my project.
[14:44:48] <GothAlice> If I were using Postgres, the RPO would be one minute.
[14:45:12] <GothAlice> RTO would be two hours, though. ;^P
[14:47:41] <kinlo> I prefer postgres as well, but sometimes we cannot choose :)
[14:48:14] <GothAlice> What I was suggesting: it's a trade-off.
[14:49:51] <GothAlice> I mean, theoretically we could end up losing absolutely no data whatsoever, and achieve an actual RPO of zero. No data loss, due to that in-house live replica. The one hour thing is just guaranteed filesystem snapshots at the hosting level. ;)
[14:50:22] <GothAlice> (Another backup approach, and one, given the right filesystem, with no impact at all.)