[04:01:00] <NotBobDole> hi there. I'm new to mongo. Seems fairly simple though once you understand how it works.
[04:02:03] <NotBobDole> I'm trying to figure out the server provisioning for shards that are going to be dealing with a collection that has 450 million+ documents. Luckily, during usage, it only makes inserts to the collection that I've mentioned.
[04:02:43] <NotBobDole> The collection at that size is about 200GB.
[04:04:29] <NotBobDole> With doing strictly inserts to a collection, how would I figure out workingset/break up this data?
[04:05:44] <NotBobDole> I know a shard key that should bring better performance, but I'm not sure how to handle how much ram is needed, if I need to do high end SSD IOPS provisioning, etc. Can't seem to google what I need.
[04:35:11] <NotBobDole> Essentially: There are 40,000,000 some-odd inserts during peak time within a two hour period. So about 5.5k inserts a second.
[04:37:18] <NotBobDole> And I'm not entirely sure how that will relate to the working set for mongo3 on MMAP. I'll see what the white papers have to help me figure this out.
[04:39:38] <NotBobDole> Just mainly wondering about RAM and Storage type needs. From my understanding your working set is what data you're accessing, but I'm not sure how that translates across many inserts. Mongo loses performance once dataset is out of range of ram, and more when the working set is.
[04:41:33] <morenoh149> NotBobDole: right, that's more for accesses right?
[04:41:45] <morenoh149> an insert may be anywhere in the dataset
[04:42:04] <morenoh149> indexes however are the 'working set' and are used for finding stuff
[04:42:29] <morenoh149> so keep enough ram for your indexes
[04:42:43] <morenoh149> your indexes will depend on the shape of the data and your access patterns
[04:43:40] <NotBobDole> mkay. So as long as I cover index size, I'll be good?
[04:44:32] <morenoh149> usually. That's what I always hear
[04:50:41] <NotBobDole> morenoh149, as far as autosharding, from what I've seen, the rebalancing has caused issues during times of high writing.
[04:51:22] <morenoh149> indeed. be wary if you're access patterns are bursty
[04:51:29] <NotBobDole> Journal file is mongo's way of limiting fault tolerance in the event of a crash. from my understanding, mongo writes out to dataset every 60 seconds. However, you can enable the journal file and get writes out to disk every 100ms
[04:51:37] <morenoh149> if you can anticipate load try to pre scale out
[04:51:59] <morenoh149> sometimes you can't, so you have to maintain extra resources around, but that costs more money
[04:52:22] <NotBobDole> I have a good idea of the peak load, so I'm trying to work out what is actually needed, and what is new
[04:52:51] <NotBobDole> I'm looking into using sharding tags to have servers for specific collections
[04:53:18] <NotBobDole> Which brought up the question: What do i need for the usecase of just.. doing inserts.
[04:55:09] <morenoh149> you can put a 300GB disk on an ec2 instance
[04:55:16] <NotBobDole> Yup. Again, though, how much ram ;)
[04:55:29] <NotBobDole> That is exactly what i'm doing, but I'm playing with figuring out how to do sharding first
[04:56:00] <NotBobDole> Just was waiting for laptop to build out the mongos cluster so I can figure out what to do from there
[04:56:18] <NotBobDole> So I figured I'd ask questions regarding what I'm doing.
[04:57:15] <morenoh149> ram depends on your indexes, which depend on your data
[04:57:45] <morenoh149> to do an estimate load a sample on your laptop (or the whole thing if you can manage) and build the indexes you'll need for your operations
[04:58:07] <morenoh149> you can then project the size of your working set on the full dataset
[04:58:29] <NotBobDole> I've got full dataset in AWS ;)
[04:58:34] <morenoh149> and add some extra ram for flexibility
[04:59:02] <NotBobDole> But when I was trying to use it I had application errors on my one server that caused nothing to work. I only saw queued reads and writes without almost any actual writing or usage of the database
[04:59:38] <NotBobDole> I created a new database and ran loadtest again, and the application worked, so I just assumed it was timing out due to the collection indexes and such being too large for my one big mongos server
[05:01:15] <morenoh149> even if you can't fit the working set into ram things should still work
[05:01:16] <NotBobDole> You been using mongo for a long while, morenoh149 ?
[05:01:24] <morenoh149> it'll just be dramatically slower
[05:01:32] <NotBobDole> Hm. The javadriver was reporting timeout errors on almost every response
[05:01:38] <morenoh149> nope. But I did the certification course.
[05:02:03] <morenoh149> I've never actually used it for anything major but I'd like to be ready when I do.
[05:02:56] <NotBobDole> I'm in 202 right now and really far behind since this is taking precedent to the class. But when I gto time I continue with the lessons on it.
[05:03:16] <morenoh149> oh then you probably know more than me
[05:03:27] <NotBobDole> Haah. I dunno about that. I've been learning as I go
[05:03:40] <NotBobDole> I am doing 101 and 202 in free time right now
[05:04:07] <NotBobDole> I'll probably fail since I"m a week and a half behind, but ah well. i can just retake it later when I actually have time to do the work in the timeframes specified.
[05:05:04] <morenoh149> could you get explain data for your queries?
[05:07:12] <NotBobDole> These queries are just about purely inserts. I'll have to spin up my testing server and view the logs to do that. But essentially: the data inserted is uid, timestamp, choice, previous choice, how long since last click, and a few other items. On average 5 inserts per user 40 times across 200k expected users at once.
[05:07:57] <NotBobDole> I don't know how to cut the dataset, sadly.
[05:08:09] <NotBobDole> I'm not part of the developers, just a devops guy
[13:47:03] <morf> can somebody tell if it's possible to create dump like this: `mongodump --db=dbname --archive=dbname` and restore it to different database with single command? (mongorestore simply restores the original database no matter what --db argument i pass)
[13:47:45] <morf> also sorry for asking again, but it looks like somebody is alive now :)
[13:48:11] <StephenLynx> you can specify the db on restore.
[13:48:40] <morf> well that doesn't work for me at all
[13:48:46] <morf> it simply restores the original db