PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Saturday the 23rd of January, 2016

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[03:56:07] <NotBobDole> Anyone alive tonight?
[03:59:57] <morenoh149> 👾
[04:01:00] <NotBobDole> hi there. I'm new to mongo. Seems fairly simple though once you understand how it works.
[04:02:03] <NotBobDole> I'm trying to figure out the server provisioning for shards that are going to be dealing with a collection that has 450 million+ documents. Luckily, during usage, it only makes inserts to the collection that I've mentioned.
[04:02:43] <NotBobDole> The collection at that size is about 200GB.
[04:04:29] <NotBobDole> With doing strictly inserts to a collection, how would I figure out workingset/break up this data?
[04:05:44] <NotBobDole> I know a shard key that should bring better performance, but I'm not sure how to handle how much ram is needed, if I need to do high end SSD IOPS provisioning, etc. Can't seem to google what I need.
[04:10:18] <Waheedi> join #c++
[04:10:22] <Waheedi> sorry
[04:10:43] <Waheedi> actually I'm not :P
[04:11:03] <NotBobDole> There we go
[04:12:56] <NotBobDole> If anyone has a suggestion, I'm all ears.
[04:13:08] <NotBobDole> Or a link or just better google queries.
[04:31:52] <morenoh149> NotBobDole: sooo
[04:31:58] <morenoh149> you got 200GB
[04:32:08] <morenoh149> you could put that on one box conceivably
[04:32:34] <morenoh149> what drives sharding is 1) how much $ do you have 2) how much throughput do you need
[04:33:30] <morenoh149> NotBobDole: https://www.mongodb.com/white-papers
[04:35:11] <NotBobDole> Essentially: There are 40,000,000 some-odd inserts during peak time within a two hour period. So about 5.5k inserts a second.
[04:37:18] <NotBobDole> And I'm not entirely sure how that will relate to the working set for mongo3 on MMAP. I'll see what the white papers have to help me figure this out.
[04:39:38] <NotBobDole> Just mainly wondering about RAM and Storage type needs. From my understanding your working set is what data you're accessing, but I'm not sure how that translates across many inserts. Mongo loses performance once dataset is out of range of ram, and more when the working set is.
[04:41:33] <morenoh149> NotBobDole: right, that's more for accesses right?
[04:41:45] <morenoh149> an insert may be anywhere in the dataset
[04:42:04] <morenoh149> indexes however are the 'working set' and are used for finding stuff
[04:42:29] <morenoh149> so keep enough ram for your indexes
[04:42:43] <morenoh149> your indexes will depend on the shape of the data and your access patterns
[04:43:40] <NotBobDole> mkay. So as long as I cover index size, I'll be good?
[04:44:32] <morenoh149> usually. That's what I always hear
[04:44:43] <NotBobDole> Thanks morenoh149 :)
[04:44:52] <morenoh149> http://s3.amazonaws.com/info-mongodb-com/MongoDB_Architecture_Guide.pdf
[04:44:56] <NotBobDole> I've been searching the internet a lot trying to figure out how this works.
[04:45:08] <morenoh149> looks like you can also autoshard if performance dips below some metric
[04:45:20] <morenoh149> search for ' ram'
[04:45:42] <morenoh149> https://s3.amazonaws.com/info-mongodb-com/MongoDB-Performance-Best-Practices.pdf
[04:46:31] <NotBobDole> Haha. I've been doing more limited searches that are probably a bit too specific. keep finding infomation from pre mongo2
[04:46:38] <NotBobDole> s/infomation/information
[04:48:58] <NotBobDole> So.. Use dataset with SSDs and the journal file with conventional HDD? cool.
[04:50:34] <morenoh149> what's the journal file
[04:50:41] <NotBobDole> morenoh149, as far as autosharding, from what I've seen, the rebalancing has caused issues during times of high writing.
[04:51:22] <morenoh149> indeed. be wary if you're access patterns are bursty
[04:51:29] <NotBobDole> Journal file is mongo's way of limiting fault tolerance in the event of a crash. from my understanding, mongo writes out to dataset every 60 seconds. However, you can enable the journal file and get writes out to disk every 100ms
[04:51:37] <morenoh149> if you can anticipate load try to pre scale out
[04:51:59] <morenoh149> sometimes you can't, so you have to maintain extra resources around, but that costs more money
[04:52:22] <NotBobDole> I have a good idea of the peak load, so I'm trying to work out what is actually needed, and what is new
[04:52:51] <NotBobDole> I'm looking into using sharding tags to have servers for specific collections
[04:53:18] <NotBobDole> Which brought up the question: What do i need for the usecase of just.. doing inserts.
[04:53:38] <morenoh149> how many inserts/second?
[04:53:55] <morenoh149> there's an upper bound on how many can be handled by one node
[04:54:23] <NotBobDole> Yeah but 5.5k isn't much
[04:54:31] <NotBobDole> I'm just worried about the dataset size
[04:54:48] <morenoh149> so then just one node (shard, or no sharding) and maybe some replicas
[04:54:56] <morenoh149> disk is cheap
[04:55:09] <morenoh149> you can put a 300GB disk on an ec2 instance
[04:55:16] <NotBobDole> Yup. Again, though, how much ram ;)
[04:55:29] <NotBobDole> That is exactly what i'm doing, but I'm playing with figuring out how to do sharding first
[04:56:00] <NotBobDole> Just was waiting for laptop to build out the mongos cluster so I can figure out what to do from there
[04:56:18] <NotBobDole> So I figured I'd ask questions regarding what I'm doing.
[04:57:15] <morenoh149> ram depends on your indexes, which depend on your data
[04:57:45] <morenoh149> to do an estimate load a sample on your laptop (or the whole thing if you can manage) and build the indexes you'll need for your operations
[04:58:07] <morenoh149> you can then project the size of your working set on the full dataset
[04:58:29] <NotBobDole> I've got full dataset in AWS ;)
[04:58:34] <morenoh149> and add some extra ram for flexibility
[04:59:02] <NotBobDole> But when I was trying to use it I had application errors on my one server that caused nothing to work. I only saw queued reads and writes without almost any actual writing or usage of the database
[04:59:38] <NotBobDole> I created a new database and ran loadtest again, and the application worked, so I just assumed it was timing out due to the collection indexes and such being too large for my one big mongos server
[05:01:15] <morenoh149> even if you can't fit the working set into ram things should still work
[05:01:16] <NotBobDole> You been using mongo for a long while, morenoh149 ?
[05:01:24] <morenoh149> it'll just be dramatically slower
[05:01:32] <NotBobDole> Hm. The javadriver was reporting timeout errors on almost every response
[05:01:38] <morenoh149> nope. But I did the certification course.
[05:02:03] <morenoh149> I've never actually used it for anything major but I'd like to be ready when I do.
[05:02:12] <NotBobDole> Ahhh
[05:02:19] <NotBobDole> What course did you do?
[05:02:50] <morenoh149> https://university.mongodb.com/courses/M101JS/about
[05:02:55] <morenoh149> there's a java one too
[05:02:56] <NotBobDole> I'm in 202 right now and really far behind since this is taking precedent to the class. But when I gto time I continue with the lessons on it.
[05:03:16] <morenoh149> oh then you probably know more than me
[05:03:27] <NotBobDole> Haah. I dunno about that. I've been learning as I go
[05:03:40] <NotBobDole> I am doing 101 and 202 in free time right now
[05:04:07] <NotBobDole> I'll probably fail since I"m a week and a half behind, but ah well. i can just retake it later when I actually have time to do the work in the timeframes specified.
[05:05:04] <morenoh149> could you get explain data for your queries?
[05:05:16] <morenoh149> or cut the dataset by 90%
[05:07:12] <NotBobDole> These queries are just about purely inserts. I'll have to spin up my testing server and view the logs to do that. But essentially: the data inserted is uid, timestamp, choice, previous choice, how long since last click, and a few other items. On average 5 inserts per user 40 times across 200k expected users at once.
[05:07:57] <NotBobDole> I don't know how to cut the dataset, sadly.
[05:08:09] <NotBobDole> I'm not part of the developers, just a devops guy
[05:10:51] <kenalex> hello ladies and gentlemen
[05:11:04] <morenoh149> NotBobDole: http://stackoverflow.com/a/15065490/630752
[05:11:47] <morenoh149> what's the dataset format? csv?
[05:17:29] <NotBobDole> I'm not exactly sure. I would assume our programmers have it in json.
[05:17:51] <NotBobDole> Everything else we have is in json, so that is my assumption.
[05:27:01] <NotBobDole> morenoh149, I gotta head to sleep, but you gave me ideas as far as what to look at, thanks
[05:27:26] <NotBobDole> kenalex, was just trying to figure out server provisioning stuff right now.
[11:18:30] <morf> morning
[11:18:55] <morf> can i mongodump database and mongorestore it to different database?
[11:19:32] <morf> i'm passing -db param to mongorestore, but it creates the collections in the original db again
[11:33:31] <morf> well i did it, but somehow it doesn't make any sense...
[11:34:44] <morf> so different question: when i do mongdump --db=dbname --archive=dbname ... can i restore the dump to different database?
[13:40:43] <synthmeat> wow. mongo and friends binaries are gig and a half
[13:41:46] <StephenLynx> hm
[13:41:47] <StephenLynx> you sure?
[13:41:57] <StephenLynx> I never got over 200mb on the reposiroty
[13:42:00] <StephenLynx> repository
[13:42:33] <synthmeat> StephenLynx: yup, just did a build of 3.2.1 on several machines
[13:43:14] <StephenLynx> welp
[13:43:17] <synthmeat> https://gist.github.com/synthmeat/7def21a32223015a52fe
[13:47:03] <morf> can somebody tell if it's possible to create dump like this: `mongodump --db=dbname --archive=dbname` and restore it to different database with single command? (mongorestore simply restores the original database no matter what --db argument i pass)
[13:47:45] <morf> also sorry for asking again, but it looks like somebody is alive now :)
[13:48:11] <StephenLynx> you can specify the db on restore.
[13:48:40] <morf> well that doesn't work for me at all
[13:48:46] <morf> it simply restores the original db
[13:48:57] <StephenLynx> you are using it wrong.
[13:49:03] <morf> lol
[13:49:04] <StephenLynx> check your version and the docs.
[13:49:18] <morf> not helping
[13:49:22] <StephenLynx> I just used it yesterday and it would be placed on the db I asked for.
[13:49:34] <morf> with archive?
[13:49:49] <StephenLynx> no, I didn`t use that options.
[13:49:52] <StephenLynx> option*
[13:51:11] <morf> well i can do it with normal dump too
[20:31:34] <dman777_alter> how do I make a bounded array with empty values? I tried http://dpaste.com/35JTW0N but I get "pets" : [ ], "__v" : 0
[20:47:27] <StephenLynx> what are the minimum amount of physical servers I need to use sharding?
[20:47:50] <StephenLynx> mongos can run with a mongod running and then there is the second mongod, two.
[20:48:00] <StephenLynx> what else is required?
[20:49:08] <StephenLynx> can the settings server run on the second server already running mongod?