[04:31:30] <BeaverP2> I wonder what ratio between SSD and RAM is used for different database types. it is read heavy workload we are looking at and I just need a rough estimate
[05:17:28] <BeaverP2> thanks but its not what I am looking for
[05:17:46] <kba> then I don't know what you're asking
[05:17:50] <BeaverP2> using mmap and mapping the db files to memory is custom
[05:18:23] <BeaverP2> I need to know what are good or best ratios for SSD vs RAM for different kind of data sets
[05:18:55] <BeaverP2> it is like in the old days for oracle it is said that having 1TB RAM for a 10TB rational database is usually a very good ratio
[05:19:13] <joannac> BeaverP2: how much data do you need to regularly access?
[05:19:13] <BeaverP2> but since HDDs are lazy I wonder what it is best for SSD
[05:19:46] <kba> RAM is always faster than disk, so the more RAM the better. So at what point does adding more RAM become irrelevant? When you can fit your entire working set in.
[05:19:48] <BeaverP2> it is comparable to shop related data like order processing and stuff along with modern social networks like messaging and notifications
[05:20:04] <BeaverP2> the entire workset will be huge
[05:21:38] <BeaverP2> but in the peta size is about 90% for archive
[05:21:47] <BeaverP2> so those are lying on consumer ssds
[05:21:58] <kba> but you're asking an odd question, then. What's the best ratio of disk to memory? No disk at all, only memory. What's the second best? Almost no disk, only memory.
[05:21:58] <BeaverP2> so it is basically just 100TB of life data
[05:22:24] <BeaverP2> i know kba i want to know what works best for certain companies
[05:22:26] <kba> But surely, after having all the indices in memory, the remaining RAM becomes less important
[05:24:10] <BeaverP2> for SSD internal they use as far as I know 512MB for buffering in consumer and several GB for enterprise systems
[05:24:22] <BeaverP2> so there is a cost effective point for some data sets
[05:24:34] <BeaverP2> maybe its even 1:20 I do not know
[05:25:32] <kba> I'd guess it heavily depends on the specific data. I can imagine that for many sites, 1% of the data is what's being accessed 90% of the time.
[05:25:52] <kba> Like a news site. After stuff goes off the front page, it's rarely retrieved afterwards.
[05:26:20] <kba> In that case, being able to just cache the active 1% would yield great performance enhancements
[05:26:58] <kba> If all your data is accessed completely random, there doesn't sem to be any exponential benefit to adding anything to memory other than the indices.
[05:27:23] <kba> If you can have 5% in memory, you'll only be able to serve 5% of the requests faster if you're doing your caching right.
[05:28:53] <kba> So do some analysis on how often what data is being accessed, then make a cost/benefit analysis and find the sweet spot.
[05:59:30] <BeaverP2> kba that are the right thoughts indeed
[05:59:54] <BeaverP2> the problem is I can not predict it after all
[06:00:18] <BeaverP2> I have designed a simple mechanism where I use linux to count life pages of the file storage areas
[06:01:25] <BeaverP2> ok I will extend that and stop and try to do this by sampling and do an adaptive solution
[07:02:42] <mbuf> when using https://docs.opsmanager.mongodb.com/current/reference/api/hosts/#create-a-new-host to create three new hosts, is there a way to specify which host is primary and which two are secondary?
[07:07:11] <joannac> mbuf: Why do you need to specify one to be primary?
[07:41:50] <mbuf> joannac, the first mongod instance that I add to the host becomes the primary, by default?
[07:42:07] <mbuf> joannac, they will automatically elect a primary?
[07:43:03] <mbuf> joannac, right now, the first mongod instance added to host becomes primary, and all subsequent ones I specify explicitly as "replicaStateName": "SECONDARY", "typeName": "REPLICA_SECONDARY"
[07:43:12] <mbuf> joannac, is this setting not required?
[07:43:53] <mbuf> joannac, I am using Ansible to configure the mongod instances, and I after election, I know the primary and other members that are part of the replica set
[07:44:18] <mbuf> joannac, I am trying to add the primary and secondary mongod instances to the host for MMS monitoring
[07:44:36] <mbuf> joannac, does that give you context as to what I am trying to accomplish?
[07:44:45] <joannac> okay, so adding them to monitoring has zero impact on what they actually are
[07:45:04] <joannac> so to answer your question, typeName is superfluous
[07:46:54] <mbuf> joannac, if I don't specify the replicateStateName and replicateSetName, then they are simply added to the listing
[07:47:08] <mbuf> joannac, here, replicateSetName is "staging"
[07:48:46] <joannac> I think Cloud Manager will figure all of that out
[08:13:54] <jamiel> Morning all, the documentation states that sharding can be enabled on collections smaller than 256GB (http://docs.mongodb.org/manual/reference/limits/#Sharding-Existing-Collection-Data-Size) ... we have a collection which is currently at 245GB, we have thought about sharding in the future but considering this limit, am I right in thinking if we plan on ever
[08:13:55] <jamiel> doing it we have to do it now or it will no longer possible without restructuring our collections?
[08:23:46] <joannac> how far away are you from planning to shard?
[08:26:23] <jamiel> the shard key might only be 1-2 bytes (NumberInt) , or 2-4 bytes (2 x NumberInt)
[08:27:31] <jamiel> We were hoping to start sharding in a few weeks (currently our staging environment is sharded for testing), but our data is growing quicker than anticipated
[08:53:53] <jamiel> shard key size 1-4 bytes , average document size: 2403 bytes (as taken from coll.stats() )
[08:54:36] <jamiel> it's not clear from that table what the formula is
[09:38:03] <synthmeat> am i correct in understanding that i'm to avoid mutable arrays at any cost, even if they're just manual/db refs?
[10:50:21] <pdekker> Hi, I have a question. What is a smart setup for data/log directories? Put them in the home directory and run mongod as a local user? Or put them under /storage for example (owned by root) and run mongod as root?
[17:06:10] <blizzow> I upgraded to mongo3 on July 16th. Check out the CPU and Load graphs of my replica set members. http://i.imgur.com/QEbib3C.png. I cannot for the life of me figure out what's been going on. Mongostat says it's allocated huge amounts of memory, but the OS isn't reporting much memory usage, or stuff being cached. I'm totally unsure of where to look for what's causing this load. I turned on profiling and it seems like multiple queries
[17:17:33] <danijoo> pamp, i thin you can simply alter the name in the user collection
[18:01:45] <blizzow> deathanchor: We were thinking that, so we truncated a lot of our data so indexes would fit (or get close to fitting) in RAM. It helps with IOwait, but the load averages and overal cpu seem much much higher.
[18:02:46] <blizzow> Mongo storage is on a dedicated disk and swap/logging is on a different disk (shared).
[19:32:44] <blizzow> If I'm writing into a replica set and one of the members is down, will writing with a writeconcern=safe still work?
[19:52:30] <pdekker> deathancor: I am using both auth and a keyfile for authentication, could this be the cause of the problem?
[20:07:49] <blizzow> Rather, if one of my replica set members is in STARTUP2. One of my devs is trying to run a mongoimport to a mongos server and the import just seems to hang. I have one replica set member that's syncing from the ground up and that's the only weirdness I can think of.
[20:21:00] <jr3> querying a document that's 3mb is taking 1.2 seconds locally
[20:25:59] <jr3> cheeser: we're tracking down some bottlenecks and using the blocked npm package it's showing that a mongoose call is blocking our node process for 1.2 seconds
[20:26:19] <jr3> which doesn't seem to be a good thing
[20:51:31] <deathanchor> what does your node process do?