PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Monday the 27th of July, 2015

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[04:31:29] <BeaverP2> hi there
[04:31:30] <BeaverP2> I wonder what ratio between SSD and RAM is used for different database types. it is read heavy workload we are looking at and I just need a rough estimate
[05:16:23] <kba> BeaverP2: http://docs.mongodb.org/manual/faq/fundamentals/#does-mongodb-require-a-lot-of-ram
[05:17:28] <BeaverP2> thanks but its not what I am looking for
[05:17:46] <kba> then I don't know what you're asking
[05:17:50] <BeaverP2> using mmap and mapping the db files to memory is custom
[05:18:23] <BeaverP2> I need to know what are good or best ratios for SSD vs RAM for different kind of data sets
[05:18:55] <BeaverP2> it is like in the old days for oracle it is said that having 1TB RAM for a 10TB rational database is usually a very good ratio
[05:19:13] <joannac> BeaverP2: how much data do you need to regularly access?
[05:19:13] <BeaverP2> but since HDDs are lazy I wonder what it is best for SSD
[05:19:46] <kba> RAM is always faster than disk, so the more RAM the better. So at what point does adding more RAM become irrelevant? When you can fit your entire working set in.
[05:19:48] <BeaverP2> it is comparable to shop related data like order processing and stuff along with modern social networks like messaging and notifications
[05:20:04] <BeaverP2> the entire workset will be huge
[05:20:12] <BeaverP2> we talk about petasize
[05:20:18] <BeaverP2> at least we have to develop for it
[05:20:29] <BeaverP2> so we have 100reds of servers if everything works well
[05:20:49] <BeaverP2> but we wont use mongodb to do the scaling it is a local storage option
[05:21:04] <BeaverP2> like having 100 mongodbs working in parallel
[05:21:13] <BeaverP2> independent from each other
[05:21:38] <BeaverP2> but in the peta size is about 90% for archive
[05:21:47] <BeaverP2> so those are lying on consumer ssds
[05:21:58] <kba> but you're asking an odd question, then. What's the best ratio of disk to memory? No disk at all, only memory. What's the second best? Almost no disk, only memory.
[05:21:58] <BeaverP2> so it is basically just 100TB of life data
[05:22:24] <BeaverP2> i know kba i want to know what works best for certain companies
[05:22:26] <kba> But surely, after having all the indices in memory, the remaining RAM becomes less important
[05:22:37] <BeaverP2> correct
[05:22:58] <BeaverP2> usually one says that in relational data sets you have 1 : 3 for index
[05:23:26] <kba> Okay, I don't know of any best practice metric, sorry
[05:23:31] <BeaverP2> ok
[05:24:10] <BeaverP2> for SSD internal they use as far as I know 512MB for buffering in consumer and several GB for enterprise systems
[05:24:22] <BeaverP2> so there is a cost effective point for some data sets
[05:24:34] <BeaverP2> maybe its even 1:20 I do not know
[05:25:32] <kba> I'd guess it heavily depends on the specific data. I can imagine that for many sites, 1% of the data is what's being accessed 90% of the time.
[05:25:52] <kba> Like a news site. After stuff goes off the front page, it's rarely retrieved afterwards.
[05:26:20] <kba> In that case, being able to just cache the active 1% would yield great performance enhancements
[05:26:58] <kba> If all your data is accessed completely random, there doesn't sem to be any exponential benefit to adding anything to memory other than the indices.
[05:27:23] <kba> If you can have 5% in memory, you'll only be able to serve 5% of the requests faster if you're doing your caching right.
[05:28:53] <kba> So do some analysis on how often what data is being accessed, then make a cost/benefit analysis and find the sweet spot.
[05:59:30] <BeaverP2> kba that are the right thoughts indeed
[05:59:54] <BeaverP2> the problem is I can not predict it after all
[06:00:18] <BeaverP2> I have designed a simple mechanism where I use linux to count life pages of the file storage areas
[06:01:25] <BeaverP2> ok I will extend that and stop and try to do this by sampling and do an adaptive solution
[06:01:30] <BeaverP2> thanks for your input
[06:01:33] <BeaverP2> cheers
[07:02:42] <mbuf> when using https://docs.opsmanager.mongodb.com/current/reference/api/hosts/#create-a-new-host to create three new hosts, is there a way to specify which host is primary and which two are secondary?
[07:07:11] <joannac> mbuf: Why do you need to specify one to be primary?
[07:41:50] <mbuf> joannac, the first mongod instance that I add to the host becomes the primary, by default?
[07:42:07] <mbuf> joannac, they will automatically elect a primary?
[07:43:03] <mbuf> joannac, right now, the first mongod instance added to host becomes primary, and all subsequent ones I specify explicitly as "replicaStateName": "SECONDARY", "typeName": "REPLICA_SECONDARY"
[07:43:12] <mbuf> joannac, is this setting not required?
[07:43:53] <mbuf> joannac, I am using Ansible to configure the mongod instances, and I after election, I know the primary and other members that are part of the replica set
[07:44:18] <mbuf> joannac, I am trying to add the primary and secondary mongod instances to the host for MMS monitoring
[07:44:36] <mbuf> joannac, does that give you context as to what I am trying to accomplish?
[07:44:45] <joannac> okay, so adding them to monitoring has zero impact on what they actually are
[07:45:04] <joannac> so to answer your question, typeName is superfluous
[07:46:54] <mbuf> joannac, if I don't specify the replicateStateName and replicateSetName, then they are simply added to the listing
[07:47:08] <mbuf> joannac, here, replicateSetName is "staging"
[07:47:28] <mbuf> joannac, is that fine?
[07:48:25] <joannac> Try it and see
[07:48:46] <joannac> I think Cloud Manager will figure all of that out
[08:13:54] <jamiel> Morning all, the documentation states that sharding can be enabled on collections smaller than 256GB (http://docs.mongodb.org/manual/reference/limits/#Sharding-Existing-Collection-Data-Size) ... we have a collection which is currently at 245GB, we have thought about sharding in the future but considering this limit, am I right in thinking if we plan on ever
[08:13:55] <jamiel> doing it we have to do it now or it will no longer possible without restructuring our collections?
[08:16:28] <mbuf> joannac, okay; thanks!
[08:22:02] <joannac> jamiel: default shard key size?
[08:23:46] <joannac> how far away are you from planning to shard?
[08:26:23] <jamiel> the shard key might only be 1-2 bytes (NumberInt) , or 2-4 bytes (2 x NumberInt)
[08:27:31] <jamiel> We were hoping to start sharding in a few weeks (currently our staging environment is sharded for testing), but our data is growing quicker than anticipated
[08:53:53] <jamiel> shard key size 1-4 bytes , average document size: 2403 bytes (as taken from coll.stats() )
[08:54:36] <jamiel> it's not clear from that table what the formula is
[09:38:03] <synthmeat> am i correct in understanding that i'm to avoid mutable arrays at any cost, even if they're just manual/db refs?
[10:50:21] <pdekker> Hi, I have a question. What is a smart setup for data/log directories? Put them in the home directory and run mongod as a local user? Or put them under /storage for example (owned by root) and run mongod as root?
[10:51:00] <Zelest> First alternative.
[10:51:05] <Zelest> No doubt about it.
[12:24:31] <pdekker> Zelest: thanks for your advice!
[13:08:07] <gcfhvjbkn> https://github.com/mongodb/mongo-tools/blob/r3.1.6/mongodump/options.go#L28
[13:08:11] <gcfhvjbkn> what is this option for?
[13:08:16] <gcfhvjbkn> apparently it is undocumented
[13:08:56] <gcfhvjbkn> i've got a dump that i did selecting this option, what can i do with it now?
[13:09:00] <gcfhvjbkn> it's not in gz format
[13:09:30] <gcfhvjbkn> also gzipping it once more makes it go down from 2.6G to 300M
[13:10:02] <gcfhvjbkn> (ie both choosing the option and setting Gzip flag to true)
[13:21:01] <deathanchor> gcfhvjbkn: too bleeding edge for me
[13:21:17] <deathanchor> those options don't exist on the version I use
[13:36:04] <lemonxah> good day
[13:36:09] <lemonxah> how are you all doing
[13:36:26] <lemonxah> just a question about enabling replication set on /etc/init.d/mongodb
[13:36:52] <lemonxah> how do i make that the service that starts with debian has the --replSet rs0
[14:08:42] <deathanchor> lemonxah: I use a config file, or you an just modify the init file where it starts it
[14:22:35] <gcfhvjbkn> i'm trying to use mongo-tools as a library
[14:22:38] <gcfhvjbkn> all is well
[14:22:52] <gcfhvjbkn> but it's a great pity i can't get rid of this
[14:22:53] <gcfhvjbkn> https://github.com/mongodb/mongo-tools/blob/master/mongodump/mongodump.go#L321
[14:22:59] <gcfhvjbkn> and this
[14:22:59] <gcfhvjbkn> https://github.com/mongodb/mongo-tools/blob/master/mongodump/mongodump.go#L317
[14:25:07] <gcfhvjbkn> would be great if you could pass a flag to MongoDump.Dump disabling this behaviour
[14:39:51] <_ari> hm
[14:40:02] <_ari> is there a way i can error check my query
[14:40:19] <_ari> forexample if they pass in a start/end date that does not exist in the collection
[15:00:08] <Tug> What is the relation between MMS and the Ops Manager? Is the former a SaaS implementation of the latter?
[15:43:10] <pdekker> Another question, I am trying to create a replication set based on an existing database
[15:43:20] <pdekker> I reran the db with the replSet option
[15:46:34] <pdekker> When I try to login to the mongo shell and do rs.initiate(), I get:
[15:46:37] <pdekker> not authorized on admin to execute command { replSetInitiate: undefined }
[15:46:58] <pdekker> Even if I try to log in with a user or admin
[15:47:05] <pdekker> What could be the problem?
[16:09:39] <deathanchor> pdekker: already on a machines with replset started?
[16:09:43] <deathanchor> rs.status()?
[16:11:31] <pdekker> rs.status() gives the same error
[16:11:50] <pdekker> deathanchor: the replset option has been given, but it has not been initiated yet
[16:22:12] <deathanchor> are you using --auth?
[16:56:03] <pamp> Its possible rename o User ?
[17:06:10] <blizzow> I upgraded to mongo3 on July 16th. Check out the CPU and Load graphs of my replica set members. http://i.imgur.com/QEbib3C.png. I cannot for the life of me figure out what's been going on. Mongostat says it's allocated huge amounts of memory, but the OS isn't reporting much memory usage, or stuff being cached. I'm totally unsure of where to look for what's causing this load. I turned on profiling and it seems like multiple queries
[17:17:33] <danijoo> pamp, i thin you can simply alter the name in the user collection
[17:56:52] <deathanchor> blizzow: disk wait?
[18:01:45] <blizzow> deathanchor: We were thinking that, so we truncated a lot of our data so indexes would fit (or get close to fitting) in RAM. It helps with IOwait, but the load averages and overal cpu seem much much higher.
[18:02:46] <blizzow> Mongo storage is on a dedicated disk and swap/logging is on a different disk (shared).
[19:32:44] <blizzow> If I'm writing into a replica set and one of the members is down, will writing with a writeconcern=safe still work?
[19:52:30] <pdekker> deathancor: I am using both auth and a keyfile for authentication, could this be the cause of the problem?
[20:07:49] <blizzow> Rather, if one of my replica set members is in STARTUP2. One of my devs is trying to run a mongoimport to a mongos server and the import just seems to hang. I have one replica set member that's syncing from the ground up and that's the only weirdness I can think of.
[20:21:00] <jr3> querying a document that's 3mb is taking 1.2 seconds locally
[20:21:04] <jr3> does that sound right?
[20:24:23] <cheeser> doesn't sound egregiously abnormal, no.
[20:25:59] <jr3> cheeser: we're tracking down some bottlenecks and using the blocked npm package it's showing that a mongoose call is blocking our node process for 1.2 seconds
[20:26:19] <jr3> which doesn't seem to be a good thing
[20:51:31] <deathanchor> what does your node process do?
[20:51:34] <deathanchor> write to the db?
[20:57:08] <blizzow> What is the default balancing window on a sharded cluster? Or what is the default balancer schedule?
[21:17:27] <s2013> if i set something like user = db.users.findOne({email: "some@email.com"})
[21:17:38] <s2013> how can i update/insert a field for that user
[21:18:13] <s2013> is it update and set?
[21:29:28] <danijoo> s2013, db.users.update({email: "some@email.com"}, {$set: {name: "users new name"}})