PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Monday the 25th of January, 2016

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[10:05:20] <fish_> hi
[10:05:31] <fish_> what options do I have if I realize my shard keys cardinality is too low? stop writes, dump db, stop it and remove data, then import again? or can I somehow live reshard it?
[10:10:48] <m3t4lukas> fish_: cardinality comes from your data itself. You'd need to choose another shard key when resharding. Unfortunately you cannot do that live as far as I know.
[10:11:49] <m3t4lukas> fish_: if having no downtime is very important you should consider developing some resharding plan that uses the oplog.
[10:47:18] <lipiec> hi, I am experiencing the following error after upgrading my sharded cluster to 3.2.1
[10:47:21] <lipiec> https://jira.mongodb.org/browse/SERVER-22247
[10:47:33] <lipiec> this bug is resolved in 3.2.2
[10:47:40] <lipiec> how can I prevent this error now?
[10:47:51] <LastCoder> hello
[10:48:04] <LastCoder> I have a noobie question. Not a noobie coder just noob to mongo
[10:48:31] <LastCoder> I want to keep a running total on an object that has multiple keys
[10:48:58] <LastCoder> I'm used to having to check for a key to exist and initializing it
[10:49:29] <LastCoder> using php mongo, how would I keep a running total of an object that would look like in php
[10:49:40] <lipiec> is it safe to downgrade only mongos to 3.0.7
[10:49:48] <LastCoder> $obj[$key1][$key2][$key3][$key4]++
[10:49:56] <lipiec> or 3.2.0?
[10:50:04] <fish_> m3t4lukas: you mean writing/using some tool that replicates data + oplog? I've looked into hydra for that but ran into several small glitches
[10:50:06] <LastCoder> but to do it using the thread safe increment way
[10:59:49] <fish_> in general, what could be possible downsides of using a hashed shard key based on _id? to me that seems like an option which should always work
[11:15:44] <LastCoder> how about the command to create user? I tried the one from the docs but I can't auth from home only connect
[12:29:46] <LastCoder> how do I do a multikey group sum?
[12:30:27] <LastCoder> '_id' => '$_id.GameId,$_id.ObjectId'
[12:57:16] <LastCoder> well I figured it out
[12:57:18] <LastCoder> *Struts*
[12:57:20] <LastCoder> mongo is fun
[13:44:37] <sulaiman> Hello. Is it a good idea to use mongodb (or any similar nosql tech ) over any RDBMS if I will be generating lots of reports for analytics? I will have a lot of aggregations and groupings to perform
[13:45:38] <sulaiman> I have a very simple data to store, no more than 4 values per entry.
[13:47:03] <StephenLynx> as in, performing joins to mold the data to be output?
[13:51:56] <sulaiman> not joins, for e.g. let's say I am storing a person's name and the restaurant he visited. I will need to find the count of unique people that visited a restaurant for a given week/month or year, number of visitors for all restaurants, etc. I will only be storing the following data 1) time 2) person's name 3) restaurant name
[13:54:44] <StephenLynx> well
[13:54:54] <StephenLynx> its common to pre-aggregate data.
[13:55:06] <StephenLynx> if you regularly needs to obtain information that is slow to aggregate all at once.
[13:55:36] <StephenLynx> lets say you have event X and you wish to know how many events X occurred in a day.
[13:56:01] <StephenLynx> instead of always aggregating from raw event information, you pre-aggregate a counter every time the event occurs.
[13:56:15] <StephenLynx> so when you want to look for a day, you just pick the pre-aggregated data for that day
[13:58:00] <StephenLynx> instead of looking for the whole counting of events, you just look for a specific day.
[15:44:31] <watmm> if i do a dump of a 2.4 db, restore and upgrade to 2.6, 3.0, then 3.2. Can i manually import the additional differences that were made to the 2.4 db into the 3.2, or do i have to do the whole thing in one pass?
[16:17:03] <cheeser> watmm: https://docs.mongodb.org/manual/release-notes/2.6-upgrade/
[16:18:50] <watmm> Yeah, wondering why an import rather than a dump/restore is unreliable
[16:19:06] <StephenLynx> import doesn't use bson
[16:19:15] <cheeser> right. so you lose type fidelity.
[16:19:17] <StephenLynx> dump/restore doesn't convert data
[16:19:22] <StephenLynx> its 1:!
[16:19:24] <StephenLynx> 1:1
[17:27:47] <MacWinner> i'm planning my mongo 3.0.8 to 3.2.1 upgrade.. i think I have most of the steps down where I wil basically shutdown mongo secondary, upgrade the binaries, update teh config to use wiredtiger, delete the data directory, and restart mongo.. then it should do a full sync. with teh primary.. i have one remaining question. if I delete the data directory, how does the secondary mongo know that it's part of the cluster? my /etc/mongod.conf only has a
[17:27:48] <MacWinner> cluster name configured. do I need to re-add the secondary node from the primary? or should I remove the node from the primary first?
[17:28:32] <MANCHUCK> MacWinner: the primary will try and contact the secondary
[17:29:39] <MANCHUCK> all you really need to do is launch mongod on the secondary with the --replSet <name> set and when the primary contacts it, it will go into STARTUP
[17:57:45] <MacWinner> MANCHUCK, thanks!
[18:16:05] <magicantler> if a server reboots is there an easy way to automate the re-deployment of shards/mongod instances?
[18:16:25] <cheeser> redeployment?
[18:20:13] <m3t4lukas> magicantler: just create them as a service in your OS. What OS do you use?
[18:20:26] <magicantler> m3t4lukas: centOS
[18:20:44] <magicantler> m3t4lukas: and i created separated bash scripts for each shard instance
[18:21:30] <cheeser> all these shards are on the same machine?
[18:22:54] <m3t4lukas> magicantler: I hope this is not your production setup. If you want to test you should use virtualization.
[18:23:28] <magicantler> m3t4lukas: it's a test cluster
[18:23:36] <magicantler> m3t4lukas: virtualization?
[18:24:15] <magicantler> m3t4lukas: please point me to what you mean. as this will reflect the production setting. 3 shards spanned across 3 servers
[18:24:24] <m3t4lukas> magicantler: it's best to use one VM per mongod. When testing those VM's can be on the same host. Multiple mongod's fight for resources. You might argue that 3 is okay, but not more
[18:24:51] <magicantler> m3t4lukas: ok, well right now i was going to run 3 mongod instances on each, and limit the cacheram
[18:25:05] <m3t4lukas> magicantler: for 3 shards you'd need at least 10 servers. Otherwise sharding does not make sense
[18:25:20] <magicantler> m3t4lukas: each server holds a primary, secondary, and arbiter.
[18:25:40] <magicantler> and the machines are massive.
[18:25:46] <m3t4lukas> magicantler: in production you should only use sharding once you got enough data
[18:25:59] <cheeser> i don't know that I agree with *that*
[18:26:02] <m3t4lukas> you can always introduce sharding to a non sharded database
[18:26:13] <magicantler> gotcha, they want me to use sharding from the start however..
[18:26:20] <cheeser> if you know you're going to accumulate a lot of data, sharding early can prevent long delays in balancing.
[18:26:36] <cheeser> that said, you have to be *really* sure of your shard key because you can't change it.
[18:26:36] <m3t4lukas> cheeser's right
[18:26:52] <magicantler> regardless, this paradigm should in theory still allow for more parallelized writes and reads
[18:27:05] <m3t4lukas> but you should really use 9 machines or more
[18:27:16] <magicantler> yeah i agree. unfortunately i don't have that as an option
[18:28:36] <m3t4lukas> that's unfortunate. If you really only have three machines then I'd shard later based on what I know about mongo and failsafeness. Maybe cheeser has another opinion on that. I'd trust cheeser more than me :D
[18:30:07] <m3t4lukas> the way I see it: sharding this early on is nice to have. But failsafeness by distibuting replica sets is kinda necessary
[18:30:24] <m3t4lukas> especially in production
[18:31:34] <cheeser> there's little point to running more than one data bearing node on the same machine.
[18:32:32] <m3t4lukas> just an example: one of your shards hits OS config limits of open files, maximum inodes or whatever. The behavior becomes unpredictable and away's your data. And the possibility of that happening is very increased when you run three instances on one machine
[18:33:45] <cheeser> not to mention memory and disk space contention
[18:33:51] <cheeser> CPU scheduling, etc.
[18:34:23] <magicantler> well the replica set for each shard will have a primary of its own separated
[18:34:34] <m3t4lukas> shared disk bandwith
[18:34:47] <magicantler> separated disk drives
[18:34:59] <m3t4lukas> magicantler: how many machines do you have?
[18:35:02] <magicantler> 3
[18:35:27] <m3t4lukas> so there will be no separated primaries
[18:35:31] <magicantler> yeah
[18:35:38] <magicantler> each machine has a primary, secondary, and arbiter
[18:35:45] <magicantler> all of which belongs to a different replica set
[18:35:46] <MacWinner> anyone here experience any major pain when upgrading to wiretiger? i have a pretty simple replicaset with 200G of data. just want to avoid any obvious gotchas
[18:35:48] <magicantler> they rotate.
[18:36:10] <m3t4lukas> MacWinner: just dump and restore and you should be fine
[18:36:19] <magicantler> so for replica set 1: primary on server 1, secondary on server 2, and arbiter on server 3
[18:37:36] <m3t4lukas> magicantler: that I understood. Still it's unhappy. I'd either scratch sharding or replication. And I personally would scratch sharding until the application generates enough revenue to cover the needed servers
[18:37:59] <magicantler> m3t4lukas: i thought sharding basically required replication
[18:38:07] <m3t4lukas> in most cases you can say the higher the revenue generated the more data, the more connections
[18:38:23] <m3t4lukas> magicantler: nope, it doesn't
[18:38:41] <magicantler> ok but then it lacks redundancy
[18:38:47] <m3t4lukas> magicantler: it's only ever taught this way round because it's much more secure and it makes sense
[18:38:52] <magicantler> ok
[18:39:37] <m3t4lukas> I'd really really really not run multiple instances on one server in production. Just be smart and save yourself some headache
[18:40:57] <magicantler> i really have no choice.
[18:40:59] <magicantler> junior dev
[18:41:06] <magicantler> :/
[18:43:26] <magicantler> m3t4lukas: with that said, advice on startup on server reboot?
[18:43:42] <magicantler> and would virtualization matter?
[18:44:01] <m3t4lukas> virtualization would definitely matter
[18:44:18] <m3t4lukas> magicantler: just like you setup any other service on CentOS
[18:46:07] <magicantler> m3t4lukas: so is this where docker would come into play?
[18:47:49] <m3t4lukas> on virtualization? I don't know whether docker really provides the right kind of separation. As far as I know Docker is no "real" virtualization
[18:48:17] <magicantler> alright. know of any tools to help with this?
[18:50:14] <m3t4lukas> magicantler: maybe you should start out with basics of linux administration before touching setting up mongodb :/ There's a lot to be made wrong on the OS side concerning file systems and kernel setup. Don't you have admins in your company?
[18:50:55] <StephenLynx> docker is ass though
[18:51:38] <magicantler> m3t4lukas: unfortunately, we're looking to get a devops member right now. but in the meantime they want me to get this rolling
[18:52:13] <m3t4lukas> magicantler: I know it's not your fault. But if this database will be vital to the smooth everyday running of the company you are working for they should consider to have at least two experienced people working on this setup.
[18:52:47] <m3t4lukas> often times those with the money are poor on methodology, I know that
[19:03:33] <magicantler> m3t4lukas: well shit.
[19:05:16] <magicantler> m3t4lukas: i'm just going to have to suck it up and learn. tips on virtualization?
[19:09:24] <m3t4lukas> magicantler: I use OpenStack wit Virtualbox or KVM as a provider. But that's just personal preference.
[19:09:43] <m3t4lukas> You can use VMWare as well. Docker is very great for anything else :D
[23:39:27] <MacWinner> is there an easy way to see if any of my indexes are not being used?
[23:39:43] <MacWinner> assuming i don't have access to the code that is querying the db