[10:05:31] <fish_> what options do I have if I realize my shard keys cardinality is too low? stop writes, dump db, stop it and remove data, then import again? or can I somehow live reshard it?
[10:10:48] <m3t4lukas> fish_: cardinality comes from your data itself. You'd need to choose another shard key when resharding. Unfortunately you cannot do that live as far as I know.
[10:11:49] <m3t4lukas> fish_: if having no downtime is very important you should consider developing some resharding plan that uses the oplog.
[10:47:18] <lipiec> hi, I am experiencing the following error after upgrading my sharded cluster to 3.2.1
[10:50:04] <fish_> m3t4lukas: you mean writing/using some tool that replicates data + oplog? I've looked into hydra for that but ran into several small glitches
[10:50:06] <LastCoder> but to do it using the thread safe increment way
[10:59:49] <fish_> in general, what could be possible downsides of using a hashed shard key based on _id? to me that seems like an option which should always work
[11:15:44] <LastCoder> how about the command to create user? I tried the one from the docs but I can't auth from home only connect
[12:29:46] <LastCoder> how do I do a multikey group sum?
[13:44:37] <sulaiman> Hello. Is it a good idea to use mongodb (or any similar nosql tech ) over any RDBMS if I will be generating lots of reports for analytics? I will have a lot of aggregations and groupings to perform
[13:45:38] <sulaiman> I have a very simple data to store, no more than 4 values per entry.
[13:47:03] <StephenLynx> as in, performing joins to mold the data to be output?
[13:51:56] <sulaiman> not joins, for e.g. let's say I am storing a person's name and the restaurant he visited. I will need to find the count of unique people that visited a restaurant for a given week/month or year, number of visitors for all restaurants, etc. I will only be storing the following data 1) time 2) person's name 3) restaurant name
[13:54:54] <StephenLynx> its common to pre-aggregate data.
[13:55:06] <StephenLynx> if you regularly needs to obtain information that is slow to aggregate all at once.
[13:55:36] <StephenLynx> lets say you have event X and you wish to know how many events X occurred in a day.
[13:56:01] <StephenLynx> instead of always aggregating from raw event information, you pre-aggregate a counter every time the event occurs.
[13:56:15] <StephenLynx> so when you want to look for a day, you just pick the pre-aggregated data for that day
[13:58:00] <StephenLynx> instead of looking for the whole counting of events, you just look for a specific day.
[15:44:31] <watmm> if i do a dump of a 2.4 db, restore and upgrade to 2.6, 3.0, then 3.2. Can i manually import the additional differences that were made to the 2.4 db into the 3.2, or do i have to do the whole thing in one pass?
[17:27:47] <MacWinner> i'm planning my mongo 3.0.8 to 3.2.1 upgrade.. i think I have most of the steps down where I wil basically shutdown mongo secondary, upgrade the binaries, update teh config to use wiredtiger, delete the data directory, and restart mongo.. then it should do a full sync. with teh primary.. i have one remaining question. if I delete the data directory, how does the secondary mongo know that it's part of the cluster? my /etc/mongod.conf only has a
[17:27:48] <MacWinner> cluster name configured. do I need to re-add the secondary node from the primary? or should I remove the node from the primary first?
[17:28:32] <MANCHUCK> MacWinner: the primary will try and contact the secondary
[17:29:39] <MANCHUCK> all you really need to do is launch mongod on the secondary with the --replSet <name> set and when the primary contacts it, it will go into STARTUP
[18:24:15] <magicantler> m3t4lukas: please point me to what you mean. as this will reflect the production setting. 3 shards spanned across 3 servers
[18:24:24] <m3t4lukas> magicantler: it's best to use one VM per mongod. When testing those VM's can be on the same host. Multiple mongod's fight for resources. You might argue that 3 is okay, but not more
[18:24:51] <magicantler> m3t4lukas: ok, well right now i was going to run 3 mongod instances on each, and limit the cacheram
[18:25:05] <m3t4lukas> magicantler: for 3 shards you'd need at least 10 servers. Otherwise sharding does not make sense
[18:25:20] <magicantler> m3t4lukas: each server holds a primary, secondary, and arbiter.
[18:25:40] <magicantler> and the machines are massive.
[18:25:46] <m3t4lukas> magicantler: in production you should only use sharding once you got enough data
[18:25:59] <cheeser> i don't know that I agree with *that*
[18:26:02] <m3t4lukas> you can always introduce sharding to a non sharded database
[18:26:13] <magicantler> gotcha, they want me to use sharding from the start however..
[18:26:20] <cheeser> if you know you're going to accumulate a lot of data, sharding early can prevent long delays in balancing.
[18:26:36] <cheeser> that said, you have to be *really* sure of your shard key because you can't change it.
[18:26:52] <magicantler> regardless, this paradigm should in theory still allow for more parallelized writes and reads
[18:27:05] <m3t4lukas> but you should really use 9 machines or more
[18:27:16] <magicantler> yeah i agree. unfortunately i don't have that as an option
[18:28:36] <m3t4lukas> that's unfortunate. If you really only have three machines then I'd shard later based on what I know about mongo and failsafeness. Maybe cheeser has another opinion on that. I'd trust cheeser more than me :D
[18:30:07] <m3t4lukas> the way I see it: sharding this early on is nice to have. But failsafeness by distibuting replica sets is kinda necessary
[18:31:34] <cheeser> there's little point to running more than one data bearing node on the same machine.
[18:32:32] <m3t4lukas> just an example: one of your shards hits OS config limits of open files, maximum inodes or whatever. The behavior becomes unpredictable and away's your data. And the possibility of that happening is very increased when you run three instances on one machine
[18:33:45] <cheeser> not to mention memory and disk space contention
[18:35:38] <magicantler> each machine has a primary, secondary, and arbiter
[18:35:45] <magicantler> all of which belongs to a different replica set
[18:35:46] <MacWinner> anyone here experience any major pain when upgrading to wiretiger? i have a pretty simple replicaset with 200G of data. just want to avoid any obvious gotchas
[18:36:10] <m3t4lukas> MacWinner: just dump and restore and you should be fine
[18:36:19] <magicantler> so for replica set 1: primary on server 1, secondary on server 2, and arbiter on server 3
[18:37:36] <m3t4lukas> magicantler: that I understood. Still it's unhappy. I'd either scratch sharding or replication. And I personally would scratch sharding until the application generates enough revenue to cover the needed servers
[18:37:59] <magicantler> m3t4lukas: i thought sharding basically required replication
[18:38:07] <m3t4lukas> in most cases you can say the higher the revenue generated the more data, the more connections
[18:38:23] <m3t4lukas> magicantler: nope, it doesn't
[18:38:41] <magicantler> ok but then it lacks redundancy
[18:38:47] <m3t4lukas> magicantler: it's only ever taught this way round because it's much more secure and it makes sense
[18:43:26] <magicantler> m3t4lukas: with that said, advice on startup on server reboot?
[18:43:42] <magicantler> and would virtualization matter?
[18:44:01] <m3t4lukas> virtualization would definitely matter
[18:44:18] <m3t4lukas> magicantler: just like you setup any other service on CentOS
[18:46:07] <magicantler> m3t4lukas: so is this where docker would come into play?
[18:47:49] <m3t4lukas> on virtualization? I don't know whether docker really provides the right kind of separation. As far as I know Docker is no "real" virtualization
[18:48:17] <magicantler> alright. know of any tools to help with this?
[18:50:14] <m3t4lukas> magicantler: maybe you should start out with basics of linux administration before touching setting up mongodb :/ There's a lot to be made wrong on the OS side concerning file systems and kernel setup. Don't you have admins in your company?
[18:51:38] <magicantler> m3t4lukas: unfortunately, we're looking to get a devops member right now. but in the meantime they want me to get this rolling
[18:52:13] <m3t4lukas> magicantler: I know it's not your fault. But if this database will be vital to the smooth everyday running of the company you are working for they should consider to have at least two experienced people working on this setup.
[18:52:47] <m3t4lukas> often times those with the money are poor on methodology, I know that