[00:54:00] <jstream> i tried using mongodump and mongorestore to move an application and its database to another, soon-to-be production server. except surprisingly, the application on the production server is behaving as though there are multiple copies of the same obj stored in the db.
[00:54:41] <jstream> I put on a bandaid by checking to see if an obj had already been read before adding it, and that fixed the problem
[00:55:00] <jstream> but, i never encountered this problem on the original server
[00:56:10] <jstream> curiously, when I use mongo on the production server, i cant find any duplicate copies of objects -- at least by searching for _id's
[00:57:23] <jstream> I wonder if this might be happening because when i moved the database, i did NOT use the point-in-time option
[07:29:19] <anubhaw> if you don't want to follow the link then question is this. "Does performing a partial update on a MongoDb document in WiredTiger provide any advantage over a full document update?"
[08:32:31] <catphish> hi, i have just added a node to my replica set, but it seems to be writing to disk a lot more than any other nodes in the cluster, my primary is writing about 20 operations per second, the new secondary is writing about 500, is there any obvious reason for the disparity?
[08:32:56] <ranman> if you just added it is it potentially grabbing the data?
[08:33:08] <ranman> or did you restore it from a backup?
[10:00:43] <catphish> ah, i've not seen mongostat before
[10:03:27] <catphish> Derick: sorry, i guess you missed my original question, i have a primary and a new secondary, the primary is (afaik) handling all the clients, and is doing about 20 disk writes per second, the secondary (which is new, but has progressed to secondary state), should be doing nothing but replicating the primary, but is running hundreds of writes per second
[10:06:16] <Derick> what sort of updates are you doing?
[10:07:20] <catphish> the workload is mostly updates, i'll have to look at the queries
[10:09:03] <catphish> here's montostat http://paste.codebasehq.com/pastes/a8edlt6ia11irbjlnp and iostat http://paste.codebasehq.com/pastes/7j2bcz22u1i0ivrtdy
[13:08:47] <kees_> are there any settings to tune this behaviour? i can't really reconfigure the replicaset every time a host goes down and have my php app wait for 300ms to connect to a server not in the seed list
[13:09:35] <kees_> i know discovery and all can have advantages, but if it is killing my performance i'd rather have a less flexibel but more performant setup
[13:10:25] <cheeser> arguably, your cluster is the problem if you have nodes dying. fixing that will fix your perf problems.
[13:11:58] <kees_> it was a planned reboot this time, but i guess i'll just have to reconfigure the cluster next time before i reboot
[13:13:04] <cheeser> didn't the perf go back up when the nodes came online?
[13:13:55] <cheeser> there was at one point an SDAM daemon for drivers like the PHP driver that couldn't do the SDAM work in a thread. did that one escape the lab?
[13:16:37] <kees_> i guess i'll just need to rework my mongo architecture to include some sort of mongodb proxy so my (short lived) php apps can connect to localhost and dont have to check the entire cluster for almost every request
[13:17:40] <cheeser> you should consult the mongodb-users list or wait for Derick to notice.
[13:20:19] <Derick> kees_: for those apps, don't connect to the mongod as a replicaset then
[13:21:18] <kees_> hm, good point, i thought it was mandatory if you used a replicaset :)
[13:21:52] <Derick> no, just "mongodb://localhost:27017" should do
[13:22:07] <Derick> unless the cdriver does something clever
[13:22:20] <cheeser> you run the risk of connecting to a secondary that way
[16:34:14] <StephenLynx> then you should check how to insert the file in gridfs.
[16:34:20] <StephenLynx> that is, if you wish to store files there.
[16:34:42] <StephenLynx> after that you have to learn how to serve them
[18:41:56] <AlmightyOatmeal> i'm using Python to cache ElasticSearch data in MongoDB but i was wondering if there was a bulk operation that wouldn't throw an exception if a duplicate key exists but either silently ignore it (and continue processing the remainder of the batch) or update the existing object if it already exists?
[18:43:15] <StephenLynx> did you try bulkwrite? I don't remember if it would stop everything when hitting a duplicate unique index.
[18:43:53] <GothAlice> AlmightyOatmeal: If you want to insert, or update if existing, you want the "upsert" option.
[18:44:31] <GothAlice> If you're using bulk operations, https://docs.mongodb.com/manual/reference/method/Bulk.find.upsert/ or your driver's equivalent may be useful. Otherwise it's just an option to the .update() call.
[19:01:12] <AlmightyOatmeal> GothAlice: also, i like the nick :)
[19:23:33] <catphish> so, i don't understand memory mapped files very well, but it seems to me that mongo maps as much as possible, which rather upsets my free memory monitoring mechanism, is there some way to find out how much memory is mapped (but actually available to other processes)?
[19:25:20] <GothAlice> Your monitoring should be measuring RSS, not VSZ.
[19:25:51] <txm> Hi. AES encryption at rest, https://docs.mongodb.com/manual/core/security-encryption-at-rest/, can this be done with the open source version or enterprise only? I note “Available for the WiredTiger Storage Engine only.”
[19:26:37] <GothAlice> txm: Right above the warning mentioning it's available for WiredTiger is another warning: Available in MongoDB Enterprise only.
[19:36:34] <AlmightyOatmeal> StephenLynx: i didn't see your response earlier, my apologies. afaik all bulk/batch operations will fail if a duplicate key is found but there is a method that would allow me to essentially do update calls, instead of insert, but that would mean i would have to iterate over the list of data i want to send to build a new array of data i want to send which introduces additional overhead :(
[19:43:12] <StephenLynx> yeah, then you'd like to look at upsert, like alice said
[20:04:43] <Doyle> Doesn't this: "Use CNAMEs to identify your config servers to the cluster so that you can rename and renumber your config servers wi1thout downtime." conflict with: "If you start a mongos instance with a string that does not exactly match the string used by the other mongos instances in the cluster, the mongos instance returns a Config Database String Error error and refuses to start."?
[20:04:46] <Doyle> as seen here: https://docs.mongodb.com/v3.0/core/sharded-cluster-config-servers/, https://docs.mongodb.com/v3.0/tutorial/deploy-shard-cluster/
[20:11:59] <kuku1g> Is it possible to have a single node in two replica sets? for example, i want node1 to be in rs0 and rs1
[20:13:16] <kuku1g> @cheeser I have 5 nodes and want to split up my data evenly but have data only replicated once. How do achieve it then? I wanted to have 5 replica sets but this won't work then
[20:13:46] <cheeser> as in, the data should only be on one node?
[20:14:05] <kuku1g> No, sorry, I meant one replication of the data. So data is available twice
[20:17:19] <kuku1g> @cheeser When I create a replica set with 3 nodes and one with 2 nodes, data is only sharded two times. I want to benchmark Cassandra vs MongoDB for my usecase. I think the setup is not equal.
[20:17:36] <kuku1g> In my Cassandra ring, data is evenly distributed over the 5 nodes.
[20:18:17] <Doyle> Nevermind. I found the line I remembered reading. "These records make it possible to change the IP address or rename config servers without changing the connection string and without having to restart the entire cluster."
[20:25:40] <catphish> GothAlice: thanks, i was actually mistaken, my memory monitoring works fine, and only looks at rss :)
[20:27:45] <kuku1g> @cheeser or do i misunderstand the concept of a sharded collection? can I have one replica set of 5 nodes and one sharded collection that will distribute all data across the nodes evenly?
[20:28:21] <kuku1g> it doesnt work that way right? one replica set will cause all nodes in the RS to have the same data?
[20:33:06] <cheeser> replica sets and sharding are two different things, kuku1g
[20:35:45] <kuku1g> does that mean that I can have a collection sharded on my 5 nodes without needing the replica sets?
[20:36:28] <kuku1g> as in "the replica sets are just there to maintain availability in case of hardware failure"?
[20:40:49] <cheeser> a shard typically consists of a replica set.
[20:41:04] <kuku1g> I just want to benchmark. And benchmark the actual system as good as possible.
[20:41:14] <cheeser> so a sharded cluster with 5 shards will be at least 15 machines: 5 replica sets of 3 members each
[20:41:26] <kuku1g> When a replica set involves faster reads because data is deployed on more machines, this is not equivalent to my cassandra setup!
[20:41:42] <cheeser> you don't *have* to have a 3 member replSet but for production that's the lowest recommendation
[20:42:23] <kuku1g> I see. that's for production though. I want to benchmark the system performance for now
[20:43:22] <kuku1g> I think the best solution I can come up with is a sharded collection without a replica set for mongodb so data gets evenly distributed on the 5 nodes. as for cassandra, i would also use no replication.
[20:44:56] <kuku1g> do you have a rule of thumb on how much a replica set increases the read performance for mongodb?
[20:45:17] <kuku1g> like in "it does affect it a lot" or "not so much"
[20:48:00] <kuku1g> I know that it will not affect write performance because only the primary member will be used to upsert data
[20:58:57] <kuku1g> @cheeser the same approach has been used by Datastax for a YCSB benchmark here: http://www.datastax.com/wp-content/themes/datastax-2014-08/files/NoSQL_Benchmarks_EndPoint.pdf
[20:59:18] <kuku1g> So it looks pretty resonable. Thanks for your help.
[21:41:36] <Robbilie> i am running into a mongoerror the past week and i cant really identify the source of it
[21:42:01] <Robbilie> MongoError: operation exceeded time limit, code 50 inparticular
[21:42:30] <Robbilie> would be greate if anyone could advise me on finding the source of this error :)
[21:43:13] <Robbilie> running an oplog enabled 3.2 instance accessed via a nodejs application
[21:47:22] <crazyphil> if I have a sharded 2.6 setup, and I need to turn auth on (mongodb-cr specifically), do I need to set every mongod instance for --auth, or only my config servers, or everything?
[22:16:34] <Robbilie> this channel is so active yet so inactive
[22:20:37] <joannac> It's for questions, so if no one has questions about MongoDB then yes, it is quiet
[22:27:27] <Robbilie> joannac: me and the other guy asked questions though :D
[22:29:07] <joannac> Robbilie: operation exceeded time limit -> probably maxTimeMS
[22:29:53] <Robbilie> and i set it to 14 days and it still happened
[22:30:16] <joannac> crazyphil: need to turn on auth for everything. https://docs.mongodb.com/v3.2/tutorial/enforce-keyfile-access-control-in-existing-sharded-cluster/
[22:38:22] <joannac> open a ticket in https://groups.google.com/forum/#!forum/mongodb-user
[22:38:33] <joannac> I don't know and don't have time to dig further right now
[22:39:24] <joannac> but I haven't seen that message for anything other than maxTimeMS, so it's weird that you're getting it without setting maxTimeMS
[22:48:53] <Robbilie> joannac: well looking at mongos server source its mentioned with some deadline or replication stuff so maybe also related to that…
[23:01:20] <Robbilie> i posted a topic in the forum but i cant find it there wtf…