[00:26:53] <Doyle> What's the best method to backup a single sharded collection?
[00:27:15] <Doyle> Do you have to complete the entire backup sharded cluster with database dumps procedure?
[00:28:35] <Doyle> That document assumes a full backup, and so involves the config server. If you're only doing a single collection, would that be necessary?
[01:14:13] <bros> I have 4 collections related to billing in my production database. Are they negatively affecting anything by being in the same database and not a separate one?
[06:21:33] <Maziz> Hi all. Anybody know if https://github.com/mongodb/mongo-java-driver/blob/master/driver-core/src/main/com/mongodb/connection/WriteCommandResultHelper.java#L98 is returning what type of Bson? Array? Number?
[07:55:37] <webjay> Should I index an array of ObjectId's?
[08:29:59] <Bodenhaltung> It is possible to aggregate a regex query with $match? It doesn´t seems so.
[08:41:39] <jokke> kurushiyama: thanks for the insight. can you explain what exactly document migrations and padding means and what impact it had on the database schema?
[08:44:02] <jokke> kurushiyama: so to draw a conclusion: performance wise it's not necessary to pre-allocate any embedded documents anymore since documents can grow without causing performance issues?
[08:44:55] <jokke> kurushiyama: the only thing stopping people using documents per event for time series based data is the resulting huge index
[08:45:20] <kurushiyama> jokke: Not quite. Overembedding still will kill you due to its mechanics, not the least the CRUD problem. For time series data, not using a doc per event is a good way to get you into trouble.
[08:45:23] <jokke> so it still makes sense to group time series data into aggregation documents.
[09:22:39] <saira_123> hi professionals , Please guide me how MongoDB latency and throughput is related to document size ? i have done this chart but not able to find theory to support it. http://www.tiikoni.com/tis/view/?id=b0bc2fe i also captured mongostat during those times but it did not show high values of % dirty % used for wiredTiger storage engine
[10:19:48] <kurushiyama> saira_123: Latency for what? How was this measured? What was the write concern? This chart, by itself, says nothing.
[10:29:37] <kurushiyama> saira_123: And we are talking of microseconds. To put it into perspective: A memcpy alone on 256 bytes may take several microseconds.
[10:57:09] <Ange7> Hey all. I'm using Ubuntu Server LTS 14.04 with mongodb 3.2.5 and pecl package mongo 1.6.13 MongoDB database driver (for PHP). I try batchInsert() with 50.000 and options continueOnError = TRUE but i have exception : PHP Fatal error: Uncaught exception 'MongoDuplicateKeyException' E11000 duplicate key error collection. Someone know why ? Thank you
[11:13:26] <saira_123> kurushiyama bundle of thanks for reply, yes latency is in us ,
[11:13:42] <saira_123> kurushiyama and write concern was 0 only single node no replica
[11:14:51] <kurushiyama> saira_123: "A microsecond is an SI unit of time equal to one millionth (0.000001 or 10−6 or 1/1,000,000) of a second. Its symbol is μs." The latter is often written as us, no?
[11:20:09] <kurushiyama> saira_123: So we are basically talking of less than 1/100.000th of a second in your worst case. I'd guess the variance you see is directly related to the allocations necessary and fuzziness of the metering.
[11:29:34] <kurushiyama> saira_123: Pardon me, but "Premature optimization is the root of all evil." (Donald Knuth)
[11:30:27] <saira_123> kurushiyama here is the iostat , please help me find SEEK TIME ? https://bpaste.net/show/77fcb29163af
[11:30:42] <kurushiyama> saira_123: Wrong data models and wrong or missing indices are likely to cause much bigger problems that latencies of 900μs
[11:30:58] <kurushiyama> saira_123: Seek time is a physical limitation of the Hard drives
[11:31:26] <kurushiyama> saira_123: IOstat is far too far away from the hardware to be able to measure it.
[11:31:31] <saira_123> yes so how do i measure seek time at run time during Insert?
[11:31:41] <kurushiyama> saira_123: You _can not_.
[11:32:21] <saira_123> i used http://www.linuxinsight.com/how_fast_is_your_disk.html to measure seek time but it gives almost same result everytime
[11:32:56] <kurushiyama> saira_123: Again, you overdo it a bit. By far more important questions (and I am talking of 5-8 orders of magnitude) are correct data modelling, indexing and query optimization.
[11:34:46] <saira_123> kurushiyama :) how could simple insert be related to data modeling and optimization? i am not building any complex indexes , just default index
[11:35:25] <saira_123> kurushiyama i am measuring MongoDB performance using a benchmarking tool called YCSB (yahoo cloud serving benchmark)
[11:35:25] <kurushiyama> saira_123: Translated: a badly modelled, badly indexed collection queried with a wrongly crafted query may take seconds or (depending on the data size) minutes to complete, whereas properly modelled, well indexed collection queried with an optimized query can reach a higher 2 to lower 3 digit millisecond duration.
[11:36:10] <kurushiyama> saira_123: Well, first of all you would need to identify WHAT you want to benchmark. Inserts only are a little useless, no?
[11:36:33] <saira_123> kurushiyama yes after i understand insert then i will move to read
[11:40:29] <saira_123> ok but how will i measure seek time there vs throughput vs latency?
[11:41:22] <kurushiyama> Unless you have a few 100k to spare to set up a lab for that kind of stuff, it is very hard to measure seek times correctly.
[11:42:37] <saira_123> kurushiyama hey thanks for this blog i am sure this will help me a lot althought
[11:42:57] <saira_123> kurushiyama hey thanks for this blog i am sure this will help me a lot althought there are less technical details in it
[11:43:39] <kurushiyama> saira_123: Well, if you read closely, they are enough for _practical_ purposes.
[11:43:59] <saira_123> kurushiyama thanks i will try to reproduce this
[11:45:29] <kurushiyama> ashevchuk: The problem is the following: to measure the seek time you'd actually have to measure the time it takes for the heads to move to the correct position and the wavers spinning to the correct position.
[11:51:06] <saira_123> kurushiyama bundle of thanks for your help dear you saved my day
[11:54:20] <kurushiyama> saira_123: You are we welcome!
[12:00:03] <owg1> So far: http://i.imgur.com/Bc8MwIu.png
[12:00:24] <owg1> Was it you kurushiyama that predicted xfs would win?
[12:02:05] <kurushiyama> owg1: I predicted that, yes. Compared to zfs. I am a bit suprised with ext4, I have to admit. Can you disclose the test details? Most important, was atime turned off?
[12:03:02] <kurushiyama> owg1: Sample size etc would be very interesting, too. Plus the data, ofc.
[12:03:48] <owg1> kurushiyama: I'm still working on my methodology, so it is very much a WIP. I'm using all the default options for mongodb, just changing the filesystem.
[12:04:06] <kurushiyama> owg1: And did you switch the FS on sdc and sdd?
[12:41:54] <Derick> for more info about the new driver: http://mongodb.github.io/mongo-php-library/
[12:45:21] <Ange7> Hey all. I'm using Ubuntu Server LTS 14.04 with mongodb 3.2.5 and pecl package mongo 1.6.13 MongoDB database driver (for PHP). I try batchInsert() with 50.000 and options continueOnError = TRUE but i have exception : PHP Fatal error: Uncaught exception 'MongoDuplicateKeyException' E11000 duplicate key error collection. Someone know why ? Thank you
[12:46:23] <Derick> Ange7: you are setting the same _id for all documents...
[15:01:15] <jokke> kurushiyama: i just ran my own little db inserter and if i can see immediately that the index will be too big for 3 months worth of data
[15:02:06] <jokke> especially since it needs to be able to scale for more sensors
[15:02:20] <kurushiyama> jokke: You will needs to scale, anyway.
[15:02:33] <kurushiyama> jokke: There is _no_ way around that
[15:02:48] <jokke> sure but will the index be sharded as well?
[15:03:14] <kurushiyama> Indices live on the respective shard
[15:03:52] <kurushiyama> Operation flow: App => Mongos => shards holding the data requested.
[15:04:48] <jokke> ok so basically i can take the index size, find a divisor that gives me a index that will fit into ram and then the divisor gives me the amount of shards i need
[16:07:03] <cpama> just wondering how practical it is for retrieving data. most of the time, i will be using it just to populate drop down lists on a web page
[16:51:18] <oblio> so this morning two of my replica sets managed to do this: assertion 13436 not master or secondary; cannot currently read from this replSet member ns:config.settings query:{}#012
[16:53:17] <kurushiyama> oblio: Sounds like an infrastructure problem. DNS or network partitioning would be my first bet.
[16:53:48] <kurushiyama> oblio: can you pastebin their rs.status()?
[16:56:59] <oblio> kurushiyama: thanks, you gave me a hunch.. checking it out
[17:05:43] <DKFermi> hi folks - I have a flask/mongo application and want to use ObjectId - the code is in python2.7 and when I do pip install bson, it fails; however, https://api.mongodb.org/python/current/installation.html states that i shouldn't try to install the bson package as pymongo comes with it - can someone please clarify this for me?
[17:07:34] <kurushiyama> dddh_: Pfff. It uses steam. Muscle-powered, that's what I call legacy...
[17:09:23] <dddh_> kurushiyama: asked more servers at work and received 8 years old
[17:09:43] <kurushiyama> dddh_: As a doorstop, I assume ;)
[17:10:20] <kurushiyama> dddh_: footrest would a a diffrent application, depending on the form factor.
[17:13:54] <kurushiyama> dddh_: Getting serious again: I would not trust the harddrives any bit.
[17:14:58] <dddh_> kurushiyama: raid10 and replication
[17:16:03] <kurushiyama> dddh_: Well, at least sth. Make sure you get notified on failure. Chances are the drives will fail close to each other. Hope you have some spare.
[17:16:56] <oblio> kurushiyama: there is something funky going on w/ dns but i manually went in and manually populated hosts files for all the instances
[17:27:50] <kurushiyama> oblio: PMing you, that requires a more in depth discussion.
[18:57:47] <surge_> My config file options are specified in the old “key = value” format, so I’m wondering if I can still keep my config as such for enabling journaling in 3.2 for WiredTiger.
[18:58:21] <surge_> According to the docs, I have to write it in YAML as storage: journal: enabled: true. Can I still use the old syntax somehow?
[19:00:53] <surge_> Or am I finally being forced off this syntax? =P
[19:18:47] <cheeser> afaik, the old syntax is still supported.
[19:18:59] <cheeser> but the yaml format is much nicer to work with.
[19:33:27] <MattB> each salary is a different person
[19:33:32] <kurushiyama> Use a bulk operation instead
[19:33:57] <MattB> ok, help on that sorry this was easy in SQL but so many things are easier here LOL
[19:34:51] <kurushiyama> MattB: Actually, it would not be too easy with SQL because of the different salaries, if I am not mistaken here. SQL has been a while, and T-SQL... Read is somehwere ;)
[19:35:16] <kurushiyama> Ok, let me write a quick example.
[19:43:02] <kurushiyama> but in that case, I'd rather do db.collection.find({_id:csvrecord.ObjectId},{$set:{salary:csvrecord.NewSalary}}); db.collection…
[21:14:38] <kurushiyama> CustosLimen: Listen, then complain
[21:15:22] <kurushiyama> CustosLimen: The thing is that on db.yourcoll.remove({}), the key ranges for each shard should not change
[21:15:29] <CustosLimen> kurushiyama, so if I use one config server, then to balance about 2000 chunks amongst 3 shards takes about 10 minutes - where as with 3 config servers balancing for same takes > 1 hour
[21:18:23] <CustosLimen> kurushiyama, I created collection, stopped balancer, ran manual splits (to pre-split before loading data) and then I started balancer
[21:18:37] <kurushiyama> CustosLimen: Now you got me.
[21:18:41] <CustosLimen> kurushiyama, my collection is currently empty - this is before importinging data
[21:18:54] <CustosLimen> kurushiyama, the reason why the time concerns me is because I do this many times as part of testing
[21:20:14] <CustosLimen> kurushiyama, it has no problem except that is just longer than I would like it to take
[21:20:23] <kurushiyama> CustosLimen: Well, I once had a problem similar to this. It was time-related, as obvious as this might sound.
[21:20:23] <CustosLimen> but I think I will just use remove({}) this time
[21:20:28] <CustosLimen> I'm sure it cannot be this slow
[21:20:48] <CustosLimen> kurushiyama, you mean time was out of sync ?
[21:21:09] <kurushiyama> CustosLimen: Pretty painful for balancer locks.
[21:22:13] <CustosLimen> kurushiyama, one config server is using different NTP - but its in different city also
[21:22:18] <CustosLimen> and it uses the ntp in its city
[21:22:19] <kurushiyama> CustosLimen: Interestingly enough, the replica sets sort of worked, despite the fact they were getting ops from the future.
[21:22:32] <CustosLimen> both sites have ntp from GPS though
[21:22:54] <CustosLimen> they are atleast within 1 second of each other - but better than this I cannot really check
[21:23:59] <kurushiyama> CustosLimen: Well, I'd double check that they are not more than 1 sec different, and I'd probably set the to UTC, anyway. It is to be checked wether various TZs, especially for the mongos, may cause... ...interesting side effects.
[21:24:31] <CustosLimen> kurushiyama, they are in same timezone
[21:24:59] <kurushiyama> If the TZ matches, and they are within a second, that should work.
[21:28:44] <kurushiyama> Ok, from the start. There can be only one chunk migration at any given time, which should be rather fast, since mongos basically instructs the data transfer of nothing before updating the metadata.
[21:31:19] <kurushiyama> Hm. 3600/2k chunks would be like 1.8 secs/chunk 600/2k would be 0.3 secs/chunk. Even with some give and take, that would make a 3 config setup roughly half as fast as a single server config setup.
[21:31:51] <kurushiyama> Per config server, that is.
[21:33:12] <kurushiyama> CustosLimen: Do you have multiple mongos? And if yes, do they have the same order of config servers?
[21:37:15] <kurushiyama> CustosLimen: Hm, you could try a bulk remove. But I'd keep an eye on the config server setup. Something "feels" wrong there. I have it in the bones...
[22:00:50] <CustosLimen> can I somehow prevent some mongos from doing balancing ?
[22:02:43] <kurushiyama> CustosLimen: iirc, disabling the balancer is per mongos... Wait a sec, there was something.
[22:04:06] <CustosLimen> kurushiyama, sh.stopBalancer(); disables balancing for whole sharded cluster
[22:04:17] <CustosLimen> kurushiyama, it basically affects some value in config db
[22:04:24] <kurushiyama> CustosLimen: Yeah, was "any" balancer, had "every" in mind
[22:05:26] <CustosLimen> kurushiyama, I have some balancers running on servers that have application on - and they are just there cos c++ client does not have failover/ha support
[22:05:47] <CustosLimen> and I would prefer they dont do balancing
[22:06:06] <kurushiyama> CustosLimen: Well, you can always set a balancer window.
[22:13:50] <kurushiyama> CustosLimen: Curious. Didn't know that the C++ drivers do not support failover. Highly unusual. Which one are you using?