PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Thursday the 21st of April, 2016

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:26:53] <Doyle> What's the best method to backup a single sharded collection?
[00:27:15] <Doyle> Do you have to complete the entire backup sharded cluster with database dumps procedure?
[00:28:35] <Doyle> That document assumes a full backup, and so involves the config server. If you're only doing a single collection, would that be necessary?
[01:14:13] <bros> I have 4 collections related to billing in my production database. Are they negatively affecting anything by being in the same database and not a separate one?
[06:21:33] <Maziz> Hi all. Anybody know if https://github.com/mongodb/mongo-java-driver/blob/master/driver-core/src/main/com/mongodb/connection/WriteCommandResultHelper.java#L98 is returning what type of Bson? Array? Number?
[07:55:37] <webjay> Should I index an array of ObjectId's?
[08:29:59] <Bodenhaltung> It is possible to aggregate a regex query with $match? It doesn´t seems so.
[08:41:39] <jokke> kurushiyama: thanks for the insight. can you explain what exactly document migrations and padding means and what impact it had on the database schema?
[08:42:05] <kurushiyama> jokke: http://blog.mahlberg.io/blog/2015/11/05/data-modelling-for-mongodb/
[08:43:00] <jokke> kurushiyama: i see
[08:44:02] <jokke> kurushiyama: so to draw a conclusion: performance wise it's not necessary to pre-allocate any embedded documents anymore since documents can grow without causing performance issues?
[08:44:55] <jokke> kurushiyama: the only thing stopping people using documents per event for time series based data is the resulting huge index
[08:45:20] <kurushiyama> jokke: Not quite. Overembedding still will kill you due to its mechanics, not the least the CRUD problem. For time series data, not using a doc per event is a good way to get you into trouble.
[08:45:23] <jokke> so it still makes sense to group time series data into aggregation documents.
[08:45:29] <kurushiyama> NEVER
[08:45:34] <jokke> ?
[08:45:47] <kurushiyama> ALWAYS document a point in time.
[08:45:59] <kurushiyama> If you need an aggregation, use the aggregation framework
[08:46:29] <kurushiyama> jokke: If you have time in like 45, I have to finish sth
[08:46:34] <jokke> kurushiyama: that's really a very bad option in our case since we would get something like 3600 documents per second that way
[08:46:40] <jokke> kurushiyama: sure
[08:46:58] <kurushiyama> jokke: Better than embedding 3600 docs / s
[08:47:03] <kurushiyama> jokke: later
[08:47:09] <jokke> ok later
[09:22:39] <saira_123> hi professionals , Please guide me how MongoDB latency and throughput is related to document size ? i have done this chart but not able to find theory to support it. http://www.tiikoni.com/tis/view/?id=b0bc2fe i also captured mongostat during those times but it did not show high values of % dirty % used for wiredTiger storage engine
[10:06:36] <kurushiyama> jokke: Back
[10:19:48] <kurushiyama> saira_123: Latency for what? How was this measured? What was the write concern? This chart, by itself, says nothing.
[10:29:37] <kurushiyama> saira_123: And we are talking of microseconds. To put it into perspective: A memcpy alone on 256 bytes may take several microseconds.
[10:57:09] <Ange7> Hey all. I'm using Ubuntu Server LTS 14.04 with mongodb 3.2.5 and pecl package mongo 1.6.13 MongoDB database driver (for PHP). I try batchInsert() with 50.000 and options continueOnError = TRUE but i have exception : PHP Fatal error: Uncaught exception 'MongoDuplicateKeyException' E11000 duplicate key error collection. Someone know why ? Thank you
[11:00:34] <Derick> Ange7: share the script
[11:13:26] <saira_123> kurushiyama bundle of thanks for reply, yes latency is in us ,
[11:13:42] <saira_123> kurushiyama and write concern was 0 only single node no replica
[11:14:51] <kurushiyama> saira_123: "A microsecond is an SI unit of time equal to one millionth (0.000001 or 10−6 or 1/1,000,000) of a second. Its symbol is μs." The latter is often written as us, no?
[11:15:24] <Derick> i'd use µs myself
[11:15:48] <kurushiyama> Derick: me too, but what else would that suppose to be?
[11:15:58] <Derick> a millionth of a second
[11:16:00] <Derick> nothing else
[11:20:09] <kurushiyama> saira_123: So we are basically talking of less than 1/100.000th of a second in your worst case. I'd guess the variance you see is directly related to the allocations necessary and fuzziness of the metering.
[11:20:38] <saira_123> yes kurushiyama
[11:20:49] <saira_123> kurushiyama yes ia m talking about microsecond
[11:21:32] <saira_123> kurushiyama i can totally understand why latency will increase by increase in throughput
[11:22:19] <saira_123> kurushiyama but i do not have any theoretical knowledge to relate recordsize (individual document size) to latency/throughput
[11:22:56] <saira_123> kurushiyama what is interesting is that at the time throughput is highest ,disk is still not 100 percent utilized
[11:23:19] <saira_123> kurushiyama i do not understand why MongoDB is not able to fully use disk write per sec
[11:23:41] <kurushiyama> saira_123: What?
[11:24:06] <saira_123> kurushiyama can you point me to some literature / document which corelates disk pefromance with mongodb perfromance?
[11:26:42] <kurushiyama> saira_123: Use SSD. persiod. Seek times of HDDs will kill you.
[11:27:31] <kurushiyama> saira_123: https://blog.serverdensity.com/mongodb-performance-ssds-vs-spindle-sas-drives/
[11:27:45] <saira_123> kurushiyama Please correct me isn't the Seek time only reading from Disk ?
[11:27:52] <kurushiyama> saira_123: No
[11:28:02] <saira_123> kurushiyama i do not have SSD i only have SATA, how do i measure Seek time ?
[11:28:11] <kurushiyama> saira_123: Documents need to be placed in a defined position in the data files.
[11:28:25] <kurushiyama> saira_123: And documents are _never_ fragmented.
[11:28:36] <saira_123> kurushiyama i am using iostat command to measure disk performance but it does not include seek time ?
[11:28:49] <kurushiyama> saira_123: How could it.
[11:29:34] <kurushiyama> saira_123: Pardon me, but "Premature optimization is the root of all evil." (Donald Knuth)
[11:30:27] <saira_123> kurushiyama here is the iostat , please help me find SEEK TIME ? https://bpaste.net/show/77fcb29163af
[11:30:42] <kurushiyama> saira_123: Wrong data models and wrong or missing indices are likely to cause much bigger problems that latencies of 900μs
[11:30:58] <kurushiyama> saira_123: Seek time is a physical limitation of the Hard drives
[11:31:26] <kurushiyama> saira_123: IOstat is far too far away from the hardware to be able to measure it.
[11:31:31] <saira_123> yes so how do i measure seek time at run time during Insert?
[11:31:41] <kurushiyama> saira_123: You _can not_.
[11:32:21] <saira_123> i used http://www.linuxinsight.com/how_fast_is_your_disk.html to measure seek time but it gives almost same result everytime
[11:32:56] <kurushiyama> saira_123: Again, you overdo it a bit. By far more important questions (and I am talking of 5-8 orders of magnitude) are correct data modelling, indexing and query optimization.
[11:34:46] <saira_123> kurushiyama :) how could simple insert be related to data modeling and optimization? i am not building any complex indexes , just default index
[11:35:25] <saira_123> kurushiyama i am measuring MongoDB performance using a benchmarking tool called YCSB (yahoo cloud serving benchmark)
[11:35:25] <kurushiyama> saira_123: Translated: a badly modelled, badly indexed collection queried with a wrongly crafted query may take seconds or (depending on the data size) minutes to complete, whereas properly modelled, well indexed collection queried with an optimized query can reach a higher 2 to lower 3 digit millisecond duration.
[11:36:10] <kurushiyama> saira_123: Well, first of all you would need to identify WHAT you want to benchmark. Inserts only are a little useless, no?
[11:36:33] <saira_123> kurushiyama yes after i understand insert then i will move to read
[11:36:44] <kurushiyama> saira_123: See.
[11:36:55] <saira_123> kurushiyama i want to benchmark throughput vs seek time
[11:36:59] <saira_123> kurushiyama i want to benchmark throughput vs seek time vs latency
[11:37:04] <kurushiyama> How much money do you have?
[11:37:26] <saira_123> kurushiyama i am a student :) so relying on test platforms
[11:37:27] <kurushiyama> Hint: you'll need _a lot_.
[11:37:41] <saira_123> kurushiyama :D
[11:38:08] <kurushiyama> saira_123: And your test setup is flawed, imho
[11:38:29] <kurushiyama> saira_123: Seek time is irrelevant for performance _measuring_.
[11:38:38] <saira_123> my test setup is single node with 24 GB RAM
[11:38:59] <kurushiyama> saira_123: It is important for the reasons _why_ HDDs are worse than SSDs.
[11:39:11] <kurushiyama> saira_123: I am not talking of your HW
[11:39:22] <kurushiyama> saira_123: I am talking of what you measure and why.
[11:39:36] <saira_123> so should i measure same results on SSD?
[11:39:49] <kurushiyama> Aye.
[11:40:29] <saira_123> ok but how will i measure seek time there vs throughput vs latency?
[11:41:22] <kurushiyama> Unless you have a few 100k to spare to set up a lab for that kind of stuff, it is very hard to measure seek times correctly.
[11:42:37] <saira_123> kurushiyama hey thanks for this blog i am sure this will help me a lot althought
[11:42:57] <saira_123> kurushiyama hey thanks for this blog i am sure this will help me a lot althought there are less technical details in it
[11:43:39] <kurushiyama> saira_123: Well, if you read closely, they are enough for _practical_ purposes.
[11:43:59] <saira_123> kurushiyama thanks i will try to reproduce this
[11:45:29] <kurushiyama> ashevchuk: The problem is the following: to measure the seek time you'd actually have to measure the time it takes for the heads to move to the correct position and the wavers spinning to the correct position.
[11:45:43] <kurushiyama> ashevchuk: Sorry, wrong adressee
[11:45:47] <kurushiyama> saira_123: ^
[11:51:06] <saira_123> kurushiyama bundle of thanks for your help dear you saved my day
[11:54:20] <kurushiyama> saira_123: You are we welcome!
[12:00:03] <owg1> So far: http://i.imgur.com/Bc8MwIu.png
[12:00:24] <owg1> Was it you kurushiyama that predicted xfs would win?
[12:02:05] <kurushiyama> owg1: I predicted that, yes. Compared to zfs. I am a bit suprised with ext4, I have to admit. Can you disclose the test details? Most important, was atime turned off?
[12:03:02] <kurushiyama> owg1: Sample size etc would be very interesting, too. Plus the data, ofc.
[12:03:48] <owg1> kurushiyama: I'm still working on my methodology, so it is very much a WIP. I'm using all the default options for mongodb, just changing the filesystem.
[12:04:06] <kurushiyama> owg1: And did you switch the FS on sdc and sdd?
[12:04:28] <owg1> https://docs.google.com/a/clock.co.uk/spreadsheets/d/1hb3gfmbXXenkHIHhf3TUKelWAR12f1pMyG-Wy-gKXkc/edit?usp=sharing
[12:04:32] <kurushiyama> owg1: To exclude the potential hardware factor
[12:04:44] <owg1> You can view my progress here
[12:05:04] <owg1> One of the big caveats is this is running in a virtual linode instance
[12:05:19] <kurushiyama> owg1: The first thing I'd do is to use ext4 on sdc and xfs on sdd.
[12:05:59] <kurushiyama> owg1: Requested permissions
[12:07:58] <owg1> My boss created the sheet, so he has to give permissions
[12:08:24] <kurushiyama> Darn, he'll probably report me...
[12:08:41] <kurushiyama> owg1: Please let him know that I am ... oh, he accpeted me..
[12:09:53] <kurushiyama> owg1: We should continue there.
[12:29:59] <cpama> hello there. i have a question ... outlined here: http://pastebin.com/M1ExXhis
[12:30:19] <cpama> trying to query some data from mongodb and control what comes back / how i format it
[12:30:24] <Ange7> Derick: http://pastebin.com/uj9rt94m
[12:31:02] <StephenLynx> cpama that's projection that you want.
[12:31:11] <StephenLynx> I don`t know how to use it on the driver you are using, though.
[12:36:02] <cpama> StephenLynx, ok.
[12:36:04] <cpama> i will google that
[12:36:08] <cpama> i'm using the php driver..
[12:36:45] <Derick> cpama: you can use a projection as second argument to find().
[12:37:00] <Derick> cpama: are you using the new driver (mongodb)? or old legacy one?
[12:37:04] <cpama> i just found this: http://stackoverflow.com/questions/15996394/query-with-projection-in-mongodb-php-syntax
[12:37:11] <cpama> Derick, checking... one sec...
[12:37:13] <Derick> the old one it seems
[12:37:20] <Derick> if this is a new project, please use the new one
[12:38:00] <cpama> Derick, i thought i could see from phpinfo but i guess now
[12:38:02] <cpama> *not
[12:38:05] <cpama> how can i check?
[12:38:11] <Derick> you can see it in phpinfo
[12:38:31] <Derick> what's the extension name and version number?
[12:38:33] <cpama> Version 1.6.13
[12:38:37] <Derick> so the old one
[12:38:46] <cpama> i see.
[12:39:02] <Derick> cpama: are you just starting this project?
[12:39:07] <cpama> um.
[12:39:11] <cpama> kinda.
[12:39:15] <cpama> here's the thing
[12:39:18] <cpama> i found this example:
[12:39:54] <cpama> hm. of course, I can't find it now.
[12:40:09] <cpama> but bascially, i found some sample code that allows me to use mongodb with codeigniter framework
[12:40:30] <Derick> not sure whether that supports the new driver yet. Ask them?
[12:40:55] <cpama> well, it doesn't support any version of mongodb natively perse... but someone wrote driver.
[12:41:18] <Derick> I'd reach out and ask what their plans are for supporting the new driver.
[12:41:26] <Derick> In any case, docs for your find are at http://docs.php.net/manual/en/mongocollection.find.php
[12:41:31] <cpama> ok.
[12:41:35] <cpama> thanks Derick!
[12:41:54] <Derick> for more info about the new driver: http://mongodb.github.io/mongo-php-library/
[12:45:21] <Ange7> Hey all. I'm using Ubuntu Server LTS 14.04 with mongodb 3.2.5 and pecl package mongo 1.6.13 MongoDB database driver (for PHP). I try batchInsert() with 50.000 and options continueOnError = TRUE but i have exception : PHP Fatal error: Uncaught exception 'MongoDuplicateKeyException' E11000 duplicate key error collection. Someone know why ? Thank you
[12:46:23] <Derick> Ange7: you are setting the same _id for all documents...
[12:47:52] <Ange7> i checked and no.
[12:48:08] <Ange7> All documents in my $toInsert have an different _id
[12:48:22] <Derick> $doc['_id'] = md5($string);
[12:48:29] <Derick> the $string isn't changed in the loop
[12:48:33] <Derick> http://pastebin.com/uj9rt94m
[12:48:36] <Derick> is what you pasted
[12:59:28] <Ange7> Derick: ok. found. thx
[14:10:09] <Bodenhaltung> :D
[15:01:15] <jokke> kurushiyama: i just ran my own little db inserter and if i can see immediately that the index will be too big for 3 months worth of data
[15:02:06] <jokke> especially since it needs to be able to scale for more sensors
[15:02:20] <kurushiyama> jokke: You will needs to scale, anyway.
[15:02:33] <kurushiyama> jokke: There is _no_ way around that
[15:02:48] <jokke> sure but will the index be sharded as well?
[15:03:08] <kurushiyama> Yes
[15:03:13] <jokke> ah i see
[15:03:14] <kurushiyama> Indices live on the respective shard
[15:03:52] <kurushiyama> Operation flow: App => Mongos => shards holding the data requested.
[15:04:48] <jokke> ok so basically i can take the index size, find a divisor that gives me a index that will fit into ram and then the divisor gives me the amount of shards i need
[15:04:57] <jokke> *divider
[15:05:26] <jokke> somewhat correct?
[15:05:33] <kurushiyama> jokke: If it was that easy, I'd starve
[15:05:43] <jokke> of course...
[15:05:45] <jokke> :)
[16:02:30] <cpama> i have a design question about creating reference data / tables in mongo db
[16:02:58] <cpama> i was thinking of creating one collection called "reference" and then sticking things in it like "countries", "rooms" , "building"
[16:03:20] <cpama> each document in the collection would be a json object, with nested info
[16:04:28] <cpama> something like this:
[16:06:20] <cpama> http://pastebin.com/aFWiZ3Lq
[16:07:03] <cpama> just wondering how practical it is for retrieving data. most of the time, i will be using it just to populate drop down lists on a web page
[16:07:22] <cpama> argh!! lunch time <afk>
[16:51:18] <oblio> so this morning two of my replica sets managed to do this: assertion 13436 not master or secondary; cannot currently read from this replSet member ns:config.settings query:{}#012
[16:53:17] <kurushiyama> oblio: Sounds like an infrastructure problem. DNS or network partitioning would be my first bet.
[16:53:48] <kurushiyama> oblio: can you pastebin their rs.status()?
[16:56:59] <oblio> kurushiyama: thanks, you gave me a hunch.. checking it out
[16:57:04] <oblio> may be firewall blocking dns
[17:05:38] <dddh_> http://devopsreactions.tumblr.com/post/143105267795/starting-an-old-legacy-server
[17:05:43] <DKFermi> hi folks - I have a flask/mongo application and want to use ObjectId - the code is in python2.7 and when I do pip install bson, it fails; however, https://api.mongodb.org/python/current/installation.html states that i shouldn't try to install the bson package as pymongo comes with it - can someone please clarify this for me?
[17:07:34] <kurushiyama> dddh_: Pfff. It uses steam. Muscle-powered, that's what I call legacy...
[17:09:23] <dddh_> kurushiyama: asked more servers at work and received 8 years old
[17:09:43] <kurushiyama> dddh_: As a doorstop, I assume ;)
[17:10:20] <kurushiyama> dddh_: footrest would a a diffrent application, depending on the form factor.
[17:13:54] <kurushiyama> dddh_: Getting serious again: I would not trust the harddrives any bit.
[17:14:58] <dddh_> kurushiyama: raid10 and replication
[17:16:03] <kurushiyama> dddh_: Well, at least sth. Make sure you get notified on failure. Chances are the drives will fail close to each other. Hope you have some spare.
[17:16:32] <oblio> kurushiyama: http://pastebin.com/wvavvKb8
[17:16:56] <oblio> kurushiyama: there is something funky going on w/ dns but i manually went in and manually populated hosts files for all the instances
[17:17:05] <oblio> and still no-go
[17:17:14] <oblio> i presume how the host identifies w/ as hostname needs to resolve
[17:17:19] <kurushiyama> Wait
[17:17:20] <oblio> *presumed
[17:17:35] <kurushiyama> oblio: I think it is more serious than that.
[17:17:39] <oblio> heh
[17:18:26] <kurushiyama> oblio: You did add that node to your replica set via rs.add(...), I assume?
[17:18:57] <oblio> so i have ue1db1, ue1db2, ue1db3.. ue1db3 is reporting up but it has a low priority and db1 and db2 are in error
[17:19:07] <oblio> kurushiyama: when it was initially setup? i believe so
[17:19:50] <oblio> at this moment, no, i have not tried re-adding it w rs.add
[17:20:18] <kurushiyama> oblio: Can you pastebin the output of one of the "pair" of machines, again?
[17:20:59] <oblio> kurushiyama: the log output?
[17:21:05] <kurushiyama> rs.status()
[17:21:09] <kurushiyama> oblio: ^
[17:22:35] <oblio> kurushiyama: http://pastebin.com/zKNpYPUh
[17:23:01] <oblio> ue1db3 has the same status
[17:23:07] <kurushiyama> oblio: careful now.
[17:23:17] <kurushiyama> oblio: Do you have prod data on these machines.
[17:23:20] <oblio> yep
[17:24:08] <kurushiyama> oblio: My suggestion: create a backup by shutting down all instances and making a copy of the datafiles. Now.
[17:25:03] <oblio> and recreating the cluster w/ one of the sets?
[17:25:54] <kurushiyama> oblio: First, we should find out why the heck the replica set config just vanished.
[17:26:17] <kurushiyama> oblio: What version do you use?
[17:26:30] <oblio> 3.0.6
[17:27:50] <kurushiyama> oblio: PMing you, that requires a more in depth discussion.
[18:57:47] <surge_> My config file options are specified in the old “key = value” format, so I’m wondering if I can still keep my config as such for enabling journaling in 3.2 for WiredTiger.
[18:58:21] <surge_> According to the docs, I have to write it in YAML as storage: journal: enabled: true. Can I still use the old syntax somehow?
[19:00:53] <surge_> Or am I finally being forced off this syntax? =P
[19:18:47] <cheeser> afaik, the old syntax is still supported.
[19:18:59] <cheeser> but the yaml format is much nicer to work with.
[19:27:12] <MattB> hello
[19:27:15] <MattB> anybody around?
[19:27:31] <MattB> Need somehelp on a Update
[19:28:12] <kurushiyama> Here
[19:28:26] <MattB> So I am use to T-SQL
[19:28:29] <kurushiyama> OS, Version, version of current Mongo, version of target Mongo.
[19:28:48] <kurushiyama> MattB: Ah, you are talking of a .update()
[19:28:52] <MattB> yea
[19:28:58] <MattB> ok so here is the deal
[19:29:19] <MattB> I am used to t sql where I could write a bunch of UPDATE, SET WHERE's
[19:29:22] <MattB> and then run them
[19:29:27] <MattB> I am struggling do that here
[19:29:34] <MattB> I got a huge collection with 382 fields
[19:29:53] <MattB> I only need to update like 750 records but only 1 field (Salary)
[19:30:09] <kurushiyama> Ok
[19:30:23] <MattB> I want to say basically in T-SQL SET Salary = XXX where ObjectID = XXX
[19:30:23] <kurushiyama> What are the conditions?
[19:30:31] <kurushiyama> Uhm
[19:30:41] <kurushiyama> Ok.
[19:30:47] <MattB> I got their exactly object id's
[19:30:55] <kurushiyama> you have multiple fields with ObjectIds?
[19:31:06] <MattB> no just one field with an object id
[19:31:09] <MattB> so it is unique
[19:31:12] <kurushiyama> So _id
[19:31:20] <MattB> yes
[19:31:40] <MattB> However I got a huge CSV list in excel with all 750 records
[19:31:47] <kurushiyama> db.collection.update({_id:yourObjectId},{$set:{salary:X}})
[19:31:47] <MattB> it has their object id's and their new salaries
[19:31:55] <MattB> ok
[19:32:05] <kurushiyama> BUT
[19:32:11] <MattB> can I do like 750 those seperated by commas?
[19:32:17] <kurushiyama> MattB: No
[19:32:21] <MattB> :(
[19:32:31] <kurushiyama> Well, you could use a $in
[19:32:41] <kurushiyama> but only if those get the same salary
[19:32:50] <MattB> can you show me structure with like 3 and I can expand it out to 750
[19:33:06] <kurushiyama> MattB: DO they get the same salary?
[19:33:16] <MattB> no
[19:33:22] <kurushiyama> Then this wont work
[19:33:27] <MattB> each salary is a different person
[19:33:32] <kurushiyama> Use a bulk operation instead
[19:33:57] <MattB> ok, help on that sorry this was easy in SQL but so many things are easier here LOL
[19:34:51] <kurushiyama> MattB: Actually, it would not be too easy with SQL because of the different salaries, if I am not mistaken here. SQL has been a while, and T-SQL... Read is somehwere ;)
[19:35:16] <kurushiyama> Ok, let me write a quick example.
[19:37:33] <MattB> thanks so much
[19:37:41] <kurushiyama> MattB: Oh, what language do you use?
[19:38:01] <MattB> I am doing it write from Mongochef
[19:38:06] <MattB> so terminal
[19:38:17] <MattB> JS would work
[19:38:46] <kurushiyama> Uhm. not sure how to get the CSV data there. I am putting some shell-ish code for you to understand.
[19:39:33] <MattB> ok thanks
[19:39:59] <kurushiyama> MattB: http://pastebin.com/hdd1JAPS Does this make sense for you?
[19:41:00] <MattB> yes I think
[19:41:05] <MattB> so I can take bulk.find({_id:csvrecord.ObjectId}).update({"$set":{salary:csvrecord.salary}})
[19:41:17] <MattB> and do as many of those as i want seperated by commas correct
[19:41:45] <kurushiyama> If you do it manually, yes.
[19:41:53] <kurushiyama> semicolon that would be.
[19:42:05] <MattB> yes
[19:42:10] <MattB> money thanks so much
[19:43:02] <kurushiyama> but in that case, I'd rather do db.collection.find({_id:csvrecord.ObjectId},{$set:{salary:csvrecord.NewSalary}}); db.collection…
[19:44:05] <kurushiyama> MattB: You are welcome.
[19:48:02] <MattB> http://s31.postimg.org/s003pdruj/help.png
[19:48:10] <MattB> kurushiyama
[19:48:18] <MattB> that is where i am at I got a few errors
[19:48:26] <MattB> I know I forgot to remove semicolon at end
[19:48:39] <MattB> but what else am i doing wrong
[19:49:50] <MattB> http://s31.postimg.org/lpndm5wsb/help2.png
[19:50:07] <MattB> kurushiyama above are the errors
[19:51:12] <kurushiyama> MattB: you do not need the foreach loop, that was pseudo.
[19:51:28] <MattB> ?
[19:51:51] <kurushiyama> I wanted to explain by this that you need to iterate over the csv lines
[19:51:51] <MattB> sorry man I am not a pro at this
[19:52:03] <kurushiyama> remove the foreach loop
[19:52:33] <kurushiyama> wait a sec
[19:52:53] <kurushiyama> lines 3 and 7
[19:53:22] <kurushiyama> And there is a ")" missing on each line, just before the update
[20:07:49] <MattB> kursh
[20:07:58] <MattB> can you give me another sample please :)
[20:09:11] <MattB> I woudl sure appreciate it
[20:10:09] <MattB> http://pastebin.com/pteKRaR2
[20:10:16] <MattB> I got it like this but still getting errors
[20:12:28] <MattB> hey kurushiyama you stil around
[20:14:05] <kurushiyama> MattB: http://pastebin.com/rWe0jWdh
[20:14:44] <MattB> error at line 2 and 3 postion 90: ".
[20:14:49] <MattB> thats it
[20:15:11] <kurushiyama> MattB: I can not really debug _your_ stuff. You need to read a bit on that ;)
[20:15:20] <MattB> ok
[20:15:29] <MattB> thanks
[20:16:53] <kurushiyama> http://pastebin.com/jtxAEixA
[20:17:01] <kurushiyama> MattB: ^
[20:17:09] <MattB> i found it sorry
[20:17:12] <MattB> Thanks so mach man
[20:18:03] <MattB> damm it set it to a double and not an int
[20:18:20] <kurushiyama> MattB: You are welcome. It is just that I am preparng dinner at 10pm ;)
[20:18:41] <MattB> yeah it is setting it as a dloat and not an int
[20:18:45] <MattB> thats it i promise?
[20:18:56] <MattB> NumberInt(0)
[20:18:57] <MattB> right?
[20:19:37] <kurushiyama> Could well be, but I am not sure it matters ;)
[20:20:14] <kurushiyama> For all practical purposes of querying 1.00 == 1
[20:20:27] <kurushiyama> Though not === 1, ofc.
[20:22:26] <surge_> cheeser: definitely is, but I was being lazy. Converted it already :)
[20:40:04] <saml> if I want to sum up a number field, do I use aggregate?
[20:40:46] <saml> db.videos.aggregate({$group:{_id:1, bytes: '$bytes'}}) this works but I want to display in GB by doing some division
[20:40:54] <saml> during aggregate
[20:41:57] <saml> aht ereh's $divide
[21:04:38] <CustosLimen> hi
[21:05:19] <CustosLimen> so I'm doing some testing
[21:05:28] <CustosLimen> and pre-splitting a database into chunks takes allot of time
[21:05:37] <CustosLimen> and when I drop the database that is all gone
[21:08:15] <kurushiyama> CustosLimen: Well, dropped is dropped.
[21:08:30] <CustosLimen> kurushiyama, I want the data gone - but nothing else
[21:08:37] <CustosLimen> and remove({}); is so slow ;(
[21:08:53] <CustosLimen> its ok - I'm going to move to single config server
[21:08:55] <kurushiyama> CustosLimen: You can not have the cake and eat it.
[21:09:09] <kurushiyama> CustosLimen: That's most likely not the point.
[21:09:22] <CustosLimen> kurushiyama, its allot faster
[21:09:42] <CustosLimen> and I'm just testing
[21:10:40] <CustosLimen> ugh no
[21:10:46] <CustosLimen> this is stupid ;(
[21:10:47] <kurushiyama> CustosLimen: Well, most likely there is sth wrong with your conf servers, then. Can you try to disable the balancer?
[21:11:00] <CustosLimen> kurushiyama, the balancer is slow
[21:11:06] <kurushiyama> Hence
[21:11:22] <CustosLimen> ok - but if I stop it then it wont be slow, it will just do nothing
[21:11:28] <kurushiyama> But actually the metadata is only for shard _ranges_
[21:11:47] <kurushiyama> s/shard/shard key/
[21:12:15] <CustosLimen> kurushiyama, its disabled
[21:12:36] <kurushiyama> Depending on your data size when you delete, and enough chunks get emptied... ...just an idea.
[21:12:53] <CustosLimen> ehrm, wait I'm not following
[21:13:04] <CustosLimen> the only thing that is slower than normal is balancer
[21:13:05] <kurushiyama> CustosLimen: Thinking with my fingers.
[21:13:19] <CustosLimen> remove({}); is slower than drop - is mainly the issue
[21:13:28] <CustosLimen> moving to single config server makes balancer quicker
[21:13:28] <kurushiyama> CustosLimen: You have 3 single or a replset of configs
[21:13:41] <CustosLimen> kurushiyama, mongodb 3.0 - so 3 config servers
[21:13:51] <kurushiyama> CustosLimen: Hard to believe.
[21:14:14] <kurushiyama> CustosLimen: The thing is that on remove, the key ranges should not change.
[21:14:28] <CustosLimen> kurushiyama, the remove has nothign to do with the slowness with balancer
[21:14:34] <CustosLimen> kurushiyama, the balancer is slow at moving chunks
[21:14:37] <CustosLimen> and they are empty
[21:14:38] <kurushiyama> CustosLimen: Listen, then complain
[21:15:22] <kurushiyama> CustosLimen: The thing is that on db.yourcoll.remove({}), the key ranges for each shard should not change
[21:15:29] <CustosLimen> kurushiyama, so if I use one config server, then to balance about 2000 chunks amongst 3 shards takes about 10 minutes - where as with 3 config servers balancing for same takes > 1 hour
[21:15:32] <CustosLimen> kurushiyama, I know
[21:17:13] <kurushiyama> CustosLimen: So, you are talking of chunk splits taking long?
[21:17:21] <kurushiyama> presplitting, that is?
[21:18:23] <CustosLimen> kurushiyama, I created collection, stopped balancer, ran manual splits (to pre-split before loading data) and then I started balancer
[21:18:37] <kurushiyama> CustosLimen: Now you got me.
[21:18:41] <CustosLimen> kurushiyama, my collection is currently empty - this is before importinging data
[21:18:54] <CustosLimen> kurushiyama, the reason why the time concerns me is because I do this many times as part of testing
[21:20:14] <CustosLimen> kurushiyama, it has no problem except that is just longer than I would like it to take
[21:20:23] <kurushiyama> CustosLimen: Well, I once had a problem similar to this. It was time-related, as obvious as this might sound.
[21:20:23] <CustosLimen> but I think I will just use remove({}) this time
[21:20:28] <CustosLimen> I'm sure it cannot be this slow
[21:20:48] <CustosLimen> kurushiyama, you mean time was out of sync ?
[21:20:56] <kurushiyama> CustosLimen: Aye
[21:21:09] <kurushiyama> CustosLimen: Pretty painful for balancer locks.
[21:22:13] <CustosLimen> kurushiyama, one config server is using different NTP - but its in different city also
[21:22:18] <CustosLimen> and it uses the ntp in its city
[21:22:19] <kurushiyama> CustosLimen: Interestingly enough, the replica sets sort of worked, despite the fact they were getting ops from the future.
[21:22:32] <CustosLimen> both sites have ntp from GPS though
[21:22:54] <CustosLimen> they are atleast within 1 second of each other - but better than this I cannot really check
[21:23:59] <kurushiyama> CustosLimen: Well, I'd double check that they are not more than 1 sec different, and I'd probably set the to UTC, anyway. It is to be checked wether various TZs, especially for the mongos, may cause... ...interesting side effects.
[21:24:31] <CustosLimen> kurushiyama, they are in same timezone
[21:24:36] <CustosLimen> just different cities
[21:24:37] <kurushiyama> Hmmm
[21:24:59] <kurushiyama> If the TZ matches, and they are within a second, that should work.
[21:28:44] <kurushiyama> Ok, from the start. There can be only one chunk migration at any given time, which should be rather fast, since mongos basically instructs the data transfer of nothing before updating the metadata.
[21:31:19] <kurushiyama> Hm. 3600/2k chunks would be like 1.8 secs/chunk 600/2k would be 0.3 secs/chunk. Even with some give and take, that would make a 3 config setup roughly half as fast as a single server config setup.
[21:31:51] <kurushiyama> Per config server, that is.
[21:33:12] <kurushiyama> CustosLimen: Do you have multiple mongos? And if yes, do they have the same order of config servers?
[21:33:47] <CustosLimen> kurushiyama, yes
[21:33:52] <CustosLimen> kurushiyama, to both questions
[21:34:54] <kurushiyama> Curious
[21:37:15] <kurushiyama> CustosLimen: Hm, you could try a bulk remove. But I'd keep an eye on the config server setup. Something "feels" wrong there. I have it in the bones...
[22:00:50] <CustosLimen> can I somehow prevent some mongos from doing balancing ?
[22:02:43] <kurushiyama> CustosLimen: iirc, disabling the balancer is per mongos... Wait a sec, there was something.
[22:04:06] <CustosLimen> kurushiyama, sh.stopBalancer(); disables balancing for whole sharded cluster
[22:04:17] <CustosLimen> kurushiyama, it basically affects some value in config db
[22:04:24] <kurushiyama> CustosLimen: Yeah, was "any" balancer, had "every" in mind
[22:05:26] <CustosLimen> kurushiyama, I have some balancers running on servers that have application on - and they are just there cos c++ client does not have failover/ha support
[22:05:47] <CustosLimen> and I would prefer they dont do balancing
[22:06:06] <kurushiyama> CustosLimen: Well, you can always set a balancer window.
[22:13:50] <kurushiyama> CustosLimen: Curious. Didn't know that the C++ drivers do not support failover. Highly unusual. Which one are you using?
[22:15:05] <kurushiyama> Anyway, got to sleep.