PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Tuesday the 17th of June, 2014

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:00:06] <joannac> sharks make everything better
[00:00:14] <joannac> like adding an 'X' to the end of names
[00:00:25] <joannac> we should be mongoDBX
[00:01:06] <joannac> cheeser: 2.6.2 is out?
[00:01:46] <cipher__> same error, joannac
[00:03:01] <joannac> cipher__: ...pastebin?
[00:03:17] <cipher__> I started mongos (numactl --interleave=all mongos --fork --nohttpinterface --logpath /var/log/mongodb_router.log --logappend --configdb 192.168.100.81:27019,192.168.100.82:27019,192.168.100.83:27019), it ran fine, and i ran mongo localhost:27017/admin
[00:07:52] <joannac> cipher__: and then?
[00:08:38] <joannac> oh
[00:08:39] <cipher__> and then i ran db.runCommand( {addShard : "rs_a/192.168.100.81:27018" } );
[00:08:47] <joannac> you're not running against the admin database
[00:08:50] <cipher__> ok: 0, not a command
[00:08:54] <cipher__> i am
[00:08:56] <cipher__> use admin
[00:09:11] <cipher__> "mongo localhost/admin"
[00:10:10] <cipher__> Well, I tried again with use admin, it failed
[00:10:47] <cipher__> "errmsg" : "no such cmd: addShard",
[00:10:54] <joannac> works for me
[00:11:20] <joannac> http://pastebin.com/mQMS8tep
[00:11:41] <joannac> well, i don't have something on that IP. but I don't get the error you get
[00:12:01] <cipher__> joannac: can i check that mongos actually connected to the config servers?
[00:12:20] <joannac> umm
[00:12:28] <joannac> look for connections in the config server logs?
[00:12:34] <joannac> or connections in the mongos logs
[00:14:28] <cipher__> scoped connection to 192.168.100.81:27019, 192.168.100.83:27019, 192.168.100.83:27019 not being returned to the pool?
[00:14:52] <joannac> why are 2 of those identical?
[00:14:55] <cipher__> actually, i think that was from trying to start mongos when it began
[00:15:04] <cipher__> typo
[00:15:11] <cipher__> ssh won't let me copy
[00:15:28] <cipher__> sorry, when it was already running*
[00:16:37] <cipher__> http://pastebin.com/AW8QGqTW
[00:20:12] <joannac> um, that shows your mongos is not running
[00:20:25] <cipher__> :|
[00:20:28] <joannac> ERROR: listen(): bind() failed errno:98 Address already in use for socket: 0.0.0.0:27017
[00:20:44] <cipher__> oh yeah, i tried to restart it while it was already running?
[00:21:12] <joannac> doesn't look like it
[00:21:17] <cipher__> okay
[00:21:20] <joannac> what do you actually have running on 27017
[00:21:54] <cipher__> http://pastebin.com/EjkfUS3j
[00:22:41] <joannac> that doesn't tell me anything other than there's something listening on that port
[00:23:04] <joannac> open a mongo shell, db.serverGetCmdLineOpts()
[00:25:32] <cipher__> It won't run on my query router anyway
[00:25:59] <cipher__> "query router", the one with mongos installed and not functioning
[00:26:19] <cipher__> 2014-06-16T18:23:52.503-0600 TypeError: Property 'serverGetCmdLineOpts' of object admin is not a function
[00:29:42] <cipher__> joannac: I ran lsof -i :27017, and have three instances off mongo on it
[00:29:47] <cipher__> of*
[00:30:24] <cipher__> basically localhost:randomport -> localhost:27017
[00:35:04] <cipher__> "g WriteBackListener for: 192.168.100.83:27019 serverID: 000000000000000000000000"
[00:35:26] <cheeser> joannac: i think i saw that email today
[00:35:28] <cipher__> dammit, it said, they're contacted successfully
[00:35:30] <cheeser> about 2.6.2
[00:36:12] <cipher__> i restarted the server, i saw no such connection errors
[00:39:11] <cipher__> oh, it recognizes it now
[01:10:48] <dafriskymonkey> hi everyone
[01:11:08] <dafriskymonkey> hello
[01:11:10] <dafriskymonkey> !
[01:11:51] <joannac> hi
[01:12:30] <dafriskymonkey> i dont know if im in the right place, but i have a little problem about windows azure and mongodb
[01:12:37] <dafriskymonkey> can someone help me ?
[01:12:52] <joannac> is it slowness?
[01:12:58] <joannac> long disk flushes?
[01:12:59] <dafriskymonkey> no
[01:13:04] <dafriskymonkey> ok
[01:13:16] <dafriskymonkey> i downloaded the preciew project at github
[01:13:25] <dafriskymonkey> the one using windows sdk 1.7
[01:13:38] <dafriskymonkey> i want to be able to run it using sdk 2.3
[01:14:03] <dafriskymonkey> my problem is with the clouddrive class
[01:14:15] <dafriskymonkey> wich is obsolete in sdk 2.3
[01:14:41] <dafriskymonkey> http://stackoverflow.com/questions/24253860/windows-azure-and-mongodb
[01:15:12] <dafriskymonkey> i dont know if someone can help me to solve this
[01:16:16] <joannac> not i, unfortunately
[01:16:27] <dafriskymonkey> nop thanks
[07:45:15] <gh2> Guys is the following is the right way to find and update the document? I am using pymongo
[07:45:16] <gh2> http://pastebin.com/raw.php?i=DaZFUCGN
[10:10:29] <shoshy> hello, i'm using mongodb 2.4 on ec2, i've ran "db.groups.update({},{$set:{loc: { type: [Number], index: '2dsphere’}},{multi:true})" on server. Since then , when i run db.groups.find() i get "SyntaxError: Unexpected identifier,Error: 16722 SyntaxError: Unexpected identifier" and if i try to use mongohub only on that location it crashes.
[10:12:34] <Nodex> 2dspehre expects an array does it not?
[10:12:41] <Nodex> array / object
[10:13:25] <shoshy> i just wanted to add an empty GEOJSON property to the existing schema items
[10:13:37] <shoshy> and the answer is yes, it does..
[10:14:20] <shoshy> it's just that i'm new to mongo and to it's GEOJSON properties... how can i add an empty coordinate field that is GEOJSON compliant
[10:14:50] <shoshy> {loc: { type: [43.32,56.777], index: '2dsphere’}} for example?
[10:15:04] <Nodex> if your document doens't have one then it won't be indexed
[10:15:27] <Nodex> not every key has to have a value
[10:16:01] <shoshy> ok, and if i wanted to add value... a point ? like i wrote?
[10:16:24] <Nodex> sure
[10:17:51] <shoshy> great... thank you, so i'll write a node.js script to help me then... tnx !
[10:18:07] <Nodex> { loc: { type: "Point", coordinates: [ 40, 5 ] } }
[10:18:11] <Nodex> according to the docs
[10:18:16] <shoshy> ahhhh
[10:18:20] <Nodex> http://docs.mongodb.org/manual/core/2dsphere/
[10:18:35] <Nodex> I havn't used mongo for geo stuff for a while, it's changed a lot since 2.2
[10:19:22] <shoshy> right , but the schema should be loc: { type: [Number], index: '2dsphere’}
[10:19:29] <shoshy> from what i've read...
[10:19:29] <Nodex> iirc it's long,lat too
[10:20:01] <Nodex> I think "type":"Point" tells it that it's 2dsphere
[10:20:14] <Nodex> as it really can't be anythign but 2d with only 2 dimensions
[10:20:29] <Nodex> dimensions -> axis
[10:21:42] <shoshy> So wait... i'm confused... when i define schemas, for example ...... { ... name:String, loc: {type:"Point", coordinates: [Number]},... } not { ... name:String,{ ... name:String, loc: {type:"Point", coordinates: Array}... } ?
[10:22:19] <Nodex> I assume so, I don't use Mongoose so I couldn't tell you the inner workings of it
[10:22:44] <Nodex> http://docs.mongodb.org/manual/core/2dsphere/#point <---- that's what the docs say for it.... type:..., coordinates...
[10:22:58] <shoshy> ops sorry i meant loc: { type: [Number], index: '2dsphere’}, wrong paste
[10:23:14] <Nodex> there is no such key in the docs as "index"
[10:23:36] <Nodex> and it would be very stupid to have "type" as an array / point
[10:23:47] <Nodex> stupid / ambiguous
[10:24:14] <shoshy> https://github.com/LearnBoost/mongoose/wiki/3.6-Release-Notes#geojson-support-mongodb--24
[10:24:24] <shoshy> new Schema({ loc: { type: [Number], index: '2dsphere'}})
[10:24:33] <shoshy> that's where i got it from... :/
[10:24:49] <Nodex> 2.4 !=2.6
[10:25:02] <shoshy> right...
[10:25:27] <shoshy> mmy mongo is 2.4 right now... i need to migrate juast wanted to do the check on a property..
[10:25:27] <Nodex> I don't know what Mongoose is confusing users for, it seems backward to me
[10:25:43] <shoshy> *just
[10:26:18] <Nodex> var geojsonLine = { type: 'LineString', coordinates: [[180.0, 11.0], [180.0, '9.00']] }...
[10:26:41] <shoshy> Right... i saw that
[10:26:49] <Nodex> you could just as easily do ... var geoJsonPoint = {type:"Point",coordinates:[1.2,3.4]}
[10:27:12] <shoshy> Yes, but i need to define a schema , with types...
[10:27:23] <Nodex> I couldn't comment on that
[10:27:23] <shoshy> so {type: String, coordinates: Array} ?
[10:27:34] <Nodex> never used it sorry
[10:27:36] <shoshy> or {type: String, coordinates: [Number]}
[10:27:37] <shoshy> ok...
[10:27:41] <shoshy> thank you very much...
[10:27:46] <shoshy> i appreciate the help
[10:28:10] <Nodex> no probs. Forced schema for a loose schema database also seems backwards but what can you do lol
[10:28:23] <shoshy> that's how mongoose works
[10:28:31] <shoshy> which is the official mongodb driver
[10:28:37] <Nodex> no it's not
[10:28:43] <Nodex> the official driver is Mongodb Native
[10:29:30] <Nodex> https://github.com/mongodb/node-mongodb-native <--- official
[10:29:36] <shoshy> you're right... they do mention on the mongodb site that: "Mongoose is the officially supported ODM for Node.js. It has a thriving open source community and includes advanced schema-based features such as async validation, casting, object life-cycle management, pseudo-joins, and rich query builder support."
[10:30:19] <shoshy> sorry, i'm new to this, i just started using it 2 days ago... and using an ODM , which seems very up to date, seemed the logical thing to do
[10:30:56] <shoshy> so i dont know what they mean by "officialy supported ODM for node.js" if it's not..
[10:31:20] <shoshy> as for mongodb native they write the same "The MongoDB Node.js driver is the officially supported node.js driver for MongoDB"
[10:31:37] <Nodex> it has pros and cons, it's recommended to learn mongodb types and functionality first befire having it all stripped away from you in an abstraction layer
[10:34:07] <shoshy> ok... thanks! you got me thinking more about this
[10:34:28] <shoshy> it made things work faster for me (ODM)
[10:34:34] <shoshy> coding wise..
[10:35:24] <Nodex> yeh but it's up for debate if that's good or bad in the long term. Anyway, you have a strating point so good luck :)
[10:41:24] <shoshy> Thank you very much!
[10:41:38] <shoshy> you're probably right... i should switch to the native driver
[10:45:56] <romaric> Hi everybody, we are experiencing some issue on our sharded system. We are sending a batch insert to our mongos, which will well split data between our two shards, but it seems that the two write operations are done sequentially. Someone have any idea. You can see mongos very verbose log here : http://pastebin.com/H0K05kwq
[10:57:55] <remonvv> \o
[13:15:08] <eagen> romaric Based on http://docs.mongodb.org/manual/faq/concurrency/ your inserts should be able to be done to each shard at the same time. Interesting.
[13:26:32] <eagen> romaric According to http://docs.mongodb.org/manual/core/bulk-inserts they recommend inserting to multiple mongos instances to parallelize imports. This blog has some ideas as well: http://cfc.kizzx2.com/index.php/slow-batch-insert-with-mongodb-sharding-and-how-i-debugged-it/
[13:41:11] <remonvv> eagen : Not sure who wrote that but multiple mongos instances won't do all that much for insert performance.
[13:59:48] <romaric> thank you eagen, but it is the job of the mongos to route the docs to their shard where it is done at the same time, isn't it ?
[14:01:30] <remonvv> romaric: What is it you're trying to do exactly? Bulk insert and you don't see full utilization of all the shards?
[14:07:03] <romaric> hi remonvv, I'm doing bulk insert and I do not see any improvement on the write speed. What is weird is that we can see the two shard filling up with approximately half of the list of document (20k batch insert, ~10k on each shard at the end) but it seems that the two background 10k batch insert, sent by the mongos, are done sequentially
[14:07:13] <eagen> romaric I totally agree. Mongos _should_ handle it but apparently it doesn't.
[14:07:19] <romaric> according to this logs :http://pastebin.com/H0K05kwq
[14:07:31] <romaric> l.03: mongos is going to insert approximately the half of the original 20k docs batch insert on shard 1
[14:07:37] <romaric> l.07: mongos has wait for writebacks on shard1
[14:07:42] <romaric> l.10 : mongos is going to insert the other half of the original 20k docs batch insert on shard2
[14:08:17] <romaric> before to say that mongo does not do something I want to try & I hope that the issue is me instead of mongo ;)
[14:08:29] <saml> i ensured index so many
[14:08:36] <saml> now mongodb so slow
[14:08:40] <saml> so i deleted and gave up
[14:09:08] <remonvv> romaric: What is your shard key? As in which type and what values do you have in your data?
[14:09:53] <romaric> look at this sh.status(true)
[14:09:54] <romaric> http://pastebin.com/bfcyxwzZ
[14:09:56] <remonvv> saml: Do you have a question?
[14:10:12] <saml> yes may i have a question?
[14:10:31] <remonvv> romaric: ObjectId is not a very good shard key
[14:10:36] <saml> given a list of queries, how can i automatically set index?
[14:10:44] <Nodex> hitn()
[14:10:46] <Nodex> hint()
[14:11:01] <remonvv> romaric: Is the index hashed?
[14:11:21] <romaric> remonvv: this is not ObjectId, this is a binary id
[14:11:42] <remonvv> romaric: Okay, how big and does it increase over time?
[14:12:23] <remonvv> romaric: It sounds like you have hotspots in your chunk distribution (meaning; an above average amount of documents go to the same shard)
[14:12:59] <remonvv> romaric: Either way, unless your mongos is cpu or I/O bound there's no real value in having more of them. Not sure why it's suggested in the documentation.
[14:13:45] <romaric> hummm that what every one say but we did a 20k bulk insert & it results in two shard containing approximately 10k docs each
[14:14:09] <cheeser> bad shard key?
[14:15:26] <rspijker> I think it’s just batch insert that doesn’t play nicely with sharding
[14:16:24] <romaric> the shard key is made of 14 bytes with random inside that result into two 10k shards
[14:16:54] <remonvv> romaric: The end result is not that important. If you have a bad shard key it might do the 10k on 1 first and the other hal on 2 sequentially. And there are various more subtle problems that can occur.
[14:17:11] <romaric> ok
[14:17:19] <remonvv> romaric: If it's completely random then the above does not apply.
[14:17:27] <romaric> do you have any simple example of a shardkey that have this behaviour ?
[14:17:34] <remonvv> sure
[14:19:32] <remonvv> Say you have 2 shards with 1 chunk each and your shard key is a date. It will fill up 1 shard with the earlier half of the date values and the other half with the other ones (assuming the chunk split happened exactly in the middle of the date range).
[14:20:56] <remonvv> Basically you want your shard key values to be uniformly distributed across the value range and "random"
[14:21:28] <remonvv> "random" as in your insert order should not be ht order of hash key values
[14:22:04] <rspijker> romaric: didn’t you tell me yesterday that you had a time component in the ids but you shufled them somehow so that they weren’t at the end?
[14:22:05] <romaric> I'm not sure to understand this "random"
[14:22:22] <rspijker> because, looking at your sh.status, the end of everything seems to be exactly the same...
[14:22:29] <romaric> yes it is what I said
[14:22:34] <remonvv> is it what you did? :)
[14:22:39] <romaric> no
[14:22:50] <romaric> the end is 00000000
[14:23:04] <rspijker> well… they are all “AAT72”xxxxx”AAAAAAAAAAA”
[14:23:12] <romaric> yes
[14:23:14] <rspijker> if the xxx is the time bit, it will be increasing
[14:23:23] <rspijker> so, regardless of where you put it in the id
[14:23:24] <romaric> the xxx is the random part
[14:23:30] <rspijker> you will still have monotonically increasing ids
[14:23:32] <remonvv> romaric: Think of it like this; say you have two shards with one chunk each; the first chunk takes shard key values 0-9 and the other 10-19, if you insert 0,1,2,3,4,5,6 they'll all end up in shard 1 and shard 2 will be idle
[14:23:33] <rspijker> define random...
[14:23:39] <romaric> AAT72 is the time
[14:23:42] <remonvv> I'm not sure if I can ELI5 it more than that.
[14:24:33] <romaric> remonvv : ok, what happend if a bulk insert {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19} ?
[14:24:50] <rspijker> so, the xxxx part is actually random? Or, more importantly, if I take two documents out of your sequence will the xxx of the first always be smaller than that of the second?
[14:24:50] <saml> Nodex, what's hint() ?
[14:24:56] <saml> error: { "$err" : "bad hint", "code" : 10113 }
[14:25:06] <saml> db.docs.find(...).sort(...).hint()
[14:25:08] <romaric> rspijker : no
[14:25:11] <rspijker> it’s a cursor method saml
[14:25:19] <saml> maybe i'm using old version
[14:25:56] <saml> oh i see
[14:25:58] <saml> whoa
[14:26:02] <remonvv> saml : find().hint()
[14:26:10] <saml> how can i drop a specific index?
[14:26:12] <saml> not all indexes
[14:26:27] <saml> dropIndex
[14:26:36] <romaric> imagine a nanotime: 1234567891012131: we do that : 123456 131 7891012 and we split on the 1 of the 131
[14:26:37] <saml> i think now i'm mongodb dba
[14:27:09] <rspijker> romaric: ok… so you shuffle the time…
[14:27:13] <rspijker> do you do it exactly like that?
[14:27:14] <romaric> yep
[14:27:21] <romaric> yep
[14:27:39] <rspijker> well… that’s still fairly close to monotonically increasing, isn;t it
[14:27:42] <remonvv> romaric : Define "we split on 131". Right now you're not helping sharding at all.
[14:27:52] <romaric> yes rspijker I agree
[14:27:56] <rspijker> remonvv: he puts the 131 in the middle
[14:27:56] <remonvv> ...
[14:28:04] <rspijker> but that will be monotonically increasing
[14:28:13] <remonvv> rspijker: Ya
[14:28:23] <rspijker> so, that’s bad mmkay
[14:28:27] <saml> yah .hint() doesn't help
[14:28:37] <rspijker> like remonvv just told you and I told you yesterday
[14:28:45] <saml> how do I figure out what index should set?
[14:29:11] <romaric> sorry but I'm not sur
[14:29:16] <remonvv> I'm trying to think of a way to explain what's going wrong.
[14:29:24] <Nodex> saml : that's up to you
[14:29:26] <romaric> ok thank you
[14:29:26] <saml> for db.docs.find({date:{$exists:1}, count:{$exists:1}, status:'published'}).sort({date:-1, count:-1})
[14:29:37] <saml> i'm not using 2.6 so no sparse index
[14:29:45] <Nodex> then that won't use an index
[14:29:46] <rspijker> romaric: it’s like I said yesterday. All of the reads will always go into your maxchunk
[14:30:06] <saml> i could change the query to: db.docs.find({date:{$gt:ISODate('1900-01-01')}, count:{$gt:0}, status:'published'}).sort({date:-1, count:-1})
[14:30:20] <remonvv> romaric : Say you have 20 people in front of you. 10 girls and 10 guys. On your left you have a pink car for the girls and on the right you have a blue car for the guys. Your goal is to get all 20 people in their car as quickly as possible.
[14:30:29] <saml> db.docs.ensureIndex({date:-1,count:-1:status:1}) after this, that query is fast
[14:30:34] <remonvv> What you're doing is put the 10 girls in line first and then the 10 guys.
[14:30:45] <Nodex> you could try a compound on date:-1, count:-1, status:1
[14:30:46] <saml> but i just blindly ensure index by using all fields appearing in the query and sort
[14:30:57] <saml> is there more systematic way of approaching index set up?
[14:31:10] <Nodex> not suew how confident the planner will pick without being on 2.6
[14:31:13] <saml> yah that's what i did. but that's just a hunch
[14:31:33] <saml> planner picks that index (date:-1,count:-1,status:1)
[14:31:44] <remonvv> romaric : Make sense? You want the shuffle the guys and the girls so they can get in their cars at the same time rather than the guys having to wait for all girls to get in their cars.
[14:31:53] <romaric> ok remonvv, I really understand this, let me explain why i still think the shardkey is good
[14:31:57] <saml> so do i set one index per query?
[14:31:57] <remonvv> I get the feeling I'm overdoing it on the analogies here..
[14:32:02] <romaric> lol
[14:32:04] <remonvv> romaric : Shoot..
[14:32:10] <saml> if i have 50 different kinds of queries, set up 50 compound indexes?
[14:32:13] <Nodex> saml: is it using the index?
[14:32:20] <Nodex> and yes, index per query
[14:32:35] <Nodex> obviously some coumpound indexes cover others so you can be smart about it
[14:32:40] <rspijker> saml: there is dex, which gives you index suggestions based on your queries
[14:32:42] <saml> "indexOnly" : false,
[14:32:55] <romaric> imagine your shardkey are small number with 4 digits
[14:32:58] <saml> "nscannedObjects" : 9175, "nscanned" : 13707,
[14:33:04] <romaric> 1111 1112 1113 1114
[14:33:05] <saml> .count() is 9175
[14:33:11] <rspijker> saml: if you don;t feel confident you can figure it out yourself, you can use it to get you started
[14:33:13] <Nodex> then it didn't use it
[14:33:16] <romaric> you split your chunk to split on the second digit
[14:33:19] <romaric> here
[14:33:26] <romaric> they all go in the first shard
[14:33:41] <romaric> but if you move the last digit at the second offset, your splitting is good
[14:33:45] <romaric> isn't it ?
[14:33:55] <romaric> http://pastebin.com/bfcyxwzZ
[14:33:59] <saml> even if i .hint(..).explain() tells not used
[14:34:04] <romaric> this is what it says
[14:34:33] <saml> https://github.com/mongolab/dex rspijker thanks
[14:34:41] <Nodex> you might have to put in a sort for status
[14:34:51] <Nodex> it's redundant but it might use the index then
[14:35:26] <saml> my goal is to get indexOnly: true ?
[14:36:04] <saml> https://gist.github.com/saml/2300b948b9e8f6e12acc
[14:36:36] <rspijker> romaric: I gave it some more thought and if you are doing it exactly like you’re saying you are. Then your keys won;t be monotonically increasing
[14:36:39] <romaric> remonvv : do you think I'm crazy ;) ?
[14:37:32] <remonvv> romaric : I don't think my assessment on your mental health will move this forward
[14:37:39] <romaric> lol
[14:39:01] <rspijker> ok, your shard key is good
[14:39:03] <remonvv> romaric : If you pick any random 2 documents from your batch, what are the odds of one going to shard 1 and the other going to shard 2
[14:40:05] <romaric> the odds of doc 1 going to shard 1 is 1 out of 2 and the same for doc 2
[14:40:23] <remonvv> rspijker: Not necessarily, depends on how his batches are ordered. He's basically doing the equivalent of turning YYYY:MM:DD hh:mm:ss into hh:YYYY:MM:DD mm:ss if I understand correctly.
[14:40:49] <remonvv> I might not have understood correctly.
[14:40:58] <rspijker> the gist of it, yes
[14:40:59] <romaric> By the way, when we do a batch insert of 20k we observe 10k each side
[14:41:10] <rspijker> ut it looks like he’s taking the last 3 digits of nanotime
[14:41:15] <rspijker> so that should be fairly random
[14:41:18] <remonvv> Ah, yeah.
[14:41:27] <rspijker> whereas hours can be fairly bad
[14:41:52] <romaric> it's more turning YYYY:MM:DD hh:mm:ss into YYYY:MM:DD ss:hh:mm
[14:42:35] <remonvv> romaric: That would not be a good solution.
[14:43:02] <rspijker> why not?
[14:43:08] <rspijker> well, seconds is still too granular
[14:43:23] <rspijker> but if we assume nanoseconds, conceptually, why not?
[14:43:24] <romaric> lol that's why its nano ;)
[14:44:58] <remonvv> Hm, let me think. Maybe I took a wrong turn in my head as well.
[14:45:11] <remonvv> How many digits of the nanotime are you putting in front? 3?
[14:45:18] <romaric> yes
[14:46:14] <remonvv> The last 3?
[14:46:19] <romaric> 16 digits > 7first + 3lasts + 6 remaining
[14:46:40] <remonvv> rspijker: You're right
[14:46:50] <rspijker> always nice to hear
[14:46:56] <rspijker> romaric: can you try running this: https://github.com/comerford/mongodb-scripts/blob/master/AllChunkInfo.js
[14:46:58] <romaric> ^^
[14:47:06] <rspijker> it will give you an idea on how well split stuff actually is
[14:47:06] <romaric> yes
[14:47:27] <rspijker> because, if you presplit the chunks, it might just be split incorrectly
[14:47:28] <remonvv> Is the batch inserted in the same order as you're generating the _id values?
[14:48:07] <remonvv> He said he had 10k each. As far as correct splitting goes it seems distributed.
[14:48:20] <rspijker> yeah, but he has way more than 2 chunks
[14:48:34] <rspijker> I’d like to know if it’s actually split properly or not
[14:48:35] <remonvv> Did you see concurrent inserts of roughly equal throughput on both shards during your insert batch?
[14:48:40] <rspijker> although it probably should be
[14:48:58] <rspijker> My gut feel still tells me that mongos just doesn’t handle bulk inserts concurrently
[14:50:12] <romaric> remonvv : I do not understant what you mean
[14:50:16] <remonvv> rspijker: I don't think there's any splitting issue that would result in what he's seeing but it's always good to have a look.
[14:50:30] <remonvv> romaric : If you do mongostat --discover during your insert what do you see?
[14:50:38] <remonvv> romaric : Pastie it
[14:51:56] <romaric> ok, wait for it
[14:52:02] <rspijker> what version are you on romaric ?
[14:52:58] <rspijker> because https://jira.mongodb.org/browse/SERVER-10723 seems to be only resolved in 2.6
[14:54:07] <romaric> 2.4
[14:54:56] <remonvv> lol
[14:54:57] <remonvv> well
[14:55:19] <remonvv> Someone didn't search for "slow bulk insert"
[14:59:32] <romaric> http://pastebin.com/A9T9k12P mongostat --discover
[14:59:37] <romaric> remonvv
[15:00:45] <romaric> the next step in our project is to upgrade to 2.6, maybe it's gonna be the actual setp ;)
[15:01:55] <rspijker> 2.6 upgrade will likely resolve it for you
[15:02:04] <rspijker> untill then, just loop through the inserts in your app
[15:02:12] <rspijker> see how that performs in sharded vs single mongod
[15:03:21] <remonvv> Hm, this mongostat dump looks exactly like a sequential insert. Unless 2.6 does black magic I don't think that will resolve this.
[15:04:52] <rspijker> well…. if the mongos is currently translating bulk inserts into separate inserts, very badly, some simple logic could vastly improve that
[15:05:02] <remonvv> Yeah but it isn't.
[15:05:12] <remonvv> Seperate inserts would be slow but still distributed
[15:05:23] <rspijker> this is true…
[15:05:31] <remonvv> This is exactly what the equivalent of putting all the girls in the car first would look like.
[15:05:44] <rspijker> (I haven’t actually checked the mongostat output, I was quite pleased with myself after finding that issue)
[15:06:14] <remonvv> romaric : Are you sure your test is valid? As in you're not inserting in the same order as your key?
[15:06:16] <rspijker> basic rule of thumg, when using people and car analogies, always use clowns...
[15:06:21] <rspijker> thumb*
[15:07:12] <remonvv> but I'd need two types of clowns..
[15:08:29] <rspijker> with and without a hat
[15:09:01] <remonvv> And the ones with the hats would go in the car with the higher roof?
[15:09:10] <remonvv> I feel that would have overcomplicated the analogy
[15:09:21] <rspijker> true, but… clowns
[15:09:35] <remonvv> Yeah I'm with you...just saying...clowns add complexity.
[15:09:59] <rspijker> there’s always a price to pay when clowns are involved
[15:10:00] <cheeser> but they make up for it!
[15:10:08] <rspijker> worth it though
[15:12:14] <q85> Questions for you ladies and gents. Why does the primary drop connections when I apply a new config to hide a secondary? Does anyone know of an elegant way take secondaries down without interrupting the primary?
[15:18:03] <romaric> remonvv
[15:18:58] <romaric> I sort the list of document before inserting them because without doing that, the mongos result by doing a small batch insert on shard 1, then on shard2, then on shard1 etc...
[15:19:38] <romaric> but the result by sorting the batch insert sort the girls & the boys
[15:21:10] <remonvv> romaric : If you sort it you're basically creating the worst case scenario
[15:23:26] <remonvv> romaric : Shuffle your batch and do single inserts. See what happens.
[15:23:53] <romaric> we benchmark this
[15:24:02] <romaric> and single inserts are 10times slower that bulk insert
[15:27:24] <remonvv> romaric : 10 times slower than a couple of thousand docs per second?
[15:27:56] <romaric> single insert vs batch insert
[15:28:02] <romaric> batch insert is 10 times faster
[15:28:11] <romaric> single inserts are 10 times slower ;)
[15:28:24] <romaric> on non sharded system
[15:28:36] <remonvv> Well yes, but have you tried single inserts on a sharded system?
[15:28:50] <romaric> no
[15:29:12] <romaric> do you really think that could be faster than a bulk operation ?
[15:29:30] <rspijker> due to the issue there apparently is in 2.4, yes...
[15:29:51] <rspijker> probably won’t be faster than a bulk insert on a single shard though (!)
[15:30:02] <romaric> should not be
[15:30:21] <romaric> well we will probably upgrade for 2.6 and see
[15:31:18] <remonvv> I suspect that will solve exactly nothing in your specific test but keep us posted ;)
[15:33:40] <romaric> I'll keep you poster ! Do you have any other idea than upgrade remonvv ? Let's say that the single inserts are faster than the bulk operation in sharded system (already weird to write this), what does it mean ?
[15:36:26] <remonvv> romaric : It would mean that the single inserts can be applied in parallel whereas your bulk inserts can do so to a lesser extent. Bulk inserts are not a magic bullet. They're a bit faster...most of the time...
[15:53:02] <joshua> wawaweewa whats going on with MMS
[15:53:38] <joshua> cluster view changed
[16:34:36] <jamis> is it possible to do an upsert and $push on an array that is deeper than 1 field? e.g: db.collection.update(query, { $set: {extracted: {hello: {world}}}, $push: {extracted: {array_field: {val:1}}} }, upsert=True)
[16:35:51] <jamis> in this case, array_field is inside the extracted dict. I need to create or update the extracted dict and push values onto extracted.array_field.
[16:36:35] <joshua> Have you tried using dot notation?
[16:37:19] <jamis> joshua I tried doing {$push: {'extracted.array_field': {val:1} }} using pymongo and didn't have luck
[16:39:07] <jamis> @joshua error when trying dot notation: "pymongo.errors.OperationFailure: Cannot update 'extracted' and 'extracted.array_field' at the same time"
[16:39:08] <joshua> Hmm. I thought it worked on the shell anyway. Never tried with pymongo (Or if I have, I don't remember what I ended up doing)
[16:43:16] <joshua> That error sounds like it might work, just not both at once so maybe you have to do them one by one.
[16:45:39] <jamis> So do the upsert with the @set first and then do another update with dot notation on the nested extracted.array_field?
[16:49:14] <blahsphemer> I am able to connect to my mongodb using hostname as localhost, but if I specify my own IP address, I can't
[16:49:21] <blahsphemer> Where do I change this?
[16:50:20] <joshua> jamis: I'm probably not the best person to answer, but thats what I was thinking. You can iterate through them and perform two actions
[16:51:09] <joshua> blahsphemer: There is a --host and --port flag. By default it uses localhost anyway
[16:52:56] <blahsphemer> joshua, I found bind_ip in /etc/mongod.conf,
[16:53:05] <Kaiju> If you object looks like this
[16:53:05] <blahsphemer> Can I not change that and restart mongo?
[16:53:05] <Kaiju> {level1 : {level 2: {level3: []}}};
[16:53:05] <Kaiju> and I wanted to push to the level 3 array
[16:53:07] <Kaiju> collection.find({level1:{$exists:true}},{$push {level1:{level2:{level3:'value'}}}});
[16:53:09] <Kaiju> $push implys a $set
[16:53:24] <blahsphemer> joshua, Works!. Ty
[17:18:59] <gsd> i'm trying to use the { fullResult: true } option for collection.update
[17:19:09] <gsd> but it isn't returning the full document - just additional meta info
[17:19:27] <gsd> is there something else I need to do?
[17:43:01] <cofeineSunshine> hello
[17:43:12] <cofeineSunshine> trying to migrate from 2.4 to 2.6
[17:43:15] <cofeineSunshine> using replica set
[17:43:47] <cofeineSunshine> getting this error
[17:43:48] <cofeineSunshine> ERROR: error creating index when cloning spec: { name: "query_1_$snapshot_1", key: { query: 1.0, $snapshot: 1.0 }, ns: "bidgeon_prod_db.task_actions", background: true } error: CannotCreateIndex bad index key pattern { query: 1.0, $snapshot: 1.0 }: Index key contains an illegal field name: field name starts with '$'.
[17:44:00] <cofeineSunshine> there is no such index on primary replica set instance
[17:44:12] <cofeineSunshine> but it gets this error
[17:44:36] <cofeineSunshine> removed this index from primary
[17:44:54] <cofeineSunshine> but replica hits this error and restarts from beggining
[17:50:58] <Kaiju> cofeineSunshine: Followed this yet? http://docs.mongodb.org/manual/release-notes/2.6-upgrade/
[17:52:34] <cofeineSunshine> Kaiju: yes. Beed following. did db.upgradeCheckAllDBs() check. It told me about those $snapshot indexes
[17:52:37] <cofeineSunshine> removed
[17:52:59] <Kaiju> replica only or shards as well?
[17:53:04] <Kaiju> config servers?
[17:53:07] <cofeineSunshine> yes
[17:53:10] <cofeineSunshine> there is config server
[17:53:18] <cofeineSunshine> i dont get where they come from
[17:53:25] <cofeineSunshine> updated config server
[17:53:27] <cofeineSunshine> updated mongos
[17:53:44] <cofeineSunshine> now there left 2 PRIMARY and SECONDARY RS
[17:53:49] <cofeineSunshine> no sharding at all
[17:54:01] <prateekp> can same model have many models of itself
[17:54:06] <prateekp> i have a model named folder
[17:54:13] <prateekp> how can i have "has_many" folders as its option
[17:55:03] <cofeineSunshine> prateekp: list of ids, or DBRefs
[17:55:51] <Kaiju> cofeineSunshine: I messed up my migration pretty bad when I did it. Got stuck in a race condition that could not be resolved. I ended up doing a mongodump backup. Wiping everything. Strating up new 2.6's all the way around and reimporting the data.
[17:56:16] <Kaiju> 1 shard server, 3 shards and 2 replicas
[17:56:19] <cofeineSunshine> ....
[17:56:20] <Kaiju> 3 config
[17:56:25] <cofeineSunshine> 85GB database
[17:56:41] <Kaiju> yhea I'm about 20gb myself
[17:56:44] <Kaiju> 200
[17:57:01] <Kaiju> was quicker than messing with it for hours on end
[18:10:03] <cofeineSunshine> but where those { name: "query_1_$snapshot_1", key: { query: 1.0, $snapshot: 1.0 }... indexes comes?
[18:10:19] <cofeineSunshine> is it related to sharding?
[18:14:17] <prateekp> is there a easy way of keeping heirarchical data
[18:14:17] <prateekp> ?
[18:15:20] <cozby> hmm so I kinda terminated my mongo master (have a replica config setup)
[18:15:30] <cozby> I have one other instance in the replica set
[18:15:45] <cozby> I'm trying to force the other replica to become primary
[18:15:52] <cozby> but I can't do that without being on the primary
[18:15:53] <cozby> ..
[18:18:38] <cozby> how do you get around this?
[18:18:51] <cozby> I keep getting this error:replSetReconfig command must be sent to the current replica set primary.
[18:22:19] <cozby> ah NVM
[18:22:26] <cozby> just found out about force:true
[18:30:19] <cozby> crap - that didn't work
[18:30:32] <cozby> now I'm getting exception: need most members up to reconfigure, not ok
[18:30:45] <cozby> do I have to start from scratch or something?
[18:30:58] <cozby> (this is a good exercise in data recovery)
[18:31:20] <joshua> How many members do you have?
[18:31:37] <joshua> If you only have 2, then it won't fail over properly
[18:32:30] <cozby> I only have 1 now
[18:32:39] <cozby> one defunct and the secondary
[18:32:45] <cozby> I'm trying to promote as primary
[18:33:13] <cozby> I can't add any members to this either because, well, I'm not primary
[18:33:15] <cozby> so... :S
[18:33:46] <cozby> joshua: any ideas?
[18:43:12] <jekle> prateekp: http://docs.mongodb.org/manual/applications/data-models-tree-structures/
[18:43:26] <joshua> cozby: if you stop both, then start one up that has your data, then start the second one with the data removed it should sync back up
[18:43:59] <joshua> But you really should add a 3rd in, even if its just an arbiter that doesn't hold data
[18:44:52] <joshua> And if all else fails you can start a node up as a single db without replication enabled and you can get your data OR do a dump from the data directory with the server stopped
[19:26:58] <Zelest> ts33kr, fix your shit please. :(
[19:30:49] <kali> cozby: you must reconfigure with the force option on the remaining secondary, and discard the broken replica
[19:31:11] <kali> cozby: and then, make sure you setup an arbiter. a replica set with two nodes is a disaster waiting to happen
[19:31:38] <kali> cozby: well, yours has actully happen
[19:32:08] <joshua> Its not a disaster its a learning experience :)
[19:33:42] <kali> it's not exclusive :)
[20:09:36] <michaelchum> I would like to aggregate for documents that values for a key are not something OR not something else such as [ { $match: { $or [{ "mykey": { $ne: "myvalue1" } }, { "mykey": { $ne: "myvalue2" } } ] } } ]
[20:10:22] <michaelchum> But it gives me some errors: "Command failed"
[20:11:05] <michaelchum> Any ideas how I can do NOT MATCH something or something with aggregation?
[21:56:43] <russql> i guess what' im asking for is not possible
[22:07:00] <djlee> Hi all, anyone here using mongohq ?
[22:08:09] <rafaelhbarros> no, everybody here uses mysql
[22:08:13] <rafaelhbarros> the best database
[22:08:13] <rafaelhbarros> ever
[22:08:23] <rafaelhbarros> if you can't use mysql, go for sqlite3
[22:08:39] <rafaelhbarros> djlee: if you need anything, shoot, people are very active here
[22:08:59] <djlee> rafaelhbarros: im going to assume that was your attempt at sarcasm, but can i just point out i said mongohq not mongodb, im not that stupid :P
[22:09:24] <rafaelhbarros> djlee: yes, I'm the stupid one with lysdexia
[22:09:35] <rafaelhbarros> djlee: I have one free account there
[22:09:50] <rafaelhbarros> djlee: it works fine, I don't do any heavy lifting
[22:09:58] <rafaelhbarros> I mostly do prototyping
[22:10:00] <djlee> i have a specific issue with mongohq where its taking a LONG time to open a connection, and i mean its gone from near instant on a local mongo install, to up to 1 second on mongohq
[22:10:01] <rafaelhbarros> with hq
[22:10:31] <djlee> im assuming ive done something wrong somewhere, but im not sure where.
[22:10:44] <rafaelhbarros> djlee: can you put some pastebin?
[22:10:51] <rafaelhbarros> which language?
[22:11:32] <djlee> rafaelhbarros: its PHP, abstracted by a few libraries (so theres a 3rd party lib that manages creating the connection, which uses the php mongo driver)
[22:12:19] <rafaelhbarros> ah, I don't know php
[22:14:19] <djlee> rafaelhbarros: no worries, can you tell me (im new to replicasets) every time i create a connection, should i be connecting to the replicaset uri? or should my applicaion do some caching? Not sure if its because im using a replicaset uri that its got slow, or if its network related
[22:14:50] <rafaelhbarros> my python code uses pymongo
[22:15:30] <rafaelhbarros> and it's MongoClient("mongo-server-01", replicaSet='foo')
[22:15:47] <rafaelhbarros> so, I basically put one of the replicas, it figures out which one is which
[22:16:16] <rafaelhbarros> pardon me, it's MongoReplicaSetClient
[22:16:54] <rafaelhbarros> MongoReplicaSetClient(db_uri, replicaSet="replName", slave_okay=True)
[22:17:41] <djlee> cheers rafaelhbarros, only difference i can see is i provide multiple members of the replicaset (as advised by mongohq), so maybe i'll just try the one member and see what happens, just to start ruling stuff out
[22:18:10] <rafaelhbarros> djlee: alright, let's see where it takes you
[22:19:09] <russql> is there a way to unwind multiple fields?
[22:22:44] <edxt34wefc> Hello guys
[22:23:01] <edxt34wefc> Can anyone help me with a question about databases (generally)?
[22:23:33] <edxt34wefc> In case my website gets hacked, I'd like my data to be encoded, but how can I do that?
[22:24:36] <cofeineSunshine> then you should protect your data also from your webapp
[22:25:07] <cofeineSunshine> or make aplication in a such way, that data would be decrypted at user mashine with provided password
[22:25:38] <cofeineSunshine> so, how can I clean oplog capped collection in my primary result set instance?
[22:25:44] <cofeineSunshine> i've migrated to 2.6
[22:26:00] <cofeineSunshine> all looks good, but i have operation, taht cant bet replicated
[22:26:29] <cofeineSunshine> edxt34wefc: as here. Not your personal army in private msgs
[22:26:36] <cofeineSunshine> *ask
[22:28:30] <cofeineSunshine> now I have 2 secondary servers killing themselves by trying to apply taht operation. And in those server oplog collections getting bigger
[22:29:08] <cofeineSunshine> As I understand SECONDARY instances reading from PRIMARY local.oplog.rs collection and aplying those operations to itself, yes?
[22:37:17] <joannac> cofeineSunshine: yes
[22:37:30] <joannac> I'm confused. why can't your secondaries apply the operation?
[22:39:28] <djlee> hey rafaelhbarros, this is going to sound silly, but i got the replica set name out of mongohq by digging through a bunch of debug info in mongohq, which sounds like a lot of hassle for information you actually need to work with it. By any chance should i be using a more generic replica set name such as the host name or database name? At the moment i have it set to "set-<random string>" which i found in mongohq s
[22:39:28] <djlee> omewhere after lots of digging
[22:46:01] <rafaelhbarros> djlee: one second let me read that
[22:46:38] <cofeineSunshine> joannac: huh, I have here situation. I was migrating my replica set from 2.4 to 2.6.2. There were some indexes with fieldname starting '$'
[22:47:47] <cofeineSunshine> it several times was in STARTUP2 mode, and restarted from start when encountred faulty index(cleaned db an started from beginning). At some point of time, at primary instance I deleted that shitty index.
[22:48:12] <rafaelhbarros> djlee: well, with mongohq I'm not sure how you name your replicaset, you should have that in a setting somewhere
[22:48:22] <cofeineSunshine> Now i have situation when 2 SECONDARY instances trying to aply removal operation of that foulty index
[22:48:24] <rafaelhbarros> djlee: and yes, it's a hassle to have to dig through logs for something you need
[22:49:25] <djlee> rafaelhbarros: yeah i was just wondering whether i had the set name wrong or was doing something silly. I may just drop tech support a note, i'm almost certain its something im misunderstanding
[22:49:57] <rafaelhbarros> djlee: also, in regards to naming: samething as any variable ever written: MexicanFoodReplicaSet
[22:53:18] <cofeineSunshine> joannac: Now it runs forever*, trying to apply that operation and fails
[22:53:36] <cofeineSunshine> when I try to stepDown primary, other server crashes
[22:53:46] <cofeineSunshine> failower doesnt work
[22:54:24] <cofeineSunshine> joannac: how can I clean oplog?
[23:01:31] <cofeineSunshine> how come in SECONDARY instance local.oplos.rs collection is growing?
[23:03:17] <cofeineSunshine> replication oplog stream went back in time
[23:03:20] <cofeineSunshine> WTF?
[23:03:27] <cofeineSunshine> how can I fix that?
[23:05:46] <prateekp> how to query a field with query as "abc/def"
[23:05:52] <prateekp> it is givin nil
[23:06:07] <prateekp> while the database should not give nil
[23:06:31] <joannac> cofeineSunshine: might be better off trying to initial sync it from scratch
[23:07:04] <joannac> prateekp: field == "abc/def" that exact string?
[23:07:18] <joannac> or field = "abc" OR field = "def" ?
[23:07:55] <prateekp> actually i have a document which has attricute type
[23:08:08] <prateekp> and that attribute has value "abc/def"
[23:08:31] <prateekp> now when i query Collection.where(type:"abc/def")
[23:08:32] <prateekp> i am getting nil
[23:08:45] <prateekp> @joannac ^
[23:10:05] <cofeineSunshine> joannac: I take your advice
[23:12:59] <prateekp> joannac : any suggestion
[23:13:00] <prateekp> @joannac : any suggestion
[23:15:02] <cofeineSunshine> joannac: but rebuildint replica set from PRIMARY wont clean primary's oplog.rs collection?
[23:15:24] <cofeineSunshine> those actions will be applied, because they are in tat oplog collection
[23:15:43] <cofeineSunshine> shitty part is, to know that now I have to wait 3hrs....
[23:15:51] <cofeineSunshine> >90GB database
[23:51:41] <joannac> dammit, he's gone
[23:51:46] <joannac> it works for me btw http://pastebin.com/c6j7A9Ex
[23:52:00] <joannac> cofeineSunshine: no, it won't
[23:52:13] <joannac> but if you resync you'll be past the part of the oplog with the drop indexes