PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Monday the 16th of June, 2014

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:10:37] <serin38> Currently having an issue restoring a db from mongohq to my local system.
[00:10:52] <serin38> Getting the error "assertion: 15936 Creating collection dbName.photos failed. Errmsg: BadValue size has to be a number"
[00:11:10] <serin38> Does this stand out to anyone?
[00:17:08] <baconmania> are db.coll.find({prop: 'val'}) and db.coll.find({prop: { $in: ['val'] }) exactly the same, performance-wise?
[00:30:44] <Jadenn> i'm having issues trying to figure out sorting in mongo... wil it not do alphabetical sort with sort()?
[00:31:03] <Jadenn> because J-Y-C and R-D-N does not look in order to me
[00:39:09] <joannac> Jadenn: more details?
[00:40:12] <Jadenn> i've applied sort to a query, and the results are mixed up
[00:41:41] <joannac> yes, i got that
[00:41:50] <joannac> do you have a reproducible small test case?
[00:42:03] <Jadenn> http://puu.sh/9vnL4.png
[00:42:28] <joannac> umm, what's the query?
[00:42:30] <Jadenn> the top three are admins, and get listed first, but should still be in order presuming the query was in order
[00:43:03] <Jadenn> well THATS embarrasing
[00:43:16] <joannac> let me guess, disaply name != what you sort on?
[00:43:20] <Jadenn> i forgot we don't store online users in mongo anymore
[00:43:34] <Jadenn> forget i said anything :P
[00:43:43] <joannac> already forgotten ;)
[00:43:48] <Jadenn> its out of order because thats the order they joined the server
[00:43:49] <Jadenn> derp derp
[03:00:40] <baconmania> so if I have a collection that has an index on a Number property `val`, for example, then if I do collection.find({ }).sort({ val: 1}).limit(1), what would be the performance characteristics?
[03:01:02] <baconmania> like would mongod sort the entire collection then return the top result?
[03:01:37] <baconmania> or would it be able to return the first result without having to do an O(n) operation
[03:52:37] <joannac> baconmania: if you have an index on val, it should be just one index bucket, plus 1 actual document
[03:52:45] <joannac> you can check with .explain()
[03:53:54] <baconmania> Yep, used explain() and you're indeed correct
[03:53:54] <baconmania> thanks :)
[05:36:54] <MacWinner> what's the difference between 0 and NumberLong(0) ? do you need to use find() differently for them?
[05:37:20] <MacWinner> i have a script that searches for slidecount: 0.. it's missing the documnets where slidecount = NumberLong(0)
[05:53:42] <greybrd> hi guys... I'm getting this exception "the limit must be specified as a number", when aggregating using java driver. any pointers? there is not discussion or something over Internet. please provide some inputs. thanks in advance.
[06:11:18] <joannac> specify the limit as a number?
[06:12:20] <joannac> my guess is you've done something like {"$limit": "5"} (i.e passed a string)
[06:12:36] <joannac> greybrd: ^^
[06:14:10] <greybrd> joannac: gotcha.. yeah. yes. changed to new Integer(5). I guess simple 5 would do. thanks a lot buddy.
[06:14:14] <greybrd> :-D
[06:15:28] <greybrd> it's a good practice to first try aggregation/map-reduce in command line. before writing code in java.. :-D lesson learnt.
[07:56:40] <Viesti> hmm
[08:47:03] <Bish> is there some advanced way of using stored procedures? i would like to get the return value of smth.
[09:00:33] <Bish> wow, i already found out, mongodb is epic
[09:02:35] <Nodex> there is no stored functions or procedures
[09:05:58] <Bish> i don't care about the terminology, everybody knows what i mean
[09:11:44] <kali> oO
[09:13:22] <Bish> is there a better way to pass arguments to a "stored js function" than using MongoDB and Scope?
[09:13:34] <Bish> MongoCode*
[09:17:46] <Bish> ...execute("function(a,b){return test(a,b);}",Array(1,2)));, well this seems to work
[09:19:02] <Bish> is there a more elegant way, because this seems to be a workaround
[09:20:54] <kali> javascript is a workaround to start with
[09:37:24] <spacepluk> hi, is it possible to perform fuzzy searches on collections using something like levenshtein distance?
[09:40:58] <kali> spacepluk: it's not built-in, and it's quite hard to implement efficiently on top of mongodb index scheme
[09:42:44] <spacepluk> kali: any idea where I can look? I'd be happy with any fuzzy search solution even if it's not very efficient.
[09:57:37] <rspijker> spacepluk: sounds like something you would need map-reduce for
[09:58:48] <spacepluk> rspijker: do you mean this http://docs.mongodb.org/manual/core/map-reduce/ ?
[10:01:02] <rspijker> spacepluk: yeh
[10:01:46] <spacepluk> it might work... I assume this is very slow, right?
[10:02:11] <rspijker> spacepluk: you are assuming correctly
[10:02:59] <spacepluk> I'll give it a try, hopefully it'll be good enough
[10:04:40] <rspijker> Ideally you would want something like the aggregation framework, but I highly doubt the operations supported in that will be sufficient to calculate levenshtein distances
[10:05:49] <spacepluk> the goal is some sort of fuzzy search, I'm using levenshtein on my prototype without a db but anything similar will do
[10:06:29] <rspijker> well, aggregation only has very basic expression support
[10:06:44] <rspijker> so you won’t be able to do anything fancy, doubt you can even get string length, for instance
[10:07:32] <rspijker> map-reduce gives you javascript, so you should be able to do whatever in there
[10:07:42] <rspijker> just… very slowly :)
[10:10:25] <spacepluk> thanks that's very helpful :)
[10:11:21] <rspijker> np, good luck :)
[10:11:29] <spacepluk> do you know of any database that has better support for this kind of operations?
[10:14:36] <rspijker> not me personally, some here might though
[10:36:43] <_jns> anyone know how to disable auto creation of databases on "use nonExistingDBName"?
[11:22:14] <MrMambo007> hi all
[11:23:54] <MrMambo007> Are you there Derick?
[11:28:06] <joannac> _jns: um, it doesn't create it until you actually write to it
[11:29:10] <joannac> MrMambo007: i suggest you ask your question so it can be answered, instead of playing pingpong with Derick
[11:32:28] <MrMambo007> I found this blog entry: describing how to hold on to the connection for continous use instead of connecting over and over again. This is what I did... but then I see a lot of other examples where you see this db.close() function called... is closing connections something you would do routinelely on a continously running server handling inserts and finds sporadically based on requests from clients and other events, or is it just for situations where
[11:32:39] <MrMambo007> http://blog.mongolab.com/2013/11/deep-dive-into-connection-pooling/
[11:35:41] <MrMambo007> joannac: Point taken... I pinged him since he was of so much help last friday. But I´ll try to avoid pinging people further down the line.
[11:53:07] <_jns> @joannac: okay... so its only in the "show dbs" list but not actually created?
[12:07:52] <joannac> dammit, he's gone :(
[12:13:00] <joannac> MrMambo007: it's not the pinging, it's pinging without asking a question :)
[12:13:10] <joannac> also, connection pools are a good thing
[12:13:36] <MrMambo007> joannac: Thanks
[12:13:39] <joannac> I would say if you're sure you're done with the connection, then close it. No point keeping them open
[12:16:27] <MrMambo007> It is basically just one server instance connecting to a mongodb. Data will come in at random points in time, maybe quite there are no natural "disconnected" phases. Maybe it is a better approach to timeout the connection if it should go say 10 minutes with no activity and then reconnect the next time someone makes a request or submits data to the server.
[12:53:43] <Viesti> I have a script that imports data with the new Bulk API by doing upserts
[12:54:30] <Viesti> funny that running it in parallel seems give better update/s readings than just running one instance
[12:54:47] <Viesti> just don't understand why
[13:26:50] <michaelchum> I would like to store the results of a db.collection.find({key:value}) to a new collection, is there any simple way to do this?
[13:30:42] <rspijker> michaelchum: easiest is probably the aggregation framework
[13:30:47] <rspijker> $match and $out
[13:37:00] <michaelchum> rspijker: Got it thank you so much!
[13:37:10] <rspijker> np
[14:26:47] <insanidade> The following query does not work as expected. It only evaluates the first statement: Transaction.objects( db.Q(startTime__gte=stime) and db.Q(endTime__lte=etime) )
[14:26:55] <insanidade> I need it to evaluate A AND B
[14:27:03] <insanidade> But it seems to evaluate only A
[14:27:09] <insanidade> (the first statement)
[14:35:11] <cheeser> is the first statement false?
[14:42:14] <insanidade> cheeser: both are true. But looks like only the first statement is evaluated.
[14:42:27] <insanidade> cheeser: it absolutely forgets about the second statement.
[14:44:38] <rspijker> what kind of weird query is that anyway?
[14:44:45] <rspijker> in terms of syntax, that is
[14:46:29] <insanidade> rspijker: I need all transactions whose start time are greater then or equals to stime and end time are lesser or equals to etime.
[14:46:49] <saml> mongodb has no transaction
[14:46:59] <rspijker> insanidade: I get that part, but the syntax is not mongodb syntax…
[14:47:01] <saml> so you ahve two times, stime and etime
[14:47:13] <insanidade> saml: that 'transaction' is in my app context.
[14:47:24] <rspijker> are you using some wrapper around mongodb?
[14:47:24] <saml> if stime < etime always?
[14:47:25] <saml> is
[14:47:42] <saml> can stime and etime be equal?
[14:47:57] <insanidade> rspijker: I'm using mongoengine. their channel is too quiet - looks like everyone is dead.
[14:48:15] <rspijker> what happens if you run the query on mongodb directly?
[14:48:39] <insanidade> saml: technicaly they could, but in most of the scenarios, stime is lower than etime
[14:48:42] <rspijker> I don’t know anything about mongoengine, but the default mongodb query is an and… So if it goes wrong in the translation...
[14:49:04] <saml> and since mongodb doesn't have transaction, there can be cases where a collection has a doc A: {startTime: stime} B: {endTime: etime} where ordering of stime and etime isn't guarenteed
[14:49:29] <saml> especially if multiple clients (with different localtime, not carefully synced) write to collection
[14:49:42] <rspijker> saml: I think you are confusing the term transaction with the simple fact that he has a collection called Transaction…
[14:49:45] <saml> i don't think you implement application level transaction that way
[14:50:12] <saml> based on some timestamp, i mean. two timestamps startTime and endTime
[14:50:24] <saml> maybe try to implement transaction with one timestamp field
[14:50:50] <insanidade> saml: ok. forget the word 'transaction'. let' use the word 'dogs'. I have dogs that are 10 years old and dogs that are 2 years old. what I need is to retrieve all dogs who are older than 3 and younger than 9.
[14:51:12] <saml> insanidade, yup. that's doing query against single field
[14:51:18] <saml> but you're doing against two fields
[14:51:27] <insanidade> saml: on the same document.
[14:51:37] <insanidade> saml: wouldn't that work?
[14:51:43] <saml> then you have a dog, not dogs
[14:52:45] <saml> dogs.find({age:{$lt: A, $gt: B}}) where A > B
[14:52:54] <rspijker> it should be absolutely possible
[14:53:18] <rspijker> in fact, it is
[14:53:38] <rspijker> I suspect this layer you have in between… can you just run the query on mongodb directly insanidade and see if you get results?
[14:53:57] <saml> but i assume you're doing var marker = dogs.find({_id: marker_id}); dogs.find({age:{$gt: marker.startTime, $lt: marker.endTime}})
[14:54:08] <saml> it's not gonna work. i had the same bug last week :P
[14:54:19] <saml> but not sure if you're doing the same thing
[14:54:28] <insanidade> rspijker, saml yes, I'm trying to do it directly on mongodb. please, take a look at what I'm trying to say: http://pastebin.com/bVf8wv1n
[14:55:02] <saml> the buggy code was first finding "marker dog" and do date range query based on values of the marker
[14:55:13] <rspijker> insanidade: so, you only expect the last result?
[14:55:43] <insanidade> rspijker: based on that example, yes. let me give you another one.
[14:57:34] <rspijker> just try this in the mongodb shell, assuming your collection is named Q: db.Q.find({$and:[ {“startTime”:{$gte:10}} , {“endTime”:{$lte:60}} ]})
[14:57:47] <rspijker> the $and should be superfluous, but just in case
[14:58:03] <rspijker> you might have to sort out my IRC quotes btw...
[14:58:34] <rspijker> db.Q.find({$and:[ {"startTime":{$gte:10}} , {"endTime":{$lte:60}} ]})
[14:58:41] <rspijker> that should work though
[14:59:02] <insanidade> take a look at this other sample output: http://pastebin.com/g3zkt7dB
[14:59:50] <McSorley> Hi all, I am parsing a CSV file of aggregations and I want to insert/update documents in Mongo. I was hoping to batch the writes as there are upwards of 1 million documents being created/updated. What is the most efficient way of doing this?
[15:01:33] <romaric> Hi everybody ! We just shard our biggest collection, we were expected to double our insert rate but we do not see any improvement. Is there someone who have an idea of what is wrong with us ? :)
[15:01:34] <rspijker> McSorley: probably bulk insert
[15:01:52] <rspijker> McSorley: that is, create arrays of documents and pass those to insert
[15:02:08] <rspijker> romaric: what are you sharding on?
[15:02:32] <rspijker> if it’s on _id (and you use the default) all writes will always go to the latest chunk, which will always be on a single node, so you have no write scaling
[15:02:46] <insanidade> I don't understand... the following query returns nothing: db.transactions.find({startTime:{$gt:1 } })
[15:03:06] <rspijker> insanidade: and db.transactions.find() ?
[15:03:15] <insanidade> there are lots of documents whre startTime is greater than 1
[15:03:15] <cheeser> is startTime an int or a Date?
[15:03:16] <rspijker> (are you actually on the correct db?)
[15:03:18] <saml> insanidade, you want to get all docs that took 10 seconds or less ?
[15:03:40] <rspijker> if you don;t use the shell very often.. by default it connects to the test db
[15:03:45] <romaric> rspijker, we are sharding on the field _id which is a own created binary id. We observed that data are well shared between our two shard but the insert rate is the same
[15:03:47] <rspijker> so you have to tell it first, to use a different db
[15:04:02] <rspijker> romaric: is this _id field monotonically increasing?
[15:04:12] <insanidade> cheeser, rspijker, saml : I'm in the correct db. By issuing that query, I'd like to search for all transactions whose startTime field is greater than '1'. that field is a float.
[15:04:20] <McSorley> rspijker, I am trying to emulate "ON DUPLICATE KEY" functionality.. how can I choose what fields to update from the insert?
[15:04:28] <rspijker> the data might be very well spread out, that does not mean that NEW documents are also spread out
[15:04:31] <saml> find({startTime: {$gt:1}})
[15:04:48] <saml> find({startTime: {$gt:1}}).sort({startTime:-1})
[15:04:55] <cheeser> you're storing time as a float?
[15:05:02] <rspijker> insanidade: again, does db.transactions.find() return anything?
[15:05:08] <cheeser> pastebin the output of a findOne() on that collection
[15:05:09] <insanidade> rspijker: it does.
[15:05:24] <rspijker> then what cheeser says, paste a document as an example
[15:05:37] <saml> McSorley, have you tried mongoimport ? it might take csv
[15:06:09] <Nodex> prolly stored as a string if it's not returning anything
[15:06:10] <insanidade> rspijker, cheeser saml : jsut found the issue. of course, my mistake. it's not 'transactions'. it's 'transaction'
[15:06:39] <insanidade> now that raw query in mongodb shell returns the values correctly. but the issue remains in my mongoengine layer.
[15:06:42] <rspijker> McSorley: http://docs.mongodb.org/manual/reference/method/db.collection.update/
[15:07:03] <McSorley> saml, I did, that can upsert. I need to do further operations on the data
[15:07:31] <romaric> rspijker, we presplit our chunks on 2 ranges of our shardkey, so we can see data going on the two shards
[15:07:35] <rspijker> insanidade: ok, so it is in that layer. Then I have no idea
[15:08:08] <saml> i massage data. output json. then run mongoimport. but it's okay to write directly
[15:08:10] <rspijker> romaric: that doesn’t really answer my question…
[15:08:11] <McSorley> rspijker, Eh thanks for that. The issue is how do you pass multiple documents to update at once with that interfcae "db.collection.update(query, update, options)"
[15:08:21] <saml> McSorley, you can't
[15:08:22] <romaric> ok
[15:08:30] <romaric> the shardkey is monotically increasing yes
[15:08:35] <insanidade> rspijker, saml, cheeser : I'm storing that value as seconds in a floag field. I could fix that later as this is just a small prototype.
[15:08:42] <saml> unless "update" doc is constant
[15:08:45] <romaric> is a nanotime salted
[15:09:03] <McSorley> saml, no update doc is not constant
[15:09:14] <saml> yah, i do multiple updates
[15:09:38] <rspijker> romaric: then all your writes always go into your last chunk...
[15:09:45] <McSorley> saml, how? mongoimport?
[15:10:00] <rspijker> romaric: because it will be the range : someKey -> max
[15:10:11] <McSorley> http://api.mongodb.org/python/2.7rc1/examples/bulk.html the *new* bulk operations prove to be very slow.
[15:10:17] <rspijker> romaric and every key will be larger than the one before, so all go to that chunk
[15:10:25] <rspijker> hence, the exact same write performance
[15:10:47] <saml> McSorley, in some cases. other cases, for _id in ids: coll.update({_id: _id}, {$set: docs[_id]}, {upsert:false})
[15:10:53] <rspijker> McSorley: can you give us an example of what you are actually trying to do?
[15:11:10] <saml> i didn't know about bulkwriter
[15:12:09] <rspijker> McSorley: There is http://docs.mongodb.org/manual/reference/method/js-bulk/ nowadays, but I’ve never used it
[15:12:47] <saml> McSorley, what are you really concerned about multiple .update()? performance?
[15:12:58] <romaric> humm sorry, its not monotically increasing, we move the last 3 chars (nanos) into the middle of the long, it results into some numbers likely random. This is correct because we observe our data going into our two chunks at the same time(mongostat displaying inserts for the two chunks). The load balancer is disabled
[15:13:32] <saml> some use mongo-connector or custom oplog tailing for production data migration, bulk import.. etc
[15:13:33] <McSorley> rspijker, Basically I have many documents (many as in 1,000000 +) being read from a file, I want to insert them in some cases though they will need to be updated as they already exist
[15:13:49] <saml> like, you write to one db, and tail oplog to propagate the change to other db or collection
[15:13:50] <McSorley> saml, speed
[15:14:30] <saml> for my setup, it takes about 2 seconds to update 1000~ documents
[15:14:54] <saml> i'm not too concerned :P
[15:16:08] <rspijker> saml: well… his 10M would then take 20.000 seconds, which is significant (about 6 hours)
[15:16:15] <McSorley> saml, my issues is I need to "upsert" 1.2 million documents in a matter of seconds :/
[15:16:31] <saml> whoa
[15:16:44] <saml> but why?
[15:17:01] <McSorley> saml, Oh dont even get me started why :-)
[15:17:06] <saml> you get 1.2 million docs as batch from somewhere else?
[15:17:10] <cheeser> everything is a matter of seconds. it's just a question of scale. ;)
[15:17:17] <saml> and need to update your collection with the docs right away?
[15:17:52] <McSorley> saml, yup
[15:17:58] <romaric> rspijker, humm sorry, its not monotically increasing, we move the last 3 chars (nanos) into the middle of the long, it results into some numbers likely random. This is correct because we observe our data going into our two chunks at the same time(mongostat displaying inserts for the two chunks). The load balancer is disabled
[15:18:24] <saml> how big is 1.2 million batch? 10GB?
[15:18:41] <saml> i think it can be written in seconds if size is small
[15:18:43] <rspijker> romaric: hmmmm, okay… what is the performance you are seeing? inserts/second
[15:19:01] <saml> all those small benchmarks insert millions in seconds
[15:19:38] <rspijker> the difficulty won;t be the insert. It will be to determine for each document whether it should be inserted or updated
[15:20:25] <saml> ther is {upsert:true}
[15:20:49] <McSorley> saml, actually I may have over exaggerated a little. Total file size: 25Mb, 698356 documents in total
[15:21:23] <romaric> rspijker, 5k-10k/s
[15:21:27] <McSorley> saml, i forgot I had aggregated the documents in memory already
[15:21:42] <saml> should be fast enough
[15:22:15] <saml> did you try loop and update?
[15:22:20] <McSorley> saml, I need to batch the write
[15:22:43] <saml> i could not find a way for a single .update() call that updates multiple docs
[15:23:09] <McSorley> 600,000 individual writes takes a lot longer that 6 x 100,000 writes.
[15:23:17] <saml> but for my case, bottle neck wasn't .update() but actually collecting docs to update
[15:23:41] <rafaelhbarros> when should I start worrying about the size of a collection?
[15:23:47] <rafaelhbarros> 100m records?
[15:24:18] <saml> McSorley, i haven't tried this http://docs.mongodb.org/manual/reference/method/Bulk/ you said this is slw?
[15:24:29] <McSorley> saml, yes.
[15:24:42] <saml> mongodb is slow as fuck. delete and give up
[15:25:12] <saml> maybe try different writeconcern
[15:25:29] <rspijker> romaric: sure there isn’t another bottleneck? Network between mongos and mongod, or you app and mongos?
[15:25:32] <rafaelhbarros> McSorley: what are you doing? 600k writes on a collection with all fields indexed?
[15:28:09] <McSorley> rafaelhbarros, No. I have a CSV file with "aggregated" data. I basically need to write that data to mongo, but also if the same aggregation appears I need to be able to update the existing aggregation
[15:28:56] <rafaelhbarros> McSorley: does your destiny collection have a bunch of indexes?
[15:29:02] <rafaelhbarros> every write has to update indexes
[15:29:12] <saml> you can provide the csv file? just curious
[15:29:19] <McSorley> rafaelhbarros, no indexes as of yet bar the default _id idx
[15:29:21] <romaric> for sur the app is a bottleneck for the insert speed, but the same app on a non sharded mongod and a sharded mongod should be faster. The team is looking for those bottleneck but I'm looking for something we could have forgotten
[15:29:26] <rafaelhbarros> McSorley: alright, seems ok
[15:29:44] <rafaelhbarros> McSorley: do you time just the insertion of the index on the db, or the time to load the csv file as well?
[15:29:56] <rafaelhbarros> McSorley: which language are you using? can you send us some profile of the ops?
[15:30:19] <rafaelhbarros> I'm sorry, I just got here and I'm trying to gather the data
[15:30:24] <McSorley> rafaelhbarros, i separate these.. i dont care about the file processing, this takes approx 2 secs
[15:30:52] <rafaelhbarros> ok, can you thread? are you doing fire-and-forget writes?
[15:31:03] <rspijker> romaric: well, if the app is the bottleneck, then why would you expect it to be faster on a sharded system? :s
[15:32:12] <McSorley> rafaelhbarros, saml, rspijker: To give some context, I can insert 600,000 documents in 15 secs with a w=1 and 9.4 secs with a w=0. I do this by chunking a list into 6 separate chunks and inserting 100,000 documents at a time
[15:32:40] <McSorley> rafaelhbarros, python if it matters
[15:32:49] <saml> McSorley, using Bulk ?
[15:32:51] <romaric> That's not what I mean, the time measured is only the inserts, that should be faster on sharded system
[15:33:09] <McSorley> saml, using plain coll.insert([...])
[15:33:10] <saml> and Bulk is slower than simple for loop ?
[15:33:13] <rspijker> romaric: “time measured” this is new?
[15:33:17] <romaric> but it acts like if it wasn't threaded
[15:33:17] <saml> oh okay
[15:33:38] <romaric> this is the elapsed time between the insert command start and its end
[15:34:24] <rspijker> “the insert command”, is this a single insert? Or you internal insert command which inserts a qhole bunch of docs, or what?
[15:34:50] <romaric> this is batch inserts
[15:35:17] <rspijker> ok, so you call insert with an array of documents?
[15:35:28] <romaric> yes
[15:36:13] <rspijker> and you;re sure the network is not the bottleneck?
[15:36:22] <saml> McSorley, if you separated input into chunks, maybe try multiprocessing so that chunks are written in parallel :P
[15:37:08] <romaric> I'm asking the network team
[15:37:29] <romaric> maybe we have some test to do on it to be sure
[15:38:18] <rspijker> romaric: were you running into hardware limitation before the sharding? So was your single shard maxing out on IO or CPU or RAM during high load?
[15:38:24] <romaric> Assuming the network is not the bottleneck, do you have any other idea of what could be the issue ?
[15:39:22] <rspijker> not really…
[15:39:47] <romaric> k
[15:40:27] <romaric> I do not think that IO/CPU/RAM was maxing out 'cause the app is doing stuff between batch inserts
[15:41:45] <rspijker> that’s why I asked: during load. If that wasn’t the case, then the bottleneck was somewhere else…
[15:41:53] <rspijker> anyway, I’ve got to go. Good luck :)
[15:41:57] <romaric> thx u
[15:41:58] <romaric> !
[16:59:13] <Kaiju> Hello, I'm looking for a best approach suggestion for reporting with mongo. I have a large number of user documents in a single collection. At the end of each day I want to run a report to show logins, transaction and such. I can stream all the documents out of the database and parse the results in code, but this is pretty inefficient. Is there a better alternative?
[18:09:21] <MacWinner> hi, what's the general process for upgrading a mongo 2-node replica set?
[18:09:55] <MacWinner> i've installed via yum repos.. just yum update on the secondary, restart, then yum update on primary and restart?
[18:14:02] <Hexxeh> If I have a ListField and an array, and want to find all documents where all values in the listfield are also found in the array, how do I specify that filter?
[18:26:04] <saml> Hexxeh, $all
[18:26:33] <saml> .find({sexes:{$all: ['male','female','not-really-male']}})
[18:26:51] <cheeser> aka fauxmale
[18:27:01] <Hexxeh> hmm, I'm using MongoEngine, so I'll have to figure out how that maps across
[18:27:02] <cheeser> you're welcome. feel free to use that one.
[18:27:08] <MacWinner> wow... upgrading mongo was a charm
[18:30:19] <Hexxeh> saml: hmm, doesn't appear to work: required_capabilities__all=req['capabilities']
[18:30:56] <Hexxeh> returns no objects, even though there's definitely a document that matches when that condition isn't applied, and the document has an listfield value identical to the provided array
[18:31:13] <MacWinner> mongo 2.6.2 release notes havent seem to be posted yet
[18:31:22] <saml> Hexxeh, give me example documents?
[18:31:22] <Hexxeh> it doesn't even match documents where the field is empty, which it should
[18:31:32] <saml> and expected result
[18:33:01] <Hexxeh> saml: http://cl.ly/W68r Job.objects.filter(required_capabilities__all=req['capabilities']), req['capabilities'] contains 'foo', I expect that document to be returned
[18:33:51] <saml> that probably means you're querying for all documents that has 'capabilities'
[18:34:08] <saml> i mean, docs like {required_capabilities: ['capabilities']}
[18:34:46] <saml> Hexxeh, you want all docs that has required_capabilities property/field ?
[18:34:52] <saml> then it's $exist
[18:35:15] <Hexxeh> saml: I want to do it the other way around. Documents represent tasks that workers can do, but tasks require certain capabilities. I'm making a request with a given list of capabilities and I want to retrieve tasks that can be completed given those.
[18:35:17] <saml> i think $exist does full scan, though
[18:35:39] <elm> how can I do an $elemMatch on a plain array item [1,2,3] rather than an array of records?
[18:35:42] <saml> ah. try $in?
[18:36:01] <saml> docs.mongodb.org/manual/reference/operator/query/in/
[18:36:17] <saml> elm, {arr: 1} ?
[18:36:42] <saml> that'll return {arr:[1,2]}, {arr:[2,1,3]}, ...
[18:36:49] <Hexxeh> saml: Looks like that just finds any match, I need to match all capabilities in the document
[18:36:55] <joshua> $in won't match all though right, just any one in the list
[18:37:03] <saml> then $all
[18:37:12] <saml> i thought you wanted "any" match
[18:38:10] <elm> I just wanna have 1 in mycolumn where mycolumn is [1,2,3]
[18:38:16] <elm> i.e. a reverse $in
[18:38:27] <saml> elm, {mycolumn:1}
[18:39:01] <elm> but mycolumn is an array rather than a field an it should not match 1 but [1,2,3] or [1,7,8]
[18:39:31] <elm> but mycolumn is an array rather than a field and it should match 1 but [1,2,3] or [1,7,8]
[18:40:00] <Hexxeh> saml: _all also does not find it
[18:40:30] <elm> saml: what is all ? there is no $all operator and all would indicate a column identifier
[18:40:33] <Hexxeh> I'm not sure a query modifier actually exists for what I'm trying to do
[18:40:57] <saml> elm, {mycolumn:1}
[18:41:10] <saml> that means mycolumn array contains element 1
[18:41:51] <elm> ah; ! I had a typo; thanks a lot!
[18:44:21] <Hexxeh> saml: it seems what I need is the inverse of the 'all' filter
[18:44:27] <Hexxeh> but i don't think such a thing exists
[18:44:52] <saml> not?
[18:45:11] <Hexxeh> as in, all items in the listfield must also exist in the provided array
[18:45:21] <Hexxeh> as opposed to all items in the array must also exist in the listfield
[18:50:11] <saml> so you're finding subsets
[18:50:56] <Hexxeh> I guess
[18:51:13] <saml> i bet it's hard
[18:51:17] <saml> unless you do full scan
[18:51:46] <saml> give me all documents tagged with any subset of ['a', 'b', 'c']
[18:51:56] <saml> Hexxeh, how big is provided array?
[18:52:03] <saml> how many elements are there?
[18:52:14] <Hexxeh> probably 10 items maximum, usually 2-3
[18:52:41] <saml> yah i can't think of such query
[18:53:49] <Hexxeh> i guess i have to retrieve the list without the capabilities filter then find ones with matching capabilities in Python
[19:03:10] <elm> one last question: what to do if I do not need { $size: 2 } but something like { $size : { $gt: 2 } }
[19:15:13] <Kaiju> It would be so cool is mongo supported a complex set of subqueries like sql does. You could sub select on document elements and output some new derived element.
[19:15:35] <cheeser> like the aggregation framework?
[19:16:11] <Kaiju> aggregation framework?
[19:16:37] <Kaiju> well thismight be the answer for a question I had
[19:18:09] <Kaiju> I have 5MM user documents and add events to the documents. At the end of the day I want to get metrics on usage. Pulling and parsing all the docs seems horrible inefficient.
[19:31:22] <halfamind1> Hi. We upgraded from v2.4.10 to 2.6.1 recently, and have been seeing large memory spikes that cause mongod to be terminated. Has anyone run into this?
[20:16:12] <gswallow> Hi,
[20:16:22] <gswallow> I am an MMS customer — or my company is.
[20:16:30] <gswallow> I have two factor auth set up and I switched phones.
[20:16:42] <gswallow> Is there a way I can set up two factor on my new phone?
[20:25:49] <joshua> gswallow: are you still logged into your session anywhere?
[20:28:01] <joshua> I guess you still need your old athentication to change it even if you are logged in, so nevermind
[20:29:09] <joshua> Now I am thinking I should add the recovery code to my own account
[20:46:14] <saml> i have documents with {score:223} property
[20:46:23] <saml> how can I get distribution of those?
[20:46:51] <saml> given score x, I want to say x is at 12% of all scores
[21:02:24] <daidoji> saml: MapReduce? Agg pipeline?
[21:02:47] <saml> is there math?
[21:02:53] <saml> that calculates percentile
[21:03:31] <saml> or different db?
[21:07:48] <thevdude> Hey guys, is there an easy way to remove all documents EXCEPT a specific _id?
[21:08:23] <thevdude> oh, found it on the docs with $not
[21:09:21] <thevdude> basically db.coll.remove({_id: {$not: "id_to_keep"}}) should work, will test it with a find to make sure it doesn't have the one I want to keep
[21:10:16] <thevdude> err, $ne
[21:21:09] <saml> coll.find({score:{$lt:X}).count()/coll.count() * 100 is roughly percentile right?
[21:22:58] <og01_> isnt it (100 / coll.count()) * coll.find({score:{$lt:X}}).count()
[21:23:02] <og01_> saml: ?
[21:23:23] <og01_> or am i failing to math
[21:25:05] <saml> og01_, same thing roughly, minus floating point math
[21:25:30] <saml> (a/b)*c = ac/b = (c/b)a
[21:26:12] <saml> but actual definition of percentile is 100(i-0.5)/n
[21:26:29] <saml> where i is 1-based index
[21:27:05] <saml> so i is coll.find({score:{$lt:X}}).count() + 1
[21:27:16] <saml> n = coll.count()
[21:27:57] <saml> coll.find({score:{$lt:X}}).count()/coll.count() is good enough, i think
[21:28:25] <saml> (coll.find({score:{$lt:X}}).count() + 1 - 0.5)/coll.count()
[21:31:09] <og01_> saml: your math foo is stroner than mine, I'll look into the 100(i-0.5)/n calc though, looks interesting
[21:31:27] <cozby> Hi, I just added a new member to my replica set to test observe how initialSync works and I _think_ it work but the data size on my new replica instance is less than all the other replica sets
[21:31:39] <cozby> its 15GB while all my other replica set members are 23GB
[21:31:48] <saml> og01_, it's the same thing i think. not sure where -0.5 came from
[21:31:59] <cozby> any idea why or a way I can test?
[21:33:57] <cozby> would the optime be a good indicator of sync status?
[21:34:12] <og01_> saml: oh i see percentile $ne percent
[22:08:57] <joannac> cozby: yes, optime
[22:09:26] <joannac> cozby: if you had any activity (deletes/updates/writes) your original instances probably have some unused space in them
[22:09:32] <halfamind1> Hi folks. How can I install v2.4.10 via yum? I get this message:
[22:09:32] <halfamind1> `Package mongo-10gen is obsoleted by mongodb-org, trying to install mongodb-org-2.6.2-1.x86_64 instead`
[22:09:42] <joannac> cozby: whereas your new initial sync'd set is nice and compact
[22:13:24] <joannac> halfamind1: can you specify the exact version?
[22:14:01] <halfamind1> joannac: That output was the result of my specifying the exact version. Full command was: `yum -d0 -e0 -y --nogpgcheck install mongo-10gen-2.4.10-mongodb_1`
[22:18:23] <joannac> try yum install mongo-10gen-2.4.10 and pin the package ?
[22:18:27] <joannac> http://docs.mongodb.org/v2.4/tutorial/install-mongodb-on-red-hat-centos-or-fedora-linux/#manage-installed-versions
[22:23:57] <cipher__> When I try to start my router, I receive the following error: "child process failed, exited with error number 1"
[22:26:03] <cipher__> log file: http://pastebin.com/pqFxXrY1
[22:28:39] <cipher__> oh, it says i need to run it with --upgrade
[22:34:26] <cipher__> where do i add this flag?
[23:04:30] <joannac> cipher__: ...when you start it?
[23:06:37] <cipher__> joannac: when i start my query router
[23:07:05] <joannac> yes
[23:07:16] <cipher__> joannac: it seems the query router is running 2.6.2 and the shards/config servers are 2.6.1
[23:07:42] <joannac> okay?
[23:08:03] <joannac> did you just upgrade or something?
[23:08:35] <cipher__> joannac: someone broke the router server, so i've just reinstalled the OS and such for it
[23:09:29] <joannac> That doesn't explain why you need to run --upgrade
[23:11:17] <cipher__> joannac: http://pastebin.com/1bsVucdM
[23:13:29] <cipher__> joannac: the last line suggests i run with --upgrade
[23:14:30] <joannac> yes
[23:14:51] <joannac> that looks like you've upgraded your binaries to 2.6 without running the metadata upgrade on the config servers yet
[23:15:00] <joannac> which is what worries me
[23:15:17] <cipher__> joannac: As far as I know the config servers have never been upgraded
[23:15:42] <joannac> hmmm
[23:16:22] <cipher__> actually, they have have been updated from 2.4.8
[23:16:59] <cipher__> may have*
[23:17:23] <joannac> i can guarantee they were upgraded from a 2.4 branch
[23:17:29] <cipher__> okay
[23:17:35] <joannac> since the metadata version is 4
[23:17:49] <cipher__> yeah, so i need to go complete the upgrade process then?
[23:17:58] <joannac> http://docs.mongodb.org/master/release-notes/2.6-upgrade/#upgrade-sharded-clusters
[23:18:17] <cipher__> thank you joannac, do you suspect it should run properly then?
[23:18:20] <joannac> although you've already upgraded all your processes and I presume been trying to run with them
[23:18:34] <joannac> So I'm not sure what state your cluster is in
[23:18:54] <cipher__> I could just reinstall the entire cluster
[23:19:22] <joannac> the issue is not the binaries, it's the data
[23:19:28] <joannac> specifically config server data
[23:19:37] <cipher__> okay
[23:27:37] <cipher__> joannac: on the query router, i disabled the balancer (verified the state), and ran mongos --configdb <config servers> --upgrade. "Error: balancer must be stopped..."{
[23:29:02] <cipher__> oh oops
[23:37:59] <cipher__> joannac: thanks, fixed it
[23:46:47] <cipher__> So mongos started without a hitch. yet when i attempt to run db.runCommand( {addshard: "rs_a/server1:27018" }); i'm thrown an unknown command error
[23:59:13] <joannac> addShard
[23:59:16] <joannac> camelk case
[23:59:33] <cheeser> i read that as addShark and got all excited about 2.6.2