[03:00:40] <baconmania> so if I have a collection that has an index on a Number property `val`, for example, then if I do collection.find({ }).sort({ val: 1}).limit(1), what would be the performance characteristics?
[03:01:02] <baconmania> like would mongod sort the entire collection then return the top result?
[03:01:37] <baconmania> or would it be able to return the first result without having to do an O(n) operation
[03:52:37] <joannac> baconmania: if you have an index on val, it should be just one index bucket, plus 1 actual document
[03:52:45] <joannac> you can check with .explain()
[03:53:54] <baconmania> Yep, used explain() and you're indeed correct
[05:36:54] <MacWinner> what's the difference between 0 and NumberLong(0) ? do you need to use find() differently for them?
[05:37:20] <MacWinner> i have a script that searches for slidecount: 0.. it's missing the documnets where slidecount = NumberLong(0)
[05:53:42] <greybrd> hi guys... I'm getting this exception "the limit must be specified as a number", when aggregating using java driver. any pointers? there is not discussion or something over Internet. please provide some inputs. thanks in advance.
[06:11:18] <joannac> specify the limit as a number?
[06:12:20] <joannac> my guess is you've done something like {"$limit": "5"} (i.e passed a string)
[10:01:46] <spacepluk> it might work... I assume this is very slow, right?
[10:02:11] <rspijker> spacepluk: you are assuming correctly
[10:02:59] <spacepluk> I'll give it a try, hopefully it'll be good enough
[10:04:40] <rspijker> Ideally you would want something like the aggregation framework, but I highly doubt the operations supported in that will be sufficient to calculate levenshtein distances
[10:05:49] <spacepluk> the goal is some sort of fuzzy search, I'm using levenshtein on my prototype without a db but anything similar will do
[10:06:29] <rspijker> well, aggregation only has very basic expression support
[10:06:44] <rspijker> so you won’t be able to do anything fancy, doubt you can even get string length, for instance
[10:07:32] <rspijker> map-reduce gives you javascript, so you should be able to do whatever in there
[11:28:06] <joannac> _jns: um, it doesn't create it until you actually write to it
[11:29:10] <joannac> MrMambo007: i suggest you ask your question so it can be answered, instead of playing pingpong with Derick
[11:32:28] <MrMambo007> I found this blog entry: describing how to hold on to the connection for continous use instead of connecting over and over again. This is what I did... but then I see a lot of other examples where you see this db.close() function called... is closing connections something you would do routinelely on a continously running server handling inserts and finds sporadically based on requests from clients and other events, or is it just for situations where
[11:35:41] <MrMambo007> joannac: Point taken... I pinged him since he was of so much help last friday. But I´ll try to avoid pinging people further down the line.
[11:53:07] <_jns> @joannac: okay... so its only in the "show dbs" list but not actually created?
[12:13:39] <joannac> I would say if you're sure you're done with the connection, then close it. No point keeping them open
[12:16:27] <MrMambo007> It is basically just one server instance connecting to a mongodb. Data will come in at random points in time, maybe quite there are no natural "disconnected" phases. Maybe it is a better approach to timeout the connection if it should go say 10 minutes with no activity and then reconnect the next time someone makes a request or submits data to the server.
[12:53:43] <Viesti> I have a script that imports data with the new Bulk API by doing upserts
[12:54:30] <Viesti> funny that running it in parallel seems give better update/s readings than just running one instance
[13:26:50] <michaelchum> I would like to store the results of a db.collection.find({key:value}) to a new collection, is there any simple way to do this?
[13:30:42] <rspijker> michaelchum: easiest is probably the aggregation framework
[14:26:47] <insanidade> The following query does not work as expected. It only evaluates the first statement: Transaction.objects( db.Q(startTime__gte=stime) and db.Q(endTime__lte=etime) )
[14:26:55] <insanidade> I need it to evaluate A AND B
[14:27:03] <insanidade> But it seems to evaluate only A
[14:46:29] <insanidade> rspijker: I need all transactions whose start time are greater then or equals to stime and end time are lesser or equals to etime.
[14:47:57] <insanidade> rspijker: I'm using mongoengine. their channel is too quiet - looks like everyone is dead.
[14:48:15] <rspijker> what happens if you run the query on mongodb directly?
[14:48:39] <insanidade> saml: technicaly they could, but in most of the scenarios, stime is lower than etime
[14:48:42] <rspijker> I don’t know anything about mongoengine, but the default mongodb query is an and… So if it goes wrong in the translation...
[14:49:04] <saml> and since mongodb doesn't have transaction, there can be cases where a collection has a doc A: {startTime: stime} B: {endTime: etime} where ordering of stime and etime isn't guarenteed
[14:49:29] <saml> especially if multiple clients (with different localtime, not carefully synced) write to collection
[14:49:42] <rspijker> saml: I think you are confusing the term transaction with the simple fact that he has a collection called Transaction…
[14:49:45] <saml> i don't think you implement application level transaction that way
[14:50:12] <saml> based on some timestamp, i mean. two timestamps startTime and endTime
[14:50:24] <saml> maybe try to implement transaction with one timestamp field
[14:50:50] <insanidade> saml: ok. forget the word 'transaction'. let' use the word 'dogs'. I have dogs that are 10 years old and dogs that are 2 years old. what I need is to retrieve all dogs who are older than 3 and younger than 9.
[14:51:12] <saml> insanidade, yup. that's doing query against single field
[14:51:18] <saml> but you're doing against two fields
[14:51:27] <insanidade> saml: on the same document.
[14:53:38] <rspijker> I suspect this layer you have in between… can you just run the query on mongodb directly insanidade and see if you get results?
[14:53:57] <saml> but i assume you're doing var marker = dogs.find({_id: marker_id}); dogs.find({age:{$gt: marker.startTime, $lt: marker.endTime}})
[14:54:08] <saml> it's not gonna work. i had the same bug last week :P
[14:54:19] <saml> but not sure if you're doing the same thing
[14:54:28] <insanidade> rspijker, saml yes, I'm trying to do it directly on mongodb. please, take a look at what I'm trying to say: http://pastebin.com/bVf8wv1n
[14:55:02] <saml> the buggy code was first finding "marker dog" and do date range query based on values of the marker
[14:55:13] <rspijker> insanidade: so, you only expect the last result?
[14:55:43] <insanidade> rspijker: based on that example, yes. let me give you another one.
[14:57:34] <rspijker> just try this in the mongodb shell, assuming your collection is named Q: db.Q.find({$and:[ {“startTime”:{$gte:10}} , {“endTime”:{$lte:60}} ]})
[14:57:47] <rspijker> the $and should be superfluous, but just in case
[14:58:03] <rspijker> you might have to sort out my IRC quotes btw...
[14:59:02] <insanidade> take a look at this other sample output: http://pastebin.com/g3zkt7dB
[14:59:50] <McSorley> Hi all, I am parsing a CSV file of aggregations and I want to insert/update documents in Mongo. I was hoping to batch the writes as there are upwards of 1 million documents being created/updated. What is the most efficient way of doing this?
[15:01:33] <romaric> Hi everybody ! We just shard our biggest collection, we were expected to double our insert rate but we do not see any improvement. Is there someone who have an idea of what is wrong with us ? :)
[15:01:52] <rspijker> McSorley: that is, create arrays of documents and pass those to insert
[15:02:08] <rspijker> romaric: what are you sharding on?
[15:02:32] <rspijker> if it’s on _id (and you use the default) all writes will always go to the latest chunk, which will always be on a single node, so you have no write scaling
[15:02:46] <insanidade> I don't understand... the following query returns nothing: db.transactions.find({startTime:{$gt:1 } })
[15:03:06] <rspijker> insanidade: and db.transactions.find() ?
[15:03:15] <insanidade> there are lots of documents whre startTime is greater than 1
[15:03:15] <cheeser> is startTime an int or a Date?
[15:03:16] <rspijker> (are you actually on the correct db?)
[15:03:18] <saml> insanidade, you want to get all docs that took 10 seconds or less ?
[15:03:40] <rspijker> if you don;t use the shell very often.. by default it connects to the test db
[15:03:45] <romaric> rspijker, we are sharding on the field _id which is a own created binary id. We observed that data are well shared between our two shard but the insert rate is the same
[15:03:47] <rspijker> so you have to tell it first, to use a different db
[15:04:02] <rspijker> romaric: is this _id field monotonically increasing?
[15:04:12] <insanidade> cheeser, rspijker, saml : I'm in the correct db. By issuing that query, I'd like to search for all transactions whose startTime field is greater than '1'. that field is a float.
[15:04:20] <McSorley> rspijker, I am trying to emulate "ON DUPLICATE KEY" functionality.. how can I choose what fields to update from the insert?
[15:04:28] <rspijker> the data might be very well spread out, that does not mean that NEW documents are also spread out
[15:07:03] <McSorley> saml, I did, that can upsert. I need to do further operations on the data
[15:07:31] <romaric> rspijker, we presplit our chunks on 2 ranges of our shardkey, so we can see data going on the two shards
[15:07:35] <rspijker> insanidade: ok, so it is in that layer. Then I have no idea
[15:08:08] <saml> i massage data. output json. then run mongoimport. but it's okay to write directly
[15:08:10] <rspijker> romaric: that doesn’t really answer my question…
[15:08:11] <McSorley> rspijker, Eh thanks for that. The issue is how do you pass multiple documents to update at once with that interfcae "db.collection.update(query, update, options)"
[15:08:30] <romaric> the shardkey is monotically increasing yes
[15:08:35] <insanidade> rspijker, saml, cheeser : I'm storing that value as seconds in a floag field. I could fix that later as this is just a small prototype.
[15:12:09] <rspijker> McSorley: There is http://docs.mongodb.org/manual/reference/method/js-bulk/ nowadays, but I’ve never used it
[15:12:47] <saml> McSorley, what are you really concerned about multiple .update()? performance?
[15:12:58] <romaric> humm sorry, its not monotically increasing, we move the last 3 chars (nanos) into the middle of the long, it results into some numbers likely random. This is correct because we observe our data going into our two chunks at the same time(mongostat displaying inserts for the two chunks). The load balancer is disabled
[15:13:32] <saml> some use mongo-connector or custom oplog tailing for production data migration, bulk import.. etc
[15:13:33] <McSorley> rspijker, Basically I have many documents (many as in 1,000000 +) being read from a file, I want to insert them in some cases though they will need to be updated as they already exist
[15:13:49] <saml> like, you write to one db, and tail oplog to propagate the change to other db or collection
[15:17:58] <romaric> rspijker, humm sorry, its not monotically increasing, we move the last 3 chars (nanos) into the middle of the long, it results into some numbers likely random. This is correct because we observe our data going into our two chunks at the same time(mongostat displaying inserts for the two chunks). The load balancer is disabled
[15:18:24] <saml> how big is 1.2 million batch? 10GB?
[15:18:41] <saml> i think it can be written in seconds if size is small
[15:18:43] <rspijker> romaric: hmmmm, okay… what is the performance you are seeing? inserts/second
[15:19:01] <saml> all those small benchmarks insert millions in seconds
[15:19:38] <rspijker> the difficulty won;t be the insert. It will be to determine for each document whether it should be inserted or updated
[15:24:42] <saml> mongodb is slow as fuck. delete and give up
[15:25:12] <saml> maybe try different writeconcern
[15:25:29] <rspijker> romaric: sure there isn’t another bottleneck? Network between mongos and mongod, or you app and mongos?
[15:25:32] <rafaelhbarros> McSorley: what are you doing? 600k writes on a collection with all fields indexed?
[15:28:09] <McSorley> rafaelhbarros, No. I have a CSV file with "aggregated" data. I basically need to write that data to mongo, but also if the same aggregation appears I need to be able to update the existing aggregation
[15:28:56] <rafaelhbarros> McSorley: does your destiny collection have a bunch of indexes?
[15:29:02] <rafaelhbarros> every write has to update indexes
[15:29:12] <saml> you can provide the csv file? just curious
[15:29:19] <McSorley> rafaelhbarros, no indexes as of yet bar the default _id idx
[15:29:21] <romaric> for sur the app is a bottleneck for the insert speed, but the same app on a non sharded mongod and a sharded mongod should be faster. The team is looking for those bottleneck but I'm looking for something we could have forgotten
[15:29:26] <rafaelhbarros> McSorley: alright, seems ok
[15:29:44] <rafaelhbarros> McSorley: do you time just the insertion of the index on the db, or the time to load the csv file as well?
[15:29:56] <rafaelhbarros> McSorley: which language are you using? can you send us some profile of the ops?
[15:30:19] <rafaelhbarros> I'm sorry, I just got here and I'm trying to gather the data
[15:30:24] <McSorley> rafaelhbarros, i separate these.. i dont care about the file processing, this takes approx 2 secs
[15:30:52] <rafaelhbarros> ok, can you thread? are you doing fire-and-forget writes?
[15:31:03] <rspijker> romaric: well, if the app is the bottleneck, then why would you expect it to be faster on a sharded system? :s
[15:32:12] <McSorley> rafaelhbarros, saml, rspijker: To give some context, I can insert 600,000 documents in 15 secs with a w=1 and 9.4 secs with a w=0. I do this by chunking a list into 6 separate chunks and inserting 100,000 documents at a time
[15:32:40] <McSorley> rafaelhbarros, python if it matters
[15:37:29] <romaric> maybe we have some test to do on it to be sure
[15:38:18] <rspijker> romaric: were you running into hardware limitation before the sharding? So was your single shard maxing out on IO or CPU or RAM during high load?
[15:38:24] <romaric> Assuming the network is not the bottleneck, do you have any other idea of what could be the issue ?
[16:59:13] <Kaiju> Hello, I'm looking for a best approach suggestion for reporting with mongo. I have a large number of user documents in a single collection. At the end of each day I want to run a report to show logins, transaction and such. I can stream all the documents out of the database and parse the results in code, but this is pretty inefficient. Is there a better alternative?
[18:09:21] <MacWinner> hi, what's the general process for upgrading a mongo 2-node replica set?
[18:09:55] <MacWinner> i've installed via yum repos.. just yum update on the secondary, restart, then yum update on primary and restart?
[18:14:02] <Hexxeh> If I have a ListField and an array, and want to find all documents where all values in the listfield are also found in the array, how do I specify that filter?
[18:27:01] <Hexxeh> hmm, I'm using MongoEngine, so I'll have to figure out how that maps across
[18:27:02] <cheeser> you're welcome. feel free to use that one.
[18:27:08] <MacWinner> wow... upgrading mongo was a charm
[18:30:19] <Hexxeh> saml: hmm, doesn't appear to work: required_capabilities__all=req['capabilities']
[18:30:56] <Hexxeh> returns no objects, even though there's definitely a document that matches when that condition isn't applied, and the document has an listfield value identical to the provided array
[18:31:13] <MacWinner> mongo 2.6.2 release notes havent seem to be posted yet
[18:31:22] <saml> Hexxeh, give me example documents?
[18:31:22] <Hexxeh> it doesn't even match documents where the field is empty, which it should
[18:33:01] <Hexxeh> saml: http://cl.ly/W68r Job.objects.filter(required_capabilities__all=req['capabilities']), req['capabilities'] contains 'foo', I expect that document to be returned
[18:33:51] <saml> that probably means you're querying for all documents that has 'capabilities'
[18:34:08] <saml> i mean, docs like {required_capabilities: ['capabilities']}
[18:34:46] <saml> Hexxeh, you want all docs that has required_capabilities property/field ?
[18:35:15] <Hexxeh> saml: I want to do it the other way around. Documents represent tasks that workers can do, but tasks require certain capabilities. I'm making a request with a given list of capabilities and I want to retrieve tasks that can be completed given those.
[18:35:17] <saml> i think $exist does full scan, though
[18:35:39] <elm> how can I do an $elemMatch on a plain array item [1,2,3] rather than an array of records?
[18:53:49] <Hexxeh> i guess i have to retrieve the list without the capabilities filter then find ones with matching capabilities in Python
[19:03:10] <elm> one last question: what to do if I do not need { $size: 2 } but something like { $size : { $gt: 2 } }
[19:15:13] <Kaiju> It would be so cool is mongo supported a complex set of subqueries like sql does. You could sub select on document elements and output some new derived element.
[19:15:35] <cheeser> like the aggregation framework?
[19:16:37] <Kaiju> well thismight be the answer for a question I had
[19:18:09] <Kaiju> I have 5MM user documents and add events to the documents. At the end of the day I want to get metrics on usage. Pulling and parsing all the docs seems horrible inefficient.
[19:31:22] <halfamind1> Hi. We upgraded from v2.4.10 to 2.6.1 recently, and have been seeing large memory spikes that cause mongod to be terminated. Has anyone run into this?
[21:07:48] <thevdude> Hey guys, is there an easy way to remove all documents EXCEPT a specific _id?
[21:08:23] <thevdude> oh, found it on the docs with $not
[21:09:21] <thevdude> basically db.coll.remove({_id: {$not: "id_to_keep"}}) should work, will test it with a find to make sure it doesn't have the one I want to keep
[21:31:09] <og01_> saml: your math foo is stroner than mine, I'll look into the 100(i-0.5)/n calc though, looks interesting
[21:31:27] <cozby> Hi, I just added a new member to my replica set to test observe how initialSync works and I _think_ it work but the data size on my new replica instance is less than all the other replica sets
[21:31:39] <cozby> its 15GB while all my other replica set members are 23GB
[21:31:48] <saml> og01_, it's the same thing i think. not sure where -0.5 came from
[21:31:59] <cozby> any idea why or a way I can test?
[21:33:57] <cozby> would the optime be a good indicator of sync status?
[21:34:12] <og01_> saml: oh i see percentile $ne percent
[22:09:26] <joannac> cozby: if you had any activity (deletes/updates/writes) your original instances probably have some unused space in them
[22:09:32] <halfamind1> Hi folks. How can I install v2.4.10 via yum? I get this message:
[22:09:32] <halfamind1> `Package mongo-10gen is obsoleted by mongodb-org, trying to install mongodb-org-2.6.2-1.x86_64 instead`
[22:09:42] <joannac> cozby: whereas your new initial sync'd set is nice and compact
[22:13:24] <joannac> halfamind1: can you specify the exact version?
[22:14:01] <halfamind1> joannac: That output was the result of my specifying the exact version. Full command was: `yum -d0 -e0 -y --nogpgcheck install mongo-10gen-2.4.10-mongodb_1`
[22:18:23] <joannac> try yum install mongo-10gen-2.4.10 and pin the package ?
[23:27:37] <cipher__> joannac: on the query router, i disabled the balancer (verified the state), and ran mongos --configdb <config servers> --upgrade. "Error: balancer must be stopped..."{
[23:46:47] <cipher__> So mongos started without a hitch. yet when i attempt to run db.runCommand( {addshard: "rs_a/server1:27018" }); i'm thrown an unknown command error