[00:41:15] <crudson> oh i saw that as an array, not a hash. you should probably avoid using variable values as hash keys as they become hard to query against and manage. Also is there a reason visits is a hash rather than an array? It's much easier to have [ {id:'134300236111rcbmbmvv','country_name:'Europe'},{}]
[00:43:39] <crudson> krz: but the document structure is my opinion. In my experience using { key: '12345', value: 'abcde' } is much preferable to { '12345':'abcde'}
[00:44:20] <krz> crudson: isn't that how i have it?
[00:44:27] <krz> are you referring to the visits structure?
[00:45:27] <crudson> krz: no, your visits are a hash keyed on some 20 character id
[00:47:06] <crudson> krz: personally I'd address the document structure to facilitate querying, but others may have other suggestions. What was the ruby question?
[00:47:57] <krz> crudson yea so based on this structure. how am i able to extract only the visits with "minute" greater than 12
[00:48:29] <crudson> krz: well unwind is for arrays so that is out
[00:50:12] <crudson> krz: that's the only native operation in aggregation to split a single document's attribute values
[00:51:13] <krz> title of slide is "Pre-Aggregation"
[01:00:26] <krz> crudson any idea? second, https://gist.github.com/3173689 is returning []
[01:10:53] <crudson> yes, but their keys are static except for some related to hours and minutes, which they are handling as a special case. You are asking to get a list of document fragments (i.e. particular {key,value}s), and on brief examination your document format is not going to make this easy with aggregation. You could map reduce, emitting id=>document if minute>12 and have a null reducer.
[01:12:18] <krz> crudson: the structure of hourly and minute follow the same structure of visits
[01:13:32] <krz> except that the visits is one level deeper
[01:14:11] <crudson> krz: use map reduce then like they are doing rather than aggregate
[01:14:25] <krz> can't. not suitable for "real-time stats"
[01:15:43] <krz> such reasons are also highlighted at this webinar: http://www.10gen.com/events/new-aggregation-framework
[01:19:29] <crudson> krz: I have no further advice other than to rethink your document design if you want to use the query type you suggest, or you just filter the nested documents on the client side.
[01:20:11] <krz> is there a reason why aggregate's unwind does not work specifically with hashes?
[01:21:09] <crudson> krz: it's an array operation by design
[01:21:59] <krz> crudson: lets take it a step back. whats wrong with this then https://gist.github.com/3173689
[01:29:15] <crudson> krz: I loaded your document into a collection and ran that command fine. check your db and collection names
[01:32:32] <crudson> krz: albeit with not a mix of hash syntaxes, probably best to stick with one
[01:40:13] <krz> crudson: you mean you tried it with an array structure?
[01:40:55] <crudson> I ran the following and got a single document in 'result' db.command(:aggregate => 'uw', :pipeline => [{'$match' => {:_id => '20120723/foobar/song/custom-cred'}}])
[01:57:05] <krz> crudson: ill consider changing the structure to an array instead. just to understand this further, can you show me more or less how the visits should look like in array form?
[02:32:14] <crudson> krz: note whether to use your own _id or not is entirely up to you, providing they are unique
[03:09:11] <krz> crudson: i mean, whats the mongo ruby method to aggregate based on that structure
[03:42:53] <clu3> Suppose users.tags = [ [id : 1, name : tag1], [ id: 2, name :tag2] , ...] how do i remove the tag with id=2 out of users.tags array? Any help really appreciated
[03:46:44] <clu3> it looks like it's not possible to do that?
[03:57:06] <krz> crudson: seems to work now with https://gist.github.com/3173629
[03:57:32] <krz> i am now seeing something in results
[03:57:47] <krz> any idea how i can return only the results with minute greater than 12?
[04:25:45] <krz> how does one use $gt in an aggregator method?
[04:50:44] <krz> crudson: i put the issue here on more detail with code for a better view http://stackoverflow.com/questions/11641358/how-do-i-execute-this-query-using-the-aggregation-framework-in-mongodb have a look at it when you have the time
[05:08:38] <crudson> krz: I wrote a number of times how to do that with your data model. check previous pastes and chat.
[05:09:35] <krz> crudson: we only spoke about document structure. nothing specific into filtering the results
[05:15:38] <crudson> krz: I pasted this link 3hrs ago http://pastebin.com/tzdyVFeS
[07:17:36] <vak> mongostat shows "locked %" 2.71e+03 in 4-core server, how come? http://pastebin.com/h7zJBF3j
[08:08:30] <kali> dstorrs: if you're still struggling with your query, there are a few quite simple document schema alteration you can make to make the indexing work better: for instance, maintain a "has_pages" boolean (<=> np>0), a "pending" boolean (<=> action != "FINISHED")
[08:09:14] <kali> dstorrs: then, make two separate queries: one for "relocking" and one for new locks to discard the $or
[08:09:57] <kali> dstorrs: and finally you need the lock_until criteria at the end of the index definition (the "range" parameter is only index-compatible in last position)
[08:17:08] <dstorrs> if there are no indices, the workers will all grab at the top, then each will have to walk past all the locked docs to find an unlocked one.
[08:19:02] <kali> dstorrs: yes. and i don't know when the write lock is actually taken: just at the time of the actual update, or for the scanning too
[08:21:23] <kali> dstorrs: but i'm happy to see i'm not the only crazy person to use mongodb for concurrency state :)
[08:21:38] <dstorrs> what's your specific application?
[08:24:02] <kali> dstorrs: like... everything. mongo-resque is the most important in terms of transaction, but we also have code for ad-hoc queues, tensions for multidoc updates, mutual exclusion like you're doing
[08:24:52] <kali> dstorrs: we used redis before, and loved it, but the lack of built-in failover was an operation nightmare
[08:25:22] <dstorrs> did you roll your own job queue?
[08:26:23] <kali> dstorrs: yes. same principle as yours, with locked_until and locked_by
[08:26:49] <kali> dstorrs: with one document per lock :)
[08:26:56] <dstorrs> what sort of throughput do you get?
[08:57:21] <kali> the index should do what you want
[08:57:49] <dstorrs> meaning I need to set an owner_host on every document, even if it doesn't need it?
[08:58:12] <vak> guys, vote, add watches, comments, whatever, but, please, let push this shame issue-1240 http://bit.ly/PiPkXW we don't want 3rd year celebration, or do we?
[08:58:25] <kali> dstorrs: i think owner_host: null matches a documents without owner_hsot
[08:59:31] <dstorrs> vak: you know that database-level is coming in 2.2 (rc's for which are out now) and that collection level is scheduled for 2.4, right?
[10:16:21] <scrr> hm. can i show the config in mongo-onsole somehow?
[10:23:01] <scrr> got it. db.adminCommand("getCmdLineOpts")
[10:32:47] <kaikaikai> does anyone have experience running a mongodb update on each page request? i'm building a sort of custom analytics and i have two choices: use timestamps and inserts and then organizing all the data later...or using updates and keeping an embedded document
[10:33:30] <kaikaikai> there is about 400 concurrent users on average, i know i'll need to just test this but if anyone has insight it will help me choose what to try first
[10:35:18] <Mortah> looking at our sampler its not causing performance issues for us... even for 400 users at 1kb each this only adds up to 400kb so it stays in memory no problem :)
[10:35:39] <kaikaikai> ok i see, how do you check? my update would be running every single page request
[10:35:57] <kaikaikai> even though sometimes addToSet wouldn't let any new data write
[11:01:09] <thewanderer1> hi. I'd like to have an application that depends on particular array ordering in MongoDB. if I save [1,2,3], is the order in which they are stored and retrieved preserved?
[11:02:04] <ankakusu> @Derick what are your subject at mongodb?
[11:18:26] <ankakusu> I'm gonna put the nodes , ways and relations to mongodb.
[11:19:01] <ankakusu> in element "way" we are using point references.
[11:19:31] <thewanderer1> guys, I'm trying to find() all documents which have "pirate" as the first element of the "arr" array. it needs to be the first element, not any further. any suggestions?
[11:20:03] <thewanderer1> I've tried: {arr[0]: 'pirate'}, {arr.0: 'pirate'}, neither works
[14:01:18] <Tobsn> http://dl.dropbox.com/u/1656816/Screenshots/ua~o.png - just FYI
[14:01:27] <Bartzy> kchodorow_: While you're here, I have a weird index scenario
[14:01:39] <remonvv> Hm, beginning to realize why the CouchDB community isn't winning any prizes : Guy tweets "Yup: this is why I choose CouchDB. Macworld Developers dish on iCloud's challenges" + link to blog about some iCloud issues. Retweeted by @CouchDB.
[14:02:01] <remonvv> Nice chain of reasoning "Apple didn't do iCloud very well so now I like completely unrelated technology CouchDB"
[14:02:42] <Tobsn> well couchdb isnt so interesting, what is interesting is membase
[14:02:52] <Bartzy> kchodorow_: I already asked it here but didn't get a complete answer - I have an index on {uid: 1, id: -1}, and doing: db.shares.find({uid: {$in: [list of 10-5000 strings here]}).sort({_id: -1}).limit(50) , results in using that index, but it also shows scanAndOrder: true. Why is it not using the _id key for sorting ?
[14:07:33] <kali> Bartzy: index will only be able to sort if you're parsing a contiguous slice of the index. in your case, the $in spread the results all over the place, so the index is useless for sorting
[14:08:06] <kali> Bartzy: there are quite good presentations by 10gen staff on the index internals, it might be a good time to have a look at one :)
[14:08:42] <Bartzy> kali: I will look at them, thanks. Reading MongoDB in Action and the chapter about indexes was no help about this :)
[14:08:58] <Bartzy> kali: Anyway to make an index work for the sorting? Or query differently ?
[14:09:32] <Bartzy> I just need a list of the documents with those uids, ordered by descending time (_id, timestamp, natural)... and limit to the last 50
[14:10:00] <Mortah> make a calculated field and index on that?
[14:12:29] <Bartzy> kali: How can I structure it differently ?
[14:13:00] <Bartzy> kali: And if I keep this structure - there is no point for _id in that index, if scanAndOrder is true, right? So I can just remove it and add a {uid:1} index ?
[14:13:21] <kchodorow_> i'm not sure if it works in 2.0, but {_id:1, uid:1} should know how to use the index in 2.2
[14:14:02] <Bartzy> kchodorow_: No need for _id: -1 ? Also, how would it know to use it , I thought sorting can be done only if is for the key after the last one used in the index ?
[14:15:53] <kchodorow_> on _id:-1: the query optimizer should be smart enough for that not to matter
[14:15:54] <kali> kchodorow_: with the sort key in first ? skipping the docs based on the index key bit ? as far as i know this is new
[14:16:20] <kchodorow_> on the reverse order: yeah, that's probably just in 2.2. but it is a new feature
[14:16:38] <Bartzy> kchodorow_: Cool! :p When 2.2 is going to be stable ?
[14:16:40] <kchodorow_> sort key can go first so it'll traverse the index in that order
[14:17:28] <Bartzy> And the performance of just getting the documents according to uid, without ordering, will be the same in {_id:1, uid:1} as in {uid:1, _id:1} ?
[14:18:02] <Bartzy> kchodorow_: Any rough estimations? Weeks or months? :)
[14:18:34] <kchodorow_> looks like it doesn't automatically use the index if you don't include the sort
[14:22:53] <Bartzy> but that also could be lots and lots of keys
[14:23:20] <Bartzy> in my scenario I'm searching for friends photos of a specific user. So all those UIDs are his/her friend UIDs
[14:23:29] <Bartzy> so their friends can not upload a photo for a very long time.
[14:24:03] <kali> Bartzy: well, i feel your pain. social graphs are the worst
[14:24:12] <Bartzy> That will still be faster than getting all the photos of the friends, then sorting in "disk" ? I have enough RAM to hold the dataset
[14:25:40] <kali> Bartzy: another option is to completely denormalize and store a "friends pictures" history of ids in each user. every time a picture is posted, you'll have to propagate the new picture ids to all the friends
[14:26:25] <Bartzy> 25 photos are posted every second
[14:26:41] <Bartzy> so that's a bit of an issue (didn't test though)
[14:29:22] <kali> Bartzy: don't expect a silver bullet, you won't find it
[14:33:43] <BurtyB> is there any easy way to change a shards _id? or would it be something like stopping the balancer, then in config updating shards._id and updating chunks.shard to the new name and restarting config servers?
[14:34:30] <Derick> BurtyB: and now to be more helpful, the ns*objects is just documents, the nscanned also includes index key searches
[14:38:55] <BurtyB> lol -l that confused me Derick until I realised you had a miss tab complete ;)
[14:39:15] <solars> hey, shouldn't this query use only indexes? https://gist.github.com/2798607b5ede61192887 or whats wrong with that?
[14:40:01] <kali> solars: you need an index on hotel_id and timestamp in this order
[14:40:19] <Bartzy> I created a PHP script with that query - I measured the time for $cursor->next to come back (with microtime(true)), and it was 64ms. Then I did $cursor->reset(); print_r($cursor->explain()) and got 121 in millis.
[14:40:28] <Bartzy> How come explain measured twice as much as the actual query time ?
[14:40:31] <solars> kali, what if I reverse the arguments?
[14:43:34] <solars> does this mean the ordering part is always last? what happens if I have hotel_id and fubar_id which I use to select?
[14:43:41] <Mortah> Bartzy... it looks like its actually getting all the docs you need via the index and then doing the sort in memory... is that really so bad? sorting 5k objects shouldn't take too long... plus you could scale with more secondaries for reads :)
[14:43:44] <kali> solars: yes, the ordering must be at the end
[14:44:00] <solars> kali, also with more than 1 selection parts?
[14:44:44] <solars> e.g. I have: index([ [ :timestamp, Mongo::ASCENDING ], [ :hotel_id, Mongo::ASCENDING], ['rateplans.rateplan_id', Mongo::ASCENDING] ], background: true)
[14:44:45] <kali> solars: if you want to do no scan at all, yes
[14:44:50] <solars> (sorry for the mongoid syntax)
[14:45:10] <solars> so if I put the timestamp at the end, for ordering, does it also work if I only filter by hotel_id?
[14:45:24] <solars> I thought it only works if I have hotelid, rateplans id
[14:45:32] <kali> solars: yes, i've seen your index. once again, take one hour, watch one of the presentation, you won't regret it if you want to understand what you're doing
[14:45:49] <solars> sure, I will, just want to understand the problem
[14:46:01] <solars> do you have a link to these presentations?
[14:47:05] <kali> solars: http://www.slideshare.net/mongodb/mongodb-sharding-internals that one for instance
[16:10:38] <kchodorow_> i might or might not get a chance to merge in your pull request before then, mostly working on 2.2 testing
[16:11:28] <diegok> kchodorow_: well, my changes are fine, but what I really need now is the ability to retry and query on secondaries mostly
[16:12:17] <diegok> kchodorow_: I think I'll have some time to go around this issue this weekend...
[16:12:53] <diegok> kchodorow_: did you saw where I should be starting at?
[16:14:14] <kchodorow_> okay... so basically the driver gets a seed list of servers to connect to. it should call ismaster on them and populate any missing hosts (ones not in the seed list)
[16:14:23] <kchodorow_> then call ismaster on every host it knows about
[16:14:43] <kchodorow_> ismaster returns either ismaster=>true or secondary=>true (or both of those fields false)
[16:15:01] <kchodorow_> so the driver should keep track of which ones are secondary=>true
[16:15:57] <kchodorow_> the driver should also time how long it takes to call ismaster on the various servers
[16:16:05] <kchodorow_> as it should prefer reading from "closer" servers
[17:48:03] <chubz> kali: thanks a bunch, sorry im super new to this :b
[17:48:26] <chubz> is there a way i can log that port number somehow in a file/
[19:04:58] <Lawnchair_> In a sharded database's config.locks collection, what does "doing a balance round" mean? Is the balancer actually balancing the shards when it has this lock?
[19:16:23] <krz> anyone use rails and the mongo db driver? how can i run the same command https://gist.github.com/3177972 using a rails model?
[19:46:45] <tystr> we've started getting these errors in our webserver logs: MongoCursorException: couldn't get response header
[19:48:43] <Habitual> I am in need of some guidance for your package, the "mongo guy" (our client) is having one or more of "network latency, network limits, disk io speed, io wait" issues. I have been beating up sysstat tools all day. I have added counts for mysqd and mongod processes in zabbix. I really need to identify the cause. Thank you.
[19:48:52] <jiffe98> anyone used the mongodb.py nagios script?
[19:49:12] <jiffe98> nagios is reporting the status as (null) but if I run the script with the proper arguments it checks ok
[20:40:45] <krz> anyone know what is wrong with this: https://gist.github.com/3177972
[20:41:36] <krz> it doesn't seem to work if the :visits_count part is in the $group
[20:45:33] <gheegh> Mongo and Ruby Question: Is it normal in my logs to see my clients connecting and disconnecting almost on a per request basis?
[20:45:56] <sigmonsays_> Is there a findAndModify that will update more than one record?
[21:59:19] <krz> i have the following structure https://gist.github.com/3161542. how do I find the visit with token_id "13432515303rwiaczcx" and update its minute to 1234?
[22:09:57] <krz> anyone can help me with this: http://stackoverflow.com/questions/11659334/how-do-i-update-an-item-in-an-array-in-this-document-structure