pmxbot IRC Log Viewer

[00:20:44] <krz> anyone familiar with the aggregation framework?

[00:29:35] <crudson> krz: what is your question?

[00:31:21] <krz> crudson: this is my document structure: https://gist.github.com/3161542

[00:31:49] <krz> using the aggregation framework. how do i return all visits with minute greater than 12 ?

[00:33:48] <crudson> krz: as they are within an array of a single document, you need to $unwind first

[00:34:17] <krz> crudson: you familiar with the mongodb ruby driver?

[00:34:23] <crudson> krz: yes

[00:35:16] <krz> crudson: i tried https://gist.github.com/3173629

[00:35:24] <krz> it returns []

[00:35:59] <krz> you know what I'm doing wrong?

[00:36:35] <crudson> krz: looking

[00:41:15] <crudson> oh i saw that as an array, not a hash. you should probably avoid using variable values as hash keys as they become hard to query against and manage. Also is there a reason visits is a hash rather than an array? It's much easier to have [ {id:'134300236111rcbmbmvv','country_name:'Europe'},{}]

[00:41:30] <crudson> it's indexable too that way

[00:41:57] <crudson> krz: p.s. 'Europe' isn't a country ;)

[00:43:17] <krz> yea shouldn't be europe.

[00:43:39] <crudson> krz: but the document structure is my opinion. In my experience using { key: '12345', value: 'abcde' } is much preferable to { '12345':'abcde'}

[00:44:20] <krz> crudson: isn't that how i have it?

[00:44:27] <krz> are you referring to the visits structure?

[00:45:27] <crudson> krz: no, your visits are a hash keyed on some 20 character id

[00:47:06] <crudson> krz: personally I'd address the document structure to facilitate querying, but others may have other suggestions. What was the ruby question?

[00:47:57] <krz> crudson yea so based on this structure. how am i able to extract only the visits with "minute" greater than 12

[00:48:29] <crudson> krz: well unwind is for arrays so that is out

[00:50:12] <crudson> krz: that's the only native operation in aggregation to split a single document's attribute values

[00:50:16] <crudson> afaik

[00:50:35] <krz> crudson: btw, http://www.10gen.com/presentations/webinar/real-time-analytics-with-mongodb slide 19

[00:50:42] <krz> is using hashes too right?

[00:50:47] <krz> for hourly and minute

[00:51:13] <krz> title of slide is "Pre-Aggregation"

[01:00:26] <krz> crudson any idea? second, https://gist.github.com/3173689 is returning []

[01:10:53] <crudson> yes, but their keys are static except for some related to hours and minutes, which they are handling as a special case. You are asking to get a list of document fragments (i.e. particular {key,value}s), and on brief examination your document format is not going to make this easy with aggregation. You could map reduce, emitting id=>document if minute>12 and have a null reducer.

[01:12:18] <krz> crudson: the structure of hourly and minute follow the same structure of visits

[01:13:32] <krz> except that the visits is one level deeper

[01:14:11] <crudson> krz: use map reduce then like they are doing rather than aggregate

[01:14:25] <krz> can't. not suitable for "real-time stats"

[01:14:27] <krz> too slow

[01:15:43] <krz> such reasons are also highlighted at this webinar: http://www.10gen.com/events/new-aggregation-framework

[01:19:29] <crudson> krz: I have no further advice other than to rethink your document design if you want to use the query type you suggest, or you just filter the nested documents on the client side.

[01:20:11] <krz> is there a reason why aggregate's unwind does not work specifically with hashes?

[01:21:09] <crudson> krz: it's an array operation by design

[01:21:59] <krz> crudson: lets take it a step back. whats wrong with this then https://gist.github.com/3173689

[01:22:07] <krz> why am i getting [] ?

[01:29:15] <crudson> krz: I loaded your document into a collection and ran that command fine. check your db and collection names

[01:32:32] <crudson> krz: albeit with not a mix of hash syntaxes, probably best to stick with one

[01:40:13] <krz> crudson: you mean you tried it with an array structure?

[01:40:55] <crudson> I ran the following and got a single document in 'result' db.command(:aggregate => 'uw', :pipeline => [{'$match' => {:_id => '20120723/foobar/song/custom-cred'}}])

[01:57:05] <krz> crudson: ill consider changing the structure to an array instead. just to understand this further, can you show me more or less how the visits should look like in array form?

[02:00:31] <crudson> sure

[02:04:41] <krz> crudson: much appreciated

[02:05:29] <crudson> krz: something like this. http://pastebin.com/zf3BcqpT note that I let mongodb assign its own _id.

[02:10:05] <crudson> krz: then execute: db.uw.aggregate({$unwind:'$visits'}, {$project:{visit:'$visits'}}, {$match:{'visit.minute':{$gte:12}}})

[02:10:49] <crudson> krz: or $gt:12 for non-inclusive

[02:13:49] <crudson> krz: this will give the result you are looking for: http://pastebin.com/tzdyVFeS

[02:30:38] <krz> crudson: that looks great

[02:30:56] <krz> crudson: how would i perform the aggregation on this to return visits with minute greater than 12?

[02:31:33] <crudson> krz: I just pasted that

[02:32:14] <crudson> krz: note whether to use your own _id or not is entirely up to you, providing they are unique

[03:09:11] <krz> crudson: i mean, whats the mongo ruby method to aggregate based on that structure

[03:42:53] <clu3> Suppose users.tags = [ [id : 1, name : tag1], [ id: 2, name :tag2] , ...] how do i remove the tag with id=2 out of users.tags array? Any help really appreciated

[03:46:44] <clu3> it looks like it's not possible to do that?

[03:57:06] <krz> crudson: seems to work now with https://gist.github.com/3173629

[03:57:32] <krz> i am now seeing something in results

[03:57:47] <krz> any idea how i can return only the results with minute greater than 12?

[04:25:45] <krz> how does one use $gt in an aggregator method?

[04:50:44] <krz> crudson: i put the issue here on more detail with code for a better view http://stackoverflow.com/questions/11641358/how-do-i-execute-this-query-using-the-aggregation-framework-in-mongodb have a look at it when you have the time

[05:08:38] <crudson> krz: I wrote a number of times how to do that with your data model. check previous pastes and chat.

[05:09:35] <krz> crudson: we only spoke about document structure. nothing specific into filtering the results

[05:15:38] <crudson> krz: I pasted this link 3hrs ago http://pastebin.com/tzdyVFeS

[07:17:36] <vak> mongostat shows "locked %" 2.71e+03 in 4-core server, how come? http://pastebin.com/h7zJBF3j

[07:51:32] <[AD]Turbo> hi there

[08:08:30] <kali> dstorrs: if you're still struggling with your query, there are a few quite simple document schema alteration you can make to make the indexing work better: for instance, maintain a "has_pages" boolean (<=> np>0), a "pending" boolean (<=> action != "FINISHED")

[08:09:14] <kali> dstorrs: then, make two separate queries: one for "relocking" and one for new locks to discard the $or

[08:09:57] <kali> dstorrs: and finally you need the lock_until criteria at the end of the index definition (the "range" parameter is only index-compatible in last position)

[08:10:15] <dstorrs> range?

[08:10:26] <dstorrs> it's a straight integer less-than comparison

[08:10:36] <dstorrs> I thought range meant "between two values"?

[08:10:43] <kali> dstorrs: half range if you like, same problem

[08:11:42] <kali> dstorrs: but even with that, i think you're not out of the wood because indexing and $elemMatch are sometimes weird

[08:11:57] <dstorrs> yeah. :<

[08:12:05] <dstorrs> actually, I'm not sure the index is our big issue

[08:12:23] <dstorrs> it seems to be the actual write lock issue on updating

[08:12:27] <kali> it might be worth splitting pages to a separate collection

[08:12:37] <dstorrs> how do you mean?

[08:12:53] <kali> each page in its own document

[08:14:04] <dstorrs> I looked at that, actually. horizontal striping (all pages in array) seemed to outperform vertical (each in a separate doc)

[08:14:24] <dstorrs> although the code has changed enough since that it might be worth re-running the test.

[08:14:50] <kali> splitting pages and the alteration i suggested will make indexing work, i'm sure of that.

[08:15:17] <kali> and i'm quite sure you won't get efficient indexing with the array and that kind of criteria

[08:15:39] <dstorrs> does indexing actually matter? it doesn't matter what order jobs get grabbed in.

[08:15:49] <kali> dstorrs: it may speed up updating, so...

[08:15:55] <dstorrs> will the indexing affect the speed of the update call?

[08:16:34] <kali> maybe :)

[08:16:34] <dstorrs> hmm, I suppose that makes sense, actually.

[08:16:37] <kali> probably.

[08:16:57] <kali> you don't want your server to burn cpu scanning thousands of documents anyway

[08:17:02] <kali> think of the polar bears

[08:17:08] <dstorrs> if there are no indices, the workers will all grab at the top, then each will have to walk past all the locked docs to find an unlocked one.

[08:19:02] <kali> dstorrs: yes. and i don't know when the write lock is actually taken: just at the time of the actual update, or for the scanning too

[08:21:23] <kali> dstorrs: but i'm happy to see i'm not the only crazy person to use mongodb for concurrency state :)

[08:21:33] <dstorrs> heh :>

[08:21:38] <dstorrs> what's your specific application?

[08:24:02] <kali> dstorrs: like... everything. mongo-resque is the most important in terms of transaction, but we also have code for ad-hoc queues, tensions for multidoc updates, mutual exclusion like you're doing

[08:24:52] <kali> dstorrs: we used redis before, and loved it, but the lack of built-in failover was an operation nightmare

[08:25:22] <dstorrs> did you roll your own job queue?

[08:26:23] <kali> dstorrs: yes. same principle as yours, with locked_until and locked_by

[08:26:49] <kali> dstorrs: with one document per lock :)

[08:26:56] <dstorrs> what sort of throughput do you get?

[08:27:02] <dstorrs> (writes / sec)

[08:27:15] <kali> no idea :)

[08:27:17] <kali> enough :)

[08:27:31] <dstorrs> ???

[08:27:34] <dstorrs> no idea??

[08:28:07] <kali> i can try to give you a figure, but we never saturated it

[08:28:19] <kali> (is that a word ?)

[08:28:24] <dstorrs> yes.

[08:28:29] <dstorrs> and the right one for this use.

[08:28:42] <dstorrs> so, are you writing 100s or 1,000s /sec ?

[08:29:22] <kali> lemme try to find you some figure

[08:29:45] <dstorrs> you don't have to, I was just curious.

[08:30:17] <dstorrs> basically, I know that I was able to harvest 10M vids in 1 hr 11 mins when I initially launched.

[08:30:39] <dstorrs> working back from that, it's O(2000 wr/sec)

[08:31:00] <dstorrs> I was curious if that's high, low, or average, that's all.

[08:31:20] <dstorrs> (performance has since dropped due to code changes, sharding of DB, and various other changes)

[08:31:47] <kali> 2000 wr/sec for the database or for a given queue ?

[08:32:59] <dstorrs> more or less the same thing, since we're only doing one queue at a time due to the business rules.

[08:33:47] <kali> ok, the situation is more complex here

[08:36:25] <kali> dstorrs: we're very far from these figures actually. a few dozen lock request every seconds right now

[08:37:28] <dstorrs> that sounds glorious. :/

[08:37:39] <dstorrs> wrong emoticon.

[08:37:42] <kali> :)

[08:37:57] <dstorrs> I wish I didn't have to deal with this massive load. it blows.

[08:38:09] <dstorrs> I'd much rather be doing useful work on customer-facing features.

[08:38:36] <dstorrs> (not that this isn't useful, but it's a pretty thankless job and it never seems to end. it gets disheartening after a while)

[08:39:07] <kali> i'll have a look later in the day, when some heavy duty jobs are running.

[08:39:27] <kali> one more think i need to instrument and graph

[08:43:46] <dstorrs> heh

[08:43:55] <dstorrs> well, ultimately, it's what makes the business go round

[08:44:10] <dstorrs> and I'm the co-founder, so making things spin faster is very much in my interest

[08:49:57] <dstorrs> well, it's late here and I'm for bed.

[08:50:16] <dstorrs> to sum up your recommendations, you're saying change the schema to look like so :

[08:51:58] <dstorrs> { _id : ObjectId(), has_pages : true | false, pending : true | false, locked_until : $epoch, (optionally: owner : $val | owner_host : $val) }

[08:52:48] <kali> dstorrs: yep

[08:53:02] <dstorrs> what would the ensureIndex command be?

[08:53:23] <dstorrs> gven that owner / owner_host may or may not be in the document

[08:57:03] <kali> all the field you require equality in the selector, plus locked_until at the end

[08:57:03] <kali> fields

[08:57:03] <kali> so has_pages, pending, owner owner_host, locked_until in that order

[08:57:03] <dstorrs> how will that deal with the case where owner_host does not exist?

[08:57:11] <kali> owner_host: null

[08:57:21] <kali> the index should do what you want

[08:57:49] <dstorrs> meaning I need to set an owner_host on every document, even if it doesn't need it?

[08:58:12] <vak> guys, vote, add watches, comments, whatever, but, please, let push this shame issue-1240 http://bit.ly/PiPkXW we don't want 3rd year celebration, or do we?

[08:58:25] <kali> dstorrs: i think owner_host: null matches a documents without owner_hsot

[08:58:47] <kali> dstorrs: yes

[08:59:31] <dstorrs> vak: you know that database-level is coming in 2.2 (rc's for which are out now) and that collection level is scheduled for 2.4, right?

[09:00:23] <vak> dstorrs: yes. and?

[09:00:26] <dstorrs> vak: http://www.10gen.com/presentations/concurrency-internals-mongodb-2-2

[09:00:39] <dstorrs> so, you're getting your wish. with no need to do anything.

[09:00:56] <dstorrs> 2.4 will be out later this year

[09:01:24] <dstorrs> and colelction-level is clearly not going to happen in 2.2

[09:01:47] <vak> what should I say..

[09:03:17] <dstorrs> kali: thanks for the tips

[09:03:21] <dstorrs> g'nigth all

[09:15:43] <scrr> hello

[09:16:06] <scrr> questionn: is it possible to supply two .hint(..)s to mongoDb for one query?

[09:16:28] <scrr> or is it possible to tell mongo "only use btree or fail query immediately"

[09:16:32] <scrr> ?

[09:16:41] <scrr> (sorry for strange chars, im keymap challenged)

[09:16:47] <Mortah> you can turn off table scans

[09:16:56] <scrr> Mortah: how?

[09:17:56] <scrr> ah found it. thank you!

[09:17:58] <Mortah> :)

[10:16:21] <scrr> hm. can i show the config in mongo-onsole somehow?

[10:23:01] <scrr> got it. db.adminCommand("getCmdLineOpts")

[10:32:47] <kaikaikai> does anyone have experience running a mongodb update on each page request? i'm building a sort of custom analytics and i have two choices: use timestamps and inserts and then organizing all the data later...or using updates and keeping an embedded document

[10:33:30] <kaikaikai> there is about 400 concurrent users on average, i know i'll need to just test this but if anyone has insight it will help me choose what to try first

[10:33:44] <Mortah> we do something similar

[10:33:52] <Mortah> to track whether users have used a feature or not

[10:33:58] <Mortah> its an embedded document

[10:34:06] <Mortah> well, its not

[10:34:07] <Mortah> :D

[10:34:26] <Mortah> its: {user_id: x, used_feature_x: false, used_feature_y: true}

[10:34:27] <Mortah> etc

[10:35:18] <Mortah> looking at our sampler its not causing performance issues for us... even for 400 users at 1kb each this only adds up to 400kb so it stays in memory no problem :)

[10:35:39] <kaikaikai> ok i see, how do you check? my update would be running every single page request

[10:35:57] <kaikaikai> even though sometimes addToSet wouldn't let any new data write

[10:36:09] <kaikaikai> yours is the same?

[10:36:24] <kaikaikai> wow awesome, 400kb is nothing

[10:36:39] <kaikaikai> thanks, that already gives me a better idea of what to expect

[10:36:53] <Mortah> we do a get... check if the feature has already been marked as used, if so - do nothing, if not - do a $set

[10:37:16] <Mortah> its a cheap thing, its all fetched directly by ID so no expensive queries

[10:37:36] <kaikaikai> yeah, same for me, just id

[10:52:57] <ankakusu> Hi!

[10:53:00] <ankakusu> I'm new to mongodb

[10:53:13] <ankakusu> and I want to tranfer openstreetmap data into mongodb

[10:53:47] <Derick> ankakusu: I've done that.

[10:53:54] <ankakusu> how?

[10:53:57] <Derick> and spoken about it, and wrote about it

[10:54:03] <Derick> one sec, will get you the links

[10:54:26] <ankakusu> Ok. I'm waiting.

[10:55:10] <Derick> ankakusu: http://derickrethans.nl/indexing-free-tags.html (links to the talk too)

[10:55:20] <Derick> ankakusu: with what purpose are you doing this?

[10:55:33] <ankakusu> for my research.

[10:55:47] <ankakusu> academic research.

[10:56:30] <Derick> are you going to write/publish on that?

[10:57:23] <Derick> I'm a big OSM fan and I work with the MongoDB people... so really interested :-)

[10:57:37] <ankakusu> :)

[10:57:43] <ankakusu> Well not yet.

[10:57:46] <ankakusu> I've just started.

[10:57:57] <Derick> hehe

[10:58:10] <Derick> well, please keep me in the loop and/or don't hesitate to ask me questions

[10:58:12] <ankakusu> so just gathering information for now on

[10:58:39] <ankakusu> ok. sure. I'll continue working on maps.

[10:59:44] <Derick> also, feel to drop me a mail at "nick" @10gen.com

[11:00:44] <ankakusu> ok. I'm saving it.

[11:00:54] <ankakusu> *your email.

[11:00:55] <Derick> (my nick is Derick btw ;-) )

[11:01:09] <thewanderer1> hi. I'd like to have an application that depends on particular array ordering in MongoDB. if I save [1,2,3], is the order in which they are stored and retrieved preserved?

[11:02:04] <ankakusu> @Derick what are your subject at mongodb?

[11:02:16] <ankakusu> are you working with maps?

[11:03:05] <Derick> ankakusu: No, I work on the PHP driver

[11:03:14] <Derick> Maps and OSM is a hobby

[11:03:30] <thewanderer1> and, if ordering is preserved, how can I query only for documents which have an array whose 0-th element is "green"?

[11:04:35] <ankakusu> ok. Maps are really charming though!

[11:04:37] <ankakusu> :)

[11:05:15] <Derick> yes :-)

[11:17:11] <ankakusu> @Derick, I have a specific question:

[11:17:31] <ankakusu> I'm followed the following tutorial at mongodb

[11:17:34] <ankakusu> http://www.mongodb.org/display/DOCS/Java+Tutorial

[11:18:26] <ankakusu> I'm gonna put the nodes , ways and relations to mongodb.

[11:19:01] <ankakusu> in element "way" we are using point references.

[11:19:31] <thewanderer1> guys, I'm trying to find() all documents which have "pirate" as the first element of the "arr" array. it needs to be the first element, not any further. any suggestions?

[11:20:03] <thewanderer1> I've tried: {arr[0]: 'pirate'}, {arr.0: 'pirate'}, neither works

[11:20:16] <Derick> "arr.0" ?

[11:20:19] <ankakusu> such as: <way id='153665622' timestamp='2012-03-06T20:03:37Z' uid='609753' user='Chad Lawlis' visible='true' version='1' changeset='10893949'> <nd ref='1663030941' /> /way>

[11:20:32] <Zelest> Derick!

[11:20:38] <Zelest> havn't seen you in ages. :-)

[11:20:41] <thewanderer1> Derick: that was it, thanks!

[11:20:47] <Derick> Zelest: busy busy :-)

[11:20:54] <thewanderer1> now I only need to find out if this works in PHP, too :)

[11:20:58] <Zelest> Derick, Mhm, your twitter sort of gave that away. :-P

[11:21:01] <Derick> thewanderer1: it will

[11:21:40] <Zelest> Derick, https://github.com/Zelest/tinyphp/blob/master/core/tinymongo.php

[11:22:03] <Zelest> Derick, wrote a little "wrapper" class for models in mongo. :-)

[11:22:45] <Derick> ankakusu: are you going to finish your question? :-)

[11:24:02] <ankakusu> well, while I was writing, I realized that ref in <nd ref='1663030941' /> is just an attribute

[11:24:26] <ankakusu> I was thinking about it Derick :)

[11:24:50] <Derick> ankakusu: in my trials, I embedded the nodes position in ways

[11:24:51] <ankakusu> let me ask another question:

[11:25:19] <ankakusu> embed?

[11:25:57] <ankakusu> can you write me a simple example?

[11:26:53] <Derick> ankakusu: it's all in the article/presentation slides really...

[11:27:37] <Derick> slide 22 on http://derickrethans.nl/talks/osm-mongouk12.pdf

[11:28:18] <ankakusu> ok. Sorry. Let me read more carefully.

[11:28:39] <Derick> and I only store nodes that have tags separately as well

[11:50:48] <Bartzy> Why I can't do in the shell:

[11:50:56] <Bartzy> db.shares.count(); sleep(10); db.shares.count() ?

[11:51:54] <Mortah> print(...)

[12:03:38] <lqez> I'm copying my mongodb into new mongodb instance by 'copyDatabase' command. (about 700M rows, 1.3TB)

[12:03:58] <lqez> Copying data was only spent 2h, but

[12:04:30] <lqez> I'm waiting for more 6h for '"msg" : "index: (2/3) btree bottom up 197232658/715742197 27%"'

[12:04:53] <lqez> Do I have to wait for it?

[12:05:12] <lqez> Or, Can I run copyDatabase command without creating index on foreground?

[12:05:26] <lqez> I want to run copyDatabase command with background index creation.

[12:17:29] <Bartzy> Mortah: What does print do that db.shares.count() doesn't ?

[12:23:02] <NodeX> Bartzy : I think you'll have to run a function and setTimeout().. in the function to call the secnod count

[13:53:51] <Bartzy> Anyway to speed up mongorestore?

[13:53:59] <Bartzy> For a 27GB .bson file, it takes an hour :o

[13:54:34] <ron> sure. use a smaller .bson file.

[13:55:09] <NodeX> lmfao

[13:55:14] <NodeX> troll

[13:55:48] <Bartzy> That's good.

[13:56:09] <Bartzy> So for big datasets it's usually better to use LVM snapshots or something like that ?

[13:56:21] <Bartzy> And just copy back the raw files when restoring ?

[13:57:16] <Mortah> we do that

[13:57:24] <Mortah> inside EC2 to S3

[13:57:29] <Mortah> takes about 2 hours to restore ~300gb

[13:58:43] <kchodorow_> but yes, saving the raw files is going to be faster than dump/restoring

[14:00:01] <ron> NodeX: you love me, and you know it.

[14:01:04] <Bartzy> kchodorow_: Thanks.

[14:01:18] <Tobsn> http://dl.dropbox.com/u/1656816/Screenshots/ua~o.png - just FYI

[14:01:27] <Bartzy> kchodorow_: While you're here, I have a weird index scenario

[14:01:39] <remonvv> Hm, beginning to realize why the CouchDB community isn't winning any prizes : Guy tweets "Yup: this is why I choose CouchDB. Macworld Developers dish on iCloud's challenges" + link to blog about some iCloud issues. Retweeted by @CouchDB.

[14:02:01] <remonvv> Nice chain of reasoning "Apple didn't do iCloud very well so now I like completely unrelated technology CouchDB"

[14:02:42] <Tobsn> well couchdb isnt so interesting, what is interesting is membase

[14:02:52] <Bartzy> kchodorow_: I already asked it here but didn't get a complete answer - I have an index on {uid: 1, id: -1}, and doing: db.shares.find({uid: {$in: [list of 10-5000 strings here]}).sort({_id: -1}).limit(50) , results in using that index, but it also shows scanAndOrder: true. Why is it not using the _id key for sorting ?

[14:03:11] <NodeX> apple are retarded - fact

[14:03:36] <Bartzy> NodeX: I think you've helped me with that problem ;)

[14:03:55] <NodeX> ;)

[14:04:38] <Bartzy> If anyone knows, I'm here :p

[14:07:33] <kali> Bartzy: index will only be able to sort if you're parsing a contiguous slice of the index. in your case, the $in spread the results all over the place, so the index is useless for sorting

[14:08:06] <kali> Bartzy: there are quite good presentations by 10gen staff on the index internals, it might be a good time to have a look at one :)

[14:08:42] <Bartzy> kali: I will look at them, thanks. Reading MongoDB in Action and the chapter about indexes was no help about this :)

[14:08:58] <Bartzy> kali: Anyway to make an index work for the sorting? Or query differently ?

[14:09:32] <Bartzy> I just need a list of the documents with those uids, ordered by descending time (_id, timestamp, natural)... and limit to the last 50

[14:10:00] <Mortah> make a calculated field and index on that?

[14:10:10] <Mortah> oh no that won't work

[14:10:46] <kali> Bartzy: i'm afraid no, not with this structure

[14:10:57] <Mortah> how many UIDs do you have in total?

[14:11:40] <Mortah> I wonder if you could flip the index... but I believe the total number of UIDs you have would affect whether that would be useful

[14:12:11] <Bartzy> Mortah: millions

[14:12:29] <Bartzy> kali: How can I structure it differently ?

[14:13:00] <Bartzy> kali: And if I keep this structure - there is no point for _id in that index, if scanAndOrder is true, right? So I can just remove it and add a {uid:1} index ?

[14:13:21] <kchodorow_> i'm not sure if it works in 2.0, but {_id:1, uid:1} should know how to use the index in 2.2

[14:14:02] <Bartzy> kchodorow_: No need for _id: -1 ? Also, how would it know to use it , I thought sorting can be done only if is for the key after the last one used in the index ?

[14:15:53] <kchodorow_> on _id:-1: the query optimizer should be smart enough for that not to matter

[14:15:54] <kali> kchodorow_: with the sort key in first ? skipping the docs based on the index key bit ? as far as i know this is new

[14:16:20] <kchodorow_> on the reverse order: yeah, that's probably just in 2.2. but it is a new feature

[14:16:38] <Bartzy> kchodorow_: Cool! :p When 2.2 is going to be stable ?

[14:16:40] <kchodorow_> sort key can go first so it'll traverse the index in that order

[14:16:50] <kchodorow_> we're working on that :)

[14:16:52] <kali> kchodorow_: nice :)

[14:17:28] <Bartzy> And the performance of just getting the documents according to uid, without ordering, will be the same in {_id:1, uid:1} as in {uid:1, _id:1} ?

[14:18:02] <Bartzy> kchodorow_: Any rough estimations? Weeks or months? :)

[14:18:34] <kchodorow_> looks like it doesn't automatically use the index if you don't include the sort

[14:18:47] <Bartzy> but if I hint it, it will ?

[14:18:50] <kchodorow_> yes

[14:18:55] <kchodorow_> i'm guessing a month?

[14:19:05] <Bartzy> That really throws me off in my understanding of how indexes work :)

[14:19:24] <Bartzy> how it can first sort and then get the documents with the specified uids ?

[14:20:22] <kchodorow_> well, it's like a sorted list like: [id1, uid1], [id2, uid2], ... [id50000, uid50000]

[14:20:37] <kchodorow_> so it goes through them in id order, looking for the right uid fields

[14:21:09] <kali> Bartzy: so you scan the index instead of making random access in your whole collection

[14:21:11] <kchodorow_> maybe i need to use real number for that example...

[14:21:12] <Bartzy> so that means it will scan through all the index ?

[14:21:37] <kali> Bartzy: it will scan until it has found enough docs

[14:21:53] <kchodorow_> it shouldn't have to, it'll sort the uids and then go from min->max uid value

[14:21:54] <Mortah> Bartzy, how many documents do you actually end up scanning btw?

[14:22:04] <kchodorow_> oh wait, ignore that

[14:22:24] <Bartzy> kchodorow_: I think what kali says makes sense

[14:22:45] <kchodorow_> yeah, kali's correct

[14:22:53] <Bartzy> but that also could be lots and lots of keys

[14:23:20] <Bartzy> in my scenario I'm searching for friends photos of a specific user. So all those UIDs are his/her friend UIDs

[14:23:29] <Bartzy> so their friends can not upload a photo for a very long time.

[14:24:03] <kali> Bartzy: well, i feel your pain. social graphs are the worst

[14:24:12] <Bartzy> That will still be faster than getting all the photos of the friends, then sorting in "disk" ? I have enough RAM to hold the dataset

[14:24:26] <Bartzy> Mortah: sec.

[14:25:40] <kali> Bartzy: another option is to completely denormalize and store a "friends pictures" history of ids in each user. every time a picture is posted, you'll have to propagate the new picture ids to all the friends

[14:26:25] <Bartzy> 25 photos are posted every second

[14:26:41] <Bartzy> so that's a bit of an issue (didn't test though)

[14:29:22] <kali> Bartzy: don't expect a silver bullet, you won't find it

[14:31:52] <Bartzy> Yep

[14:32:40] <Bartzy> Mortah: nscanned: 1911

[14:32:43] <Bartzy> nscannobjects : 1393

[14:32:46] <Bartzy> What's the difference? :)

[14:33:43] <BurtyB> is there any easy way to change a shards _id? or would it be something like stopping the balancer, then in config updating shards._id and updating chunks.shard to the new name and restarting config servers?

[14:33:55] <Derick> BurtyB: 518

[14:34:30] <Derick> BurtyB: and now to be more helpful, the ns*objects is just documents, the nscanned also includes index key searches

[14:38:55] <BurtyB> lol -l that confused me Derick until I realised you had a miss tab complete ;)

[14:39:15] <solars> hey, shouldn't this query use only indexes? https://gist.github.com/2798607b5ede61192887 or whats wrong with that?

[14:40:01] <kali> solars: you need an index on hotel_id and timestamp in this order

[14:40:19] <Bartzy> I created a PHP script with that query - I measured the time for $cursor->next to come back (with microtime(true)), and it was 64ms. Then I did $cursor->reset(); print_r($cursor->explain()) and got 121 in millis.

[14:40:28] <Bartzy> How come explain measured twice as much as the actual query time ?

[14:40:31] <solars> kali, what if I reverse the arguments?

[14:40:36] <solars> same result

[14:41:13] <kali> solars: no. the index must have the selecting part as a prefix of the sortering part for it to work

[14:41:31] <kali> solars: there are quite good presentations by 10gen staff on the index internals, it might be a good time to have a look at one :)

[14:42:01] <solars> hm I don't understand

[14:42:01] <Bartzy> kali: heh :)

[14:42:08] <Bartzy> kali: Any idea about my measuring question? :)

[14:42:30] <solars> kali, can you tell me how the index has to look like then for it work?

[14:42:50] <kali> solars: {hotel_id: 1, timestamp:1}

[14:43:34] <solars> does this mean the ordering part is always last? what happens if I have hotel_id and fubar_id which I use to select?

[14:43:41] <Mortah> Bartzy... it looks like its actually getting all the docs you need via the index and then doing the sort in memory... is that really so bad? sorting 5k objects shouldn't take too long... plus you could scale with more secondaries for reads :)

[14:43:44] <kali> solars: yes, the ordering must be at the end

[14:44:00] <solars> kali, also with more than 1 selection parts?

[14:44:44] <solars> e.g. I have: index([ [ :timestamp, Mongo::ASCENDING ], [ :hotel_id, Mongo::ASCENDING], ['rateplans.rateplan_id', Mongo::ASCENDING] ], background: true)

[14:44:45] <kali> solars: if you want to do no scan at all, yes

[14:44:50] <solars> (sorry for the mongoid syntax)

[14:45:10] <solars> so if I put the timestamp at the end, for ordering, does it also work if I only filter by hotel_id?

[14:45:24] <solars> I thought it only works if I have hotelid, rateplans id

[14:45:32] <kali> solars: yes, i've seen your index. once again, take one hour, watch one of the presentation, you won't regret it if you want to understand what you're doing

[14:45:49] <solars> sure, I will, just want to understand the problem

[14:46:01] <solars> do you have a link to these presentations?

[14:47:05] <kali> solars: http://www.slideshare.net/mongodb/mongodb-sharding-internals that one for instance

[14:47:13] <solars> thanks a lot

[14:47:32] <Bartzy> Mortah: yeahh...

[14:47:38] <kali> solars: mmm this is just the slides

[14:48:20] <solars> yeah

[14:50:23] <kali> solars: http://www.mongodb.org/display/DOCS/Indexes there is actualy one here

[14:51:38] <solars> thanks I'll have a look

[14:52:33] <kali> kchodorow_: by the way, one month ? nice :)

[14:53:36] <kchodorow_> kali: yeah, getting closer :)

[16:07:17] <diegok> kchodorow_: any news on the perl driver side?

[16:08:47] <kchodorow_> diegok: situation improving, we hired someone to work on the driver full-time :)

[16:09:06] <diegok> kchodorow_: oh!, that's great!!!

[16:09:27] <diegok> kchodorow_: who is that one?, so I can disturb him instead of you :D

[16:10:03] <kchodorow_> he hasn't started yet, he's supposed to in a week or two

[16:10:06] <diegok> kchodorow_: I'll be glad to help him/her :)

[16:10:13] <kchodorow_> i'll hook you guys up :)

[16:10:38] <kchodorow_> i might or might not get a chance to merge in your pull request before then, mostly working on 2.2 testing

[16:11:28] <diegok> kchodorow_: well, my changes are fine, but what I really need now is the ability to retry and query on secondaries mostly

[16:12:17] <diegok> kchodorow_: I think I'll have some time to go around this issue this weekend...

[16:12:53] <diegok> kchodorow_: did you saw where I should be starting at?

[16:14:14] <kchodorow_> okay... so basically the driver gets a seed list of servers to connect to. it should call ismaster on them and populate any missing hosts (ones not in the seed list)

[16:14:23] <kchodorow_> then call ismaster on every host it knows about

[16:14:43] <kchodorow_> ismaster returns either ismaster=>true or secondary=>true (or both of those fields false)

[16:15:01] <kchodorow_> so the driver should keep track of which ones are secondary=>true

[16:15:57] <kchodorow_> the driver should also time how long it takes to call ismaster on the various servers

[16:16:05] <kchodorow_> as it should prefer reading from "closer" servers

[16:17:45] <kchodorow_> diegok: pm'ed you

[16:18:05] <diegok> ok, so, it calls ismaster on startup and keep this ordered by "closer" secondaries. When it should refresh this list?

[16:19:06] <diegok> how do I populate this list further?

[16:19:26] <kchodorow_> ismaster returns a list of hosts

[16:19:44] <kchodorow_> let me check on how often to refresh, any driver ppl here know off the top of their head?

[16:21:48] <kchodorow_> every 5 seconds, it looks like

[17:37:03] <chubz> when i run a command like "db.isMaster()" in mongo, how can i limit the output to just a certain field?

[17:37:19] <chubz> in my case i just want to see the port number of primary

[17:39:20] <kali> chubz: db.isMaster().primary.split(':')[1] ?

[17:48:03] <chubz> kali: thanks a bunch, sorry im super new to this :b

[17:48:26] <chubz> is there a way i can log that port number somehow in a file/

[19:04:58] <Lawnchair_> In a sharded database's config.locks collection, what does "doing a balance round" mean? Is the balancer actually balancing the shards when it has this lock?

[19:16:23] <krz> anyone use rails and the mongo db driver? how can i run the same command https://gist.github.com/3177972 using a rails model?

[19:21:26] <krz> crodas: any idea?

[19:45:59] <krz> http://stackoverflow.com/questions/11657286/how-do-i-group-by-id anyone?

[19:46:45] <tystr> we've started getting these errors in our webserver logs: MongoCursorException: couldn't get response header

[19:48:43] <Habitual> I am in need of some guidance for your package, the "mongo guy" (our client) is having one or more of "network latency, network limits, disk io speed, io wait" issues. I have been beating up sysstat tools all day. I have added counts for mysqd and mongod processes in zabbix. I really need to identify the cause. Thank you.

[19:48:52] <jiffe98> anyone used the mongodb.py nagios script?

[19:49:12] <jiffe98> nagios is reporting the status as (null) but if I run the script with the proper arguments it checks ok

[20:40:45] <krz> anyone know what is wrong with this: https://gist.github.com/3177972

[20:41:36] <krz> it doesn't seem to work if the :visits_count part is in the $group

[20:45:33] <gheegh> Mongo and Ruby Question: Is it normal in my logs to see my clients connecting and disconnecting almost on a per request basis?

[20:45:56] <sigmonsays_> Is there a findAndModify that will update more than one record?

[21:59:19] <krz> i have the following structure https://gist.github.com/3161542. how do I find the visit with token_id "13432515303rwiaczcx" and update its minute to 1234?

[22:09:57] <krz> anyone can help me with this: http://stackoverflow.com/questions/11659334/how-do-i-update-an-item-in-an-array-in-this-document-structure

[22:14:15] <Tobsn> update( {visits.token_id:'yourid' ], { $set:{visits.0.minute:123} } )

[22:14:17] <Tobsn> krz

[22:14:20] <Tobsn> or something like that

[22:14:45] <Tobsn> update( { visits.token_id: 'yourid' }, { $set:{ visits.0.minute:123 } } );

[22:14:49] <Tobsn> try that out

[22:14:51] <Tobsn> not sure about the 0

[22:14:58] <Tobsn> see what a find would return

[22:15:29] <krz> Tobsn: but how do i know which document to look in?

[22:15:59] <krz> there are 2 other documents. each has its own _id

[22:16:13] <Tobsn> find( { visits.token_id: 'yourid' } ); there are two if you do that?

[22:16:20] <Tobsn> do a findOne

[22:16:22] <Tobsn> ;)

[22:16:33] <Tobsn> then you need a second idicator which one you want to update

[22:16:37] <krz> oh so first findOne by _id

[22:16:51] <krz> and then filter the visits array

[22:16:57] <Tobsn> ?

[22:17:12] <Tobsn> oh right it would return the whole object with the find

[22:17:25] <Tobsn> yeah no idea how to update a specific array element by its content

[22:18:27] <krz> hm i htink i have an idea

[22:18:32] <krz> ill try something out after lunch

[22:18:33] <Tobsn> ah found your problem

[22:18:34] <Tobsn> http://stackoverflow.com/questions/4669178/how-to-update-multiple-array-elements-in-mongodb

[22:21:12] <Tobsn> update( { visits.token_id: 'yourid' }, { $set: { visits.$.minute: 123 } } );

[22:21:14] <Tobsn> see if that works

[22:36:45] <iwoj> how do I create a one to many relationship in mongoDB?

[22:38:13] <Tobsn> parent_id in child object

[22:38:19] <Tobsn> as dbref

[22:38:20] <Tobsn> i guess

[22:38:25] <Tobsn> its mainly code logic

[22:40:42] <iwoj> i see. so there's not implicit way to have relational data in mongo.

[22:40:56] <iwoj> You have to use IDs and lookups in code?

[22:44:47] <iwoj> It looks like you can do something like: db.orders.find({"items.sku":12345},{_id:1})

[23:10:39] <dstorrs> are there any gotchas about indexes on a sharded collection that might cause inserts to silently fail?

[23:11:02] <dstorrs> we added an index recently and suddenly many of our inserts are not hitting the DB

[23:13:13] <Tobsn> iwoj, yeah its a non relational database

[23:17:46] <iwoj> but with the code example above it looks like documents can exist in more than one collection.

[23:23:00] <krz> Tobsn: with update( { visits.token_id: 'yourid' }, { $set: { visits.$.minute: 123 } } ); whats visits.$.minute ?

[23:23:27] <Tobsn> relates to the current object

[23:23:31] <Tobsn> like i said try it

[23:23:33] <Tobsn> maybe it works

[23:23:39] <Tobsn> set up a test collection

[23:23:45] <Tobsn> put in that one object and try around

[23:24:31] <krz> it works

[23:34:45] <krz> Tobsn: you use ruby?

[23:35:08] <Tobsn> nope

Log file Viewer

Help | Karma | Search:

#mongodb logs for Wednesday the 25th of July, 2012