[00:20:10] <mmlac> Is there a way to manipulate children of a model without hitting the database? Or more specific, how can I set a class ivar for all children and when they are lazily loaded call a hook (after_initialize?) on it?
[03:02:36] <jstout24> do i sacrifice speed if i have some indexed stuff not in the root top-level document?
[04:48:47] <dstorrs> daveluke: what are you trying to achieve?
[04:48:53] <Guest63591> you pass in a method and expect it to be used as callback?
[04:48:54] <mids> dstorrs: if this is only a one time operation on such a smal set; I'd just iterate over all of them, throw a dice and delete in 90% of the cases
[04:49:17] <daveluke> update object A, take updated object A and put it in object B
[04:49:44] <dstorrs> mids: is there a way to do that in the shell? I figured it out a week or so ago but since then for the life of me I can't reconstruct what I did.
[04:53:01] <dstorrs> I see a lot of edge cases here -- can users be in multiple lobbies? can they be in zero? what happens if their connection drops mid-session?
[04:55:10] <daveluke> the lobby info i want in the player, because i am identifying each player by their socket's given id
[04:55:41] <daveluke> so when i get a socket, and i want to broadcast only to players in his lobby, i can just take the ids of the players in the player's lobby!
[04:56:34] <daveluke> i'm wondering though if i need a player collection at all… if i can just do a find where lobby's players has an id of X
[04:57:19] <dstorrs> I would be more inclined to manage this as two separate collections of referenced docs. but that may be a matter of taste
[04:57:47] <dstorrs> what's the maximum number of users per lobby?
[04:58:00] <daveluke> it will be 4.. currently it doesn't really matter
[04:58:11] <daveluke> i am making left 4 dead basically but 2d!
[04:59:29] <dstorrs> well, if it's 4, I don't think it matters which way you do it. If it "doesn't really matter", then I'd say you better not use embedded docs -- if your site got popular, your docs would get very large, and very hard to work with / slow over the wire / possibly blow the 16M doc size limit
[05:00:42] <daveluke> so completely removing the players collection is not a bad idea
[08:17:04] <nopz> I'm using pymongo and i've got an OperationFailure: cursor id '6312762830493842190' not valid at server. I'm already using timeout=False on this request, I don't know what to do next to solve this problem.
[09:18:03] <jenner> guys, what might be the problem if I'm getting "[conn8] assertion 13436 not master or secondary, can't read ns:config.settings query:{ $query: {} }" on a former replica set primary which I restarted? looks like it lost the replica set config?
[09:28:54] <Lujeni> jenner, check with rs.status()
[09:29:30] <jenner> Lujeni: I can't, there's no rs now apparently :(
[09:30:42] <Lujeni> jenner, can you pastebin your replica set log ?
[09:47:19] <dstorrs> hey all. I've got a collection of hourly stats; there's about 30,000 users, each of whom has one stat every hour. I want to generate a "daily stats" collection with the latest stat from each day for each user.
[09:47:54] <dstorrs> this is for 90, 30, and 14 day periods.
[09:48:34] <dstorrs> "the latest stat for each day" might not be the same harvest for each user -- someone could have been missed. is there an easy way to do this?
[09:58:41] <NodeX> group by users to gather a list then re-itterate the list else map/reduce
[10:01:03] <JamesHarrison> Okay; is there any way I can get mongodb to sort by a field and be aware of a limit prior to filtering by an index to avoid a huge table scan for a small return set?
[10:03:44] <JamesHarrison> Okay, good and clear :D
[10:04:27] <remonvv> db.col.find()._addSpecial({$maxScan : <max docs to scan regardless>})
[10:04:35] <dstorrs> NodeX: ok, thanks. I'm already halfway through that, but it seems like a fair amount of code for something that is clearly a common use case. I was hoping there was a rollup I had missed.
[10:04:45] <remonvv> it's almost certainly useless though ;)
[10:04:48] <JamesHarrison> Will that bear in mind the sort criterion regardless though remonvv?
[10:05:13] <JamesHarrison> ie "given a sorted list of most recent docs, find me 20, search at most 200"?
[10:05:34] <remonvv> From a MongoDB perspective it'll simply stop scanning after X
[10:05:45] <remonvv> If it's scanning based on an index it could already be sorted
[10:06:00] <remonvv> I think scanAndSort : true/false in your explain() will notify you of the difference
[10:06:02] <dstorrs> JamesHarrison: see also http://www.mongodb.org/display/DOCS/Retrieving+a+Subset+of+Fields#RetrievingaSubsetofFields-CoveredIndexes
[10:06:35] <remonvv> if it scans and then sorts it will not meet your criteria, if it does a sorted b-tree walk it (accidently) will.
[10:06:52] <remonvv> I wouldn't build your functionality based on $maxScan really.
[10:06:56] <remonvv> Every query can be made fast enough.
[10:07:34] <dstorrs> even db.my_ginormous_collection.count() ? :>
[10:07:38] <JamesHarrison> remonvv: currently I'm scanAndSorting, just over 16,000 entries which is a tad suboptimal, and the indexes won't reduce that since it's a sort
[10:07:53] <remonvv> dstorrs, with notable design flaws in MongoDB excluded ;)
[10:07:55] <JamesHarrison> er, well, since it's limited by number and based on a sort
[10:08:19] <Derick> JamesHarrison: you can add the sorting field to the index as last part of your compound key and it should be able to use it
[10:08:28] <JamesHarrison> Derick: Already done, no joy.
[10:08:29] <Lujeni> Hello - FindOne and sort are compatible ?
[10:10:05] <dstorrs> Derick: IIUC, if I have this index: { foo : 1, bar : 1 } then the query db.coll.find({ foo : 'bleh' }) can use the index but db.coll.find({ bar : 'bleh' }) cannot. is that right?
[10:10:43] <remonvv> a compound index A, B will overlap an index A but not B. If you query on B it cannot use an index because you don't have any index that starts with B.
[10:10:59] <remonvv> if you change your index to {bar : 1, foo :1} both queries will hit the index
[10:11:10] <remonvv> but that might not be faster if "bar" has very low cardinality
[10:11:11] <dstorrs> thanks. I always thought was the case in both RDBMS and NoRelDBs, but then I used MySQL....
[10:11:23] <Derick> remonvv: no, "foo": something won't use it then :P
[10:12:45] <Derick> Lujeni: just do find().sort().limit(1)
[10:12:54] <dstorrs> if I have index { foo : 1, bar : 1 } is there a diff between these queries?: db.coll.find({foo:'a', bar : 'b' }) db.coll.find({bar :'b', foo : 'a' })
[10:13:27] <dstorrs> ok, good. order of params not relevant then.. good to know. thanks
[10:13:28] <remonvv> I'd like to point out at this juncture that I actually agree with Guest63591 here. Sorting and returning the top document isn't that uncommon a use case. findOne() should allow sort().
[10:13:48] <Lujeni> Derick, yeah i use this solution
[10:13:48] <remonvv> dstorrs, order of index fields is relevant (and their direction!), order in which you use them within a query isn't.
[10:14:12] <Derick> it's just the shell that doesn't do it
[10:14:35] <remonvv> Oh? I thought the issue was that findOne() returns a document rather than a cursor and as such you can't invoke .sort() on it.
[10:14:48] <remonvv> (so the sort clause should be an optional parameter of findOne() )
[10:14:59] <Derick> findOne isn't a server command... it's just a query wrapper thingy
[10:15:35] <remonvv> Oh I know but that doesn't really change the argument that much. All drivers and the shell support findOne so a consistent and usable function signature would be nice.
[10:15:38] <Derick> it really doesn't do more than just find().limit(1) :-)
[10:15:54] <remonvv> right, so it should allow a find().sort(S).limit(1) as well ;)
[10:19:25] <Derick> also, you use $nin, which can't really use an index
[10:19:43] <Derick> if it's a fixed set, use $in with the rest of the possible values
[10:19:44] <remonvv> well it can but it doesn't help anyone ;)
[10:20:55] <remonvv> also, a compound index that starts with a boolean will only really reduce candidate sets if it'll match only a relatively small part of the data. "hidden_from_users" sounds like an exception rather than the rule.
[10:21:30] <remonvv> I'm going to be bold and say this index : {created_at : 1} is going to give you the best results
[10:21:41] <remonvv> 1 or -1, doesn't matter for single key
[10:21:43] <Derick> yes, a very low cardinality isn't good, but not too much of a prob if the compound key works (which it doesn't now)
[10:21:50] <JamesHarrison> $in would be being called with around 10,000 values
[10:22:06] <Derick> JamesHarrison: I suggest you add an extra field with that flag then
[10:22:07] <JamesHarrison> $nin is only up to about 50 or so at most
[11:47:48] <remonvv> The sort only index was meant to scan down a sorted list but the scanning (in that case) isn't helped by any index. In other words it will scan linearly until it hits N items in your limit(N)
[11:47:59] <remonvv> There should be a faster option
[11:49:31] <scruz> what is the recommended way of relating two documents?
[11:51:22] <remonvv> Embed if you can, but you usually shouldn't in which case you have to store the _id value of the referred document in the referring document.
[11:52:55] <scruz> so, basically no different from relations in RDBMSes
[13:19:31] <locojay> hi anyone using mongo-hadoop on osx? streaming works for me with dumbo... but with mongo-hadoop its chunks the collections but then fails (no avalable error) : https://gist.github.com/2969688
[14:20:02] <dnnsmanace> would anyone be so kind to advise on a data structure?
[14:21:29] <Guest63591> would anyone ask a question before expecting an answer to an unanswered question?
[14:25:12] <Chel> question: i have a problem, my mongo client cant connect for a long time. this is a log: https://gist.github.com/7c4b7bc841ca837fb4c4
[14:26:08] <dnnsmanace> for the pair i tried setting { type: String, unique: true, sparse: true } but it keeps seeing null as a dup
[14:27:14] <dnnsmanace> there must be a general structure for databases when you pair entries on the go to ensure you don't pair to any entry except those two
[14:37:50] <Chel> and it works if i use mongo --nodb key
[14:38:36] <Chel> and in /var/log/mongodb/mongodb.log nothing strange
[14:38:36] <Guest63591> i wonder why 75% of the people here are not capable asking a reasonable question or ask in complete sentences or providing reasonable information?
[14:42:30] <Guest63591> but #mongodb is completely below
[14:42:30] <FerchoArg> haha you're right. Is there any way to do this within a query? : Suppose I have a collection of places, which has City Name, State, Street name. I want to make a serach of a regex. The regex should be compared to the three fields, City Name, State and Street, but documents in which City matches the regex should be listed first, or better weighted in some way
[14:43:18] <Chel> Guest63591: what is not correct in my question ? i have a problem, i've described it with logs and software versions. What else ?
[15:36:19] <ron> zak_: depends on how you installed mongodb and your os.
[15:36:56] <ron> NodeX: not really. I'm just a bit fed up of people who say 'yes' when they mean to say 'I don't need to check it since I *know* the answer'.
[15:37:22] <NodeX> I was reffering to the PHP comment!
[15:37:34] <zak_> when i issue this command,the prompt shows mongodb running
[15:37:39] <ron> NodeX: well, that was just in reference to me being sarcastic ;)
[16:18:08] <Pilate> Is it possible to define a function that is callable from within map/reduce without having to define it within the map/reduce functions?
[16:25:48] <sir> has anyone run into an issue with gridfs (using python driver) where you attempt to get a file using the objectid, and it says there is no file in the collection with that objectid?
[16:25:56] <sir> even when you can query and see the file is in fact there
[16:29:46] <kali> Pilate: have you tried putting your function in the "scope" hash ?
[16:38:27] <pgentoo> hey guys, i'm looking at using mongo for keeping a large mysql table archived off, but still searchable. I see that craigslist does a simliar thing, and am just curious how this works. I have a lot of records per user in my system, so i can't just do a document per user, so curiuos what bestpractice is around this. For reference, think of something like bit.ly, where there are a lot of
[16:38:28] <pgentoo> users, but many many clicks that need to be kept track of per user. (i'm not working for bit.ly, but similar type service)
[16:42:35] <dnnsmanace> hey, i am having some trouble making pairing documents uniquely, and because of my query i get two different documents paired to one, whereas i am looking to have only unique pairs
[16:42:50] <dnnsmanace> any schema modeling advice for this?
[16:43:44] <addisonj> i am have really interesting performance on doing large numbers of upserts. I have ~100,000 records. I am processing 1,000 records at a time, the first batch took only .8 seconds, the 54th batch is up to ~39 seconds…is this just due to index building?
[16:44:00] <dnnsmanace> basically i want a user paired to another user, but when my query happens sometimes the script finds an unpaired user, but by the time it updates t another instance of the script already paired that user so i am getting double pairs to one person
[16:51:08] <dnnsmanace> but what sometimes happens is, because first i do findOne, and then i update
[16:51:41] <dnnsmanace> findOne finds a null paired_user, but by the time it updates another instance of a pair attempt already paired that user so he is actually no longer null by the time it updates
[16:51:59] <kali> dnnsmanace: ok. have a look at findAndModify
[16:52:00] <dnnsmanace> so i end up having two different users thinking they are in a pair with a third
[16:52:36] <dnnsmanace> i have been looking at it just now
[16:52:53] <dnnsmanace> is there such a thing as findAndUpdate or something?
[16:53:35] <dnnsmanace> because it findAndModify, as far as i understand, just overwrites the whole document
[16:53:40] <kali> dnnsmanace: findAndModify will returns you the old or the new document (as you prefer) so you can check the update worked, and do something else if need be
[16:54:00] <kali> dnnsmanace: nope, it it the same as update() you can use modifiers
[16:54:38] <dnnsmanace> so if i leave the email field out it will not overwrite it?
[16:55:32] <kali> dnnsmanace: just use { $set: { paired_email : ... } } as the second argument
[16:55:41] <kali> dnnsmanace: it will not alter other fields
[16:55:41] <addisonj> kali: to go in more depth, we are trying to import a few million documents daily from a partner and joining that to our own data. so our own data persists but each day the updated dat a from our partner is joined with an upsert
[16:56:36] <kali> addisonj: so the documents basically... constantly grow ?
[16:56:37] <dnnsmanace> kali: i really appericate your help, i will try this
[16:59:03] <addisonj> kali: uhh, well mostly we are just replacing the meta-data with updated meta-data, typically, this data stays the same from day to day
[16:59:26] <dijonyummy> in pymongo how do i check if the result of a find() is null or empty? ie not found. == null doesnt work
[17:00:53] <kali> addisonj: ok, this is better... another thing to check is the %lock (run mongostats) and the page faults. if your collection and index outgrow your RAM... well, you need more RAM :)
[17:11:40] <addisonj> hrm I am thinking there just isn't a good way to do this with upserts… just going to be slower than bulk inserts or mongoimporter, I think we may try a map-reduce or something to join the two data-sets together
[17:14:33] <addisonj> so you are saying the reason I am seeing degraded performance is just page faults huh… hrm… I will try on a big server then
[17:14:48] <kali> check out the stats() for your collection
[17:16:15] <addisonj> avgObjSize is 530, total size is 10116620, i am assuming that is bytes?
[17:17:43] <kali> sounds safe... 10M this is nothing
[17:17:50] <kali> unless you're runnin on an apple 2
[17:23:37] <mmlac> When I load an object that has_many OtherObjects, what callbacks are called when on the OtherObject? Is there a way to inject a variable into the "stub" and evaluate the "stub" before the model gets generated for real?
[17:26:41] <addisonj> yeah, even with that small of a dataset so far, the time to insert 1,000 documents after inserting 14,000 balloons to 15 seconds
[17:29:05] <jenner> guys, how can I make sure a certain node is elected as primary when I call rs.stepDown()? will a higher priority of the desired node help?
[17:31:01] <kali> jenner: yes. it will also steal primary status when you update your cofniguration if it is not primary
[17:31:47] <jenner> kali: ah, so I basically don't even need to step down?
[17:31:48] <pgentoo> anyone have any comments on my earlier question (about structure of storing say 1billion click records tying back to say 10K total system users)? I realize its too big to store all a users clicks as a single document for that user, so just curiuos on how to attack the problem (new to mongodb...)
[17:32:56] <kali> jenner: the current primary will actualy step down when it will see somebody else has priority
[17:33:38] <jenner> kali: thanks, that's really good to know
[17:53:51] <sir> can anyone think of why a file would exist in gridfs, i can find it using mongo queries, but gridfs can't see it?
[19:00:39] <pgentoo> I have a single mysql table that i'd like to try moving into mongodb for testing. Currently the mysql data is on the filesystem in the form of a mysqldump file (insert statements for each record). is there some "easy" way of inserting this data into mongo?
[19:03:04] <pgentoo> I saw some tools (mongovue?) but requires the data to still be in mysql... would take ages to reimport this stuff..> :(
[19:09:21] <Azra-el> i have a listfiend of embedded fields inside my collection and I would like to know if its possible to query by one of the fields in that embedded fields and as a result get that embedded field to get more info from it. for example : http://pastie.org/4133935 I would like to find the embedded field with internal_id of 5 and get the "name" of that embedded field
[19:16:29] <mids> pgentoo: possibly you can change the file into a tsv format with some sed magic and then import it using mongoimport ?
[19:26:23] <dstorrs> hey all. I have a collection with a boat-ton of statistics (video metadata from youtube). I've got an index on { harvested_epoch : -1, publisher : 1 }. Will that index be used in the sort clause of a map_reduce that says:
[19:26:49] <dstorrs> "all videos by 'bob' more recent than X, sort ascending" ?
[19:27:35] <dstorrs> IIUC it will be used by the query, it's just the sort that I'm not sure about -- does sort ever use indices, or is it always in RAM?
[19:34:02] <pgentoo> mids, thanks for info on taht option. I'll look into it
[19:34:41] <mids> probably csv is even closer; but you get the point
[19:35:51] <pgentoo> are there any documents out there on schema design for mongo? I have hundreds of millions of mysql records (single table) that sotre attributes about a click through my system. I'm curious how to best design the storage of this in mongo. Multiple collections, one per user? Just one huge collection? etc... Could easily be 1 billion click documents in a couple years...
[19:36:13] <pgentoo> would liek to be able to shard it out across servers to be able to scale that way in teh future, etc...
[19:36:35] <dstorrs> read up on embedding and linking too
[19:36:46] <mids> also http://www.10gen.com/presentations
[19:37:04] <pgentoo> ok, do you have any hints on suggestions for hte design that i can keep in mind?
[19:37:57] <dstorrs> in general, be careful about embedded docs. They sound great in theory and sometimes they are the best solution, but they complicate the code a lot and introduce potential issues (e.g. "what if this grows beyond the 16M doc limit?" "how do I shard this efficiently?" etc)
[19:38:29] <dstorrs> you NEED a profiler when doing this stuff. It's not optional.
[19:40:25] <pgentoo> i have clients that will be doing millions of clicks/day, so embedding those isn't possible. All my click "document" (current row in mysql) has a bunch of denormalized data already. bunch of bool values, some text strings, dates, etc. Everything i need to be able to generate my reports later.
[19:40:42] <pgentoo> in taht case, i'm assuming that each row would just be its own document, with no embedding or linking...
[19:42:23] <pgentoo> if i had a single collection of "clicks" and each click had say a "username" property or something similar that would distributed my writes/reads well, i could shard on that right? Its not sharding by colleciton right, by some value of the documents?
[19:42:28] <Azra-el> i have a listfiend of embedded fields inside my collection and I would like to know if its possible to query by one of the fields in that embedded fields and as a result get that embedded field to get more info from it. for example : http://pastie.org/4133935 I would like to find the embedded field with internal_id of 5 and get the "name" of that embedded field
[19:43:43] <spillere> I wanna add a information which is in a list inside like this {'name':name, 'photo':[{id:1, name:daniel},{id:2, name:pedro}]}, how do I add information after id:1?
[19:44:00] <spillere> so it would be {'name':name, 'photo':[{id:1, name:daniel, info:NEW},{id:2, name:pedro}]},
[19:47:44] <spillere> i'm trying an insert like this db.dataz.update({'username': 'dansku', "photos": [ {"filename":"5gzv38erpm1wxnfu4i0jq9ltk2b6yc.jpg"}]}, {"$push":{"name":"lalala"}})
[20:27:44] <sir> i'm getting a lot of "replSet syncTail: 13629 can't have undefined in a query expression" from slaves trying to replicate. i think someone tried copying bad data
[20:27:59] <sir> how do i remove all of those from the oplog on the primary so they secondaries stop trying to replicate them and failing?
[21:18:02] <dijonyummy> how to specify a not equal in a find() for a property via pymongodb or mongodb in general find({"date" : not mydate}) like that but thats wrong or is it
[21:34:06] <dijonyummy> its not possible to do that right? do i have to get the results, then further filter based on the lastModified being not equal?
[21:40:19] <dijonyummy> neverminded this fixed it in pymongo non-shell... "lastModified" : {"$ne" : os.path.getmtime(file_path) }} )
[21:56:29] <pgentoo> how chatty are mongo write patterns? Curious about backing mongodb on ssd...
[22:35:42] <mmlac> Is there any documentation on the lifecycle of mongoid?
[22:44:23] <ranman> mmlac: same semantic versioning as linux kernel
[23:22:29] <iamjarvo> would mongo be good for storing a document like an exccel sheet
[23:23:28] <iamjarvo> i feel like if i could grab the whole document instead of doing multiple queries on a relational database
[23:47:57] <acidjazz> so my collection is an obj w/ an array inside that has tags
[23:48:00] <acidjazz> and i want results that match those tags
[23:48:12] <acidjazz> iamjarvo: check out mongogridfs
[23:50:11] <dstorrs> iamjarvo: what are you trying to achieve? by "like a Excel doc" do you mean an actual .xls file? or something with the data from an XLS ?
[23:50:31] <iamjarvo> basically i am want to port my document sonline