PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Friday the 22nd of June, 2012

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:20:10] <mmlac> Is there a way to manipulate children of a model without hitting the database? Or more specific, how can I set a class ivar for all children and when they are lazily loaded call a hook (after_initialize?) on it?
[03:02:36] <jstout24> do i sacrifice speed if i have some indexed stuff not in the root top-level document?
[03:03:34] <jstout24> ie) "foo": ObjectId('….') vs "foo": { $ref: ObjectId('…') }
[03:56:52] <daveluke> can i update a document where i just specify what needs to be changed, in opposed to inputting the entire revised document?
[03:58:19] <daveluke> yay, never mind i used $set
[04:26:49] <dstorrs> I've got a collection of 30,000 docs. I'd like to remove 90% of them, randomly selected. What's the best way to do that?
[04:46:35] <carsten> hi creatures
[04:47:28] <daveluke> can the update function return the updated doc?
[04:47:51] <dstorrs> daveluke: no. you need to do the update(), then a find()
[04:48:06] <dstorrs> or use findAndModify, but that is about two orders of magnitude slower
[04:48:07] <Guest63591> findAndModify() is your friend
[04:48:27] <daveluke> and is there a callback for updates? i've been using a third parameter as a function, but the docs say thats an upset boolean
[04:48:29] <daveluke> upsert
[04:48:47] <dstorrs> daveluke: what are you trying to achieve?
[04:48:53] <Guest63591> you pass in a method and expect it to be used as callback?
[04:48:54] <mids> dstorrs: if this is only a one time operation on such a smal set; I'd just iterate over all of them, throw a dice and delete in 90% of the cases
[04:49:17] <daveluke> update object A, take updated object A and put it in object B
[04:49:44] <dstorrs> mids: is there a way to do that in the shell? I figured it out a week or so ago but since then for the life of me I can't reconstruct what I did.
[04:50:13] <Guest63591> do what on the shell?
[04:50:20] <dstorrs> daveluke: that's not a goal, that's a pseudocode description. What is the purpose of your actual code?
[04:50:50] <dstorrs> Guest63591: nuke a randomly selected 90% of the entries in a collection (30k items)
[04:50:56] <daveluke> i have players and lobbies.. lobbies have an array of players.. i want to put the lobby in the player object too though
[04:51:07] <daveluke> which i guess i should ask if thats the way i should do it
[04:51:27] <daveluke> the point is that then i can later just get the player and not need to query the lobby
[04:51:43] <Guest63591> http://www.mongodb.org/display/DOCS/Schema+Design
[04:53:01] <dstorrs> I see a lot of edge cases here -- can users be in multiple lobbies? can they be in zero? what happens if their connection drops mid-session?
[04:53:21] <daveluke> users can be in one lobby
[04:53:22] <dstorrs> (presumably that's why you want to persist the lobby info, but what about other state...?)
[04:53:25] <daveluke> zero, yes
[04:53:57] <daveluke> if there connection drops, i have a disconnect event that deletes the player from the lobby
[04:54:04] <daveluke> their
[04:55:10] <daveluke> the lobby info i want in the player, because i am identifying each player by their socket's given id
[04:55:41] <daveluke> so when i get a socket, and i want to broadcast only to players in his lobby, i can just take the ids of the players in the player's lobby!
[04:56:33] <dstorrs> hm.
[04:56:34] <daveluke> i'm wondering though if i need a player collection at all… if i can just do a find where lobby's players has an id of X
[04:57:19] <dstorrs> I would be more inclined to manage this as two separate collections of referenced docs. but that may be a matter of taste
[04:57:47] <dstorrs> what's the maximum number of users per lobby?
[04:58:00] <daveluke> it will be 4.. currently it doesn't really matter
[04:58:11] <daveluke> i am making left 4 dead basically but 2d!
[04:59:29] <dstorrs> well, if it's 4, I don't think it matters which way you do it. If it "doesn't really matter", then I'd say you better not use embedded docs -- if your site got popular, your docs would get very large, and very hard to work with / slow over the wire / possibly blow the 16M doc size limit
[05:00:42] <daveluke> so completely removing the players collection is not a bad idea
[05:01:02] <daveluke> hmm
[05:03:07] <mids> dstorrs: db.x.find().forEach(function(o) { if (Math.random() > 0.5) { db.x.remove(o) }})
[05:03:39] <dstorrs> oh, nice. I didn't think removing from the collection you were iterating on was legit...?
[05:04:03] <mids> it seems to work :)
[05:05:45] <daveluke> if i have an array as a field.. can i query for all docs where that array has an object with field X?
[05:06:11] <daveluke> so players live in lobbies.. i want to find the lobby where player X lives
[05:07:45] <dstorrs> daveluke: yes, you can. db.coll.find({ 'players.name' : 'player_name' })
[05:08:21] <daveluke> wow cool
[05:09:09] <dstorrs> you can even use the $push, $pushAll, and $pull modifiers on update() to change that array in a performant way.
[05:10:45] <mids> dstorrs: btw you might wanna db.eval that
[05:11:02] <dstorrs> mids: ok. why?
[05:11:44] <mids> so those documents arent all transmitted to your client
[05:12:07] <dstorrs> ah. yes, of course. thanks.
[05:15:21] <mids> dstorrs: alternative => db.x.remove({ $where : function() { return Math.random() > 0.5 }})
[05:17:45] <dstorrs> oh, shiny.
[05:17:59] <dstorrs> working with the system and everything. Thanks, mids
[05:18:20] <mids> supreme: db.x.find({ $where : "Math.random() > 0.5" })
[05:18:29] <mids> s/find/remove/ :)
[05:18:48] <dstorrs> heh, I was wondering.
[05:19:02] <dstorrs> Why is that one better than the previous? less processing?
[05:19:55] <mids> less typing
[05:20:35] <dstorrs> ah.
[07:24:46] <dstorrs> anyone still about?
[07:35:41] <[AD]Turbo> hi there
[08:16:01] <nopz> Hi
[08:17:04] <nopz> I'm using pymongo and i've got an OperationFailure: cursor id '6312762830493842190' not valid at server. I'm already using timeout=False on this request, I don't know what to do next to solve this problem.
[08:37:46] <multiHYP> hi
[09:18:03] <jenner> guys, what might be the problem if I'm getting "[conn8] assertion 13436 not master or secondary, can't read ns:config.settings query:{ $query: {} }" on a former replica set primary which I restarted? looks like it lost the replica set config?
[09:28:54] <Lujeni> jenner, check with rs.status()
[09:29:30] <jenner> Lujeni: I can't, there's no rs now apparently :(
[09:30:42] <Lujeni> jenner, can you pastebin your replica set log ?
[09:47:19] <dstorrs> hey all. I've got a collection of hourly stats; there's about 30,000 users, each of whom has one stat every hour. I want to generate a "daily stats" collection with the latest stat from each day for each user.
[09:47:54] <dstorrs> this is for 90, 30, and 14 day periods.
[09:48:34] <dstorrs> "the latest stat for each day" might not be the same harvest for each user -- someone could have been missed. is there an easy way to do this?
[09:58:41] <NodeX> group by users to gather a list then re-itterate the list else map/reduce
[10:01:03] <JamesHarrison> Okay; is there any way I can get mongodb to sort by a field and be aware of a limit prior to filtering by an index to avoid a huge table scan for a small return set?
[10:03:28] <Guest63591> no
[10:03:28] <remonvv> Yes
[10:03:44] <JamesHarrison> Okay, good and clear :D
[10:04:27] <remonvv> db.col.find()._addSpecial({$maxScan : <max docs to scan regardless>})
[10:04:35] <dstorrs> NodeX: ok, thanks. I'm already halfway through that, but it seems like a fair amount of code for something that is clearly a common use case. I was hoping there was a rollup I had missed.
[10:04:45] <remonvv> it's almost certainly useless though ;)
[10:04:48] <JamesHarrison> Will that bear in mind the sort criterion regardless though remonvv?
[10:05:13] <JamesHarrison> ie "given a sorted list of most recent docs, find me 20, search at most 200"?
[10:05:21] <remonvv> Depends on a few things.
[10:05:34] <remonvv> From a MongoDB perspective it'll simply stop scanning after X
[10:05:45] <remonvv> If it's scanning based on an index it could already be sorted
[10:06:00] <remonvv> I think scanAndSort : true/false in your explain() will notify you of the difference
[10:06:02] <dstorrs> JamesHarrison: see also http://www.mongodb.org/display/DOCS/Retrieving+a+Subset+of+Fields#RetrievingaSubsetofFields-CoveredIndexes
[10:06:35] <remonvv> if it scans and then sorts it will not meet your criteria, if it does a sorted b-tree walk it (accidently) will.
[10:06:52] <remonvv> I wouldn't build your functionality based on $maxScan really.
[10:06:56] <remonvv> Every query can be made fast enough.
[10:07:34] <dstorrs> even db.my_ginormous_collection.count() ? :>
[10:07:38] <JamesHarrison> remonvv: currently I'm scanAndSorting, just over 16,000 entries which is a tad suboptimal, and the indexes won't reduce that since it's a sort
[10:07:53] <remonvv> dstorrs, with notable design flaws in MongoDB excluded ;)
[10:07:55] <JamesHarrison> er, well, since it's limited by number and based on a sort
[10:08:19] <Derick> JamesHarrison: you can add the sorting field to the index as last part of your compound key and it should be able to use it
[10:08:28] <JamesHarrison> Derick: Already done, no joy.
[10:08:29] <Lujeni> Hello - FindOne and sort are compatible ?
[10:08:48] <dstorrs> Lujeni: yes
[10:08:50] <Derick> Lujeni: have you tried it?
[10:09:00] <Lujeni> Derick, y
[10:09:33] <Derick> JamesHarrison: care to show your query and which index you made for it?
[10:09:33] <remonvv> JamesHarrison, perhaps post the index it's using now and youre query in a pastie so we can have a look.
[10:09:42] <Lujeni> db.foofindOne().sort({'_id':1}) , Fri Jun 22 12:04:38 TypeError: db.tag6545542.findOne().sort is not a function (shell):1
[10:09:47] <remonvv> Derick is soooo unoriginal with his questions.
[10:09:55] <Derick> remonvv: I was first, copy-cat :P
[10:09:55] <JamesHarrison> Derick/remonvv: Sure, hang on a tick
[10:10:00] <Guest63591> findOne() and sort() is pointless
[10:10:02] <remonvv> Derick, details.
[10:10:05] <dstorrs> Derick: IIUC, if I have this index: { foo : 1, bar : 1 } then the query db.coll.find({ foo : 'bleh' }) can use the index but db.coll.find({ bar : 'bleh' }) cannot. is that right?
[10:10:19] <remonvv> dstorrs, yes it's right
[10:10:43] <remonvv> a compound index A, B will overlap an index A but not B. If you query on B it cannot use an index because you don't have any index that starts with B.
[10:10:59] <remonvv> if you change your index to {bar : 1, foo :1} both queries will hit the index
[10:11:10] <remonvv> but that might not be faster if "bar" has very low cardinality
[10:11:11] <dstorrs> thanks. I always thought was the case in both RDBMS and NoRelDBs, but then I used MySQL....
[10:11:23] <Derick> remonvv: no, "foo": something won't use it then :P
[10:11:44] <remonvv> Isn't that what I said?
[10:11:55] <Derick> 11:08 <remonvv> if you change your index to {bar : 1, foo :1} both queries will hit the index
[10:11:59] <remonvv> Ohhh
[10:12:18] <remonvv> sorry, i thought you had a {foo, bar} query and a {bar} query
[10:12:18] <Lujeni> Guest63591, so why is compatible? how i can find first or last document with findone and sort ASCENDING / DESCENDING
[10:12:24] <Guest63591> it is NOT compatible
[10:12:28] <remonvv> So yeah, two seperate indexes then ;)
[10:12:32] <Guest63591> findOne() returns *one* document
[10:12:35] <Guest63591> which can not sorted
[10:12:45] <Derick> Lujeni: just do find().sort().limit(1)
[10:12:54] <dstorrs> if I have index { foo : 1, bar : 1 } is there a diff between these queries?: db.coll.find({foo:'a', bar : 'b' }) db.coll.find({bar :'b', foo : 'a' })
[10:13:01] <Derick> dstorrs: no
[10:13:27] <dstorrs> ok, good. order of params not relevant then.. good to know. thanks
[10:13:28] <remonvv> I'd like to point out at this juncture that I actually agree with Guest63591 here. Sorting and returning the top document isn't that uncommon a use case. findOne() should allow sort().
[10:13:48] <Lujeni> Derick, yeah i use this solution
[10:13:48] <remonvv> dstorrs, order of index fields is relevant (and their direction!), order in which you use them within a query isn't.
[10:14:02] <remonvv> Sorry, agree with Lujeni
[10:14:02] <Derick> remonvv: have you seen the output of "db.poi.findOne" (without parenthesis)
[10:14:06] <remonvv> I need coffee.
[10:14:06] <dstorrs> right. gotcha
[10:14:12] <Derick> it's just the shell that doesn't do it
[10:14:35] <remonvv> Oh? I thought the issue was that findOne() returns a document rather than a cursor and as such you can't invoke .sort() on it.
[10:14:48] <remonvv> (so the sort clause should be an optional parameter of findOne() )
[10:14:59] <Derick> findOne isn't a server command... it's just a query wrapper thingy
[10:15:35] <remonvv> Oh I know but that doesn't really change the argument that much. All drivers and the shell support findOne so a consistent and usable function signature would be nice.
[10:15:38] <Derick> it really doesn't do more than just find().limit(1) :-)
[10:15:54] <remonvv> right, so it should allow a find().sort(S).limit(1) as well ;)
[10:16:00] <remonvv> findOne(query, sort)
[10:16:00] <Lujeni> Derick, remonvv "All arguments to find() are also valid arguments for find_one()" in pymongo documentation
[10:16:22] <Derick> remonvv: sticking arguments to findOne() is a tad ugly... as I can dream up a lot more that we might have to add
[10:16:35] <remonvv> Lujeni, not familiar with pymongo but that might be an API design flaw if it implies you can sort a findOne atm.
[10:16:48] <Derick> Lujeni: not every driver is the same though. So I guess Python does support it, but the PHP one does not (and can't)
[10:17:17] <remonvv> Derick, I agree but the use case for sort is a LOT more common than, say, any addSpecial clause.
[10:17:30] <remonvv> But I agree that it's not a big enough issue to really put any effort in ;)
[10:18:00] <JamesHarrison> http://pastie.org/private/mjw8vwcfks5masbyzmpg
[10:18:02] <JamesHarrison> there we go.
[10:18:30] <remonvv> so it hits 0 indexes?
[10:18:35] <remonvv> Well, room for improvement then ;)
[10:18:37] <remonvv> let's have a look
[10:18:51] <Derick> "created_at" : 1
[10:18:53] <Derick> should be -1
[10:19:12] <Derick> (in the index)
[10:19:25] <Derick> also, you use $nin, which can't really use an index
[10:19:43] <Derick> if it's a fixed set, use $in with the rest of the possible values
[10:19:44] <remonvv> well it can but it doesn't help anyone ;)
[10:20:55] <remonvv> also, a compound index that starts with a boolean will only really reduce candidate sets if it'll match only a relatively small part of the data. "hidden_from_users" sounds like an exception rather than the rule.
[10:21:30] <remonvv> I'm going to be bold and say this index : {created_at : 1} is going to give you the best results
[10:21:41] <remonvv> 1 or -1, doesn't matter for single key
[10:21:43] <Derick> yes, a very low cardinality isn't good, but not too much of a prob if the compound key works (which it doesn't now)
[10:21:50] <JamesHarrison> $in would be being called with around 10,000 values
[10:22:06] <Derick> JamesHarrison: I suggest you add an extra field with that flag then
[10:22:07] <JamesHarrison> $nin is only up to about 50 or so at most
[10:22:13] <Derick> hm
[10:22:35] <remonvv> yeah I think you need to explore schema changes
[10:22:51] <remonvv> Because having an index on an array field with a ton of possible values is going to result in a big index.
[10:23:13] <Derick> instead of having tags in an array, perhaps: { 'tagname': 1, 'tagname2' : 0 } in some form?
[10:23:35] <Derick> then again, elemMatch with an index isn't great either
[10:23:37] <remonvv> Hm, I would go for a boolean that you update based on the contents of your tags array on every write
[10:23:58] <Derick> don't know the rest of the schema, so hard to say really
[10:24:09] <JamesHarrison> I was thinking about storing tag names denormalised as a string
[10:24:12] <remonvv> but then you're still stuck with a double boolean compound index which has a grand total of 4 different paths
[10:24:15] <JamesHarrison> and using a regexp
[10:24:25] <remonvv> That certainly wont help ;)
[10:24:30] <JamesHarrison> figured not :p
[10:24:38] <Derick> regexp can only use a key if it's anchored to the start of the string (with ^)
[10:24:48] <JamesHarrison> yeah, which kinda doesn't help us at all
[10:24:55] <remonvv> you need to avoid the $nin and replace it with a prebaked flag
[10:25:11] <JamesHarrison> trouble is it can't be prebaked
[10:25:17] <JamesHarrison> the $nin values are user specific
[10:25:29] <remonvv> In that case you probably need a 2 step query.
[10:25:43] <remonvv> tag -> doc collection and this query
[10:26:32] <remonvv> But a quick step back. How many % of the total dataset is skipped by the hidden_from_users flag?
[10:26:33] <JamesHarrison> but how does that help in terms of reducing the collection size for $nin?
[10:26:36] <remonvv> how common is true versus false
[10:26:52] <JamesHarrison> remonvv: tiny, we have maybe 60 out of 16,000
[10:27:07] <remonvv> so adding that to the index is completely useless?
[10:27:46] <Derick> well, if that's the only query he's running the matching only has to be done on those 60
[10:27:54] <Derick> as opposed to the current 15259
[10:28:02] <Derick> right now, no index is used
[10:28:24] <Derick> JamesHarrison: try an index on just hidden_from_users, it should make *this* case better
[10:28:38] <remonvv> Isn't the situation reversed? he's querying for false
[10:28:42] <Derick> or even hidden_from_users, created: -1
[10:28:49] <JamesHarrison> er, yeah, I'm asking for false, not true
[10:28:55] <Derick> remonvv: how does it matter? the cardinality of the index is still 2
[10:29:00] <JamesHarrison> so that really doesn't help in terms of solution space
[10:29:08] <Derick> ?
[10:29:14] <JamesHarrison> $nin still gets to check 16,000 - 60
[10:29:37] <JamesHarrison> the hidden_from_users field is fairly irrelevant to this unless I'm much mistaken (which I may be)
[10:29:45] <Derick> uh, the index should get that down to 60... or am I mistaken here?
[10:30:19] <JamesHarrison> other way around I'm afraid
[10:30:34] <JamesHarrison> we're asking for "what's not hidden from the users, and..." and only a small fraction is hidden
[10:30:54] <remonvv> The candidate set after that first index field is 15700+ i think
[10:30:58] <remonvv> right
[10:31:04] <remonvv> in that case it shouldn't be in the index
[10:31:29] <remonvv> index fields must eliminate a significant portion of the candidate set in the most common usecases to be effective
[10:31:59] <Derick> yes, of course
[10:32:21] <JamesHarrison> Yeah, okay, so we drop that from the index
[10:32:22] <remonvv> Have you tried my index on sort field only approach?
[10:32:25] <remonvv> It'll serve as a reference
[10:32:51] <remonvv> Because that's "worst case" in that it'll stop scanning only if it found X elements on the sorted set
[10:32:59] <JamesHarrison> remonvv: it's building now
[10:33:06] <remonvv> building? for 16k?
[10:33:13] <remonvv> that usually takes a second ;)
[10:33:20] <remonvv> sorry, afk for a bit, keep me posted
[10:33:30] <JamesHarrison> probably done by now, yeah, just a fair bit going on in currentOP so just giving it a second. ;)
[10:34:29] <JamesHarrison> so that's now using the created_at index, but still scans the entire dataset
[10:34:47] <Derick> yeah, I think it will be tricky to use an index for this query :S
[10:35:09] <JamesHarrison> http://pastie.org/private/s2rsk0sffpknnmsh1s1hdg
[10:35:17] <JamesHarrison> I don't get it, though
[10:35:38] <JamesHarrison> mongodb at no point, given it can sort by the index, needs to go scanning the whole table
[10:35:51] <Derick> yes it does
[10:35:58] <Derick> it would only stop once it finds 21 items
[10:36:00] <Derick> but it finds 0
[10:36:47] <JamesHarrison> ... which is bizzare now I've just noticed this
[10:37:20] <JamesHarrison> I am recreating by-hand my ORM's query in javascript here because my ORM doesn't give me a js query output, hang on
[10:37:36] <JamesHarrison> let me do this in ORM-land and I can ask for an explain from there
[10:40:15] <Derick> got to go, my flight is ready for boarding
[10:40:19] <Derick> JamesHarrison: good luck!
[10:41:23] <JamesHarrison> Huh, okay, and with the tag_ids_1_created_at_-1 and created_at_-1 indexes, it's gone for created_at_-1 and is doing well
[11:47:09] <remonvv> Wait what?
[11:47:48] <remonvv> The sort only index was meant to scan down a sorted list but the scanning (in that case) isn't helped by any index. In other words it will scan linearly until it hits N items in your limit(N)
[11:47:59] <remonvv> There should be a faster option
[11:49:31] <scruz> what is the recommended way of relating two documents?
[11:51:01] <remonvv> Depends on your use case.
[11:51:22] <remonvv> Embed if you can, but you usually shouldn't in which case you have to store the _id value of the referred document in the referring document.
[11:52:55] <scruz> so, basically no different from relations in RDBMSes
[11:53:15] <scruz> remonvv: thanks
[11:53:53] <remonvv> It is a little.
[11:54:06] <remonvv> MongoDB doesn't manage the relationships (relationships aren't a concept within MongoDB).
[11:54:16] <remonvv> And embedding comes with benefits and drawbacks that you need to take into account
[11:54:25] <scruz> like?
[11:54:33] <remonvv> You usually want to have a more denormalized schema compared to RDBMSs
[11:54:47] <remonvv> That's widely documented. Would take a bit to type out ;)
[11:55:09] <remonvv> Google "MongoDB embedded versus references" or something and you should hit a few resources.
[12:24:41] <smoochict> hi
[12:25:43] <SLNP> hey
[12:26:32] <smoochict> Hi SLNP
[12:33:28] <smoochict> hi
[12:45:46] <remonvv> hi!
[12:45:52] <smoochict> hi
[13:02:05] <remonvv> Aren't we a friendly bunch.
[13:02:14] <smoochict> ineed
[13:02:16] <smoochict> indeed
[13:08:07] <NodeX> wait, what
[13:08:11] <smoochict> i dunno
[13:08:18] <NodeX> who goes there
[13:19:31] <locojay> hi anyone using mongo-hadoop on osx? streaming works for me with dumbo... but with mongo-hadoop its chunks the collections but then fails (no avalable error) : https://gist.github.com/2969688
[14:20:02] <dnnsmanace> would anyone be so kind to advise on a data structure?
[14:21:29] <Guest63591> would anyone ask a question before expecting an answer to an unanswered question?
[14:21:41] <dnnsmanace> :)
[14:21:50] <dnnsmanace> what my app is doing is pairing two users in the database
[14:22:09] <dnnsmanace> and that pair is unique
[14:22:19] <dnnsmanace> i have a unique email field
[14:22:28] <dnnsmanace> and a field "pair_email"
[14:23:04] <dnnsmanace> i am having trouble making unique pairs because some times the script ends up pairing two people to the same person
[14:23:33] <smoochict> are you using a view?
[14:23:40] <dnnsmanace> even though it searches for an unpaired person, by the time it updates the entry that person is already paired
[14:24:17] <dnnsmanace> what do you mean by view?
[14:25:12] <Chel> question: i have a problem, my mongo client cant connect for a long time. this is a log: https://gist.github.com/7c4b7bc841ca837fb4c4
[14:26:08] <dnnsmanace> for the pair i tried setting { type: String, unique: true, sparse: true } but it keeps seeing null as a dup
[14:27:14] <dnnsmanace> there must be a general structure for databases when you pair entries on the go to ensure you don't pair to any entry except those two
[14:27:32] <smoochict> oh, sorry.
[14:27:32] <FerchoArg> Hi, is there a way to, server side, weight a field in a search?
[14:27:39] <smoochict> when i said view i thought i was in the #couchdb channel
[14:27:40] <smoochict> lol
[14:28:07] <dnnsmanace> oh, havent tried couchdb :)
[14:28:22] <smoochict> lolol'
[14:28:45] <dnnsmanace> this seemingly trivial problem has got me stumped
[14:29:54] <Chel> any suggetsion for my problem ?
[14:30:07] <Chel> how can i check why it is not connected to shell ?
[14:34:12] <dnnsmanace> i guess what i am asking is how to go about making a unique link
[14:34:20] <dnnsmanace> between two entries
[14:34:59] <dnnsmanace> or the logic behind it so the app doesnt link two users to a third by accident
[14:35:53] <Chel> anyone can help me !! ??
[14:36:57] <Guest63591> why is what not connected?
[14:37:30] <Chel> mongo client
[14:37:40] <Chel> to a localhost
[14:37:50] <Chel> and it works if i use mongo --nodb key
[14:38:36] <Chel> and in /var/log/mongodb/mongodb.log nothing strange
[14:38:36] <Guest63591> i wonder why 75% of the people here are not capable asking a reasonable question or ask in complete sentences or providing reasonable information?
[14:38:55] <dnnsmanace> guest: you must be bored
[14:38:58] <Chel> Guest63591: what additional infromation do you need ?
[14:39:00] <dnnsmanace> help or dont help
[14:39:51] <Chel> OS linux fedora 16, mongodb db version v2.0.2, pdfile version 4.5
[14:39:53] <Chel> Fri Jun 22 18:37:04 git version: nogitversion
[14:40:16] <Chel> MongoDB shell version: 2.0.2
[14:40:49] <Guest63591> this is a biotope for dumb asses?
[14:41:15] <Chel> what a hell, is this a channel about mongodb ?
[14:41:45] <Guest63591> i think this is a place where idiots help other idiots?
[14:42:19] <Guest63591> #mysql and #php is already one of the worst channel on IRC
[14:42:20] <Chel> offtopic
[14:42:30] <Guest63591> but #mongodb is completely below
[14:42:30] <FerchoArg> haha you're right. Is there any way to do this within a query? : Suppose I have a collection of places, which has City Name, State, Street name. I want to make a serach of a regex. The regex should be compared to the three fields, City Name, State and Street, but documents in which City matches the regex should be listed first, or better weighted in some way
[14:43:18] <Chel> Guest63591: what is not correct in my question ? i have a problem, i've described it with logs and software versions. What else ?
[14:44:08] <smoochict> bye
[15:03:44] <Guest63591> this community is horrible - numpties unitied
[15:03:46] <Guest63591> united
[15:05:42] <ron> we're horrible people! just horrible, horrible people!
[15:21:54] <zak_> I am unable to connect to the local database using the mongo command
[15:22:57] <ron> is mongodb running? :)
[15:24:46] <zak_> yeah
[15:25:03] <ron> default port?
[15:25:16] <ron> what error do you get?
[15:25:18] <zak_> 113
[15:25:30] <ron> huh?
[15:26:03] <zak_> it couldn't connect to the default server
[15:26:37] <ron> I somehow doubt that's the wording.
[15:27:58] <zak_> Error: couldn't connect to server 127.0.0.1 (anon):1137
[15:28:00] <zak_> exception: connect failed
[15:28:08] <multiHYP> hi back again
[15:28:54] <ron> zak_: okay, and how did you check mongo is running?
[15:30:04] <zak_> I've started the service
[15:30:23] <ron> do you see the process in the process list?
[15:30:57] <zak_> yeah.
[15:31:13] <ron> did you change any of the default settings?
[15:31:27] <NodeX> looks like the port was changed ?
[15:31:39] <ron> tried that already, NodeX.
[15:31:57] <zak_> certainly not
[15:32:10] <NodeX> is it listning ?
[15:32:18] <NodeX> netstat -pan | grep mong
[15:32:21] <NodeX> netstat -pan | grep mongo
[15:33:25] <zak_> NodeX:nothing
[15:33:51] <NodeX> check your logs then because it means Mongo isn't started
[15:34:27] <ron> but he swears he sees the process running. you wouldn't be calling him a liar now... would you?
[15:34:45] <NodeX> if it's not in netstat then it's not running
[15:34:52] <NodeX> even a socket only server would appear
[15:35:08] <ron> but he swears!
[15:35:11] <NodeX> haha
[15:35:22] <NodeX> Oh you were being sarcastic!
[15:35:32] <NodeX> I don't know half the time when you answer things ron !
[15:35:34] <zak_> sudo service mongodb start
[15:35:35] <NodeX> :P
[15:35:55] <zak_> won't it do?
[15:35:59] <ron> never! PHP is the greatest thing since bread came sliced!
[15:36:16] <NodeX> Oh it's troll-o'clock I see :P
[15:36:19] <ron> zak_: depends on how you installed mongodb and your os.
[15:36:56] <ron> NodeX: not really. I'm just a bit fed up of people who say 'yes' when they mean to say 'I don't need to check it since I *know* the answer'.
[15:37:22] <NodeX> I was reffering to the PHP comment!
[15:37:34] <zak_> when i issue this command,the prompt shows mongodb running
[15:37:39] <ron> NodeX: well, that was just in reference to me being sarcastic ;)
[15:37:49] <NodeX> haha
[15:38:18] <ron> speaking of being an ass, I don't recall seeing MACYet here for a while...
[15:38:36] <NodeX> he got banned
[15:38:39] <ron> nowai!
[15:38:41] <NodeX> yer
[15:38:44] <ron> nowai!!!
[15:38:48] <NodeX> notice the ops too ^^
[15:38:57] <ron> yeah, noticed that already.
[15:39:20] <ron> but he was filled with shitload of sunshine, flowers, rainbows and flying unicorns!
[15:39:20] <NodeX> yeh I think 10gen got fed up with him being obnoxious to people asking for help
[15:39:56] <NodeX> it painted a general bad cloud on the chan and is not good for them as a company (my opinion) and obviously thiers so he got banned
[15:39:59] <ron> zak_: please check the logs. the prompt is telling you that it has started, but it could have crashed (so to speak).
[15:40:14] <NodeX> tail /var/logs/mongodb.log
[15:40:45] <NodeX> it's been a peaceful month or so :D
[15:41:19] <ron> now if only they do some work on morphia or declare it dead.
[15:41:25] <NodeX> carsten (who is on perma ignore) has done a good job of picking up the being a bad kid slack though
[15:42:01] <ron> oh my, I've missed a lot.. and I idle here all the time. oh well.
[15:42:16] <NodeX> it's quite lively here recently
[15:42:23] <NodeX> been alot of uptake of mongo
[15:42:35] <ron> well, I did notice the ops being more involved, that's true.
[15:42:35] <NodeX> 10gen marketing must be doing a good job
[15:43:52] <ron> well, since zak_ is gone, I guess it'd time to go back to idling.
[15:44:00] <NodeX> lol7
[15:44:41] <NodeX> idling is the internet's favourite past time, second only to the more pre-troll lurking
[15:45:15] <NodeX> whenI'm bored I like to burst into a chan and start a PHP vs the world debate or mac vs PC
[15:45:18] <NodeX> then leave
[15:45:27] <NodeX> (before i get booted)
[15:50:06] <NodeX> yay
[15:50:38] <NodeX> good thing it wasn't an Apple MAC.
[15:50:59] <NodeX> else it wouldv'e tried to kill me and realised it couldn't interoperate with anything and gave up
[15:54:07] <spillere> i have a db like this {name:name, pics [ {name: name}, {name, name}]
[15:54:22] <spillere> how do I add so it will be info so it will be like
[15:54:43] <spillere> {name:name, pics [ {name: name, data:data}, {name: name, adta:data}]}
[16:02:16] <NodeX> add what ?
[16:18:08] <Pilate> Is it possible to define a function that is callable from within map/reduce without having to define it within the map/reduce functions?
[16:25:48] <sir> has anyone run into an issue with gridfs (using python driver) where you attempt to get a file using the objectid, and it says there is no file in the collection with that objectid?
[16:25:56] <sir> even when you can query and see the file is in fact there
[16:29:46] <kali> Pilate: have you tried putting your function in the "scope" hash ?
[16:38:27] <pgentoo> hey guys, i'm looking at using mongo for keeping a large mysql table archived off, but still searchable. I see that craigslist does a simliar thing, and am just curious how this works. I have a lot of records per user in my system, so i can't just do a document per user, so curiuos what bestpractice is around this. For reference, think of something like bit.ly, where there are a lot of
[16:38:28] <pgentoo> users, but many many clicks that need to be kept track of per user. (i'm not working for bit.ly, but similar type service)
[16:42:35] <dnnsmanace> hey, i am having some trouble making pairing documents uniquely, and because of my query i get two different documents paired to one, whereas i am looking to have only unique pairs
[16:42:50] <dnnsmanace> any schema modeling advice for this?
[16:43:44] <addisonj> i am have really interesting performance on doing large numbers of upserts. I have ~100,000 records. I am processing 1,000 records at a time, the first batch took only .8 seconds, the 54th batch is up to ~39 seconds…is this just due to index building?
[16:44:00] <dnnsmanace> basically i want a user paired to another user, but when my query happens sometimes the script finds an unpaired user, but by the time it updates t another instance of the script already paired that user so i am getting double pairs to one person
[16:44:53] <dnnsmanace> does that make any sense
[16:46:30] <kali> dnnsmanace: mmmm... not really. show us some examples, and what you want to do
[16:46:33] <kali> dnnsmanace: :)
[16:47:21] <dnnsmanace> basically, the point of the app is to pair random users
[16:47:30] <kali> addisonj: it might be indexing. you can also check you mongod log for preallocation messages
[16:48:21] <dnnsmanace> kali: the unique id is "email", and there is a field for "paired_email"
[16:48:40] <kali> addisonj: are your upsert making the updated docs grow bigger ? that could be a factor too
[16:49:00] <dnnsmanace> the app attempts to pair a user who's "paired_email" is null to another who's "paired_email" is null
[16:49:28] <kali> dnnsmanace: so you pull two docs with null paired_users ?
[16:49:42] <kali> dnnsmanace: and update and save them both ?
[16:49:58] <dnnsmanace> kali: well, the pair has an input of user to pair
[16:50:16] <dnnsmanace> and it pairs that specific user with a pulled null paired_user
[16:50:31] <dnnsmanace> and updates the pulled user that it is paired with the one in the function
[16:50:32] <dnnsmanace> make sense?
[16:50:43] <kali> mmm yeah, i think so
[16:50:47] <kali> so what's wrong ?
[16:51:08] <dnnsmanace> but what sometimes happens is, because first i do findOne, and then i update
[16:51:41] <dnnsmanace> findOne finds a null paired_user, but by the time it updates another instance of a pair attempt already paired that user so he is actually no longer null by the time it updates
[16:51:59] <kali> dnnsmanace: ok. have a look at findAndModify
[16:52:00] <dnnsmanace> so i end up having two different users thinking they are in a pair with a third
[16:52:36] <dnnsmanace> i have been looking at it just now
[16:52:39] <dnnsmanace> and i have a question
[16:52:53] <dnnsmanace> is there such a thing as findAndUpdate or something?
[16:53:35] <dnnsmanace> because it findAndModify, as far as i understand, just overwrites the whole document
[16:53:40] <kali> dnnsmanace: findAndModify will returns you the old or the new document (as you prefer) so you can check the update worked, and do something else if need be
[16:54:00] <kali> dnnsmanace: nope, it it the same as update() you can use modifiers
[16:54:38] <dnnsmanace> so if i leave the email field out it will not overwrite it?
[16:55:32] <kali> dnnsmanace: just use { $set: { paired_email : ... } } as the second argument
[16:55:41] <kali> dnnsmanace: it will not alter other fields
[16:55:41] <addisonj> kali: to go in more depth, we are trying to import a few million documents daily from a partner and joining that to our own data. so our own data persists but each day the updated dat a from our partner is joined with an upsert
[16:56:36] <kali> addisonj: so the documents basically... constantly grow ?
[16:56:37] <dnnsmanace> kali: i really appericate your help, i will try this
[16:59:03] <addisonj> kali: uhh, well mostly we are just replacing the meta-data with updated meta-data, typically, this data stays the same from day to day
[16:59:26] <dijonyummy> in pymongo how do i check if the result of a find() is null or empty? ie not found. == null doesnt work
[17:00:50] <sir> foo = bar.find()
[17:00:53] <kali> addisonj: ok, this is better... another thing to check is the %lock (run mongostats) and the page faults. if your collection and index outgrow your RAM... well, you need more RAM :)
[17:00:54] <sir> if type(foo) is dict
[17:01:12] <sir> well, find_one() will work as adict
[17:01:28] <sir> you can do a .count() != 0
[17:01:33] <addisonj> kali: lock percentage during the import is pegged at 100 basically all the time
[17:02:15] <kali> addisonj: yep, that is not a big surprise... what about the faults ?
[17:02:44] <addisonj> is faults in mongostats?
[17:03:49] <kali> addisonj: yes
[17:04:46] <charliekilo> System Information: Model: MacBook Air (13-inch Mid 2011) • CPU: Intel Core i7-2677M (4 Cores) @ 1.80 GHz • L2: 262.14 KB • L3: 4.19 MB • Memory: 4.00 GB • Uptime: 2 Days • Disk Space: Total: 249.82 GB; Free: 94.77 GB • Graphics: Intel HD Graphics 3000 • Screen Resolution: 2560 x 1440 • Load: 46% • OS: Mac OS X 10.7.4 (Build 11E53) (64-bit kernel)
[17:05:15] <addisonj> kali: running on OSX, so apparently i get no faults? running 2.0.6
[17:05:36] <kali> addisonj: ha. faults are only measurable in linux
[17:06:03] <addisonj> one of the pains of deving on osx
[17:06:06] <addisonj> :(
[17:06:17] <kali> addisonj: i use vmware :)
[17:06:50] <addisonj> i will just try it out on a server
[17:07:16] <kali> addisonj: for this kind of load, you will want the whole table and its index in RAM
[17:07:39] <kali> addisonj: check out the stats() from the collection and compare that to the available RAM in the sytem
[17:08:03] <kali> addisonj: on your mac, mongodb has to struggle for RAM with a kiloton of other stuff
[17:08:14] <dnnsmanace> kali: one more, how do i use $ne with more than one value? {email {$ne : [value1, value2]}}?
[17:08:54] <dnnsmanace> { email : { $ne : [value1, value2]}}
[17:08:55] <kali> dnnsmanace: you mean $nin ? :)
[17:09:47] <dnnsmanace> do i? i want it to NOT match those emails
[17:10:01] <dnnsmanace> $ne is not equals?
[17:10:02] <kali> dnnsmanace: yes $nin = not in
[17:10:32] <dnnsmanace> i see
[17:10:33] <dnnsmanace> thank you
[17:10:34] <dnnsmanace> :)
[17:11:40] <addisonj> hrm I am thinking there just isn't a good way to do this with upserts… just going to be slower than bulk inserts or mongoimporter, I think we may try a map-reduce or something to join the two data-sets together
[17:11:53] <addisonj> after using mongoimporter
[17:13:03] <kali> addisonj: if you can fit it in RAM, upsert will probably be more efficient than a m/r
[17:13:20] <kali> addisonj: and easier too
[17:14:33] <addisonj> so you are saying the reason I am seeing degraded performance is just page faults huh… hrm… I will try on a big server then
[17:14:48] <kali> check out the stats() for your collection
[17:16:15] <addisonj> avgObjSize is 530, total size is 10116620, i am assuming that is bytes?
[17:17:43] <kali> sounds safe... 10M this is nothing
[17:17:50] <kali> unless you're runnin on an apple 2
[17:23:37] <mmlac> When I load an object that has_many OtherObjects, what callbacks are called when on the OtherObject? Is there a way to inject a variable into the "stub" and evaluate the "stub" before the model gets generated for real?
[17:23:42] <mmlac> (hits the database*)
[17:26:41] <addisonj> yeah, even with that small of a dataset so far, the time to insert 1,000 documents after inserting 14,000 balloons to 15 seconds
[17:29:05] <jenner> guys, how can I make sure a certain node is elected as primary when I call rs.stepDown()? will a higher priority of the desired node help?
[17:31:01] <kali> jenner: yes. it will also steal primary status when you update your cofniguration if it is not primary
[17:31:47] <jenner> kali: ah, so I basically don't even need to step down?
[17:31:48] <pgentoo> anyone have any comments on my earlier question (about structure of storing say 1billion click records tying back to say 10K total system users)? I realize its too big to store all a users clicks as a single document for that user, so just curiuos on how to attack the problem (new to mongodb...)
[17:32:56] <kali> jenner: the current primary will actualy step down when it will see somebody else has priority
[17:33:38] <jenner> kali: thanks, that's really good to know
[17:53:51] <sir> can anyone think of why a file would exist in gridfs, i can find it using mongo queries, but gridfs can't see it?
[19:00:39] <pgentoo> I have a single mysql table that i'd like to try moving into mongodb for testing. Currently the mysql data is on the filesystem in the form of a mysqldump file (insert statements for each record). is there some "easy" way of inserting this data into mongo?
[19:03:04] <pgentoo> I saw some tools (mongovue?) but requires the data to still be in mysql... would take ages to reimport this stuff..> :(
[19:05:12] <Azra-el> hello everybody
[19:09:21] <Azra-el> i have a listfiend of embedded fields inside my collection and I would like to know if its possible to query by one of the fields in that embedded fields and as a result get that embedded field to get more info from it. for example : http://pastie.org/4133935 I would like to find the embedded field with internal_id of 5 and get the "name" of that embedded field
[19:16:29] <mids> pgentoo: possibly you can change the file into a tsv format with some sed magic and then import it using mongoimport ?
[19:26:23] <dstorrs> hey all. I have a collection with a boat-ton of statistics (video metadata from youtube). I've got an index on { harvested_epoch : -1, publisher : 1 }. Will that index be used in the sort clause of a map_reduce that says:
[19:26:49] <dstorrs> "all videos by 'bob' more recent than X, sort ascending" ?
[19:27:35] <dstorrs> IIUC it will be used by the query, it's just the sort that I'm not sure about -- does sort ever use indices, or is it always in RAM?
[19:34:02] <pgentoo> mids, thanks for info on taht option. I'll look into it
[19:34:41] <mids> probably csv is even closer; but you get the point
[19:35:51] <pgentoo> are there any documents out there on schema design for mongo? I have hundreds of millions of mysql records (single table) that sotre attributes about a click through my system. I'm curious how to best design the storage of this in mongo. Multiple collections, one per user? Just one huge collection? etc... Could easily be 1 billion click documents in a couple years...
[19:36:13] <pgentoo> would liek to be able to shard it out across servers to be able to scale that way in teh future, etc...
[19:36:24] <dstorrs> pgentoo: http://www.mongodb.org/display/DOCS/Schema+Design
[19:36:35] <dstorrs> read up on embedding and linking too
[19:36:46] <mids> also http://www.10gen.com/presentations
[19:37:04] <pgentoo> ok, do you have any hints on suggestions for hte design that i can keep in mind?
[19:37:57] <dstorrs> in general, be careful about embedded docs. They sound great in theory and sometimes they are the best solution, but they complicate the code a lot and introduce potential issues (e.g. "what if this grows beyond the 16M doc limit?" "how do I shard this efficiently?" etc)
[19:38:29] <dstorrs> you NEED a profiler when doing this stuff. It's not optional.
[19:40:25] <pgentoo> i have clients that will be doing millions of clicks/day, so embedding those isn't possible. All my click "document" (current row in mysql) has a bunch of denormalized data already. bunch of bool values, some text strings, dates, etc. Everything i need to be able to generate my reports later.
[19:40:42] <pgentoo> in taht case, i'm assuming that each row would just be its own document, with no embedding or linking...
[19:41:03] <pgentoo> on teh right path here?
[19:42:23] <pgentoo> if i had a single collection of "clicks" and each click had say a "username" property or something similar that would distributed my writes/reads well, i could shard on that right? Its not sharding by colleciton right, by some value of the documents?
[19:42:28] <Azra-el> i have a listfiend of embedded fields inside my collection and I would like to know if its possible to query by one of the fields in that embedded fields and as a result get that embedded field to get more info from it. for example : http://pastie.org/4133935 I would like to find the embedded field with internal_id of 5 and get the "name" of that embedded field
[19:43:43] <spillere> I wanna add a information which is in a list inside like this {'name':name, 'photo':[{id:1, name:daniel},{id:2, name:pedro}]}, how do I add information after id:1?
[19:44:00] <spillere> so it would be {'name':name, 'photo':[{id:1, name:daniel, info:NEW},{id:2, name:pedro}]},
[19:47:44] <spillere> i'm trying an insert like this db.dataz.update({'username': 'dansku', "photos": [ {"filename":"5gzv38erpm1wxnfu4i0jq9ltk2b6yc.jpg"}]}, {"$push":{"name":"lalala"}})
[19:57:03] <spillere> ?
[20:03:01] <mids> spillere_out: db.dataz.update({},{$set:{'photo.0.info':'NEW'}})
[20:03:20] <mids> or not
[20:03:47] <spillere> well, i have a user, this user have many pictures
[20:04:02] <spillere> pictires [{name:lala}{name:lala2}]
[20:04:17] <spillere> i need to add another item on the first picture, for example
[20:07:47] <spillere> you gotthe idea?
[20:27:44] <sir> i'm getting a lot of "replSet syncTail: 13629 can't have undefined in a query expression" from slaves trying to replicate. i think someone tried copying bad data
[20:27:59] <sir> how do i remove all of those from the oplog on the primary so they secondaries stop trying to replicate them and failing?
[20:44:39] <spillere> when I try on the console
[20:44:39] <spillere> db.dataz.update({'photos':{$elemMatch:{'filename':'5gzv38erpm1wxnfu4i0jq9ltk2b6yc.jpg'}}, {$set:{'photos.$.filename_tb240':'lala'}}})
[20:44:48] <spillere> i get an SyntaxError: invalid property id (shell):1
[20:44:51] <spillere> why?
[20:56:21] <php14> is it bad practice to use the mongodb assigned _id as say a username in a credential set for something?
[21:06:05] <spillere> db.dataz.update({'photos.filename':'4zbg5yqa80jol7em6nfwt2dvpuc9si.jpg'},{$push:{"filename_tb240" : "lala"}})
[21:06:28] <sir> $push
[21:06:33] <sir> not $pus
[21:18:02] <dijonyummy> how to specify a not equal in a find() for a property via pymongodb or mongodb in general find({"date" : not mydate}) like that but thats wrong or is it
[21:25:25] <dijonyummy> this doesnt work: "lastModified" : { $ne : os.path.getmtime(file_path) } } )
[21:34:06] <dijonyummy> its not possible to do that right? do i have to get the results, then further filter based on the lastModified being not equal?
[21:40:19] <dijonyummy> neverminded this fixed it in pymongo non-shell... "lastModified" : {"$ne" : os.path.getmtime(file_path) }} )
[21:56:29] <pgentoo> how chatty are mongo write patterns? Curious about backing mongodb on ssd...
[22:35:42] <mmlac> Is there any documentation on the lifecycle of mongoid?
[22:44:23] <ranman> mmlac: same semantic versioning as linux kernel
[23:22:29] <iamjarvo> would mongo be good for storing a document like an exccel sheet
[23:23:28] <iamjarvo> i feel like if i could grab the whole document instead of doing multiple queries on a relational database
[23:47:57] <acidjazz> so my collection is an obj w/ an array inside that has tags
[23:48:00] <acidjazz> and i want results that match those tags
[23:48:12] <acidjazz> iamjarvo: check out mongogridfs
[23:50:11] <dstorrs> iamjarvo: what are you trying to achieve? by "like a Excel doc" do you mean an actual .xls file? or something with the data from an XLS ?
[23:50:31] <iamjarvo> basically i am want to port my document sonline
[23:50:40] <iamjarvo> online version of excel
[23:50:55] <dstorrs> acidjazz: there's something in the indexing docs on exactly your problem, but I can't remember the keywords, sorry.
[23:51:24] <iamjarvo> im going to have all these data points and im going to show them in the format of an excel sheet
[23:51:57] <iamjarvo> basically my monthly budget
[23:52:00] <acidjazz> $in
[23:52:02] <dstorrs> in general, IIRC, you want an index on your tags array and then db.coll.find on it
[23:52:35] <dstorrs> iamjarvo: why not just upload to Google Docs?
[23:53:13] <iamjarvo> dstorrs: just want to reinvent the wheel
[23:53:47] <dstorrs> ok, fair enough
[23:54:18] <dstorrs> then yes, just throw your data into JSON docs and call it good.
[23:54:58] <dstorrs> each row in the XLS becomes a document, each column is a key in the doc, each cell is a value
[23:56:02] <dstorrs> (obviously by "each row / column / cell" I actually mean "each of those that you're actually *using*" :>)
[23:58:00] <iamjarvo> dstorrs: thanks. see any advantages of nosql vs a relational?
[23:58:52] <dstorrs> learning?
[23:59:28] <iamjarvo> true that. thats was the original intenions
[23:59:29] <iamjarvo> lol
[23:59:44] <iamjarvo> thank you