PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Thursday the 21st of June, 2012

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:08:07] <matthewvermaak> any ideas why this happens? http://pastie.org/4123535
[00:11:37] <matthewvermaak> wow, if you switch the order of those conditions it works ...
[00:12:38] <matthewvermaak> at least 1.9 has order in its hashes i guess
[00:24:24] <matthewvermaak> ah, nvm, thats effectively being treated as an or condition across the embedded documents, which matches, and then the positional is set to the last matching criterion
[00:24:29] <matthewvermaak> so, that wont work for me
[00:26:00] <matthewvermaak> is there an "update if current" strategy for embedded documents?
[01:37:54] <kenny> any tips on why db.auth("","") will fail right away, but db.auth(user,password) and show collections hangs forever?
[01:44:56] <nemothekid> Evening.
[01:51:06] <nemothekid> Evening?
[01:59:02] <lorfds> has anyone here used thriftdb?
[01:59:09] <lorfds> curious what ppls thoughts are on it
[01:59:25] <lorfds> pluses and minuses vs mongo
[01:59:26] <lorfds> etc
[04:13:12] <dstorrs> I have a document that looks like this: { _id : 'foo', pages : [ {}, {}, ...] } What is the query I need to find the length of the 'pages' array?
[04:15:05] <kobold> dstorrs: you can't; fetch the document and count it, or store the length in a field and use that instead
[04:15:17] <dstorrs> kobold: ok, thanks
[07:36:08] <[AD]Turbo> hi there
[08:05:44] <NewBie> halo
[08:06:26] <Guest30365> is mongo db suitable for apps like pandora radio?
[08:06:51] <NodeX> what is pandora radio?
[08:07:03] <Guest30365> pandora.com
[08:07:16] <Guest30365> in other words music streaming application
[08:07:20] <carsten_> mongodb is suitable for all and everything when you know what you are doing
[08:07:27] <carsten_> mongodb is not a streaming service
[08:07:45] <NodeX> I don't see what it has to do with a datastore?
[08:07:50] <Guest30365> yes I know
[08:08:28] <NodeX> do you mean to build an app like it?
[08:08:36] <Guest30365> yes
[08:08:54] <NodeX> It depends on it's functions I guess but I don't see why not
[08:09:30] <Guest30365> Im decided to use following stacks: flask, mongodb, backbonejs.
[08:10:02] <Guest30365> and thinking about to turn my 2PCs into web app server
[08:10:17] <Guest30365> mongodb seems very interesting
[08:10:34] <Guest30365> rumors said, scaling is big advantage
[08:10:55] <carsten_> rumors said: you should have database background building database applications
[08:11:30] <NodeX> :)
[08:17:32] <Guest30365> is MongoDB free to use for commercial apps?
[08:18:05] <sirpengi> mongodb is free to use (period)
[08:18:10] <carsten_> Amen
[08:18:45] <Guest30365> will I pay?
[08:18:56] <sirpengi> Guest30365: they make their money from consulting/support services
[08:19:08] <sirpengi> so if you don't need consulting or a support plan, then you don't have to pay
[08:19:10] <NodeX> mongodb is the awesome and it's what google is to the little people of Somewhere
[08:19:40] <Guest30365> sirpengi NodeX thats cool ;)
[08:19:41] <NodeX> there is a rumour you have to donate a beer to some guy called Nodex though
[08:20:17] <carsten_> there is a rumor that NodeX is Caspar of #MongoDB
[08:20:20] <Guest30365> ok :)
[08:28:17] <NodeX> http://www.infoworld.com/d/open-source-software/red-hat-releases-nosql-database-enterprise-java-196020 <---- interesteing, I wonder what it's like
[08:30:49] <mkopras> hey guys, I got question about particular query http://pastie.org/private/vx6yybunxkxmzoleau1vw can someone look ?
[08:31:06] <remonvv> Ask the question sir.
[08:31:33] <mkopras> its in the link with example
[08:31:42] <mkopras> and schema
[08:32:49] <sirpengi> you can't match single subdocuments
[08:32:52] <sirpengi> you get the whole document
[08:36:26] <mids> mkopras: db.bbb.find({ steps : { $elemMatch : { name: "aaa", date : 1340020000 } } })
[08:36:54] <mkopras> mids: hell yeah, thank you!
[08:37:32] <NodeX> you can dot notate it aswell
[08:37:40] <NodeX> Red Hat JBoss Data Grid 6 <-- not really a very good name
[08:39:40] <mids> NodeX: not sure how dot notation would work in this example; but you can also do db.bbb.find({ steps : {name: "aaa", date : 1340020000 } })
[08:40:25] <NodeX> steps.name:aaa, steps.date:134...
[08:41:21] <mids> NodeX: that has a different meaning and gives an unwanted result in mkopras' case
[08:41:36] <NodeX> it does exactly the same as your elemMatch
[08:41:57] <NodeX> it get's the result where steps.name = aaa and steps.date=134...
[08:42:05] <remonvv> It's the same.
[08:42:20] <remonvv> The result is, there's a conceptual difference obviously.
[08:42:24] <NodeX> I am not sure which is more index efficient
[08:42:41] <NodeX> one may require a hint
[08:42:57] <mkopras> thx guys, I was not aware how to search over collections properly, these example will make it more easier now :)
[08:43:14] <remonvv> Sincerely doubt there's a difference. An $elemMatch unpacks to the equivalent of the dot notation route.
[08:43:55] <mids> https://gist.github.com/730f278a479708e2795d
[08:44:19] <mids> NodeX: maybe I dont understand you properly, but I see a difference
[08:44:45] <NodeX> did he state he wanted 1 document only ?
[08:44:55] <NodeX> I mustv'e missed that part
[08:44:57] <mids> yes
[08:45:02] <mids> ok :)
[08:45:04] <NodeX> findOne() ;)
[08:45:08] <mids> lol
[08:45:30] <NodeX> but how does your way garuntee the right document ?
[08:45:53] <remonvv> mids is right. Pretty sure I've made this mistake about 10 times in here by now.
[08:45:54] <NodeX> if there are 2 I mean ... your way only returns one of two possible matches which would not allow for a sort
[08:46:02] <remonvv> Doesn't matter how many he needs, it'll still return different results.
[08:46:31] <mkopras> yea, it will be one, but wrong one, with findOne() example
[08:46:44] <NodeX> but the query is not correct because 2 documents match that criteria so two should be returned
[08:46:55] <mids> my understanding is for the query to return the document with both attributes matching in the same subdocument
[08:46:57] <NodeX> if one is wanted then a sort must be applied with a limit
[08:47:12] <NodeX> both do match dont they ?
[08:47:20] <mkopras> mids: is right, exacly
[08:47:38] <NodeX> My mistake - I read the source document wrong
[08:47:41] <remonvv> No, there's a semantic difference between two fields having to match for a single array element and either field having to match for any but not necessarily the same element in the array.
[08:47:52] <NodeX> I thought bbb & the timestamp we're the same on BOTH documents
[08:47:56] <mids> anyway, the client is happy. next! :D
[08:48:00] <mkopras> ;D
[08:48:12] <remonvv> With all that said though, who would read the dot notation version and not assume the $elemMatch case?
[08:48:29] <NodeX> it wasn't the notation part of it that I had a problem with
[08:48:41] <mids> http://www.mongodb.org/display/DOCS/Dot+Notation+%28Reaching+into+Objects%29
[08:48:46] <NodeX> I (in my sleepy state) read bbb had the same timestamp on BOTH documjetns
[08:48:51] <mids> atleast that *tries* to explain the fifference
[08:48:54] <remonvv> Right, but that's where I have a problem. I think it's a bit of a query syntax design booboo
[08:49:03] <mids> hehe
[08:49:18] <mids> hey, they need to still be able to sell consultancy right
[08:50:04] <remonvv> I'm trying to think of a use case that is caught by the dot notation method.
[08:50:13] <NodeX> either way ... with both documents having the same values for the feilds then either works
[08:50:21] <remonvv> True.
[08:50:33] <NodeX> which is what I -read- / misread
[08:50:38] <remonvv> But $elemMatch is still the preferred route.
[08:50:57] <remonvv> Ah, no I was just wrong. Ironically enough I probably suggested the exact opposite at some point as well.
[08:51:15] <remonvv> It's 11am here, way too early.
[08:51:22] <remonvv> For thought, that is.
[08:51:24] <NodeX> 09:48 here
[08:51:34] <NodeX> didnt sleep till 6am
[08:51:46] <NodeX> kentucky fried brain today
[08:52:41] <NodeX> http://news.cnet.com/8301-1009_3-57457470-83/code-crackers-break-923-bit-encryption-record/
[08:54:36] <remonvv> 148 days for a single match.
[08:55:10] <NodeX> it beats the old record hands down though which means encryption needs to get better else Moores law will nullify SSL very very soon
[08:55:25] <remonvv> You have to wonder what comes first. Household quantum computers that make the research useless or conventional computers that start cracking this stuff in any sort of practical speed.
[08:55:50] <NodeX> a mic would be best
[08:55:52] <NodeX> mix *
[08:56:01] <NodeX> good software and good hardware
[08:56:36] <remonvv> Well, effort to bump up encryption bits to something higher is endlessly easier than speeding up brute force decryption. I somehow doubt this'll ever really get relevant in day to day life.
[08:57:19] <NodeX> let's hope they make a brain chip to wirelessly beam keys to the user!!
[08:58:48] <remonvv> quantum entanglement!
[08:58:58] <remonvv> also...XML
[09:00:39] <NodeX> lmao
[09:04:00] <JamesHarrison> Hi - relatively new to MongoDB, working on a site which makes fairly heavy use of $nin to filter out content - a query like {$orderby: {created_at: -1}, $query: {tag_ids: {$nin: ['some-tag']}, hidden_from_users: {$in: [None, False]}}} appears to completely not use an index set up for created_at-1, tag_ids 1, hidden_from_users 1. What am I doing wrong?
[09:04:59] <NodeX> did you hint() it ?
[09:05:33] <JamesHarrison> Tried to, but it's not making the query performance any better, which makes me think I may be missing something more fundamental with my query structure.
[09:05:51] <JamesHarrison> (nscanned still much, much higher than n in explain)
[09:11:12] <NodeX> I can't see anything that jumps out
[09:11:37] <NodeX> is the index definately built and not in the background building ?
[09:12:01] <JamesHarrison> Not seeing it in currentOp and it appears to be built; I didn't do it as a background thing to start with
[09:14:09] <NodeX> try to drop the index and rebuild it .. maybe that will kickstart it
[09:14:44] <JamesHarrison> That query type is currently showing at about 150ms avg, some (with many nin tags) up to 1.5 seconds, which is a bit high for an 8-core dual xeon box with 16 gigs of ram, SSDs and only 13,000 objects (38MB) total to look through
[09:14:45] <remonvv> It wont, $nin wont hit an index
[09:14:55] <JamesHarrison> and there's the fundamental thing I'm missing
[09:14:56] <JamesHarrison> okay
[09:15:19] <remonvv> In that case, that is.
[09:15:35] <remonvv> What's your index?
[09:15:41] <NodeX> I didn't know $nin doesn't use indexes .. is this intentional or going to be fixed?
[09:16:15] <remonvv> Well I should rephrase.
[09:16:50] <remonvv> It can use an index but a) it almost always results in very little performance gain and b) only hits it for relatively simple indexes.
[09:17:00] <carsten_> not it can not be fixed when you know how indexes work :)
[09:17:14] <remonvv> It can be fixed to use the index, not to make it fast.
[09:17:20] <NodeX> I use it in a single query like this ... settings : {$nin : ["Admin"]
[09:17:31] <remonvv> That works, it's just ineffective ;)
[09:17:35] <NodeX> I hope it works one for that (not that I have noticed it not using it
[09:17:41] <NodeX> (perfiormance wise
[09:17:47] <JamesHarrison> http://pastie.org/private/sdrapb9k0helolvav7omga are the current sets of indexes on that
[09:18:02] <JamesHarrison> (there's at least one I've just not gotten round to deleting yet)
[09:18:03] <remonvv> Unless almost all of your records actually match the $nin you wont notice much of a difference
[09:18:23] <NodeX> dang
[09:18:25] <remonvv> JamesHarrison, that's a mess ;)
[09:18:34] <JamesHarrison> remonvv: I know, I know... :)
[09:18:51] <NodeX> holy cow that's alot of indexes
[09:19:10] <remonvv> Seperate index for each fields usually means a lack of understanding about indexes ;)
[09:19:33] <NodeX> or querying the data in alot of different ways
[09:19:35] <JamesHarrison> and most of those indexes work great, tbh, heavy read, little write on this collection I should mention
[09:19:50] <JamesHarrison> we're doing a very large number of different queries against this collection, yeah.
[09:19:50] <remonvv> Yeah but quite a few are overlapping
[09:20:00] <remonvv> an A,B,C index covers an A and an A,B index
[09:20:17] <JamesHarrison> the first multi index needs deleting yeah
[09:20:26] <JamesHarrison> and yes, some could be combined
[09:20:29] <remonvv> Anyway, which index does it say it hit for your problem query, if any? pastie explain()
[09:20:34] <JamesHarrison> width/height for sure
[09:20:54] <NodeX> JamesHarrison : seocndly ... 1. The sort column must be the last column used in the index. <-- from the docs
[09:21:08] <NodeX> in your index it's the first
[09:21:13] <JamesHarrison> NodeX: oh, that's one I missed... evidently.
[09:21:22] <NodeX> swap it round just incase
[09:21:43] <NodeX> I dont think it will help but you never know!
[09:22:08] <remonvv> Well, since it can only use 1 index for the entire query (sort included) it just might
[09:22:19] <JamesHarrison> http://pastie.org/private/cb4jfk4lf9olmjxwy5d8sq
[09:22:23] <JamesHarrison> yeah, almost certainly
[09:24:41] <remonvv> JSON formatter is your friend ;)
[09:25:02] <remonvv> So yeah, it hits the expected index.
[09:29:52] <JamesHarrison> okay, set it up in the shell, more readable: http://pastie.org/private/kuyslzwv2ae0utetg1ltvg
[09:30:07] <JamesHarrison> that's after adding an index "tag_ids_1_hidden_from_users_1_created_at_-1"
[09:35:15] <JamesHarrison> Okay, so nin is inefficient in terms of index performance when >50% of the data doesn't match any of the $nin query
[09:35:40] <remonvv> the index you added wont hit it
[09:35:50] <remonvv> you need to add tag_ids in the middle of the compound
[09:36:33] <JamesHarrison> like this? http://pastie.org/private/caraikkshw45dml1qnmfpa
[09:37:32] <remonvv> Yes, that should do it.
[09:37:45] <JamesHarrison> well, it doesn't pick up on it, and if I hint to explicitly specify that query, http://pastie.org/private/u82xqwmedtpax4r85nt9na
[09:37:49] <remonvv> Can you drop the index it's hitting now and see if it hits the new one?
[09:37:56] <JamesHarrison> performance is even worse
[09:37:57] <remonvv> Or hint for the new one.
[09:38:15] <remonvv> $nin and performance are usually mutually exclusive
[09:38:29] <remonvv> If the $nin is high up in the selection criteria that is
[09:38:32] <JamesHarrison> 119221 being many times larger than the DB size rather implies failure
[09:38:38] <JamesHarrison> right
[09:38:48] <JamesHarrison> I'm still confused as to why it's failing -this hard- with a limit() in there
[09:38:48] <remonvv> 119221?
[09:38:57] <JamesHarrison> nscanned, yeah, in that last one
[09:39:39] <remonvv> limit is not that relevant to this query.
[09:39:43] <remonvv> scanAndOrder: true
[09:40:05] <JamesHarrison> This should surely at least be: sorting by created_by, then for the resultant datasets, getting at least 100 with first query filter, paring down with second filter (both with indexes) and doing that again till it hits 100
[09:41:09] <remonvv> Well if it would do that (it doesn't) it would have to sort the entire dataset before doing anything else.
[09:41:16] <remonvv> That alone is bye bye performance ;)
[09:41:28] <remonvv> It always reduces the data set before doing any sort of ordering.
[09:41:39] <JamesHarrison> Ah.
[09:41:44] <JamesHarrison> Therein lies the issue, then
[09:41:49] <remonvv> How many documents are hit by the first clause?
[09:41:50] <JamesHarrison> (largely, at least)
[09:41:54] <remonvv> Yes, that's what I'm saying ;)
[09:41:57] <JamesHarrison> nearly all of them
[09:42:06] <remonvv> Thought so.
[09:42:25] <remonvv> Okay, so what is happening is this :
[09:42:28] <JamesHarrison> and the latter is also hitting a small percentage (2-5% maybe, varies per user)
[09:43:40] <remonvv> first index field doesn't help much at all and thus the candidate set is huge, the $nin that follows is slow (regardless of index) because it has to scan a huge candidate set. Once that has all happened MongoDB has to sort the full set in memory (scanAndOrder: true) and then return the first 100.
[09:44:04] <remonvv> I'd be very surprised if removing the limit results in much worse performance.
[09:44:23] <remonvv> Indexes need high cardinality fields to shine
[09:45:17] <JamesHarrison> Yeah. Okay, so for the problem we have here (filtering images tagged with a set of tags (there are 10,000+ tags and growing), filtering hidden images, and sorting) what's the right approach in MongoDB terms?
[09:45:43] <JamesHarrison> Doing a quick query to find the image IDs we want to exclude, then excluding them and sorting?
[09:46:01] <JamesHarrison> (though I guess $in is going to suck pretty hard performance-wise too...)
[09:46:25] <NodeX> $in is fine
[09:47:04] <remonvv> Well, $in is faster technically and conceptually is usually used to select a small subset from a larger set.
[09:47:41] <remonvv> Why are you $nin-ing if you need the ones that DO match the tags, by the way?
[09:48:06] <remonvv> And the hidden_from_users criteria doesn't require an $in
[09:48:15] <NodeX> heh
[09:48:36] <remonvv> if it's null it doesn't exist and will fail an equality test, so simply {hidden_from_users:{$ne: true}} will do
[09:48:57] <remonvv> which makes it a boolean, which (usually) makes it irrelevant for indexing
[09:49:06] <remonvv> it already is a boolean i guess.
[10:00:44] <remonvv> I missed pretty much everything after the boolean comment, fyi
[10:00:59] <NodeX> [10:46:20] <remonvv> it already is a boolean i guess.
[10:00:59] <NodeX> [10:46:29] * ifesdjeen (~ifesdjeen@109.144.18.169) Quit (Remote host closed the connection)
[10:01:03] <NodeX> [10:57:19] * remonvv kicks the internet
[10:01:05] <NodeX> lol
[10:01:17] <NodeX> a few more joins if you want to see them too :P
[10:09:58] <remonvv> really?
[10:10:04] <remonvv> JamesHarrison, respond!
[10:10:12] <NodeX> he went on holidays!
[10:10:24] <remonvv> Inform me about how I fixed all your problems and changed the way you view life.
[11:22:48] <JamesHarrison> remonvv: sorry, real life intervened - good stuff, I'll have a play, thanks to you and NodeX
[11:52:57] <ahmeij> Hi! I'm new here, and have a question that is probably an education issue, but I cant find an answer. I have a mongo replica set, 3 equal rackspace cloudservers, with a significant load, where 2 servers run with a load of 20 (the master and 1 slave) and the 2nd slave is ideling at load 0. However mongostats seems to indicate the second slave is processing just as much queries as the first slave
[11:54:16] <ahmeij> I dont think this is typical behaviour, but maybe it points to a common configuration mistake?
[11:57:13] <carsten_> driver issue where the driver can not balance the load properly across the slaves and master?
[11:58:27] <ahmeij> all connections come from the default mongo driver for ruby, I have tried to investigate that a bit, however mongostats indicates all queries are split over the 2 slaves and all updates go to the master (as expected)
[12:10:34] <remonvv> ahmeij, are all members up to date? one of the possible explanation is that your idle member is behind which results in faster queries (less data to query on). That's a bit far fetched though.
[12:11:49] <ahmeij> I'll check
[12:23:00] <mklappstuhl> Hey
[12:23:03] <mklappstuhl> I need some ideas :)
[12:25:45] <mklappstuhl> I want users to be able to setup own "content structures" where they enter a name and add a set of properties... then they should be able to save this "form" and "instantiate" it later and it should be saved with actual data
[12:25:54] <mklappstuhl> Ccan anybody
[12:26:19] <mklappstuhl> It's kind of similar to what you can do in some CMS where you can define custom content elements
[12:26:50] <mklappstuhl> I'm pretty sure there is a name for that, too
[12:27:03] <mklappstuhl> Anyone an idea how to implement something like this with mongodb?
[12:39:16] <npx> So what's up with the V8 port? Is it going to be possible for Node.js to share a libv8 instance with MongoDB?
[12:47:55] <remonvv> It's in progress afaik. Why would you want it to share the js context?
[12:49:09] <npx> it would remove the need for a client library and just let me write Mongo queries directly
[12:49:40] <npx> which I can sorta do, it's probably a bad idea for them to share a context for a dozen reasons, it was just the first thing that came to my mind
[12:50:18] <remonvv> That, and I don't think your premise is correct. You'll need a library of some sort to convert your queries to the MongoDB line protocol.
[13:02:41] <lazymanc> hi, I've just installed mongodb from the 10-gen repo on ubuntu 12.04 (VM) and it's now sat there idling at 15-20% cpu - is that normal?
[13:02:53] <lazymanc> this is without me accessing it
[13:07:34] <lazymanc> VM is allocated 512mb total - could this be the reason?
[13:13:16] <remonvv> No it's not normal and I'm not sure why limited memory would cause cpu load.
[13:14:04] <Gargoyle> lazymanc: What is idling, the vm or the host?
[13:15:15] <lazymanc> remonvv: was wondering if it was swapping to virtual disk because of limited memory (completely new to this so just guessing)
[13:15:43] <lazymanc> Gargoyle: I mean mongodb wasn't doing anything, it was just started but nothing is accessing it as yet
[13:16:49] <Gargoyle> lazymanc: I have a 3 node replica set running on virtual hardware idling at 4% (When I say idling, the could be 1 or 2 users + mms agent)
[13:17:08] <lazymanc> top is showing mongod at 15-20% usage on the ubuntu guest
[13:17:37] <lazymanc> i'll take a look at the log and see if that gives any ideas
[13:20:47] <Gargoyle> What would be "a lot" of connections for a mongod server replicating to 2 more nodes, and running 2 web apps (in testing, so only a handful of users / apache threads)
[13:21:06] <Gargoyle> Better, question: What would be "normal" ?
[13:21:24] <Gargoyle> Currently seeing 450+
[13:25:19] <Gargoyle> Another one just to leave hanging round the channel - Has anyone got an ubuntu 12.04 init.d script for the mms-agent ?
[13:36:30] <remonvv> Gargoyle, primarily depends on the driver
[13:36:59] <Gargoyle> remonvv: PHP driver
[13:37:08] <remonvv> Most drivers follow a rather odd pattern of establish new connections until they run out of allowed connections from a single host.
[13:38:05] <remonvv> Gargoyle, it's not a worrying amount of connections but you can control the maximum amount of connections per client if you feel it should use less.
[13:45:12] <Gargoyle> Well, if it's not a worrying amount, then I'll let it be - at this moment in time, I really have no idea what a sensible value would be!
[13:48:20] <wereHamster> what is the difference between { $pull : { field : {field2: value} } } and { $pull : { .field.field2': value } } ?
[13:57:44] <BlenderHead_625> Hi guys. I'm trying to $pull all elements from an array where the first element of the arrays (in the array) I want to $pull have a given value. I tried {'$pull': {'bookmarks.0': ...}} but that removed the first element of the array. I want to removed the entire array from its parent array if its first element matches the criteria. Any pointers?
[14:01:55] <BlenderHead_625> What I want is {'$pull': {'bookmarks': {0: 'match'}}} but it won't let me do that.
[14:04:11] <wereHamster> how about '0':'match' ?
[14:05:14] <BlenderHead_625> tried it, didn't work
[14:08:41] <BlenderHead_625> there must be a way to match array elements like it's possible to with {}
[14:43:05] <remonvv> Why not just $unset the array if it matches your query?
[14:43:15] <remonvv> or set it to []
[15:22:45] <JoeyJoeJo> I'm setting up a new MongoDB and as of yet I don't have much data, but I plan on having more and more as time goes on. Can I set up one shard initially and add more as needed?
[15:23:37] <Derick> yes
[15:24:30] <JoeyJoeJo> Is that a good plan?
[15:24:44] <remonvv> yes
[15:24:55] <remonvv> specifically it's a good plan to start with mongos based setups from the start
[15:25:00] <remonvv> shards can be added as needed
[15:25:10] <remonvv> (there's no real reason to have a 1 shard setup)
[15:25:28] <JoeyJoeJo> Sweet, thanks
[15:25:54] <remonvv> No problem.
[15:27:38] <JoeyJoeJo> Is it a good idea to run my app server, mongos and configdb all on one server?
[15:30:44] <remonvv> Well, it's definitely a good idea to have your mongos local to your app server in almost all cases.
[15:31:03] <JoeyJoeJo> But I should put configdb on it's own server?
[15:31:15] <remonvv> Nah, that's even more flexible.
[15:31:24] <remonvv> Config db's require very little in terms of resources
[15:31:31] <remonvv> It's jsut very important they're highly available.
[15:32:44] <JamesHarrison> sort of
[15:32:58] <JamesHarrison> doh, wrong channel, my bad
[15:33:43] <JoeyJoeJo> Ok, so one last question. I'm planning on getting 4 Hi-mem large EC2 instances. One will be my app server, mongos and configdb. The other three will be my first shard. Does that pass the sanity check?
[15:34:21] <remonvv> Well, what kind of read:write ratio do you expect?
[15:35:57] <JoeyJoeJo> I don't have an exact answer to that yet, but I'd say much more reading than writing. Data will come in large bursts every few days or so, but I assume clients will be looking at the data at any time
[15:36:00] <remonvv> I'm assuming with 3 machines for 1 shard you mean a repset
[15:36:28] <JoeyJoeJo> Is it not true that all shards are repsets?
[15:36:36] <remonvv> Ah right. So assuming those 3 machines form a replica set and your queries are all slaveOk=true then that should be fine.
[15:36:38] <remonvv> No.
[15:41:00] <remonvv> sharding and replica sets are two seperate things
[15:41:08] <remonvv> a shard can be a replica set
[15:41:35] <remonvv> It's easier for you to read up on it than for me to explain it in a few sentences ;)
[15:42:10] <remonvv> Sharding is basically a horizontal scaling strategy whereas replica sets are mostly a durability/availablity feature. With that said a replica set does scale up reads.
[16:06:24] <rguillebert> hi
[16:08:29] <rguillebert> I would like to unittest my python webapp that uses mongodb, do you know if there's a mock version of mongodb (or an in-memory version of mongodb) ?
[16:08:54] <remonvv> Unfortunately, there isn't.
[16:09:19] <remonvv> I'm not aware of any python mongodb mock tools and MongoDB cannot run in memory only mode atm
[16:11:14] <rguillebert> erf, that would be a nice thing to have, for sql you can use sqlite (especially if you're using an ORM)
[16:22:47] <remonvv> I agree ;)
[16:29:35] <drockna> im trying to start a mongod in a openvz read only container. What folders do i need to make writable for mongod to work?
[16:33:19] <pulse00> hi all. i'm trying to dig into document oriented databases, coming from a rdbms background. when the domain model is a relation between book and author (many-to-many), how would this be stored in a document oriented db? have a document "author", which then contains an array of books?
[16:51:55] <kali> pulse00: an author collection, a book collection, most likely. then you can deal with the relation in either side or both side.
[16:52:33] <kali> pulse00: the important thing with doc-oriented DB is not the static description of your model, but rather the way you will need to read it and access it
[16:53:41] <pulse00> kali: yeah, that's what i was asking myself. when i want to show a paginated list of all books sorted by release date, having the book as a "inner" document of the author looks odd from a rdbms perspective.
[16:55:19] <kali> pulse00: usually what you embed is not the "strong" entities of your model
[16:55:41] <pulse00> kali: i see, something like a comment in a blog post i guess
[16:55:51] <kali> pulse00: that's more the idea
[16:55:57] <pulse00> alright.
[16:56:09] <pulse00> so do relations exist? e.g. between books and authors?
[16:56:27] <kali> pulse00: it's up to you to materialize and resolve them
[16:56:38] <kali> pulse00: an array of author id in the book docs for instance
[16:56:49] <pulse00> i see
[16:57:11] <pulse00> and then i would issue 2 queries? first get the author, then the books by id?
[16:57:44] <kali> i was thinking the opposite, but yes
[16:59:28] <kali> if you embeds author ids in book, you can also use an index on the author array. that way you don't have to denormalize and maintain the relation both ways
[19:06:06] <MrJones98> hello - i have an issue understanding some semantics but may be specific to the pymongo driver
[19:06:13] <MrJones98> is this a reasonable place to ask?
[19:24:28] <mids> MrJones98: yes
[19:47:33] <Zelest> I thought of having a collection with embedded docs (one to many) where the content basically is "one domain/site and multiple pages" .. however, I wish to be able to select a single "page" within a document as well as sorting them to request the "oldest" page within that document.. is this possible?
[19:48:35] <Zelest> like { _id: 'domain.tld', pages: [ { uri: '/', date: ... }, { uri: '/about', date: ... }, { uri: '/contact', date: ... } ]
[19:50:44] <dstorrs> Zelest: it's probably possble, but I'm not clear on the goal. what are you trying to achieve?
[19:51:52] <Zelest> I'm writing a crawler and I plan to use mongodb to store the data on domains and pages within these domains.. and I wish to keep that in the same document to avoid excessive lookups.
[19:53:01] <Zelest> not sure on the most optimal solution for this.. and I use find_and_modify to fetch the pages to crawl (in order to make the crawlers concurrent)
[19:53:16] <dstorrs> ok. I would recommend profiling with embedded vs separate collections before assuming this will be optimal, but it sounds plausible at least.
[19:53:38] <dstorrs> find_and_modify is DEADLY slow
[19:53:58] <Zelest> really? :o
[19:54:05] <dstorrs> like, a couple orders of magnitudes slower than update()
[19:54:16] <Zelest> well
[19:54:35] <Zelest> if I have 10 threads fetching "the oldest urls" .. i want to get the 10 olders, right? :P
[19:54:39] <dstorrs> better solution is to throw out an optimistic update() where you set some attr on the item (e.g. "owner : my_id") then do a find()
[19:54:54] <Zelest> oh
[19:55:07] <Zelest> true that
[19:56:06] <dstorrs> I know because I'm currently building a web crawler that uses that strategy. :>
[19:56:17] <Zelest> oh :-D
[19:56:24] <Zelest> what lang? if I might ask :-)
[19:56:32] <dstorrs> Perl. crawling YouTube
[19:56:40] <Zelest> aah, Ruby here
[19:56:47] <Zelest> plan to crawl OpenNIC (www.opennicproject.org)
[19:56:57] <dstorrs> "Ruby: all the cool kids are doing it!" :>
[19:57:03] <Zelest> Haha
[19:57:13] <Zelest> I guess that's Rails these days :P
[19:57:20] <dstorrs> yeah, mostly.
[19:57:41] <Zelest> but yeah, ruby is lovely.. sadly I'm stuck with php at work so ruby is a nice change. :-)
[19:57:52] <Zelest> as for perl, it's a write-only language for me. :-P
[19:58:00] <dstorrs> I *strongly* recommend profiling before you go too far down the embedded or referenced path.
[19:58:27] <dstorrs> yeah, a lot of people who aren't very good at it say that. ;>
[19:58:37] <Zelest> hehe
[19:58:43] <Zelest> I sure suck at it, so no argues there ;-)
[19:59:10] <dstorrs> FYI, if you find it write-only, I'd be willing to be you haven't looked at it in ~5 years.
[19:59:30] <dstorrs> Moose / Mouse / Moo => declarative OO programming in Perl
[19:59:37] <Zelest> oh
[19:59:50] <dstorrs> method modifiers, all kinds of tasty goodness.
[19:59:59] <Zelest> sounds interesting indeed..
[20:00:29] <dstorrs> anyway, I see two paths you could take.
[20:01:08] <therealkoopa> If I have a list of embedded documents, is there a way to run a query to find all of the embedded documents up to a certain object id? For instance, [1, 2, 3, 4, 5]. Grab me everything up to objectid 3903 which is the 4th
[20:01:35] <dstorrs> First would be a "vertical" stack of pages, like so: { site: 'foo.com', page : 'http://...', owner : 'none', lock_expires: 0 }
[20:02:07] <dstorrs> oops, add an attr -- "added : $EPOCH" to that.
[20:03:14] <dstorrs> then, when you want to query it, you do: db.coll.update({ site : 'foo.com', owner : 'none' }, { $set : { owner : my_id(), lock_expires : time() + 10 } )
[20:03:33] <dstorrs> and db.coll.find({ site : 'foo.com', owner : my_id() })
[20:03:53] <dstorrs> add a sort onto whichever of those is most relevant.
[20:04:28] <Zelest> that's almost how I use it now.. but with code (http code) set to 0 if "in progress" ..
[20:04:35] <Zelest> but using find_and_modify that is
[20:04:54] <dstorrs> the second way is to have generally the same structure, but "stripe" them horizontally, like so { site : 'foo.com', pages : [ { owner : 'none' ...} ]}
[20:05:23] <dstorrs> add new pages using pushAll.
[20:06:06] <dstorrs> In profiling, I found that up to ~50 pages, I was getting better performance with embedded docs than a straight collection, but above that it flipped.
[20:06:54] <dstorrs> using embedding complicates your code a lot, though.
[20:07:33] <dstorrs> also, watch out for the "16M per doc" limit if your max number of pages is very high, or if the page objects are large
[20:07:54] <Zelest> ah, i will use postgresql for the actual content of the page
[20:08:01] <Zelest> mongodb is only for metadata
[20:08:41] <dstorrs> sure, but if that metadata document is large-ish -- say, 1k -- and you have 1,000 pages per site, that means the entire site document is ~1M.
[20:08:53] <Zelest> true
[20:09:18] <dstorrs> You can't retrieve just one page from the array, you need to pull the entire pages array over the wire. => slow
[20:10:15] <dstorrs> even if 99.9% of your sites will have only a dozen pages, you need to account for the one Googlebomber with 50,000 pages on his site. That blows the doc limit.
[20:11:39] <Zelest> ah, then that settles it.. seeing opennic has plenty of wikileaks clones..
[20:11:55] <Zelest> not to mention "mistakes" like spider-traps and such
[20:11:59] <dstorrs> :>
[20:12:25] <Zelest> i'm still surpriced you say find_and_modify is so horribly slow.. :o
[20:12:50] <Zelest> i use it on my mini-crawler on httpdb.org .. and it can easily reach up to 40-50 sites a sec
[20:12:56] <dstorrs> oh and, before you think of it -- don't even think about using one collection per site. http://www.mongodb.org/display/DOCS/Using+a+Large+Number+of+Collections
[20:13:23] <Zelest> wow, haha, nah, that's too extreme
[20:13:42] <Zelest> opennic only has around 10 mil pages
[20:13:46] <dstorrs> I was getting about 100 jobs a second with FaM, vs something like 5000 with update(). but don't take my word for it -- do your own profiling.
[20:14:17] <Zelest> wow
[20:14:51] <dstorrs> it's possible it was an error on my part -- poor / missing index, inefficient query, etc.
[20:21:25] <Zelest> regarding indexes.. if I want to find a document where site: is "foo.com" and the path: is "/robots.txt" .. i do want to use an index to find /robots.txt, but I don't want to create a full index on the path:, only the ones which is /robots.txt.. is that doable?
[20:21:42] <Zelest> or will I have to store that as a separate field?
[20:48:15] <kenny> I'm looking to create trigrams on an object. Would it make more sense to store them directly on each object, or in a separate collection with a list of matching objects? (there'll be a lot of overlap)
[20:53:09] <kenny> (the problem I'm trying to solve is fast regex searching across around 25G of data)
[21:06:26] <therealkoopa> Any idea why a console.log(change) gives me an object that has a property called name that's set, but if I do change.name it's undefined?
[21:06:33] <therealkoopa> console.log(change.name), it's defined.
[21:06:41] <therealkoopa> undefined, rather.
[21:13:48] <dstorrs> kenny: it sounds like you might be doing full text searching. Yes?
[22:14:19] <jonwage> what all can cause MongoConnectionException: Operation now in progress?
[22:14:34] <jonwage> I'm trying to figure out what causes that in our test suite. I watch db.serverStatus() and we're not using all the connections
[22:14:40] <jonwage> it just hangs for some reason(sometimes)
[22:14:44] <jonwage> not all builds fail with the error
[22:14:45] <jonwage> some succeed
[22:15:50] <jonwage> possible that I issue some background operation or something that then blocks later tests?
[22:18:21] <ahmeij> Hi, can anyone please help by explaining where i need to start looking to solve the following?: query sr_production.profiles ntoreturn:1 nscanned:144381 scanAndOrder:1 nreturned:1 reslen:1340 6861ms
[22:19:48] <ahmeij> I've been looking all over but I cant seem to find one responsible query (although it should be somewhere I guess :) )
[22:20:01] <jY> lack of index
[22:20:14] <ahmeij> This is a table scan without an index?
[22:20:20] <jY> yep
[22:20:24] <ahmeij> Cool, thnks!
[22:20:34] <ahmeij> That should narrow my search field
[22:20:51] <jY> looks like on your profiles collections
[22:20:56] <jY> might be a count() or something
[22:21:59] <ahmeij> Is there a way to force the server to log the actual query?
[22:34:48] <jY> did you find that via the query profiler?
[22:39:54] <dstorrs> ahmeij: note that, as jY implied, count() is always slow. That's an infrastructure problem -- Mongo uses uncounted BTrees for its indices, so a count is always a table scan
[22:41:07] <ahmeij> I tried the query profiler just now
[22:41:27] <ahmeij> However all reads go to slaves which wont allow me to check the db.system.profile collection
[22:41:43] <ahmeij> and they are not replicated to the master it seams
[22:42:27] <ahmeij> dstorrs: I think I found my culprit, a relative new query via mongoid, however I cant determine the exact query format yet, as to build the best index
[22:43:08] <dstorrs> sounds like a good start, at least.
[22:43:14] <ahmeij> a count of the entire collection takes less than 100ms
[22:43:27] <ahmeij> so slow indeed, but not the cause
[22:43:39] <ahmeij> I'm investigating this query further indeed
[22:55:52] <kenny> dstorrs: I'm not sure if full text search is what I'm trying to do. Googling now. I'm trying to do fast full regex searching on at least one text field of around 500 bytes.
[22:56:54] <dstorrs> I don't know if it fits your use case, but you may want to at least consider using something like Solr that's purpose-built for text searching
[22:57:39] <dstorrs> you can probably make it work with Mongo, but it may end up being cheaper and easier with something that's dedicated to the job
[22:58:19] <kenny> thank you. I'll check that out and see if it'll do what I need.
[22:58:41] <dstorrs> no worries. There's other things like it too. That's just the one that comes to mind first.
[22:58:45] <rossdm> look into ElasticSearch too, it indices json docs
[22:59:25] <dstorrs> yeah, good thought.
[22:59:48] <kenny> will do. Anything that'll help reduce the time of my current 40min queries :)
[23:06:36] <ahmeij> jY / dstorrs: I found this query: { "ts" : ISODate("2012-06-21T23:00:16.906Z"), "op" : "query", "ns" : "sr_production.profiles", "query" : { "$query" : { "identities.service" : "service2", "identities.uid" : "24h24kh8" }, "$orderby" : { "_id" : 1 } }, "ntoreturn" : 1, "nscanned" : 145262, "scanAndOrder" : true, "nreturned" : 1, "responseLength" : 6037, "millis" : 8465, "client" : "xxx", "user" : "" }
[23:06:55] <retworda> So Im changing _id in one of my collections. Have been running this, but its taking forever. Have I made a mistake? The docs Im inserting arent being picked up by the find are they? db.coll.find().forEach(function(d) { // remove d, edit it, reinsert it })
[23:07:39] <ahmeij> But it does have an index: "v" : 1, "key" : { "identities.service" : 1, "identities.uid" : 1 },
[23:12:48] <ahmeij> 200k records, query with an index, resulting in either 1 or 0 records takes 8+ seconds :( if anyone has a clue (see above) please help
[23:13:24] <jY> ahmeij: run explain on it
[23:13:58] <ahmeij> I'm so new ro mongo performance investigations that i keep forgetting the basic stuff, thnx
[23:17:01] <ahmeij> jY: Good info, it tells me the index is not used (as expected)
[23:17:20] <jY> yep
[23:18:05] <ahmeij> jY: now the bigger issue: why :) I'll re-read the multi key index docs
[23:18:20] <jY> try figuring it out first without the orderby/sort
[23:18:33] <ahmeij> I did that, no change
[23:18:58] <jY> try just making an index on identities.service
[23:21:15] <ahmeij> jY: w00t! it seems the time went from 6000ms+ to 100ms+
[23:21:57] <jY> seems it didn't like your compound index then
[23:22:04] <ahmeij> Indeed
[23:22:31] <jY> if identities.uid is more unique.. you might want to index that instead
[23:22:31] <ahmeij> Troughput now many times of what it was before
[23:22:41] <jY> and put that in the query first
[23:22:44] <ahmeij> I did that indeed :)
[23:22:45] <ahmeij> ow
[23:22:55] <ahmeij> do I need oto order the query correctly ?
[23:23:09] <jY> not sure.. but i think it will use whatever it finds first
[23:23:12] <jY> but don't hold me to that :)
[23:23:25] <ahmeij> i'll use explain :) thanks for teaching me where to look
[23:23:34] <jY> np
[23:23:37] <retworda> Pls ... http://pastie.org/4129175 Is there any chance the function is interfering with the set of documents being iterated over? (the inserts making the set endless?)
[23:24:03] <ahmeij> briljant, the application server load just spiked becasue it can finally move at a decent speed :)
[23:24:07] <dstorrs> ahmeij: This may or may not be useful to you. It was to me: http://kylebanker.com/blog/2010/09/21/the-joy-of-mongodb-indexes/
[23:24:33] <ahmeij> Thanks, I'll read it
[23:25:16] <ahmeij> Looks very useful
[23:28:34] <Zelest> dstorrs, if I run an update.. and a find afterwards.. is it ever possible that the update won't apply until after the find has been done?
[23:28:43] <Zelest> dstorrs, as in, does the update lock?
[23:29:13] <jY> update will lock since it is a write
[23:29:28] <Zelest> ah, as i thought.. just wanted to doublecheck it. thanks :-)
[23:29:42] <dstorrs> Zelest: also, you can use 'safe' and 'w : 2' for extra safety
[23:29:50] <Zelest> mhm
[23:31:47] <ahmeij> Zelest: In my experience, certainly with slaves but even without, if you have async writes (default?) the read of a just written record can return the previous version, we swithced to savemode for all writes becasue of that
[23:32:17] <Zelest> ah, i see.. thanks
[23:32:44] <Zelest> hmms.. can i use both $set and $unset in the same query?
[23:32:49] <ahmeij> btw: I'm still learning a lot, however just reporting what I experienced
[23:33:23] <dstorrs> ahmeij: no worries. We all are. I dthink there are only two or three people in the channel with really significant experience.
[23:33:30] <dstorrs> And they work for 10gen. :>
[23:35:32] <ahmeij> does that mean that if you get a high enough experience level through use you will automatically be hired by 10gen ? :P
[23:35:55] <bjori> on that note... we are hiring :)
[23:36:00] <dstorrs> "automatically" is a strong word, but there's likely a good chance
[23:36:25] <ahmeij> I just learned that you already employ the 3 gurus here :)
[23:36:28] <dstorrs> I don't know of a non-legacy tech company that ISN"T desperate for good people right now
[23:37:10] <ahmeij> True, however strange I find it that a company like for example github would need 50+ developers
[23:37:56] <dstorrs> A successful company always has more projects they would like to do than hands to do them.
[23:38:21] <dstorrs> One of the biggest things that made google successful was that they got started right after the last tech boom. There was a huge glut of available talent in the Valley,
[23:38:32] <dstorrs> and they hired them all, even if they didn't have jobs.
[23:38:38] <dstorrs> *jobs for them.
[23:38:50] <ahmeij> I think to be succesful in the long run it is important to keep focussing on the core business, --> Google is a nice exception though
[23:38:54] <dstorrs> That way, they weren't available to the competition and Google could expand fast
[23:40:02] <dstorrs> If you're a business with a future, your "core business" generally has a lot of room in it. I run a startup, and I could easily find work for 50 good devs / sysadmins / designers / etc
[23:40:57] <dstorrs> Sadly, we are not currently hiring, due to resource limits
[23:41:31] <ahmeij> I run one too, and while we could have a few more, I'd think 5 would be sufficient :) however once we have 5 our goals would probably expand faster etc
[23:42:27] <ahmeij> Same issue, we need a bit more cash to grow the software a bit faster to be more actractive to VC's, and that completes the chicken and egg problem
[23:42:45] <dstorrs> yep
[23:42:54] <ahmeij> but we are doing quite ok,just not as fast as we'd want to at times :)
[23:43:31] <ahmeij> anyway, since it is close to 2am. and thanks to you guys my production issue is resolved, it is time to get some sleep
[23:44:10] <ahmeij> I'll be back, I hope to learn way way more since we are fully running on mongo now :)
[23:45:12] <ahmeij> thanks again dstorrs and jY
[23:48:15] <dstorrs> ahmeij: np
[23:51:03] <mmlac> Hello everyone. I am struggling with loading my custom Mongoid::MyMongoidModule with rails standard autoloading, so I was wondering what is the best practice to extend Mongoid? Right now I create lib/mongoid/mymongoidmodel.rb
[23:51:58] <mmlac> and in that, standard module Mongoid module MyMongoidModule end end. Normal autoload fails (Expected to define MyMongoidModule at load_missing_constant)