[00:08:07] <matthewvermaak> any ideas why this happens? http://pastie.org/4123535
[00:11:37] <matthewvermaak> wow, if you switch the order of those conditions it works ...
[00:12:38] <matthewvermaak> at least 1.9 has order in its hashes i guess
[00:24:24] <matthewvermaak> ah, nvm, thats effectively being treated as an or condition across the embedded documents, which matches, and then the positional is set to the last matching criterion
[00:24:29] <matthewvermaak> so, that wont work for me
[00:26:00] <matthewvermaak> is there an "update if current" strategy for embedded documents?
[01:37:54] <kenny> any tips on why db.auth("","") will fail right away, but db.auth(user,password) and show collections hangs forever?
[04:13:12] <dstorrs> I have a document that looks like this: { _id : 'foo', pages : [ {}, {}, ...] } What is the query I need to find the length of the 'pages' array?
[04:15:05] <kobold> dstorrs: you can't; fetch the document and count it, or store the length in a field and use that instead
[08:28:17] <NodeX> http://www.infoworld.com/d/open-source-software/red-hat-releases-nosql-database-enterprise-java-196020 <---- interesteing, I wonder what it's like
[08:30:49] <mkopras> hey guys, I got question about particular query http://pastie.org/private/vx6yybunxkxmzoleau1vw can someone look ?
[08:37:40] <NodeX> Red Hat JBoss Data Grid 6 <-- not really a very good name
[08:39:40] <mids> NodeX: not sure how dot notation would work in this example; but you can also do db.bbb.find({ steps : {name: "aaa", date : 1340020000 } })
[08:47:38] <NodeX> My mistake - I read the source document wrong
[08:47:41] <remonvv> No, there's a semantic difference between two fields having to match for a single array element and either field having to match for any but not necessarily the same element in the array.
[08:47:52] <NodeX> I thought bbb & the timestamp we're the same on BOTH documents
[08:47:56] <mids> anyway, the client is happy. next! :D
[08:55:10] <NodeX> it beats the old record hands down though which means encryption needs to get better else Moores law will nullify SSL very very soon
[08:55:25] <remonvv> You have to wonder what comes first. Household quantum computers that make the research useless or conventional computers that start cracking this stuff in any sort of practical speed.
[08:56:01] <NodeX> good software and good hardware
[08:56:36] <remonvv> Well, effort to bump up encryption bits to something higher is endlessly easier than speeding up brute force decryption. I somehow doubt this'll ever really get relevant in day to day life.
[08:57:19] <NodeX> let's hope they make a brain chip to wirelessly beam keys to the user!!
[09:04:00] <JamesHarrison> Hi - relatively new to MongoDB, working on a site which makes fairly heavy use of $nin to filter out content - a query like {$orderby: {created_at: -1}, $query: {tag_ids: {$nin: ['some-tag']}, hidden_from_users: {$in: [None, False]}}} appears to completely not use an index set up for created_at-1, tag_ids 1, hidden_from_users 1. What am I doing wrong?
[09:05:33] <JamesHarrison> Tried to, but it's not making the query performance any better, which makes me think I may be missing something more fundamental with my query structure.
[09:05:51] <JamesHarrison> (nscanned still much, much higher than n in explain)
[09:11:12] <NodeX> I can't see anything that jumps out
[09:11:37] <NodeX> is the index definately built and not in the background building ?
[09:12:01] <JamesHarrison> Not seeing it in currentOp and it appears to be built; I didn't do it as a background thing to start with
[09:14:09] <NodeX> try to drop the index and rebuild it .. maybe that will kickstart it
[09:14:44] <JamesHarrison> That query type is currently showing at about 150ms avg, some (with many nin tags) up to 1.5 seconds, which is a bit high for an 8-core dual xeon box with 16 gigs of ram, SSDs and only 13,000 objects (38MB) total to look through
[09:14:45] <remonvv> It wont, $nin wont hit an index
[09:14:55] <JamesHarrison> and there's the fundamental thing I'm missing
[09:16:50] <remonvv> It can use an index but a) it almost always results in very little performance gain and b) only hits it for relatively simple indexes.
[09:17:00] <carsten_> not it can not be fixed when you know how indexes work :)
[09:17:14] <remonvv> It can be fixed to use the index, not to make it fast.
[09:17:20] <NodeX> I use it in a single query like this ... settings : {$nin : ["Admin"]
[09:17:31] <remonvv> That works, it's just ineffective ;)
[09:17:35] <NodeX> I hope it works one for that (not that I have noticed it not using it
[09:37:45] <JamesHarrison> well, it doesn't pick up on it, and if I hint to explicitly specify that query, http://pastie.org/private/u82xqwmedtpax4r85nt9na
[09:37:49] <remonvv> Can you drop the index it's hitting now and see if it hits the new one?
[09:37:56] <JamesHarrison> performance is even worse
[09:40:05] <JamesHarrison> This should surely at least be: sorting by created_by, then for the resultant datasets, getting at least 100 with first query filter, paring down with second filter (both with indexes) and doing that again till it hits 100
[09:41:09] <remonvv> Well if it would do that (it doesn't) it would have to sort the entire dataset before doing anything else.
[09:41:16] <remonvv> That alone is bye bye performance ;)
[09:41:28] <remonvv> It always reduces the data set before doing any sort of ordering.
[09:42:25] <remonvv> Okay, so what is happening is this :
[09:42:28] <JamesHarrison> and the latter is also hitting a small percentage (2-5% maybe, varies per user)
[09:43:40] <remonvv> first index field doesn't help much at all and thus the candidate set is huge, the $nin that follows is slow (regardless of index) because it has to scan a huge candidate set. Once that has all happened MongoDB has to sort the full set in memory (scanAndOrder: true) and then return the first 100.
[09:44:04] <remonvv> I'd be very surprised if removing the limit results in much worse performance.
[09:44:23] <remonvv> Indexes need high cardinality fields to shine
[09:45:17] <JamesHarrison> Yeah. Okay, so for the problem we have here (filtering images tagged with a set of tags (there are 10,000+ tags and growing), filtering hidden images, and sorting) what's the right approach in MongoDB terms?
[09:45:43] <JamesHarrison> Doing a quick query to find the image IDs we want to exclude, then excluding them and sorting?
[09:46:01] <JamesHarrison> (though I guess $in is going to suck pretty hard performance-wise too...)
[10:10:24] <remonvv> Inform me about how I fixed all your problems and changed the way you view life.
[11:22:48] <JamesHarrison> remonvv: sorry, real life intervened - good stuff, I'll have a play, thanks to you and NodeX
[11:52:57] <ahmeij> Hi! I'm new here, and have a question that is probably an education issue, but I cant find an answer. I have a mongo replica set, 3 equal rackspace cloudservers, with a significant load, where 2 servers run with a load of 20 (the master and 1 slave) and the 2nd slave is ideling at load 0. However mongostats seems to indicate the second slave is processing just as much queries as the first slave
[11:54:16] <ahmeij> I dont think this is typical behaviour, but maybe it points to a common configuration mistake?
[11:57:13] <carsten_> driver issue where the driver can not balance the load properly across the slaves and master?
[11:58:27] <ahmeij> all connections come from the default mongo driver for ruby, I have tried to investigate that a bit, however mongostats indicates all queries are split over the 2 slaves and all updates go to the master (as expected)
[12:10:34] <remonvv> ahmeij, are all members up to date? one of the possible explanation is that your idle member is behind which results in faster queries (less data to query on). That's a bit far fetched though.
[12:25:45] <mklappstuhl> I want users to be able to setup own "content structures" where they enter a name and add a set of properties... then they should be able to save this "form" and "instantiate" it later and it should be saved with actual data
[12:26:19] <mklappstuhl> It's kind of similar to what you can do in some CMS where you can define custom content elements
[12:26:50] <mklappstuhl> I'm pretty sure there is a name for that, too
[12:27:03] <mklappstuhl> Anyone an idea how to implement something like this with mongodb?
[12:39:16] <npx> So what's up with the V8 port? Is it going to be possible for Node.js to share a libv8 instance with MongoDB?
[12:47:55] <remonvv> It's in progress afaik. Why would you want it to share the js context?
[12:49:09] <npx> it would remove the need for a client library and just let me write Mongo queries directly
[12:49:40] <npx> which I can sorta do, it's probably a bad idea for them to share a context for a dozen reasons, it was just the first thing that came to my mind
[12:50:18] <remonvv> That, and I don't think your premise is correct. You'll need a library of some sort to convert your queries to the MongoDB line protocol.
[13:02:41] <lazymanc> hi, I've just installed mongodb from the 10-gen repo on ubuntu 12.04 (VM) and it's now sat there idling at 15-20% cpu - is that normal?
[13:02:53] <lazymanc> this is without me accessing it
[13:07:34] <lazymanc> VM is allocated 512mb total - could this be the reason?
[13:13:16] <remonvv> No it's not normal and I'm not sure why limited memory would cause cpu load.
[13:14:04] <Gargoyle> lazymanc: What is idling, the vm or the host?
[13:15:15] <lazymanc> remonvv: was wondering if it was swapping to virtual disk because of limited memory (completely new to this so just guessing)
[13:15:43] <lazymanc> Gargoyle: I mean mongodb wasn't doing anything, it was just started but nothing is accessing it as yet
[13:16:49] <Gargoyle> lazymanc: I have a 3 node replica set running on virtual hardware idling at 4% (When I say idling, the could be 1 or 2 users + mms agent)
[13:17:08] <lazymanc> top is showing mongod at 15-20% usage on the ubuntu guest
[13:17:37] <lazymanc> i'll take a look at the log and see if that gives any ideas
[13:20:47] <Gargoyle> What would be "a lot" of connections for a mongod server replicating to 2 more nodes, and running 2 web apps (in testing, so only a handful of users / apache threads)
[13:21:06] <Gargoyle> Better, question: What would be "normal" ?
[13:37:08] <remonvv> Most drivers follow a rather odd pattern of establish new connections until they run out of allowed connections from a single host.
[13:38:05] <remonvv> Gargoyle, it's not a worrying amount of connections but you can control the maximum amount of connections per client if you feel it should use less.
[13:45:12] <Gargoyle> Well, if it's not a worrying amount, then I'll let it be - at this moment in time, I really have no idea what a sensible value would be!
[13:48:20] <wereHamster> what is the difference between { $pull : { field : {field2: value} } } and { $pull : { .field.field2': value } } ?
[13:57:44] <BlenderHead_625> Hi guys. I'm trying to $pull all elements from an array where the first element of the arrays (in the array) I want to $pull have a given value. I tried {'$pull': {'bookmarks.0': ...}} but that removed the first element of the array. I want to removed the entire array from its parent array if its first element matches the criteria. Any pointers?
[14:01:55] <BlenderHead_625> What I want is {'$pull': {'bookmarks': {0: 'match'}}} but it won't let me do that.
[15:22:45] <JoeyJoeJo> I'm setting up a new MongoDB and as of yet I don't have much data, but I plan on having more and more as time goes on. Can I set up one shard initially and add more as needed?
[15:32:58] <JamesHarrison> doh, wrong channel, my bad
[15:33:43] <JoeyJoeJo> Ok, so one last question. I'm planning on getting 4 Hi-mem large EC2 instances. One will be my app server, mongos and configdb. The other three will be my first shard. Does that pass the sanity check?
[15:34:21] <remonvv> Well, what kind of read:write ratio do you expect?
[15:35:57] <JoeyJoeJo> I don't have an exact answer to that yet, but I'd say much more reading than writing. Data will come in large bursts every few days or so, but I assume clients will be looking at the data at any time
[15:36:00] <remonvv> I'm assuming with 3 machines for 1 shard you mean a repset
[15:36:28] <JoeyJoeJo> Is it not true that all shards are repsets?
[15:36:36] <remonvv> Ah right. So assuming those 3 machines form a replica set and your queries are all slaveOk=true then that should be fine.
[15:41:35] <remonvv> It's easier for you to read up on it than for me to explain it in a few sentences ;)
[15:42:10] <remonvv> Sharding is basically a horizontal scaling strategy whereas replica sets are mostly a durability/availablity feature. With that said a replica set does scale up reads.
[16:08:29] <rguillebert> I would like to unittest my python webapp that uses mongodb, do you know if there's a mock version of mongodb (or an in-memory version of mongodb) ?
[16:29:35] <drockna> im trying to start a mongod in a openvz read only container. What folders do i need to make writable for mongod to work?
[16:33:19] <pulse00> hi all. i'm trying to dig into document oriented databases, coming from a rdbms background. when the domain model is a relation between book and author (many-to-many), how would this be stored in a document oriented db? have a document "author", which then contains an array of books?
[16:51:55] <kali> pulse00: an author collection, a book collection, most likely. then you can deal with the relation in either side or both side.
[16:52:33] <kali> pulse00: the important thing with doc-oriented DB is not the static description of your model, but rather the way you will need to read it and access it
[16:53:41] <pulse00> kali: yeah, that's what i was asking myself. when i want to show a paginated list of all books sorted by release date, having the book as a "inner" document of the author looks odd from a rdbms perspective.
[16:55:19] <kali> pulse00: usually what you embed is not the "strong" entities of your model
[16:55:41] <pulse00> kali: i see, something like a comment in a blog post i guess
[16:57:11] <pulse00> and then i would issue 2 queries? first get the author, then the books by id?
[16:57:44] <kali> i was thinking the opposite, but yes
[16:59:28] <kali> if you embeds author ids in book, you can also use an index on the author array. that way you don't have to denormalize and maintain the relation both ways
[19:06:06] <MrJones98> hello - i have an issue understanding some semantics but may be specific to the pymongo driver
[19:06:13] <MrJones98> is this a reasonable place to ask?
[19:47:33] <Zelest> I thought of having a collection with embedded docs (one to many) where the content basically is "one domain/site and multiple pages" .. however, I wish to be able to select a single "page" within a document as well as sorting them to request the "oldest" page within that document.. is this possible?
[19:50:44] <dstorrs> Zelest: it's probably possble, but I'm not clear on the goal. what are you trying to achieve?
[19:51:52] <Zelest> I'm writing a crawler and I plan to use mongodb to store the data on domains and pages within these domains.. and I wish to keep that in the same document to avoid excessive lookups.
[19:53:01] <Zelest> not sure on the most optimal solution for this.. and I use find_and_modify to fetch the pages to crawl (in order to make the crawlers concurrent)
[19:53:16] <dstorrs> ok. I would recommend profiling with embedded vs separate collections before assuming this will be optimal, but it sounds plausible at least.
[19:53:38] <dstorrs> find_and_modify is DEADLY slow
[19:54:35] <Zelest> if I have 10 threads fetching "the oldest urls" .. i want to get the 10 olders, right? :P
[19:54:39] <dstorrs> better solution is to throw out an optimistic update() where you set some attr on the item (e.g. "owner : my_id") then do a find()
[20:00:29] <dstorrs> anyway, I see two paths you could take.
[20:01:08] <therealkoopa> If I have a list of embedded documents, is there a way to run a query to find all of the embedded documents up to a certain object id? For instance, [1, 2, 3, 4, 5]. Grab me everything up to objectid 3903 which is the 4th
[20:01:35] <dstorrs> First would be a "vertical" stack of pages, like so: { site: 'foo.com', page : 'http://...', owner : 'none', lock_expires: 0 }
[20:02:07] <dstorrs> oops, add an attr -- "added : $EPOCH" to that.
[20:03:14] <dstorrs> then, when you want to query it, you do: db.coll.update({ site : 'foo.com', owner : 'none' }, { $set : { owner : my_id(), lock_expires : time() + 10 } )
[20:03:33] <dstorrs> and db.coll.find({ site : 'foo.com', owner : my_id() })
[20:03:53] <dstorrs> add a sort onto whichever of those is most relevant.
[20:04:28] <Zelest> that's almost how I use it now.. but with code (http code) set to 0 if "in progress" ..
[20:04:35] <Zelest> but using find_and_modify that is
[20:04:54] <dstorrs> the second way is to have generally the same structure, but "stripe" them horizontally, like so { site : 'foo.com', pages : [ { owner : 'none' ...} ]}
[20:06:06] <dstorrs> In profiling, I found that up to ~50 pages, I was getting better performance with embedded docs than a straight collection, but above that it flipped.
[20:06:54] <dstorrs> using embedding complicates your code a lot, though.
[20:07:33] <dstorrs> also, watch out for the "16M per doc" limit if your max number of pages is very high, or if the page objects are large
[20:07:54] <Zelest> ah, i will use postgresql for the actual content of the page
[20:08:41] <dstorrs> sure, but if that metadata document is large-ish -- say, 1k -- and you have 1,000 pages per site, that means the entire site document is ~1M.
[20:09:18] <dstorrs> You can't retrieve just one page from the array, you need to pull the entire pages array over the wire. => slow
[20:10:15] <dstorrs> even if 99.9% of your sites will have only a dozen pages, you need to account for the one Googlebomber with 50,000 pages on his site. That blows the doc limit.
[20:11:39] <Zelest> ah, then that settles it.. seeing opennic has plenty of wikileaks clones..
[20:11:55] <Zelest> not to mention "mistakes" like spider-traps and such
[20:12:25] <Zelest> i'm still surpriced you say find_and_modify is so horribly slow.. :o
[20:12:50] <Zelest> i use it on my mini-crawler on httpdb.org .. and it can easily reach up to 40-50 sites a sec
[20:12:56] <dstorrs> oh and, before you think of it -- don't even think about using one collection per site. http://www.mongodb.org/display/DOCS/Using+a+Large+Number+of+Collections
[20:13:23] <Zelest> wow, haha, nah, that's too extreme
[20:13:42] <Zelest> opennic only has around 10 mil pages
[20:13:46] <dstorrs> I was getting about 100 jobs a second with FaM, vs something like 5000 with update(). but don't take my word for it -- do your own profiling.
[20:14:51] <dstorrs> it's possible it was an error on my part -- poor / missing index, inefficient query, etc.
[20:21:25] <Zelest> regarding indexes.. if I want to find a document where site: is "foo.com" and the path: is "/robots.txt" .. i do want to use an index to find /robots.txt, but I don't want to create a full index on the path:, only the ones which is /robots.txt.. is that doable?
[20:21:42] <Zelest> or will I have to store that as a separate field?
[20:48:15] <kenny> I'm looking to create trigrams on an object. Would it make more sense to store them directly on each object, or in a separate collection with a list of matching objects? (there'll be a lot of overlap)
[20:53:09] <kenny> (the problem I'm trying to solve is fast regex searching across around 25G of data)
[21:06:26] <therealkoopa> Any idea why a console.log(change) gives me an object that has a property called name that's set, but if I do change.name it's undefined?
[22:15:50] <jonwage> possible that I issue some background operation or something that then blocks later tests?
[22:18:21] <ahmeij> Hi, can anyone please help by explaining where i need to start looking to solve the following?: query sr_production.profiles ntoreturn:1 nscanned:144381 scanAndOrder:1 nreturned:1 reslen:1340 6861ms
[22:19:48] <ahmeij> I've been looking all over but I cant seem to find one responsible query (although it should be somewhere I guess :) )
[22:21:59] <ahmeij> Is there a way to force the server to log the actual query?
[22:34:48] <jY> did you find that via the query profiler?
[22:39:54] <dstorrs> ahmeij: note that, as jY implied, count() is always slow. That's an infrastructure problem -- Mongo uses uncounted BTrees for its indices, so a count is always a table scan
[22:41:07] <ahmeij> I tried the query profiler just now
[22:41:27] <ahmeij> However all reads go to slaves which wont allow me to check the db.system.profile collection
[22:41:43] <ahmeij> and they are not replicated to the master it seams
[22:42:27] <ahmeij> dstorrs: I think I found my culprit, a relative new query via mongoid, however I cant determine the exact query format yet, as to build the best index
[22:43:08] <dstorrs> sounds like a good start, at least.
[22:43:14] <ahmeij> a count of the entire collection takes less than 100ms
[22:43:27] <ahmeij> so slow indeed, but not the cause
[22:43:39] <ahmeij> I'm investigating this query further indeed
[22:55:52] <kenny> dstorrs: I'm not sure if full text search is what I'm trying to do. Googling now. I'm trying to do fast full regex searching on at least one text field of around 500 bytes.
[22:56:54] <dstorrs> I don't know if it fits your use case, but you may want to at least consider using something like Solr that's purpose-built for text searching
[22:57:39] <dstorrs> you can probably make it work with Mongo, but it may end up being cheaper and easier with something that's dedicated to the job
[22:58:19] <kenny> thank you. I'll check that out and see if it'll do what I need.
[22:58:41] <dstorrs> no worries. There's other things like it too. That's just the one that comes to mind first.
[22:58:45] <rossdm> look into ElasticSearch too, it indices json docs
[23:06:55] <retworda> So Im changing _id in one of my collections. Have been running this, but its taking forever. Have I made a mistake? The docs Im inserting arent being picked up by the find are they? db.coll.find().forEach(function(d) { // remove d, edit it, reinsert it })
[23:07:39] <ahmeij> But it does have an index: "v" : 1, "key" : { "identities.service" : 1, "identities.uid" : 1 },
[23:12:48] <ahmeij> 200k records, query with an index, resulting in either 1 or 0 records takes 8+ seconds :( if anyone has a clue (see above) please help
[23:23:37] <retworda> Pls ... http://pastie.org/4129175 Is there any chance the function is interfering with the set of documents being iterated over? (the inserts making the set endless?)
[23:24:03] <ahmeij> briljant, the application server load just spiked becasue it can finally move at a decent speed :)
[23:24:07] <dstorrs> ahmeij: This may or may not be useful to you. It was to me: http://kylebanker.com/blog/2010/09/21/the-joy-of-mongodb-indexes/
[23:28:34] <Zelest> dstorrs, if I run an update.. and a find afterwards.. is it ever possible that the update won't apply until after the find has been done?
[23:28:43] <Zelest> dstorrs, as in, does the update lock?
[23:29:13] <jY> update will lock since it is a write
[23:29:28] <Zelest> ah, as i thought.. just wanted to doublecheck it. thanks :-)
[23:29:42] <dstorrs> Zelest: also, you can use 'safe' and 'w : 2' for extra safety
[23:31:47] <ahmeij> Zelest: In my experience, certainly with slaves but even without, if you have async writes (default?) the read of a just written record can return the previous version, we swithced to savemode for all writes becasue of that
[23:35:32] <ahmeij> does that mean that if you get a high enough experience level through use you will automatically be hired by 10gen ? :P
[23:35:55] <bjori> on that note... we are hiring :)
[23:36:00] <dstorrs> "automatically" is a strong word, but there's likely a good chance
[23:36:25] <ahmeij> I just learned that you already employ the 3 gurus here :)
[23:36:28] <dstorrs> I don't know of a non-legacy tech company that ISN"T desperate for good people right now
[23:37:10] <ahmeij> True, however strange I find it that a company like for example github would need 50+ developers
[23:37:56] <dstorrs> A successful company always has more projects they would like to do than hands to do them.
[23:38:21] <dstorrs> One of the biggest things that made google successful was that they got started right after the last tech boom. There was a huge glut of available talent in the Valley,
[23:38:32] <dstorrs> and they hired them all, even if they didn't have jobs.
[23:38:50] <ahmeij> I think to be succesful in the long run it is important to keep focussing on the core business, --> Google is a nice exception though
[23:38:54] <dstorrs> That way, they weren't available to the competition and Google could expand fast
[23:40:02] <dstorrs> If you're a business with a future, your "core business" generally has a lot of room in it. I run a startup, and I could easily find work for 50 good devs / sysadmins / designers / etc
[23:40:57] <dstorrs> Sadly, we are not currently hiring, due to resource limits
[23:41:31] <ahmeij> I run one too, and while we could have a few more, I'd think 5 would be sufficient :) however once we have 5 our goals would probably expand faster etc
[23:42:27] <ahmeij> Same issue, we need a bit more cash to grow the software a bit faster to be more actractive to VC's, and that completes the chicken and egg problem
[23:51:03] <mmlac> Hello everyone. I am struggling with loading my custom Mongoid::MyMongoidModule with rails standard autoloading, so I was wondering what is the best practice to extend Mongoid? Right now I create lib/mongoid/mymongoidmodel.rb
[23:51:58] <mmlac> and in that, standard module Mongoid module MyMongoidModule end end. Normal autoload fails (Expected to define MyMongoidModule at load_missing_constant)