[00:23:29] <quuxman> I'm struggling with getting my queries to use indexes at the moment
[00:24:44] <quuxman> I'm also having issues with a recent upgrade from 2.4 to 2.6, and failing to build indexes at all. There's the failIndexKeyTooLong command flag, but they don't expose that (MongoHQ / Compose.io). Is there a way to set it from the Mongo client?
[00:27:30] <quuxman> now I just have to figure out how to make the mongo client connect. For some reason using the exact same connection string I'm using in pymongo fails
[00:29:51] <quuxman> can the mongo client connect to a replica set? My connection string has multiple servers
[00:36:02] <quuxman> er, sorry "exception: HostAndPort: bad config string"
[00:39:47] <quuxman> AH ok, I think I've run into this before, that for some ridiculous reason the mongo shell client doesn't use standard connection strings at all
[00:40:03] <quuxman> I imagine I can use runCommand with pymongo
[00:40:58] <quuxman> should've done that to start with
[00:43:03] <quuxman> I managed to authenticate with the mongo shell, and now I see: "errmsg" : "setParameter may only be run against the admin database.", but I think I can figure that out ;-)
[00:55:58] <Terabyte> hey, I have a document that exists, and I want to add to a subportion of it (in this example, I want to add another sub element to the array of subValues belonging to itemid 4 belonging to myId 1. Still trying to get relational world out of my head, what's the "doc" way of doing this? https://gist.github.com/danielburrell/8249094d5008c9f62fe1
[00:57:58] <Terabyte> A dumb way is to select the document I want, turn it to a pojo, add the item to my pojo, and write the document back (overwriting the old/deleting the original). but I'm fairly sure update() exists for a reason...
[00:59:12] <Terabyte> thanks i'll take a look at that
[01:16:07] <quuxman> When you create an index, the keys are specified in a dict, so I assume there is no order enforced to the index?
[01:16:52] <quuxman> (like in mysql (at least when I used it), where different queries can use an index (foo, bar) than (bar, foo)?
[01:28:12] <lpghatguy> Is there a better way to force a case-insensitive unique index than storing a duplicate normalized case version with its own unique index?
[01:28:23] <lpghatguy> err, that's probably more of a Mongoose-specific question
[01:32:35] <GothAlice> quuxman: No, BSON dictionaries are ordered. Unfortunately not all languages support this feature, so some drivers, like pymongo, have custom datatypes to handle it.
[01:33:54] <GothAlice> quuxman: This means that an index on {company: 1, date: -1} will only optimize queries on (company) and (company, date). Luckily the query optimizer usually figures out to re-arrange incoming queries (if possible), but sometimes you need to give MongoDB a hint. Thus: http://docs.mongodb.org/manual/reference/operator/meta/hint/
[01:35:48] <GothAlice> (There are some nifty optimizations around having frequently accessed data accessible entirely within index data. In an $explain you can see if you're getting this optimization when indexOnly is true.)
[01:39:08] <hicker> Hi everyone, can anyone tell me why I get { __v: 0, _id: 5452dc48d687bad849d70816 } after executing create()? http://pastie.org/9686889
[01:43:12] <GothAlice> hicker: __v looks like a version number, and drivers may return the inserted document's ID upon insert. Unfortunately the documentation for mongoose is bad and should feel bad for not specifying the return value more explicitly than just "<Promise>".
[01:46:44] <hicker> Oh, hmm. I was expecting there to be a customerName node in the document after executing Customer.create({ customerName: 'John' }, ... Do I need to execute something after create()?
[02:44:23] <Terabyte> I wanted to generate a unique id for a sub component of my document (what used to be autoincrement in relational world), it doesn't have to be a number, but it does have to be a scalar value. Right now I'm looking at the output, and I'm seeing a details json object with timestamps etc. is there anyway to get a single value?
[02:44:51] <Terabyte> I'm using new ObjectId() to generate the ID at the moment (org.bson)
[02:45:20] <rkgarcia> Terabyte, that's right, use the ObjectId for generate new ID's
[02:45:51] <Terabyte> right, but i'd like my ids to be a number, id:1, id:2..3..4. not id{metadata:12387123, timestamp:129724, version:923928} etc
[02:46:27] <GothAlice> Terabyte: How do you plan on keeping the numbers sequential? Are duplicates allowed?
[02:46:29] <rkgarcia> Terabyte, then you need to use a counter collection
[02:46:57] <Terabyte> GothAlice duplicates are not allowed,
[02:47:00] <GothAlice> Terabyte: Counters like that add an extra layer of locking (to be safe), and are such a problem Twitter created an entire service, separately scalable from the database, just to handle it.
[02:47:21] <GothAlice> (And are a big reason why MongoDB uses a better structure.)
[02:47:54] <GothAlice> https://github.com/bravecollective/forums/blob/develop/brave/forums/component/thread/model.py#L100-L110 < Even I generate ObjectIds for subdocuments; replies to a thread on a forum, in this case. The IDs are used frequently to look up specific comments, edit them, etc.
[02:49:31] <Terabyte> if people use that meta data when making changes then that's ok
[02:49:51] <GothAlice> https://github.com/bravecollective/forums/blob/develop/brave/forums/component/thread/model.py#L135-L161 < This does some argument list mangling, but yes, it boils down to finding the thread containing the reply (comment) with the appropriate ID, then updating a la "comments.$.foo"
[02:51:57] <GothAlice> Terabyte: http://cl.ly/image/1U3z3h1q0r2T < When I emit HTML, I include important details in a way that is easily accessible from JavaScript. :)
[02:52:23] <GothAlice> (Thus most of my JavaScript is completely generic throughout my front-end.)
[02:55:48] <raar> Hey guys, I'm getting a bunch of errors in my web app like "71: Failed to connect to: mongo-example:27017: Remote server has closed the connection in /var/www/example.php on line X".
[02:55:58] <raar> this only happens sometimes, most connections are fine
[02:56:26] <raar> I'm not seeing many slow queries, the disk i/o should be sufficient, I don't really know what the cause could be. I'm using mongodb 2.6.5
[02:56:36] <raar> my ulimit (hard and soft limits) is 64000
[02:56:50] <raar> I get between 0 and 500 update queries per second.. (and between 0 and 20 inserts)
[02:57:15] <raar> I normally don't get more than 35 simultaneous connections (have 51168 available)
[02:57:35] <raar> I don't really know how to troubleshoot it.. any ideas/suggestions?
[02:58:23] <raar> (also, it seems worse after restarting mongod.. more application errors, but I only _think_ this is the case [don't want to restart mongo again to see if I get more problems in production])
[02:58:43] <raar> I have 500 provisioned iops (aws volume) on an ssd (io1)
[03:58:46] <Terabyte> GothAlice see I'm having trouble now deserializing...
[03:59:05] <Terabyte> web browser sends the ID of the document, but there's too many fields for the constructor of ObjectId
[03:59:36] <Terabyte> so the json parser is having issues, at first i thought it was a naming convention issue, but then i look at the constructor, and there's only 3 values, vs the id which contains 7
[04:39:06] <joannac> Ceseuron: I suggest you ask your actual question instead of asking meta-questions
[04:39:30] <Ceseuron> I was trying to avoid being rude and just filling up the channel. Sorry.
[04:39:48] <Ceseuron> Basically it goes like this...we're draining replicasets to migrate to new physical MongoDB servers
[04:40:09] <Ceseuron> The process is getting hung up on NUMEROUS duplicate key index errors
[04:40:40] <Ceseuron> And I basically have to sit here tailing the MongoDB log. When the duplicate error comes up, I have to do a manual removal of the duplicate document
[04:41:03] <Ceseuron> I'd like to know if there's a way I can scan the collection prior to shard drain for duplicate key indexes.
[04:41:34] <joannac> you are draining a replica set with indexes
[04:41:44] <joannac> you insert all of the data into a new server with the same indexes
[04:41:52] <joannac> and you get duplicate key exceptions?
[04:42:00] <joannac> where are your duplicates coming from?
[04:42:26] <Ceseuron> errMsg: E11000 duplicate key error index
[04:43:16] <Ceseuron> The duplicates are the result of a long since past incident that never got corrected.
[04:43:37] <joannac> why don't you change your code to do upserts instead?
[04:44:29] <Ceseuron> I am not a developer. This is not my design. I'm the engineer responsible for draining these replicasets so the VMs that were running them can be removed and the entire collection consolidated down to physical systems.
[04:45:40] <Ceseuron> It all started out innocently enough, from what I understand. MongoDB on a few VMs to test feasability before production use.
[04:45:54] <Ceseuron> That turned into eighty virtual machines.
[04:49:38] <Ceseuron> I run out a couple of SSH sessions onto the replicaset's primary member. On one, I tail the mongodb log in realtime (-f) and grep for E11000
[04:50:13] <Ceseuron> When one comes up, I use the mongod shell on the other to do a db.members.app.remove({blablablabla})
[04:50:22] <Ceseuron> And the drain continues until the next error comes up
[04:54:49] <Ceseuron> Hmm...So when we had 40 shards going on 80 VMs, before I was actually working here, they added 4 shards running on some beefy servers with a hefty amount of storage and RAM.
[04:55:00] <Ceseuron> Then when I started, they handed the project to me.
[04:55:21] <Ceseuron> I really hate to say it, sir, but you probably know more than I do about Mongo at this point.
[04:56:24] <Ceseuron> I know enough to know that the duplicates have to be removed, otherwise Mongo simply fills up the disk space on the shard being moved endlessly retrying the move on a duplicate document.
[04:57:05] <Ceseuron> I believe the consensus from the data team and the devs was "Mongo will simply move data to remaining shards until only the physical shards are left as destinations".
[04:57:32] <Ceseuron> If there was a way to define "who wins", you'll be the first person I've talked to that knows it.
[04:59:34] <joannac> Would you be just as happy removing it from A or B?
[05:00:01] <Ceseuron> I would be just as happy to make it so that the shard drain doesn't have me sitting here manually removing individual documents that could possibly number into the thouands.
[05:02:08] <Ceseuron> I would prefer, if possible, to get a list of documents that have duplicate keys in the collection if possible.
[05:02:17] <Ceseuron> And simply remove those en masse.
[05:03:00] <Ceseuron> this is a production system. It is literally in production now with active users on the application that depends on it. I will probably get shot or at least fired if I cause a service interrupt.
[05:03:30] <joannac> how would you propose to do that? How would you even tell if there was more than one document with the same set of keys?
[05:04:39] <Ceseuron> the "ownerid" should be unique for every document. Aggregate all documents in the collection and list out any that hit more than once for the same key.
[05:05:16] <Ceseuron> In short, if the ownerid field is the same for more than one document, that's a duplicate and should be removed.
[05:14:06] <Ceseuron> I don't believe we talk to the shards directly.
[05:14:20] <Ceseuron> There are mongos routers that handle all that.
[05:15:46] <joannac> then I don't know how you got dupes
[05:16:46] <Ceseuron> I'm not entirely sure either. All I know is that there was an incident in days long since past where they ran into a problem with two primaries in the same replicaset.
[05:17:46] <Ceseuron> That's happening once this is done
[05:17:55] <joannac> Okay, so, even if there were 2 nodes in the same replica set, you shouldn't get dupes
[05:18:44] <Ceseuron> I'm not sure what the configuration was before. Now it's a primary, secondary, and an arbiter for what I assume is a quorum role.
[05:19:04] <Ceseuron> It was never supposed to grow in VM land
[05:19:46] <Ceseuron> When I got the project, there were 40 shards on 80 VMs. Before that, I believe it was more like 70 or 80 shards.
[05:19:59] <Ceseuron> And an entire ESXi cluster. driving it.
[05:20:44] <Ceseuron> Had this been my project from the get go, it would have never made it into VMs for the primary and secondary. I would have demanded physical assets be used.
[05:26:30] <joannac> I think I have a similar script to do this kind of thing
[05:27:36] <joannac> I don't have the resources to modify it for your environment though
[05:27:51] <joannac> Maybe post on GG or SO and see if someone has the resourcing for it?
[05:28:43] <joannac> This is the kind of thing you would get with a support contract with MongoDB :)
[05:54:57] <mgeorge> i think i'll just use a seperate db
[05:55:33] <mgeorge> building out a session.class.inc in php so it stores all session data in mongodb
[05:55:41] <mgeorge> web servers will be load balanced behind an F5
[06:06:31] <quuxman> If I have a query like so: {foo:'toggle1', bar:'toggle2', baz:'thing'}, sort=[('updated',-1)], and I have an index of {baz:1, updated:-1}, why can't this query use my index?
[06:06:53] <quuxman> I don't want to add foo and bar into my index, because they only have a handful of values
[06:07:25] <quuxman> Should I create an {{baz:1, foo:1, bar:1, updated:-1} index?
[06:08:04] <joannac> you want an index on {updated: -1, baz:1}
[06:08:59] <quuxman> but if I take out the other clauses, it uses my existing index
[06:16:11] <quuxman> kk, I'll connect with mongo client. It's easier for me to use my DB code because I'm used to it
[06:16:34] <joannac> quuxman: there's no sort there
[06:17:11] <joannac> also I want explain(true), not explain()
[06:18:26] <quuxman> joannac: the sort is added by that function
[06:19:37] <quuxman> oh godamn, it's too late. stupid mistake
[06:20:33] <quuxman> allright, onto the next entry in the slow query log :-/
[06:24:25] <quuxman> OK, enough with the specific examples. I need to understand the theory of how Mongo uses indexes with $and and $or
[06:27:36] <quuxman> say I have a clause like: {$or: [{foo: 'blah'}, {bar: 'blahblah'}, {small_set:2, baz:{$in: [ID1, ID2, ..., ID10]} ]}
[06:28:29] <quuxman> and I have 3 indexes that start with foo, bar, and baz. Should this be able to use these 3 indexes? (my test says no)
[06:29:30] <quuxman> I would expect the planner to do three separate queries for each clause in the $or, stick them together, and then sort (on updated, which wouldn't use an index)
[06:31:47] <quuxman> hm, that's actually what it's doing, based on what I can understand from the explain, it's just scanning far more rows than I'd expect
[06:31:57] <quuxman> I should probably create a foo_bar_baz index?
[06:34:31] <quuxman> wow, using .explain(true) with the shell gives me way more information than with pymongo.
[06:36:24] <quuxman> would creating a foo_bar_baz index not help at all, because it's an $or clause?
[06:36:37] <quuxman> I'm fairly certain that's what I'd want to do if it was $and
[06:43:36] <quuxman> I'm reading about index intersection which is straightforward to understand, but I don't know whether it applies to $or
[13:54:28] <safsoft> the collection contains about 2 millions on documents, and indexes are placed correctly, so we have executed a query with .limit(10000), this query will return ~4800 documents and freeze searching, and after long time it returns the last part of the result
[13:55:42] <safsoft> If we try the .count() on this query is freeze completely, is there a way to accelerate at least the count ?
[13:56:22] <safsoft> Please notice, that we execute the query directly on the console in a .js file
[13:58:35] <Salyangoz> are there any db/collection naming conventions in mongodb
[13:58:50] <Salyangoz> e.g. I use pep-8 for naming conventions in python
[14:01:40] <hicker_> Mongoose/Express question here: Why is a blank document created after executing Customer.create({ customerName: 'John' })? http://pastie.org/9686889
[14:08:33] <Terabyte> I have a document I'm persisting to mongodb, the document contains subsections which I might want to query for, and update. Do these subsections count as documents in their own right?
[14:10:36] <Terabyte> When modelling I had 2 classes, parent object modelling the person, and a list of child objects. I gave the child objects an _id value, and told mongo via jongo @Id, @ObjectId annotation that these needed their ID to be generated, but for some reason mongodb only generates the parent Id...
[14:17:06] <tscanausa> safsoft: if you are using the correct indexes then there is nothing more one can do, other then hardware performance.
[14:17:30] <Terabyte> sorry, I still don't understand how one can identify a subdocument without an id, even if the parentid is known that doesn't allow you to tell the server "i updated 'that' one"
[14:19:24] <cheeser> safsoft: you've run an explain to make sure you're using the index you think you are?
[14:19:54] <tscanausa> Terabyte: I like to think of documents in mongo as a json hash that has an pointer to it ( aka the id ) and such how would I normally update a sub element in any json hash?
[14:20:01] <cheeser> Terabyte: well, subdocuments would need *some* kind of unique identifier. a mongodb/driver managed _id just isn't it in this case.
[14:23:41] <cheeser> Terabyte: unless there's some preexisting unique property
[14:25:59] <Terabyte> yeah the only way that would be possible is if on serialisation/deserialisation the elements knew their own position in the array.
[14:26:30] <Terabyte> even that's a pain because user sorts locally and everything's broken
[14:27:06] <Terabyte> see, originally I was calling new ObjectId() for every subdocument, but then there were issues serializing because ObjectId() doesn't play nicely with jackson.
[14:27:15] <cheeser> positionally dependent updates are usually a code smell
[14:28:01] <Terabyte> agreed, my relational model had an autoincrement (sub documents had their own table), so this was avoided in this way.
[14:28:31] <tscanausa> sub documents could actually be a different collection
[14:28:59] <Terabyte> yeah, i thought that by doing that I was missing out on the doc way of doing things
[14:29:45] <tscanausa> its all about the trade offs.
[14:32:12] <smik> How to query 'man.property[0].size' > 300 ?
[14:36:36] <tscanausa> can or used to rearrange arrays
[14:41:04] <Terabyte> cheeser so I've decided that on create I'm going to generate a UUID (using randomUUID), store that as a string, shouldn't be a problem for serialization that way. any thoughts? (performance or uniqueness i guess would be the only things)
[14:41:15] <Terabyte> (generate a UUID in java that is)
[14:44:55] <cheeser> ewww. creating an ObjectId is probably faster. ;)
[15:14:38] <Terabyte> hey, say I have {docid:1 array:[{arrayid:1, subArray:[{},{}]}], how do I push onto the subarray belonging to arrayid1 of docid1?
[15:16:22] <Terabyte> found the $push feature but not sure how to describe an 'xpath-esque' {$push:array(arrayId=1).subarray{#}}
[19:14:08] <nicolas_leonidas> is there a way to write a query that would return true false values based on whether or not a key exists in each document?
[19:31:21] <skot> nicolas_leonidas: see the aggregation framework: http://docs.mongodb.org/manual/core/aggregation-introduction/
[19:32:50] <rekataletateta> I know, but I need more help
[20:15:53] <defk0n> does anyone here have experience with pymongo? Im trying to create a ObjectId() but when i try to insert it into db it says OverflowError: BSON can only handle up to 8-byte ints
[20:50:34] <nicolas_leonidas> hi I'm using $cond and I wanna know if an array is empty or not with $if here's what I have http://paste.debian.net/129641/
[20:56:26] <nicolas_leonidas> nm I had an extra bracket
[20:56:39] <nicolas_leonidas> typo cause of 99% of my problems I have fat fingers
[21:48:00] <flyingkiwi> hey guys! today we added a second shard to our cluster. everything seems good. buts thats the point - i really dont know
[21:48:53] <flyingkiwi> NICs bandwith seems to be low. like 3mbps. is there a way to get an overview about what has to be move and how the progress is?
[21:50:35] <flyingkiwi> the documentation states "it will take some time", buts thats fairly relative to me :)
[21:53:11] <joannac> flyingkiwi: check sh.status() and see if chunks are moving
[22:01:10] <IPhoton> hello I posted a question yesterday but it didn't get answered. I would like to know if it's better to use mongodb locally for development/practice, or to use a remote service like mongohq. From what I heard, it takes a lot of space to install mongodb, which is about 250mb, now this is 250mb per project or just for the installation?
[22:01:26] <IPhoton> If it is just a one time install, I think it's not such a big deal, right?
[22:04:32] <flyingkiwi> joannac, in terms of ops/s or...?
[22:04:47] <flyingkiwi> joannac, the servers are not overloaded in any case
[22:06:02] <flyingkiwi> don't get me wrong. I do not expect you saying "ou, this will exactly take 499 hours and 32 seconds". I just want to figure out a method to get an overvierw about the balancing status
[22:06:25] <joannac> see how long it takes to move a chunk, and extrapolate
[22:07:17] <flyingkiwi> and that sh.status() thingy looks quite okay. Can I get more details about whats going on with some special property like {beMegaVerboseAndTellMeHowLowItWillTake: true}?
[22:07:30] <IPhoton> well the installation on my drive says about that, joannac
[22:08:02] <joannac> well yeah, but that doesn't actually include database files and stuff, IPhoton
[22:08:03] <IPhoton> I have installed it on Windows and Linux, and the data/db folders I created are that size
[22:08:27] <joannac> flyingkiwi: nope, check the logs
[22:10:24] <IPhoton> so for learning you feel it's best to install locally, joannac?
[22:10:39] <IPhoton> I am trying to follow some of the MEAN stack tutorials
[22:12:33] <joannac> flyingkiwi: connect to a mongos in the mongo shell, use config; db.changelog.find({}, {what:1, time:1})
[22:13:59] <storyleforge> Hey there. I am trying to lay out the architecture for a scalable realtime analytics platform. Mongo seems like a great choice in terms of functionality. I'm looking at using heroku for this, which offers mongodb addons that can be kind of expensive in terms of storage space. I realize that this is a scaled solution and that's what I'm paying for vs a lot of space on a single instance. i'm thinking about moving older data into
[22:14:12] <storyleforge> be missing out on speed from a sql database, or just ease of use
[22:15:22] <joannac> you question cut out at "..moving older data into "