[06:07:21] <dandv> I have ensured indexes on the fields ct and companyIdsDetected but db.content.find({ct: 'article', companyIdsDetected: {$exists: true}}).explain(); is still way slow (6+ seconds)
[06:08:23] <dandv> I have a composite index on the two as well. What's going on?
[06:12:58] <joannac> also, I think you'd get better results from indexing on {ct:1, companyIdsDetected:1}
[06:13:21] <joannac> also, explain(true) and hint the index you want it to pick
[06:15:36] <dandv> I believe I already have that index? Line 93
[06:16:32] <dandv> explain(true) is at http://pastebin.com/GETyNRAG
[06:16:55] <dandv> Is there some more user-friendly interpreter of this explain output? Can't do much with it as it is.
[06:17:21] <joannac> nope, you have the fields in the opposite order
[06:17:30] <joannac> explain(true) while hinting the index?
[06:17:48] <joannac> it's not even considering that index when evaluating query plans...
[06:18:37] <dandv> does the order of the fields in an index matter? I created that index like this: db.content.ensureIndex({ct: 1, companyIdsDetected: 1});
[06:42:59] <dandv> Do we have a mongo index usage bug then? Maybe for $exists?
[06:44:11] <joannac> how many results do you get just for db.content.find({ct: 'article'})
[06:45:49] <dandv> we apparently do have that bug. db.content.find({ct: 'article', companyIdsDetected: {$in: ['4wti8J4enxGWwL68u']}}).hint({ct: 1, companyIdsDetected: 1}).explain(true); is instant
[06:47:50] <dandv> "This fix was made to the new query framework introduced in 2.6." and I' on 2.4
[06:57:26] <nonrecursive> Is there an alternative to querying nested documents using the dot notation? I'm trying to build queries programmatically and the dot notation seems to require string formatting (which seems strange).
[07:57:04] <joannac> I didn't think there was a $distinct aggregation operator?
[08:00:53] <torgeir> Can a db.collection.update() with { multi: true } not update _several_ fields of all documents using $set in one query on mongodb 2.4.8?
[08:01:40] <torgeir> does it have to be one field, in the $set: { here: 1 }, when there's a multi: true?
[08:02:20] <nfroidure_> i found the answer here http://stackoverflow.com/questions/18501064/mongodb-aggregation-counting-distinct-fields
[08:02:35] <nfroidure_> the distinct command messed me out
[08:04:21] <salty-horse> hey. 2.6 has a mergeChunks command to merge/remove empty chunks. If I have 2.4, is there a way to remove empty chunks?
[09:04:29] <ManicQin> Hello everybody , I'm getting "exception: Exceeded memory limit for $group, but didn't allow external sort. Pass allowDiskUse:true to opt in." even when I pass allowDiskUse:true.
[09:17:45] <rspijker> torgeir: you can just put multiple fields in the $set. $set:{field1:”x”, field2:”y”} will work fine. Also with multi:true. multi does not indicate updating multiple fields though, it indicates updaint multiple documents. The first field of an update is the query part. If you don’t specify multi:true only a single document matching the query will be updated. If you specify multi:true, all documents matching the query will be
[09:31:30] <ManicQin> rspijker: it's a collection of ~600,000 documents, and the fields that I $push are basically all there is. I made heavier queries (I think)
[09:33:15] <rspijker> maybe it’s the wrapper function somehow...
[09:34:05] <rspijker> could you try it like this: http://pastebin.com/vVWvAXvx
[09:34:18] <ManicQin> rspijker: I'm running it in shell ... (robomongo to be exact)
[09:34:46] <rspijker> hmmmm… not sure robomongo has updated their shell to be fully 2.6 compliant
[09:35:04] <rspijker> and I think passing the allowDiskUse like that to the helper was introduced in 2.6
[11:15:58] <Left_Turn> youre right kees_ .. got it on 1st link
[11:18:26] <rspijker> there are multiple packages out there Left_Turn. The “old” (<= 2.4) ones are marked with 10gen-… and the new ones (2.6+) are mongodb.org
[12:20:22] <Left_Turn> anyone familiar with deployd ? i cant get the dashboard up.. I doubt this is relevant to mongodb
[12:22:44] <bensons> hi there, we are using a sharded cluster and want to backup the data. i know we can stop balancer and dump each individual shard, but whats bad about dumping directly via mongos? i havent found anything about that...
[12:36:40] <adamcom> no real difference to dumping each one, except that it will take longer - mongodump will walk the _id index to do its dump, single threaded, so it will only hit one shard at a time, effectively dumping them serially
[12:37:43] <adamcom> generally far quicker to use something else (filesystem snapshot, or similar) for backups, unless you have quite a small data set
[12:38:06] <adamcom> or, of course, MMS Backup - but you have to pay for that
[12:39:03] <bensons> adamcom: yes snapshots would be an idea but for that i would need to shutdown the instance in order to be consistent(?) and to do so i will need to change the write concern inside my app and so on and so forth... so mongos is the smallest hassle
[12:39:22] <bensons> ok it might not be 100% consistent, but thats ok for us..
[12:40:59] <adamcom> ignoring the cluster wide consistency point (only really possible with MMS backup or if you stop writes) - at the shard/replica set level: as long as your snapshot is point-in-time, and includes the journal, there are no special steps needed
[12:42:06] <adamcom> it's basically the equivalent of restarting after a crash (journal will be replayed for consistency)
[12:42:32] <bensons> adamcom: and from available linux filesystems/volume managers - the only one able to do snapshots is lvm and thats at least from my experience not 100%
[12:43:01] <bensons> zfs would be neat but for that i would need to migrate to slowlaris or fbsd :)
[12:43:19] <bensons> and placing mongo db on a backend fc storage is out of scope for us..
[12:44:49] <adamcom> you can take extra steps to make things less susceptible to errors - do the snapshot on a secondary - shut down or fsyncLonck that secondary (one in each shard at the same time) and then snapshot that - to be even more paranoid about it you could combine it with xfs_freeze or similar - it's really about how far you want to go to guarantee a good snapshot
[12:45:19] <adamcom> I've seen the LVM snapshot (and EBS, and NetApp etc.) work as long as they included the journal
[12:46:29] <adamcom> the real issue is getting cross-cluster consistency because you need a backup from multiple shards, and you need a corresponding backup of the cluster meta data from the config servers
[12:47:58] <adamcom> and, of course, the key to any backup solution is to regularly do restores and make sure they work, even when not needed - periodically re-seeding a secondary from backup snapshots within the oplog window would be my recommendation if using snapshots
[12:48:41] <adamcom> re-seed, make sure it catches up, compare data with other nodes for consistency
[12:55:10] <bensons> ok thanks adamcom, think we will stick to mongos backup for the beginning.. should be enough for us, but thanks a lot for your thoughts :)
[12:58:41] <adamcom> oh, one last thing - whether you are using mongodump directly, or via mongos (assuming you have secondaries - it defaults to seconary reads) then you are likely to dump orphaned documents with the mongodump approach
[12:59:07] <adamcom> so, you may want to look into http://docs.mongodb.org/manual/reference/command/cleanupOrphaned/
[13:08:19] <bensons> adamcom: hm i dont get it 100%. when i dump + restore via mongos (assume sharding is already enabled with the downside of pretty slow restore speed), how can i get orphaned documents or to be more precise, how should orphaned documents end up on my shards, as the balancer during restore already balanced them?
[13:09:06] <bensons> i know this is far away from being optimal especially because of the slow restore speed, but the backup should be something like a last resort for us anyway...
[13:13:00] <rspijker> as long as you can guarantee that your dump process completes before the oplog window runs out
[13:13:25] <rspijker> that was an issue for us, so we went with FS snapshots
[13:14:58] <bensons> rspijker: so you did a restore while data was still inserted?
[13:15:34] <bensons> because as mentioned, for us a restore would just take place in case everything is already somewhat los and the app is stopped + no data will be modified during the restore
[13:17:57] <rspijker> that will lock your DB during the length of the dump :/
[13:18:16] <bensons> mongodump via mongos will lock my db?
[13:20:13] <rspijker> let me check into it… I know we decided against it for some reason, now I’m no longer 100% sure...
[13:21:06] <bensons> rspijker: would be nice :) i did not (yet) experience any db locks and we do have dbs > 50gb that take quite a while to get backed up...
[13:24:18] <rspijker> bensons: my mistake, it’s not locking. Which makes sense, since it’s basically just a query
[13:25:03] <bensons> rspijker: yeah :) it is not consistent or lets say - far away from being consistent but at least better than nothing..
[13:25:24] <rspijker> you can get fairly close with --oplog
[13:25:51] <rspijker> the problem is, syncing them across shards and making sure they are consistent amongst each other
[13:48:31] <salty-horse> can a 2.4 mongos connect to a 2.6 database?
[13:51:34] <saml> it's better to keep versions the same. i got segfault
[13:54:10] <rspijker> I recall there being a warning in the upgrade docs that said not to upgrade any of the mongod instances before all of the mongos instances were upgraded
[13:54:25] <rspijker> so I’d guess there could be some issues in connecting a 2.6 mongos to a <2.6 mongod
[14:09:34] <ep1032> I have a c# windows service that is connecting to my mongo instance. I have a log file with thousands of instances of the error message: Unable to connect in the specified timeframe of '00:00:00'.
[14:09:47] <ep1032> I found the exact driver code that is throwing the exception
[14:10:04] <ep1032> if you ctrl+f that page for the string "specified timeframe"
[14:10:25] <ep1032> but have no idea why my service will run fine for a few hours, and then suddenly just start throwing that message every following time
[15:14:39] <Nostalgeek> Hello. I just apt-get'ed mongodb on Ubuntu 14.04. Authentication seems to be turned off by default allowing for admin login without password. How do I secure my MongoDB by disabling this? I already created a new user with db.addUser but I want to disable / set an admin password?
[15:15:08] <cheeser> i think localhost might always be allowed in some cases.
[15:15:33] <Nostalgeek> cheeser, Humm good hint. Yeah, I'm logging in locally.
[15:28:37] <adamcom> once you add a user in later versions the localhost exception is disabled
[15:29:08] <adamcom> so, always add a userAdmin first, or you can end up with no permissions to add more
[15:29:44] <adamcom> and, you can explicitly disable the localhost exception, if you don't mind potentially locking yourself out and having to restart the process if you do :)
[15:30:30] <adamcom> this tutorial walks it through well: http://docs.mongodb.org/manual/tutorial/enable-authentication/
[15:32:14] <adamcom> cheeser: just make sure you've added and are using the MongoDB packages rather than the Ubuntu ones or you will be on 2.4 forever (or until you do a distro upgrade from 14.04)
[15:32:43] <cheeser> i *do* need to upgrade actually
[15:32:51] <cheeser> i'll wait until i'm back from my trip, though.
[15:37:20] <Nostalgeek> cheeser, adamcom: I did create a user but was still able to login without authentication. Could it be that by default auth=false in mongodb.conf (at least under Ubuntu 14.04 MongoDB 2.4). I changed auth=yes in mongodb.conf and restarted Mongo and it seems to be working now.
[15:37:46] <Nostalgeek> i mean auth=true of course
[16:42:31] <uehtesham90> hello, i wanted to know what is the best way to migrate a mongodb database from one server to another?? i have looked at two options: 1) using mongidump and mongorestores 2) copying the database files from dpath directory e.g. from /data/db and pasting them into the datapath in the new server
[17:19:39] <netQt> @uehtesham90 it's better to use mongodump and then use mongorestore
[17:30:15] <zhodge> evaluating options for a log store, and mongo has the plus of storing documents, which would be very convenient for my app (node.js)
[17:30:53] <zhodge> but I’m not savvy enough with my database technical knowledge to determine whether a lot of the Internet opinions discouraging mongo are worth heeding or not
[17:31:18] <cheeser> most are either old or they have an axe to grind
[17:31:59] <zhodge> it’d be nice to be able to use SQL to query against a few log tables, but setting those up seems unnecessary compared to just tossing a few objects at a collection
[17:32:10] <zhodge> cheeser: that’s what it can feel like yeah
[17:32:59] <zhodge> I’m already running a postgres server and I thought about using that for logging but that seemed like a terrible idea since the burden of serving app data and handling log writes would be shared by one server
[17:35:03] <cheeser> well, writes will go to one mongod, too, unless you shard.
[17:36:19] <zhodge> though it would avoid burdening the primary db
[17:40:57] <rspijker> zhodge: what kind of volume are we talking about here?
[17:58:41] <mango_> Hi, I'm starting MongoDB training next week M102 and M202
[18:15:15] <zhodge> more seriously, I’m not too sure as of yet, but my goal is to have a reasonable logging solution for a relatively small site in terms of traffic
[18:17:37] <Riobe> obiwahn, Maybe? I tried a fix last night to grub stuff that has helped some people that have my motherboard. .
[18:18:00] <rspijker> zhodge: if the volume isn’t huge. Which it doesn’t sound like it is, it;s not going to matter that much… Just pick what you are most familiar and comfortable with.
[18:18:02] <Riobe> obiwahn, But I won't know if it worked till it doesn't break for a while. So fingers crossed. Thanks for remembering and asking. :D That's awesome.
[18:18:17] <rspijker> Unless you have an ulterioir motive, like wanting to learn mongo :)
[18:19:15] <zhodge> rspijker: always looking to learn new things and mongo isn’t exempt from being an eligible candidate ;)
[18:20:10] <rspijker> mango_: M202 has some exercises with sharded clusters, they can require a bit of free space. I did it with a 20GB ubuntu VM and that was usually fine. Had to remove some of the older weeks a few times, though
[18:20:55] <rspijker> zhodge: then mongo is as good a choice as any for this :)
[18:22:34] <mango_> rspijker:thanks for the info, ok, I'll see if I increase the space just to avoid cleaning space.
[18:34:11] <mango_> cheeser: ok thanks, I'm starting from scratch now anyway with a larger disk drive
[18:34:35] <zhodge> rspijker: which foregos the benefit of “schemaless” right? haha
[18:35:06] <rspijker> the benefit of schemaless is that different documents can have different fields in the same collection
[18:35:29] <zhodge> why yes that seems a lot more reasonable
[18:35:58] <zhodge> that’s a familiar pain point coming from relational considering that eventually one fat table with a lot of nullable fields gets a bit much
[18:36:17] <rspijker> still important to think about how you organize your data though
[18:36:17] <rspijker> standard example is a blog with comments on each post
[18:36:31] <rspijker> in sql you’d have a post and comment table and they’d be joined by ids
[18:36:38] <cheeser> you wouldn't embed comments on a post in either relational or mongo.
[18:37:20] <mango_> rspijker: collection = table? in relation world?
[18:37:31] <cheeser> document size constraints. documents moves when they grow. etc.
[18:39:06] <rspijker> 16MB… that’s a LOT of comments. As far as the moving goes, it really depends on your access patterns whether that’s worse than having to do multiple queries to get some info
[18:39:27] <rspijker> mango_: yeah, close enoguh at least
[18:40:23] <mango_> rspijker: ok, curious to see that in practice.
[18:40:40] <cheeser> rspijker: that 16M includes the post and all its metadata, the fieldnames for each and then all the comment documents with all their metadata and fieldnames.
[18:41:19] <cheeser> you'll hit it sooner than you think. i've worked on a large CMS and it's not that uncommon to have hundreds of comments on a popular blog site.
[18:46:44] <rspijker> We used to keep a history on some entities in our DB, not even a log of the full document, but only who changed it, and when did they change it. We hit the limit with that. So I realize that there are limits and that they can be hit quicker than you might think. The comment example is not that bad though…
[18:49:19] <rspijker> If we set aside 1MB for the post. Which is a lot for plain text, even including metadata. You can still have 1000 comments of 15000 characters each, which should be plenty for any comment
[18:50:15] <cheeser> well, freel free to go that route. it's suboptimal at the least.
[19:09:05] <netQt> hi guys, i have a data loss problem by using replica sets. at some point i have to make my primary to step down, and it takes some time to select new primary, and that where i lose data