[00:29:29] <Guest26758> What's the best way to authenticate a replica set?
[00:35:40] <vaq> Hello, I am running a replica shard setup with 3 servers. I understand that if two of the servers are down the last will continue to stay secondary and not accept any writes. How to circumvate this so that it will ellect itself to master? Are there any potential problems in doing this, if possible?
[00:36:28] <retran> shards dont contain your full dataset, by definition
[00:36:45] <vaq> I am aware, thats why I run with replica.
[00:51:03] <vaq> and then I enabled sharding for the database
[00:51:05] <johann_> hello humans i would like to know how to know the length of an array stored in a collection
[00:51:16] <vaq> I run a router and configsvr on each node retran
[00:52:31] <retran> you're conflating the idea of shard and replica set
[00:53:03] <retran> anyway, secondaries wont self-elect if you configure them that way
[00:54:02] <retran> johann, the length of array of single record in collection?
[01:48:33] <bmw> Hi, Can anyone help me with authentication over a replica set?
[02:23:52] <MasterGberry> Trying to optimize some mongodb queries if possible. A description of what I have and am trying to optimize is here: http://stackoverflow.com/questions/17584055/trying-to-optimize-i-o-for-mongodb
[08:04:33] <Infin1ty> double_p, i mean, even db.serverStatus()
[08:04:49] <Infin1ty> double_p, it's now resyncing, running db.serverStatus() simply hangs
[08:05:01] <double_p> oh, internals should work. maybe just lags on CPU overload?
[08:05:22] <Infin1ty> double_p, i have no idea, it's weird, i can't even ssh to the box, first time it happens to me
[08:05:41] <Infin1ty> double_p, i do receive metrics from the server itself, i see the mongod and mongos there are working, disk space goes up on the data directory
[08:05:50] <Infin1ty> double_p, i can db.serverStatus() on the mongos that runs there
[08:07:05] <double_p> best guess, machine overloaded for now. indexing can be intense
[08:09:32] <Infin1ty> double_p, yes, but that's not explaining that you can't even login remotely and even not remotely
[08:09:48] <Infin1ty> double_p, this is a very strong server, 2 x E5-2670, 96GB RAM , SSDs
[08:29:13] <Infin1ty> double_p, i reset the server now, seems like buggy kernel patch reverting a minor version back seems to fix it
[08:29:23] <Infin1ty> double_p, even a simple command was hanging
[08:30:13] <untaken> is there anything I can do, to speed up a the sorting of a query? If I have a regex search with no ^ or $, the query takes 0.211389s, but as soon as I sort it by a different field, it takes 1.572594s. Is there anything I can do to speed this up? I have tried indexing, but it don't do nothing :(
[08:33:51] <untaken> Nodex: so if the field in the regex is A, and the field I want to order by is B, would it be db.users.ensureIndex( { A: 1, B: 1 } )
[08:35:49] <untaken> because if so, that doesn't work in my case
[08:36:14] <untaken> If I do a search ordering by A, it's fast! but if I order by B or C, it's dog slow
[08:40:15] <untaken> Nodex: I have just done a normal sort, I am actually using the perl module ->sort( A => 1)
[08:40:34] <untaken> Nodex: btw, I have also tried db.users.ensureIndex( { A: 1, B: -1 } )
[08:44:01] <untaken> ^ was this: I am actually using the perl module ->sort( B => 1)
[08:44:22] <untaken> I guess the sorting is an issue for me that mongodb can't sort ;/
[08:44:45] <untaken> I got quite a few records, but the sorting takes way too long
[08:44:47] <Nodex> you're going to need to pastebin an explain()
[08:56:43] <Nodex> I think the root cause of your problem is the lack of a prefix on the regex and thus wont use any index, I am not sure if you can hint the index to use for sorting or not hence the need for an explain()
[09:01:59] <traceroute> hey, is there a way to get a slave/shard to register back to the routing server? My shards will be behind a firewall and not directly accessible from the internet.
[09:02:04] <untaken> Nodex: ok thanks for that. I'll keep fiddling, and if I am still struggling will provide this information.
[10:55:53] <untaken> is there away to make .find({some query}).count() quicker?
[10:56:28] <ron> which version of mongo do you use?
[11:20:02] <untaken> Nodex: yea, I don't it's making much difference
[11:20:46] <Nodex> internal counts are fast because they use internal counters
[11:21:00] <Nodex> queried counts don't have that luxury
[11:23:58] <sikor_sxe> hello, i have a doc with a start & end Date field, now i'd like to ensure that the end Date is not before the start one. is there a mechanism in mongodb to do that (a special index or something) or do i have to care for that in the application logic?
[11:41:26] <untaken> Nodex: what is annoying, is if I was going to implement paging, for a some webpage, the first 50 is quick, but in order to find the total results, I need to do the ->count as I describe above, which then slows it all down. The count would be to confirm the total entries for the query. I know this didn't have to run for each page, but is annoyingly slow for the first page
[11:43:07] <ron> you can always use an external indexing system.
[11:44:03] <untaken> so two queries could go two different places?
[13:22:07] <nyov> ah yes, i remember reading that before ;) I suppose it got lost on me
[13:23:16] <traceroute> hey I was in earlier. I want to have mongo shards which have to sit behind a router (and are not directly accessible via the internet). From what I've read, my server on the 'cloud' has to register each shard. Is there a way to do it backwards so the shards register with the 'master' first?
[13:30:17] <mjburgess> does any one know how i can query for stats using the c driver?
[13:37:13] <nyov> is there a possibility to get an oplog without replication? or some possibility to emulate it?
[13:50:06] <dash_> hello, how to select from huge database every Nth record? (maybe Map Reduce)
[13:50:46] <mjburgess> ( the answer for future reference is to use runCommand() with the corresponding run_command in mongo.h )
[13:55:06] <Nodex> dash_ : that's an appside problem really
[13:55:30] <Nodex> I would construct something in my app and use skip and limit for it
[13:56:05] <ron> depends how big the skip is though, no?
[13:58:37] <dash_> Nodex: yep it should work, I will create counter on every item and than run query similar to db.inventory.find( { qty: { $mod: [ 4, 0 ] } } ) I am wondering about speed of this … what do you think?
[14:05:07] <dash_> 1000 documents every one 50 Bytes, so it would be 50 KB
[14:38:05] <EricL> Is there any way to reclaim space from Mongo that doesn't require many hours of down time waiting for aglobal write lock to be released?
[14:38:46] <EricL> Had a process go rogue and created a lot of records. Cleaned it up, now Mongo won't give back the space.
[14:40:29] <jmar777> EricL: are you running a replica set?
[14:41:17] <Nodex> you can compact but it's not advised on a live database
[14:42:32] <EricL> jmar777: Yes. I was hoping to not have to take down each node at a time to run the repair.
[14:42:42] <EricL> Especially because we are talking about hundreds of gigs.
[14:43:05] <EricL> I also don't have that much available space on the drives.
[14:45:40] <jmar777> EricL: hrmm... not sure what the best approach is to stop slave reads temporarily, but if you start with that you can run compact on the slaves, and then promote one of them to primary while you run compact on the (former) primary node
[14:46:05] <EricL> jmar777: Even if I do that, I still don't have enough drive space according to the Mongo docs.
[14:46:07] <jmar777> EricL: ... and then re-promote it to primary, if that makes sense
[14:47:48] <EricL> Line: Important compact only removes fragmentation from MongoDB data files and does not return any disk space to the operating system.
[14:54:35] <EricL> I've had this conversation with a bunch of the 10gen folks a few times. You nearly always get better bang for your buck with instance storage than PIOPS.
[14:54:46] <mrapple_> picking a shard key is quite nerve racking
[14:55:09] <EricL> mrapple_: In the worst case scenario, you can always hash _id since it's monotonic.
[14:55:24] <EricL> You'll still get solid distribution.
[14:55:45] <EricL> mrapple_: It depends on your queries.
[14:55:48] <jmar777> EricL: ya, but snapshots and situations like your current one are a much bigger pain
[14:56:06] <mrapple_> let's say i have a collection and the two most common queries on that collection use separate keys
[14:56:12] <mrapple_> is the shard key a combination of those keys?
[14:56:28] <EricL> jmar777: I've been using Mongo since 0.4. This is the first time I've run into this situation.
[14:56:41] <jmar777> EricL: with provisioned IOPs, EBS is a pretty decent solution now unless you're doing a lot of long sequential reads
[14:57:18] <jmar777> EricL: EBS can actually beat ephemeral for random reads... but more or less sucks for sequential
[14:57:26] <EricL> jmar777: PIOPS isn't bad, just expensive and not usually worth the extra money.
[14:57:41] <EricL> It was roughly equal for our random read tests and worse for sequential.
[14:58:58] <jmar777> EricL: that sounds right. anyway... back to the problem at hand, I don't really see any good options other than to export/import to a larger fs
[14:59:16] <kizzx2> mrapple_: you should design it so that shards have a uniform distribution
[14:59:32] <jmar777> EricL: although, granted, we're getting to the limit of my Mongo Ops knowledge, so I wouldn't take that as gospel
[14:59:48] <kizzx2> the bad extreme example is using date as a shard key
[15:00:19] <jmar777> EricL: FWIW, I'd recommend EBS for the new volume if it's performant enough for you... makes various operational scenarios MUCH easier to deal with
[15:00:24] <kizzx2> (unless date is uniformly accessed in your application)
[15:00:47] <EricL> jmar777: It;s not even close to performant enough.
[15:13:57] <buzzedword> morning guys. had a question re: performance of mongo through the PHP driver, seems pretty straightforward but i'm doing due dil
[15:14:52] <buzzedword> which one of these two options performs better/scales better? 1) performing a multiselect query 2) selecting multiple single items, joining through PHP
[15:15:47] <buzzedword> i currently have a very small sample set to test for, and this use case is going to grow substantially, so i'm not exactly seeing much of a difference with either side. at the same time, there's not a lot of load thrown against it either
[15:16:26] <Derick> buzzedword: depends on how much memory you want to put to it more than speed
[15:16:49] <Derick> and how complex your query is...
[15:17:23] <buzzedword> Derick: it's fairly straightforward. nothing complex at all, literally selecting by one column, that's on top of that is indexed already
[15:17:30] <buzzedword> so a very straightforward select
[15:18:14] <buzzedword> when you say how much memory i want to put to it, i'm assuming you're talking about joining via PHP? or am i mistaken here
[15:22:45] <buzzedword> we want this to be preferably as low memory as possible whilst performing as quickly as possible. i'd take low memory consumption over speed if it came down to it though
[15:23:53] <Derick> then do the complex query on mongodb... but you really ought to benchmark this
[15:27:57] <buzzedword> Derick: I am benchmarking-- they're currently breaking even with performance. my sample set isn't large enough
[15:28:46] <buzzedword> it's going to get massive, and i don't have access to that kind of scaffolding environment yet-- still being set up. Due dil :)
[15:29:33] <buzzedword> just wanted to get an idea where the wind blows on this particular issue
[15:39:41] <Nodex> It's the only map service I built the tool for
[15:40:38] <Nodex> turns out MongoDB is an awesome configuration store, I store the whole config if every part of the app inside mongodb
[15:41:08] <Nodex> builds all the queries for SOLR, maps mongodb to solr, there is nothing it can't do (that I've found) so far, and this is one complex app
[15:47:17] <drag> Hi. I have large DOM objects that I want to store to a mongodb. Can I use buffers to write to mongodb or do I have to wait for the entire DOM to be convereted to XML in memory before writing it out?
[15:48:04] <Nodex> you can't write in parts if that's what your asking
[15:48:16] <Nodex> writing yields a lock so nothing else can touch it
[15:53:31] <drag> That's what I though Nodex. Thanks.
[15:53:47] <drag> So there's no more elegant way to do this then just have lots of memory?
[15:54:25] <kizzx2> hey guys, mongostat is reporting 200% locked % and iostat is reoprting 99% util, but mongotop shows nothing
[15:54:36] <kizzx2> any pointers how to diagnose this?
[15:54:59] <drag> We are taking UIMA CAS objects and converting them to XMLs using an output stream. However, instead of writing it to a file, I want to write it to a mongodb, Nodex.
[15:56:16] <Nodex> I don't have a clue what a UIMA CAS object is sorry
[15:56:51] <drag> Nodex, just think of it as either really large XML documents or lots of medium size XML documents.
[15:57:51] <darkpassenger> is there such a thing as $not-exist if I want element that dont contain a certain field ?
[15:57:54] <drag> The issue is we are going to be processing lots of documents and I don't want to have to give the box absurd amounts of memory just to hold the objects before they are written out to the db.
[15:58:14] <Nodex> darkpassenger there is $not or $ne
[18:30:00] <idlemichael> does every mongodb document have an inherent created_at field?
[18:31:44] <darkPassenger> how come that this (http://pastebin.com/K9Pbf0un) returns me this : { _id: 51ddb907ae3267d6154a3e64, archive: 'true', nom: 'Girard', prenom: 'Maxime' }
[19:18:02] <revoohc> Is anyone here using chef to deploy MongoDB?
[19:20:18] <drag> Just in case someone here knows. Regarding Spring Data for mongo db. I have an output stream containing xml data that I want to write to MongoDB. The API specifies that I pass the store method an Input steam and a file name. 1) What is the purpose of that file name. Isn't MongoDB suppose to take care of all that for me? 2) Does the stream have to contain all the data that will be written or can it write as data comes?
[19:26:53] <awestwell> if you use the aggregate on a colleciton like follows does the skip and limit happen after the matches are processed or before { aggregate: "sample", pipeline: [ { $match: { value: "test" } }, { $match: { groups: { $in: [ "GroupA", "GroupD" ] } } }, { $skip: 39 }, { $limit: 40 } ] }
[19:38:14] <crudson> awestwell: it's in order you specify, but note http://docs.mongodb.org/manual/core/aggregation/#optimizing-performance
[19:53:03] <darkPassenger> what is wrong with that code ? http://pastebin.com/vLiLzKBF it keeps returning document who have field "archive" set to 1 ...
[19:53:12] <darkPassenger> its coffeescript using mongodb-native
[19:53:42] <darkPassenger> can i specify an ID and a condition in a find ?
[19:54:01] <darkPassenger> like get this ID if somefield has specified_value ?
[20:04:56] <starfly> mrapple: if you're going to use replica sets between the US and EU, then you have to either shard or not shard the whole configuration
[20:07:57] <darkPassenger> what is wrong with that code ? http://pastebin.com/vLiLzKBF it keeps returning document who have field "archive" set to 1 ...
[20:26:40] <leifw> in pymongo, how can I get a SON object back from a cursor? it looks like if I do "for o in coll.find()" I get python dicts in o, but I need them to be ordered
[20:29:27] <crudson> leifw: how do you want them to be ordered?
[20:29:50] <leifw> I just want to know the field ordering of the documents as they are stored
[20:30:22] <leifw> truth be told I'm trying to read the oplog, and in there, field order matters (for commands)
[20:30:57] <crudson> leifw: and enumerating them doesn't match, say, looking at the same document in mongo shell?
[20:34:43] <leifw> the ordering is the ordering that the python dict's hashtable gives
[20:34:51] <leifw> which is not the ordering on disk
[20:35:43] <leifw> there is a SON python class that basically is an ordered dictionary, for this exact reason, but I don't know how to get the cursor to give me one
[20:35:55] <leifw> considering slinking back, defeated, to the C++ driver
[20:51:01] <crudson> leifw: oplog documents use attributes ts, h, op, ns, o[, o2] - if you can't force pymongo to use ordered dict, just read attributes in a known order?
[20:51:50] <leifw> it's the embedded document in the o field
[20:53:19] <leifw> for commands, the object that's logged is the same object that gets sent over the wire for the command, where the first field is always the name of the command
[20:53:30] <leifw> when I look at it in python, I can't tell which field is the command
[22:30:24] <polynomial2> can one node be part of multiple replica sets?
[22:32:27] <polynomial2> it seems that if you have 6 nodes and 2 replica sets. if one of the replica sets fails you lose half your shards. but if you were able to make many replica sets with different combinations or servers, three servers failuring would not result in 50% loss of shards
[22:37:45] <idlemichael> i'm doing this in pymongo (finds records from today) on a collection that's 100GB: c.find({ 'created_at' : { '$gte': d } } ).sort([('created_at', pymongo.DESCENDING)]))
[22:45:47] <darkpassenger> i would like to modify a mongodb document via javascript http://pastebin.com/gLje7nwi can anyone help me ?
[22:48:33] <tyl0r> I'm using LVM snapshots every night to populate a standalone development machine with our production data. I'm starting to get "Invalid BSONObj size" errors now when querying the data on the standalone machine. The machine isn't replaying the contents of the journal/ folder of the LVM snapshot when I restart mongo (each night) on that development node. I end up having to run repairDatabase(). Is there a cleaner way of doing this? I'm running 2.2.1
[22:50:10] <retran> you're trying to do something weird
[22:50:29] <retran> are you not doing mongo shutdown before the LVM snapshot?
[22:50:53] <tyl0r> No. I'm not shutting down (or locking) the production node before taking the LVM
[22:51:25] <retran> then your other mongodb should start exactly as that
[22:51:27] <tyl0r> Taking a snapshot is quick. I guess I could lock or stop the node before snapshotting
[22:54:43] <retran> simpliest solution is just don't copy the data files while they're 'hot'
[22:55:11] <tyl0r> Is it a matter of the journal files not working on the standalone node? From what I gather the journal files should work "hot" for a restore
[22:56:25] <tyl0r> I think what you're saying makes sense though; I'll make sure to grab a "clean" snapshot instead. i.e., stop mongo on secondary, create lvm snapshot, start mongo on secondary, do the copy, delete the snapshot
[22:57:57] <retran> i wouldn't spend time being very curious about 'why' it doens't work
[22:58:04] <retran> just know it's a weird thing to do
[22:59:08] <retran> there's no reason you should expect copying the data files you snatched from a runnng mongodb to work
[22:59:22] <retran> rather, you should be shocked every time they happen to work
[23:00:06] <tyl0r> Well, no. I wouldn't expect them too. I figured mongo would handle the inconsistency via the journaling
[23:00:37] <retran> well, since you're curious, consider what may be in memory, not yet represented on disk
[23:01:10] <retran> the order it writes whatever information to disk
[23:01:35] <tyl0r> I misinterpreted this from the docs.mongodb.org site: The database must be in a consistent or recoverable state when the snapshot takes place. This means that all writes accepted by the database need to be fully written to disk: either to the journal or to data files.
[23:02:06] <retran> yeah and it's possible it's not to either
[23:02:28] <tyl0r> I'm convinced. I see the light. :)
[23:04:12] <retran> in the mysql world, they have weird utility that can hot copy data from running mysqld
[23:04:28] <retran> it has logic to check for a bunch of things
[23:04:40] <retran> it would be similar ordeal to do same in mongo
[23:06:17] <retran> i believe whoever wrote such utlity, would have to be incredibly familar with how/when data is committed to disk, and know what to check for in order to know which items ot ignore due to being inconsistent
[23:06:45] <retran> and then, such details would be prone to change with versions of the db