PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Thursday the 11th of July, 2013

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:29:29] <Guest26758> What's the best way to authenticate a replica set?
[00:35:40] <vaq> Hello, I am running a replica shard setup with 3 servers. I understand that if two of the servers are down the last will continue to stay secondary and not accept any writes. How to circumvate this so that it will ellect itself to master? Are there any potential problems in doing this, if possible?
[00:36:28] <retran> shards dont contain your full dataset, by definition
[00:36:45] <vaq> I am aware, thats why I run with replica.
[00:37:02] <retran> all replica are shards?
[00:39:02] <bmw> Hello
[00:40:16] <vaq> retran: to my understanding when doing replica set you have a copy of the db on each of the members
[00:42:02] <vaq> failed with error 10009: "ReplicaSetMonitor no master found for set: set0"
[00:42:17] <vaq> 00:31:23 [rsMgr] replSet can't see a majority, will not try to elect self
[00:42:25] <vaq> why will it not elect itself if it's alone?
[00:43:26] <vaq> how can I create high availability with replica shard that will results in a working environment if there is only one db server left?
[00:46:11] <retran> to my understanding replication set is distinct from sharding
[00:47:09] <retran> my gues is that oyu've configured it to prevent itself from becoming a primary in an election
[00:47:46] <vaq> I didn't do anything to prevent that
[00:47:56] <retran> you probably did
[00:47:57] <vaq> I setup normal replica shard with 3 servers like in the doc
[00:48:00] <vaq> then I shutdown two servers
[00:48:04] <retran> what's a replica shard
[00:48:05] <vaq> and last stays secondary
[00:49:09] <retran> a shard can be a replica set, but replica shard makes no sense
[00:49:32] <retran> i'm pretty sure you configured your secondaries to prevent self election
[00:50:00] <retran> when i say "you" don't take it personally, but the machines are configured that way by someone
[00:50:24] <vaq> retran: --shardsvr --replSet set0
[00:50:43] <vaq> retran: db.runCommand({"replSetInitiate" : {"_id" : "set0", "members" : [{"_id" : 1, "host" : "10.10.0.103:27018"}, {"_id" : 2, "host" : "10.10.0.104:27018"}, {"_id" : 3, "host" : "10.10.0.105:27018"}]}})
[00:50:55] <vaq> db.runCommand( { addshard : "set0/10.10.0.103:27018,10.10.0.104:27018,10.10.0.105:27018", name : "shard0" } );
[00:51:03] <vaq> and then I enabled sharding for the database
[00:51:05] <johann_> hello humans i would like to know how to know the length of an array stored in a collection
[00:51:16] <vaq> I run a router and configsvr on each node retran
[00:52:31] <retran> you're conflating the idea of shard and replica set
[00:53:03] <retran> anyway, secondaries wont self-elect if you configure them that way
[00:54:02] <retran> johann, the length of array of single record in collection?
[01:48:33] <bmw> Hi, Can anyone help me with authentication over a replica set?
[02:23:52] <MasterGberry> Trying to optimize some mongodb queries if possible. A description of what I have and am trying to optimize is here: http://stackoverflow.com/questions/17584055/trying-to-optimize-i-o-for-mongodb
[02:46:42] <dalekurt> exit
[05:58:03] <jgiorgi> what makes a bson document invalid, my document is a hash, timestamp (float) and a large string (~150KB)
[06:09:53] <nyov> jgiorgi: did you encode it? bson.BSON.encode(document) (python)
[06:10:21] <jgiorgi> nyov, not directly, i called insert on a mongodb collection
[06:11:07] <jgiorgi> think i found it, that string is not what it should be
[06:11:08] <nyov> afaik needs to be an actual bson first (meaning binary encoded)
[06:11:21] <nyov> ok
[06:20:09] <sag> Installing : mongo-10gen-2.4.5-mongodb_1.i686 1/2
[06:20:09] <sag> Error in PREIN scriptlet in rpm package mongo-10gen-server-2.4.5-mongodb_1.i686
[06:20:09] <sag> error: %pre(mongo-10gen-server-2.4.5-mongodb_1.i686) scriptlet failed, exit status 1
[06:20:09] <sag> error: install: %pre scriptlet failed (2), skipping mongo-10gen-server-2.4.5-mongodb_1
[06:20:40] <sag> on CentOS release 6.2 i686
[06:21:04] <sag> its fresh vanilla centos 6.2 instance..
[06:21:25] <sag> can some body tell me where to look
[06:22:44] <sag> from /var/log/messges i can tell mongo-10gen-server-2.4.5-mongodb_1.i686: 100 is problem
[06:24:44] <sag> :(
[06:29:24] <sag> when i run in verbose mode
[06:29:35] <sag> i mean yum in verbose mode, it gives,
[06:29:36] <sag> Installing : mongo-10gen-2.4.5-mongodb_1.i686 1/2
[06:29:36] <sag> Error in PREIN scriptlet in rpm package mongo-10gen-server-2.4.5-mongodb_1.i686
[06:29:36] <sag> error: %pre(mongo-10gen-server-2.4.5-mongodb_1.i686) scriptlet failed, exit status 1
[06:29:36] <sag> error: install: %pre scriptlet failed (2), skipping mongo-10gen-server-2.4.5-mongodb_1
[06:29:37] <sag> Warning: scriptlet or other non-fatal errors occurred during transaction.
[06:29:37] <sag> What is this? mongo-10gen-server-2.4.5-mongodb_1.i686
[06:29:38] <sag> VerifyTransaction time: 0.815
[06:29:38] <sag> Transaction time: 23.083
[07:01:48] <sag> again little more digging..
[07:01:50] <sag> ioctl(1, TIOCGWINSZ, 0xbff0fabb) = -1 ENOTTY (Inappropriate ioctl for device)
[07:01:50] <sag> gettimeofday({1373525717, 523335}, NULL) = 0
[07:01:50] <sag> write(1, "\nFailed:\n mongo-10gen-server.i6"..., 91) = 91
[07:01:50] <sag> | 00000 0a 46 61 69 6c 65 64 3a 0a 20 20 6d 6f 6e 67 6f .Failed: . mongo |
[07:01:51] <sag> | 00010 2d 31 30 67 65 6e 2d 73 65 72 76 65 72 2e 69 36 -10gen-s erver.i6 |
[07:01:51] <sag> | 00020 38 36 20 30 3a 32 2e 34 2e 35 2d 6d 6f 6e 67 6f 86 0:2.4 .5-mongo |
[07:01:52] <sag> | 00030 64 62 5f 31 20 20 20 20 20 20 20 20 20 20 20 20 db_1 |
[07:01:52] <sag> | 00040 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 |
[07:01:53] <sag> | 00050 20 20 20 20 20 20 20 20 20 0a 0a .. |
[07:01:53] <sag> rt_sigaction(SIGQUIT, {0x77f380, [], 0}, {SIG_DFL, [], 0}, 8) = 0
[07:02:28] <sag> this is the strace o/p for installing mongo-10gen-server only..
[07:03:37] <sag> as when i separately install mongo-10gen package, it got installed succesfully..but mongo-10gen-server failed!!
[07:37:54] <[AD]Turbo> hola
[08:03:50] <Infin1ty> when resyncing a replicaset member from scratch, during the index build, it's not possible to query anything in that instance?
[08:04:25] <double_p> nope.. check rs.status()
[08:04:33] <Infin1ty> double_p, i mean, even db.serverStatus()
[08:04:49] <Infin1ty> double_p, it's now resyncing, running db.serverStatus() simply hangs
[08:05:01] <double_p> oh, internals should work. maybe just lags on CPU overload?
[08:05:22] <Infin1ty> double_p, i have no idea, it's weird, i can't even ssh to the box, first time it happens to me
[08:05:41] <Infin1ty> double_p, i do receive metrics from the server itself, i see the mongod and mongos there are working, disk space goes up on the data directory
[08:05:50] <Infin1ty> double_p, i can db.serverStatus() on the mongos that runs there
[08:07:05] <double_p> best guess, machine overloaded for now. indexing can be intense
[08:09:32] <Infin1ty> double_p, yes, but that's not explaining that you can't even login remotely and even not remotely
[08:09:48] <Infin1ty> double_p, this is a very strong server, 2 x E5-2670, 96GB RAM , SSDs
[08:09:58] <Infin1ty> and it runs only mongo
[08:10:07] <double_p> sorry, crystal globe is on vacation ;)
[08:11:15] <sag> Infin1ty, its weird SSDs and 96 RAM still overload..
[08:11:38] <sag> how much is cluster is it ib TB?
[08:11:45] <sag> *in
[08:12:20] <Infin1ty> sag, i have done this procedure many many many times, i never had this until i upgraded to 2.2.5
[08:12:28] <Infin1ty> sag, been using 2.0.8 for a while
[08:12:37] <sag> oops..
[08:12:47] <Infin1ty> sag, the cluster in total is around 13TB, this specific member should be around 1.2T at the end
[08:13:28] <sag> infin1ty, if i m not wrong version prior to 2.2 use gloabl locks!!
[08:13:50] <Infin1ty> sag, how is that have to do now?
[08:14:06] <Infin1ty> sag, i said, in 2.0.8 everything worked without any issues, in 2.2.5 i see this problem now
[08:14:45] <Infin1ty> well nevermind
[08:15:41] <sag> Infin1ty, even im facing issues in installing 2.2.5
[08:15:54] <sag> nope 2.4.5
[08:16:00] <sag> im using 2.2
[08:16:59] <sag> with GB's of data
[08:19:29] <double_p> 1.2T with 96G ram? mhmmmm... check if "oom-killer" didnt blast you off
[08:20:32] <Infin1ty> double_p, it didnt, my indexes fits well into it, but nevermind :)
[08:23:47] <double_p> :)
[08:25:13] <Nodex> if it;s rebuilding an index your data will be locked
[08:25:23] <Nodex> specifically for that collection
[08:27:47] <Infin1ty> anyhow, it's seems as it's a kernel issue in redhat
[08:28:26] <double_p> hum?
[08:29:13] <Infin1ty> double_p, i reset the server now, seems like buggy kernel patch reverting a minor version back seems to fix it
[08:29:23] <Infin1ty> double_p, even a simple command was hanging
[08:30:13] <untaken> is there anything I can do, to speed up a the sorting of a query? If I have a regex search with no ^ or $, the query takes 0.211389s, but as soon as I sort it by a different field, it takes 1.572594s. Is there anything I can do to speed this up? I have tried indexing, but it don't do nothing :(
[08:32:27] <Nodex> you'll need a compound index
[08:33:51] <untaken> Nodex: so if the field in the regex is A, and the field I want to order by is B, would it be db.users.ensureIndex( { A: 1, B: 1 } )
[08:35:49] <untaken> because if so, that doesn't work in my case
[08:36:14] <untaken> If I do a search ordering by A, it's fast! but if I order by B or C, it's dog slow
[08:37:34] <Nodex> sorting asc or desc?
[08:40:15] <untaken> Nodex: I have just done a normal sort, I am actually using the perl module ->sort( A => 1)
[08:40:34] <untaken> Nodex: btw, I have also tried db.users.ensureIndex( { A: 1, B: -1 } )
[08:44:01] <untaken> ^ was this: I am actually using the perl module ->sort( B => 1)
[08:44:22] <untaken> I guess the sorting is an issue for me that mongodb can't sort ;/
[08:44:45] <untaken> I got quite a few records, but the sorting takes way too long
[08:44:47] <Nodex> you're going to need to pastebin an explain()
[08:56:43] <Nodex> I think the root cause of your problem is the lack of a prefix on the regex and thus wont use any index, I am not sure if you can hint the index to use for sorting or not hence the need for an explain()
[09:01:59] <traceroute> hey, is there a way to get a slave/shard to register back to the routing server? My shards will be behind a firewall and not directly accessible from the internet.
[09:02:04] <untaken> Nodex: ok thanks for that. I'll keep fiddling, and if I am still struggling will provide this information.
[10:55:53] <untaken> is there away to make .find({some query}).count() quicker?
[10:56:28] <ron> which version of mongo do you use?
[10:56:55] <untaken> ron: 2.2.1 I thinks
[10:57:10] <ron> then upgrade :) that would make it quicker.
[10:57:14] <untaken> thanks
[10:57:14] <ron> I think.
[10:57:19] <untaken> haha :)
[10:58:05] <ron> untaken: http://docs.mongodb.org/manual/release-notes/2.4-overview/ check the "Improved Performance" section
[10:58:11] <ron> second point.
[10:58:24] <untaken> thanks, I'll take a look
[11:00:14] <untaken> ahh, it's 2.4.3 sorry
[11:00:30] <untaken> so, more than likely have those performance tweaks there already
[11:00:42] <ron> yeah. possibly an indexing issue.
[11:01:19] <untaken> no no no...
[11:01:19] <untaken> tell a lie
[11:01:21] <untaken> the client I am using is 2.4.
[11:01:29] <untaken> using db.runCommand("buildInfo") says 2.2
[11:01:32] <ron> who cares about the client?
[11:02:06] <untaken> exactly
[11:02:12] <untaken> thanks ron
[11:02:18] <ron> no problem
[11:16:41] <Nodex> untaken : count() is terribly slow, I don't think it will ever be as fast as people want it
[11:18:08] <Senor> To develop one tcp server ,which design is better ? multithread and multiprocess
[11:18:52] <Nodex> ask in #tcp
[11:20:02] <untaken> Nodex: yea, I don't it's making much difference
[11:20:46] <Nodex> internal counts are fast because they use internal counters
[11:21:00] <Nodex> queried counts don't have that luxury
[11:23:58] <sikor_sxe> hello, i have a doc with a start & end Date field, now i'd like to ensure that the end Date is not before the start one. is there a mechanism in mongodb to do that (a special index or something) or do i have to care for that in the application logic?
[11:26:56] <Nodex> appl logig
[11:26:58] <Nodex> logic*
[11:41:26] <untaken> Nodex: what is annoying, is if I was going to implement paging, for a some webpage, the first 50 is quick, but in order to find the total results, I need to do the ->count as I describe above, which then slows it all down. The count would be to confirm the total entries for the query. I know this didn't have to run for each page, but is annoyingly slow for the first page
[11:43:07] <ron> you can always use an external indexing system.
[11:44:03] <untaken> so two queries could go two different places?
[11:44:21] <ron> sure
[11:44:35] <untaken> hmm, food for thought
[13:07:32] <nyov> would setting up a replicaset with mixed 32bit and 64bit machines work? or would there be issues with datatypes (64/32bit int's?)
[13:08:11] <Zelest> 32bit should be avoided completely imho.
[13:09:23] <nyov> for what reasons?
[13:09:58] <algernon> nyov: http://blog.mongodb.org/post/137788967/32-bit-limitations
[13:22:07] <nyov> ah yes, i remember reading that before ;) I suppose it got lost on me
[13:23:16] <traceroute> hey I was in earlier. I want to have mongo shards which have to sit behind a router (and are not directly accessible via the internet). From what I've read, my server on the 'cloud' has to register each shard. Is there a way to do it backwards so the shards register with the 'master' first?
[13:30:17] <mjburgess> does any one know how i can query for stats using the c driver?
[13:30:27] <mjburgess> *collection stats
[13:37:13] <nyov> is there a possibility to get an oplog without replication? or some possibility to emulate it?
[13:50:06] <dash_> hello, how to select from huge database every Nth record? (maybe Map Reduce)
[13:50:46] <mjburgess> ( the answer for future reference is to use runCommand() with the corresponding run_command in mongo.h )
[13:55:06] <Nodex> dash_ : that's an appside problem really
[13:55:30] <Nodex> I would construct something in my app and use skip and limit for it
[13:56:05] <ron> depends how big the skip is though, no?
[13:58:37] <dash_> Nodex: yep it should work, I will create counter on every item and than run query similar to db.inventory.find( { qty: { $mod: [ 4, 0 ] } } ) I am wondering about speed of this … what do you think?
[13:59:11] <dash_> *then
[14:00:26] <Nodex> up to you, depends if you want every 4th document of the ENTIRE collection or not
[14:02:54] <dash_> in worst case I would like to select every 1000th document from 1 000 000 documents
[14:03:21] <dash_> resulting in 1000th documents
[14:03:26] <dash_> *1000
[14:05:07] <dash_> 1000 documents every one 50 Bytes, so it would be 50 KB
[14:38:05] <EricL> Is there any way to reclaim space from Mongo that doesn't require many hours of down time waiting for aglobal write lock to be released?
[14:38:46] <EricL> Had a process go rogue and created a lot of records. Cleaned it up, now Mongo won't give back the space.
[14:40:29] <jmar777> EricL: are you running a replica set?
[14:41:17] <Nodex> you can compact but it's not advised on a live database
[14:42:32] <EricL> jmar777: Yes. I was hoping to not have to take down each node at a time to run the repair.
[14:42:42] <EricL> Especially because we are talking about hundreds of gigs.
[14:43:05] <EricL> I also don't have that much available space on the drives.
[14:43:42] <kizzx2> hey guys
[14:43:49] <EricL> In other words, I don't have 2x available of the data on disk size of Mongo's files.
[14:43:56] <kizzx2> iotop reports that mongodb is using 50MB/s disk usage, but mongotop shows all 0ms
[14:44:11] <jmar777> EricL: do you actively read against the slaves?
[14:44:12] <kizzx2> any suggestions how can i debug this?
[14:44:18] <EricL> jmar777: Yes.
[14:45:40] <jmar777> EricL: hrmm... not sure what the best approach is to stop slave reads temporarily, but if you start with that you can run compact on the slaves, and then promote one of them to primary while you run compact on the (former) primary node
[14:46:05] <EricL> jmar777: Even if I do that, I still don't have enough drive space according to the Mongo docs.
[14:46:07] <jmar777> EricL: ... and then re-promote it to primary, if that makes sense
[14:46:18] <jmar777> EricL: to run the compact?
[14:46:43] <EricL> Compact doesn't give space back to the OS.
[14:46:56] <Nodex> yes it does
[14:47:11] <EricL> Nodex: Not according to this: http://docs.mongodb.org/manual/faq/storage/
[14:47:17] <EricL> Are the Mongo docs incorrect?
[14:47:48] <EricL> Line: Important compact only removes fragmentation from MongoDB data files and does not return any disk space to the operating system.
[14:48:09] <Nodex> I stand corrected
[14:48:14] <jmar777> EricL: ya.. .just saw that actually
[14:48:29] <EricL> So pretty much Mongo f'd me.
[14:48:41] <jmar777> EricL: looks like repairDatabase() might do it... reading up now
[14:48:43] <EricL> I have to add a new replica to the set and drop the shitty one?
[14:48:59] <EricL> I don't think I have enough disk space to run repairDatabase thansk to Mongo.
[14:49:12] <Nodex> no your app f'd you
[14:49:30] <EricL> Right, but Mongo doesn't have sufficient tools to let me recover.
[14:49:50] <EricL> This is definitely going into my next Mongo talk.
[14:49:53] <Nodex> it does, you just dont have sufficient disk space to let them run
[14:50:45] <Nodex> repairDatabase() is what I thought compact was, seems I was mistaken on the name
[14:52:44] <jmar777> The problem may be avoidable, but Mongo would definitely benefit from an in-place repair
[14:53:20] <jmar777> or, alternatively, make compact work the way it sounds like it should (which is already in-place)
[14:53:48] <jmar777> EricL: what's your storage look like - EC2 + EBS by any chance?
[14:54:02] <EricL> EC2, not EBS.
[14:54:35] <EricL> I've had this conversation with a bunch of the 10gen folks a few times. You nearly always get better bang for your buck with instance storage than PIOPS.
[14:54:46] <mrapple_> picking a shard key is quite nerve racking
[14:55:09] <EricL> mrapple_: In the worst case scenario, you can always hash _id since it's monotonic.
[14:55:24] <EricL> You'll still get solid distribution.
[14:55:32] <mrapple_> but poor querying perhaps?
[14:55:45] <EricL> mrapple_: It depends on your queries.
[14:55:48] <jmar777> EricL: ya, but snapshots and situations like your current one are a much bigger pain
[14:56:06] <mrapple_> let's say i have a collection and the two most common queries on that collection use separate keys
[14:56:12] <mrapple_> is the shard key a combination of those keys?
[14:56:28] <EricL> jmar777: I've been using Mongo since 0.4. This is the first time I've run into this situation.
[14:56:41] <jmar777> EricL: with provisioned IOPs, EBS is a pretty decent solution now unless you're doing a lot of long sequential reads
[14:57:18] <jmar777> EricL: EBS can actually beat ephemeral for random reads... but more or less sucks for sequential
[14:57:26] <EricL> jmar777: PIOPS isn't bad, just expensive and not usually worth the extra money.
[14:57:41] <EricL> It was roughly equal for our random read tests and worse for sequential.
[14:58:58] <jmar777> EricL: that sounds right. anyway... back to the problem at hand, I don't really see any good options other than to export/import to a larger fs
[14:59:16] <kizzx2> mrapple_: you should design it so that shards have a uniform distribution
[14:59:32] <jmar777> EricL: although, granted, we're getting to the limit of my Mongo Ops knowledge, so I wouldn't take that as gospel
[14:59:48] <kizzx2> the bad extreme example is using date as a shard key
[15:00:19] <EricL> jmar777: Ok, thanks anyway.
[15:00:19] <jmar777> EricL: FWIW, I'd recommend EBS for the new volume if it's performant enough for you... makes various operational scenarios MUCH easier to deal with
[15:00:24] <kizzx2> (unless date is uniformly accessed in your application)
[15:00:47] <EricL> jmar777: It;s not even close to performant enough.
[15:00:55] <jmar777> EricL: :\
[15:13:57] <buzzedword> morning guys. had a question re: performance of mongo through the PHP driver, seems pretty straightforward but i'm doing due dil
[15:14:52] <buzzedword> which one of these two options performs better/scales better? 1) performing a multiselect query 2) selecting multiple single items, joining through PHP
[15:15:47] <buzzedword> i currently have a very small sample set to test for, and this use case is going to grow substantially, so i'm not exactly seeing much of a difference with either side. at the same time, there's not a lot of load thrown against it either
[15:16:26] <Derick> buzzedword: depends on how much memory you want to put to it more than speed
[15:16:49] <Derick> and how complex your query is...
[15:17:23] <buzzedword> Derick: it's fairly straightforward. nothing complex at all, literally selecting by one column, that's on top of that is indexed already
[15:17:30] <buzzedword> so a very straightforward select
[15:18:14] <buzzedword> when you say how much memory i want to put to it, i'm assuming you're talking about joining via PHP? or am i mistaken here
[15:21:16] <Derick> you're correct
[15:22:45] <buzzedword> we want this to be preferably as low memory as possible whilst performing as quickly as possible. i'd take low memory consumption over speed if it came down to it though
[15:23:53] <Derick> then do the complex query on mongodb... but you really ought to benchmark this
[15:27:57] <buzzedword> Derick: I am benchmarking-- they're currently breaking even with performance. my sample set isn't large enough
[15:28:46] <buzzedword> it's going to get massive, and i don't have access to that kind of scaffolding environment yet-- still being set up. Due dil :)
[15:29:33] <buzzedword> just wanted to get an idea where the wind blows on this particular issue
[15:33:26] <Nodex> zerick = Derick 's evil twin ?? :P
[15:33:42] <Derick> hehe
[15:33:49] <Zelest> Zelest + Derick = zerick
[15:33:58] <Derick> okay, creepy
[15:34:03] <Zelest> Haha
[15:34:08] <zerick> Maybe
[15:34:09] <Nodex> Zelest = creepy
[15:34:13] <Zelest> :D
[15:34:16] <Zelest> Zelest
[15:34:17] <Zelest> Molest
[15:34:18] <Zelest> Incest
[15:34:20] <Zelest> go figure..
[15:34:21] <Zelest> ;)
[15:34:37] <Nodex> Zelest : http://www.youtube.com/watch?v=g-O3wl4VvFc&feature=youtu.be
[15:34:49] <Nodex> watch in 1080p if you can
[15:36:22] <Zelest> "if you can" ?!
[15:36:42] <Zelest> wow, looks awesome
[15:37:02] <Zelest> all mongo and php?
[15:39:00] <Nodex> mongo, php, solr
[15:39:07] <Derick> minus points fo using GG
[15:39:08] <Nodex> and some nodex javascript magic
[15:39:15] <Nodex> GG ?
[15:39:25] <Derick> googlemaps
[15:39:41] <Nodex> It's the only map service I built the tool for
[15:40:38] <Nodex> turns out MongoDB is an awesome configuration store, I store the whole config if every part of the app inside mongodb
[15:41:08] <Nodex> builds all the queries for SOLR, maps mongodb to solr, there is nothing it can't do (that I've found) so far, and this is one complex app
[15:41:55] <Nodex> 30k lines of PHP lmao
[15:42:23] <rendar> what is sorl?
[15:42:49] <Nodex> a search daemon
[15:42:54] <Derick> solr
[15:42:58] <Nodex> !google Solr
[15:42:59] <pmxbot> http://lucene.apache.org/solr/ - Apache Lucene - Apache Solr
[15:43:35] <Nodex> latest versions make good use of Json also so now it's really really lightweight
[15:43:50] <rendar> oh i see
[15:47:17] <drag> Hi. I have large DOM objects that I want to store to a mongodb. Can I use buffers to write to mongodb or do I have to wait for the entire DOM to be convereted to XML in memory before writing it out?
[15:48:04] <Nodex> you can't write in parts if that's what your asking
[15:48:16] <Nodex> writing yields a lock so nothing else can touch it
[15:53:31] <drag> That's what I though Nodex. Thanks.
[15:53:47] <drag> So there's no more elegant way to do this then just have lots of memory?
[15:54:07] <Nodex> how big is your DOM?
[15:54:12] <Nodex> no homo
[15:54:25] <kizzx2> hey guys, mongostat is reporting 200% locked % and iostat is reoprting 99% util, but mongotop shows nothing
[15:54:36] <kizzx2> any pointers how to diagnose this?
[15:54:59] <drag> We are taking UIMA CAS objects and converting them to XMLs using an output stream. However, instead of writing it to a file, I want to write it to a mongodb, Nodex.
[15:56:16] <Nodex> I don't have a clue what a UIMA CAS object is sorry
[15:56:51] <drag> Nodex, just think of it as either really large XML documents or lots of medium size XML documents.
[15:57:51] <darkpassenger> is there such a thing as $not-exist if I want element that dont contain a certain field ?
[15:57:54] <drag> The issue is we are going to be processing lots of documents and I don't want to have to give the box absurd amounts of memory just to hold the objects before they are written out to the db.
[15:58:14] <Nodex> darkpassenger there is $not or $ne
[15:58:34] <Derick> darkpassenger: sure, just use: field: { $exists: false }
[15:58:44] <Nodex> or that
[15:58:55] <Nodex> drag: define large document
[15:59:07] <darkpassenger> thanks !
[15:59:12] <Nodex> there is a 16mb cap per doc currently anyway
[16:00:52] <drag> Nodex, we will have some that are going to be bigger than 16mb.
[16:02:19] <Nodex> then you will need to break them up because they wont be inserted if they're larger
[16:06:03] <drag> ok, thanks Nodex.
[17:48:16] <idlemichael> does anyone use vi key bindings, like "set -o vi" inside their mongo shell?
[17:48:34] <idlemichael> i've been trying to add it to my various ~/.*rc files, but no luck
[18:29:06] <maxime_> heya all
[18:30:00] <idlemichael> does every mongodb document have an inherent created_at field?
[18:31:44] <darkPassenger> how come that this (http://pastebin.com/K9Pbf0un) returns me this : { _id: 51ddb907ae3267d6154a3e64, archive: 'true', nom: 'Girard', prenom: 'Maxime' }
[18:35:44] <darkPassenger> yep..
[18:39:55] <darkPassenger> is there anything particular as to retrieve a particular item based on ID and a second criteria
[18:54:46] <primitive> how can I add a field to the results of a $group operation without grouping by that field?
[19:13:10] <drag> Does this channel support Spring Data for MongoDB?
[19:13:51] <darkPassenger> man
[19:13:54] <darkPassenger> i wouldnt know...
[19:18:02] <revoohc> Is anyone here using chef to deploy MongoDB?
[19:20:18] <drag> Just in case someone here knows. Regarding Spring Data for mongo db. I have an output stream containing xml data that I want to write to MongoDB. The API specifies that I pass the store method an Input steam and a file name. 1) What is the purpose of that file name. Isn't MongoDB suppose to take care of all that for me? 2) Does the stream have to contain all the data that will be written or can it write as data comes?
[19:25:45] <awestwell> Hey all small quesiotn
[19:26:53] <awestwell> if you use the aggregate on a colleciton like follows does the skip and limit happen after the matches are processed or before { aggregate: "sample", pipeline: [ { $match: { value: "test" } }, { $match: { groups: { $in: [ "GroupA", "GroupD" ] } } }, { $skip: 39 }, { $limit: 40 } ] }
[19:38:14] <crudson> awestwell: it's in order you specify, but note http://docs.mongodb.org/manual/core/aggregation/#optimizing-performance
[19:53:03] <darkPassenger> what is wrong with that code ? http://pastebin.com/vLiLzKBF it keeps returning document who have field "archive" set to 1 ...
[19:53:12] <darkPassenger> its coffeescript using mongodb-native
[19:53:42] <darkPassenger> can i specify an ID and a condition in a find ?
[19:54:01] <darkPassenger> like get this ID if somefield has specified_value ?
[19:54:33] <darkPassenger> help
[19:54:35] <darkPassenger> lol
[19:54:45] <ron> lol
[19:55:02] <mrapple_> so, i have servers in the US and EU
[19:55:17] <mrapple_> i want to have a sharded setup in the US, but no sharding in EU
[19:55:25] <mrapple_> how's this setup work?
[19:55:27] <ron> is it because EU sucks?
[19:55:36] <mrapple_> we don't have enough SSDs/servers there
[19:55:45] <mrapple_> I just want to replicate all the data there and ignore sharding
[19:56:02] <mrapple_> aka I want instant reads, but writes taking some time to get to the US are fine
[19:56:34] <darkPassenger> man
[19:56:57] <mrapple_> thoughts?
[20:00:49] <darkPassenger> nah sorry bro
[20:04:56] <starfly> mrapple: if you're going to use replica sets between the US and EU, then you have to either shard or not shard the whole configuration
[20:07:57] <darkPassenger> what is wrong with that code ? http://pastebin.com/vLiLzKBF it keeps returning document who have field "archive" set to 1 ...
[20:09:39] <darkPassenger> man
[20:26:40] <leifw> in pymongo, how can I get a SON object back from a cursor? it looks like if I do "for o in coll.find()" I get python dicts in o, but I need them to be ordered
[20:29:27] <crudson> leifw: how do you want them to be ordered?
[20:29:50] <leifw> I just want to know the field ordering of the documents as they are stored
[20:30:22] <leifw> truth be told I'm trying to read the oplog, and in there, field order matters (for commands)
[20:30:57] <crudson> leifw: and enumerating them doesn't match, say, looking at the same document in mongo shell?
[20:34:26] <leifw> nope
[20:34:43] <leifw> the ordering is the ordering that the python dict's hashtable gives
[20:34:51] <leifw> which is not the ordering on disk
[20:35:43] <leifw> there is a SON python class that basically is an ordered dictionary, for this exact reason, but I don't know how to get the cursor to give me one
[20:35:55] <leifw> considering slinking back, defeated, to the C++ driver
[20:51:01] <crudson> leifw: oplog documents use attributes ts, h, op, ns, o[, o2] - if you can't force pymongo to use ordered dict, just read attributes in a known order?
[20:51:50] <leifw> it's the embedded document in the o field
[20:53:19] <leifw> for commands, the object that's logged is the same object that gets sent over the wire for the command, where the first field is always the name of the command
[20:53:30] <leifw> when I look at it in python, I can't tell which field is the command
[20:53:35] <leifw> so I can't use it
[20:54:15] <leifw> meh, it's fine I'll do it in C++
[20:54:37] <crudson> leifw: that sounds pretty fundamental! Surely there is a way to tell the driver to use an ordered hash.
[20:55:47] <leifw> that's what I thought, but I can't find anything
[20:58:23] <tystr_> at what point should I become concerned about the global lock %
[20:59:36] <tystr_> ours is hovering between 6% and 8%
[20:59:51] <tystr_> but we haven't noticed performance issues yet
[21:30:00] <saml> hey how can I update a field of all documents?
[21:30:18] <saml> db.coll.update({}, {$set: {field: {}}}) doesn't update any doc
[21:32:44] <saml> db.Images.update({isDirectory: false}, {$set: {metadata: {raw: {}}}})
[21:35:40] <leifw> need a third option to update
[21:35:57] <leifw> db.coll.update({}, {$set: {field:{}}}, {multi:true})
[21:36:08] <leifw> otherwise it just updates the first doc that matches that query
[21:37:11] <saml> oh thanks leifw
[21:47:03] <mrapple> starfly: but i don't want to have some data in the EU and some data in the US
[21:47:24] <mrapple> i want to have a copy of data in both places, but have the US copy sharded
[21:47:29] <mrapple> unless there's a better way to set this up
[21:48:16] <retran> shards are distinct from relica sets
[21:48:41] <retran> your shard can be distributed tinto replica set
[21:48:52] <retran> different concepts
[21:49:39] <mrapple> so what i'm thinking of is not possible then
[21:49:49] <mrapple> what do you advise, to have speedy reads in the EU?
[21:49:58] <mrapple> but not shard data there
[21:52:26] <retran> yes it's possible
[21:52:53] <retran> you create a shard, store the shard in a replica set
[21:53:12] <retran> have a node in EU
[21:53:17] <retran> of the replica set
[21:54:17] <mrapple> if i have three shards in the US, i need three replicas in EU then? one for each shard?
[21:54:25] <mrapple> i guess that makes seense
[22:30:24] <polynomial2> can one node be part of multiple replica sets?
[22:32:27] <polynomial2> it seems that if you have 6 nodes and 2 replica sets. if one of the replica sets fails you lose half your shards. but if you were able to make many replica sets with different combinations or servers, three servers failuring would not result in 50% loss of shards
[22:37:45] <idlemichael> i'm doing this in pymongo (finds records from today) on a collection that's 100GB: c.find({ 'created_at' : { '$gte': d } } ).sort([('created_at', pymongo.DESCENDING)]))
[22:38:02] <idlemichael> it's on an m2.xlarge
[22:38:05] <idlemichael> taking forever
[22:38:07] <idlemichael> is that standard?
[22:45:47] <darkpassenger> i would like to modify a mongodb document via javascript http://pastebin.com/gLje7nwi can anyone help me ?
[22:48:33] <tyl0r> I'm using LVM snapshots every night to populate a standalone development machine with our production data. I'm starting to get "Invalid BSONObj size" errors now when querying the data on the standalone machine. The machine isn't replaying the contents of the journal/ folder of the LVM snapshot when I restart mongo (each night) on that development node. I end up having to run repairDatabase(). Is there a cleaner way of doing this? I'm running 2.2.1
[22:50:10] <retran> you're trying to do something weird
[22:50:29] <retran> are you not doing mongo shutdown before the LVM snapshot?
[22:50:53] <tyl0r> No. I'm not shutting down (or locking) the production node before taking the LVM
[22:51:25] <retran> then your other mongodb should start exactly as that
[22:51:27] <tyl0r> Taking a snapshot is quick. I guess I could lock or stop the node before snapshotting
[22:51:42] <retran> oops
[22:51:44] <retran> i read you wrong
[22:51:57] <retran> yeah you need to shutdown mongodb first for what you describe to do to reliably work
[22:52:05] <tyl0r> hmmm...
[22:52:40] <retran> if you do it while it's on, you're trying to re-implement replication without fully re-implementing it
[22:52:54] <retran> that is strange to do on any db system
[22:53:22] <tyl0r> yeah, I was hoping I could copy the journal/ folder over to the new machine, and have mongodb replay that folder
[22:53:39] <retran> you would have to write a fancy routine/utility for that
[22:53:49] <retran> that 'knows' the inner workings of mongo
[22:54:01] <retran> that doesn't sound trivial
[22:54:43] <retran> simpliest solution is just don't copy the data files while they're 'hot'
[22:55:11] <tyl0r> Is it a matter of the journal files not working on the standalone node? From what I gather the journal files should work "hot" for a restore
[22:56:25] <tyl0r> I think what you're saying makes sense though; I'll make sure to grab a "clean" snapshot instead. i.e., stop mongo on secondary, create lvm snapshot, start mongo on secondary, do the copy, delete the snapshot
[22:57:57] <retran> i wouldn't spend time being very curious about 'why' it doens't work
[22:58:04] <retran> just know it's a weird thing to do
[22:59:08] <retran> there's no reason you should expect copying the data files you snatched from a runnng mongodb to work
[22:59:22] <retran> rather, you should be shocked every time they happen to work
[23:00:06] <tyl0r> Well, no. I wouldn't expect them too. I figured mongo would handle the inconsistency via the journaling
[23:00:37] <retran> well, since you're curious, consider what may be in memory, not yet represented on disk
[23:01:10] <retran> the order it writes whatever information to disk
[23:01:35] <tyl0r> I misinterpreted this from the docs.mongodb.org site: The database must be in a consistent or recoverable state when the snapshot takes place. This means that all writes accepted by the database need to be fully written to disk: either to the journal or to data files.
[23:02:06] <retran> yeah and it's possible it's not to either
[23:02:13] <retran> if you snatch it randomly
[23:02:21] <tyl0r> yeah, gotcha
[23:02:28] <tyl0r> I'm convinced. I see the light. :)
[23:04:12] <retran> in the mysql world, they have weird utility that can hot copy data from running mysqld
[23:04:28] <retran> it has logic to check for a bunch of things
[23:04:40] <retran> it would be similar ordeal to do same in mongo
[23:06:17] <retran> i believe whoever wrote such utlity, would have to be incredibly familar with how/when data is committed to disk, and know what to check for in order to know which items ot ignore due to being inconsistent
[23:06:45] <retran> and then, such details would be prone to change with versions of the db