[02:15:04] <ketema> mrpro: its just a bit.ly shortened link kjkh are my initials
[02:53:29] <jwilliams> we get the following errors Assertion: 13282:Couldn't load a valid config for after 3 attempts. Checking user group shows that upgrading to 1.8 (this seems an 1.6 bug) would fix the problem. but we use 2.0.1.
[02:53:56] <jwilliams> Any other possible root cause?
[05:00:17] <meson10> How does connection Pooling work.. Is it suggested to create a new connection per request or should it be cached at module level and reUsed ?
[05:11:15] <andoriyu> is there a way to make bulk update.
[05:11:15] <andoriyu> (not just update all documents with one function, but update each document separetly with different values)
[05:12:57] <meson10> andoriyu: No. You can look at the aggregation framework, or write server side JS code.
[05:23:41] <andoriyu> meson10, nah, I will just update each document separetly
[05:28:53] <whatasunnyday> Hi! I was reading the mongo manual and I have a simple question. If I make an index that is unique and I insert a document that has that field repeated, will it not be inserted into the collection? Will it return an error?
[05:29:18] <whatasunnyday> Or will it appear stored in the collection but not accessible unless I remove the index?
[05:34:27] <whatasunnyday> Actually, I understand now.
[06:11:51] <whatasunnyday> i have a quick question on sharding
[06:12:14] <whatasunnyday> what is the "right" cardinality when sharding?
[06:12:54] <whatasunnyday> http://www.mongodb.org/display/DOCS/Choosing+a+Shard+Key i was thinking that time would be an example of bad sharding but it doesn't seem to be the case. if you shard by one attribute and that attribute grows beyond 64 mb, does it mean it doesn't create another shrad?
[06:19:59] <michaeltwofish> I've just upgraded a server to use MongoDB 2.2.0 from 1.8.3. Starting the shell shows the correct version but db.serverStatus() still shows 1.8.3. Is this cause for concern and something is screwed up?
[06:49:48] <iksik> hmm, still testing and learning about replSets... yesterday, when i was removing master nodes one by one, 'primary' node was selected after less then 1 sec(?) today i'm looking on replset status table and hmm http://i.imm.io/Gw65.png - no primary node o.O
[07:03:12] <iksik> oskie: one more question... how many resources can be used by an arbiter instance? cpu/mem/bandwidth?
[07:03:43] <iksik> in target arch i could set it up on machine used by postgresql server i think
[07:08:16] <iksik> heh, 4th node is up, and primary was selected ;-D
[07:15:00] <ezakimak> question: it says in the manual under Dot Notation vs Subobjects that key order must be the same, and that ... "This can make subobject matching unwieldy in languages whose default document representation is unordered.". However, in the manual page for updating in the "Field (re)order" section it says: "There is no guarantee that the field order will be consistent, or the same, after an update"
[07:15:31] <ezakimak> this seems to be an inconsistency...
[07:16:34] <ezakimak> how can it expect the client to test w/the "correct" key order if the server won't guarantee that it will preserve the same order it was originally written with?
[08:21:22] <arussel> that was the question I was going to ask :-)
[08:35:20] <Lujeni> Hello - someone can explain what does this error mean pls ? http://sebsauvage.net/paste/?fd747367994a0cd3#xwA7ebeaAIPPg2cOmCeAz4b2lAgBH1jt9y6LbARX1Mg=
[08:38:38] <kali> Lujeni: it just means you have an index that has empty physical pages. it happens all the time in collection with heavy write/update/delete rate
[08:38:49] <kali> Lujeni: i litterally have millions of them in my logs
[10:46:01] <WormDrink> Hi, I'm using mongodb 1.8 - with following config (http://pastebin.com/SasVcUHK) - I then stop all replica set memebers except the ones with id 3 (status afterward can be seen here: http://pastebin.com/ys4pKnWJ) - then at this point in time I get error whenever I try connect to router : uncaught exception: error { "$err" : "socket exception", "code" : 11002 }
[10:46:11] <WormDrink> how do I go about fixing this ?
[10:47:36] <WormDrink> basically I want to change the locations of the shards
[10:48:10] <WormDrink> I see there is some info regarding this in echo 'db.shards.find()' | mongo localhost:27219/config (localhost:27219is config db server)
[10:51:35] <WormDrink> Ok I will just update the db.shard - I think this is appropriate
[11:54:57] <WormDrink> is there any risk of losing data when creating a replica set
[13:29:02] <imbusy> I've set up two load balanced apache servers as workers(16 threads per process, ~25 processes) to run mod_wsgi + mongoengine + replica set with two instances. I'm stress testing the setup and the logs of the primary mongodb instance say that there are ~530 connections open. I'm calling connection.end_request() every time the rendering of the page is done. After a while, if no requests to the servers are made, the database stops responding, but t
[13:29:02] <imbusy> here still remain ~530 open connections. When i restart the apache instances, the connections drop and everything is fine again.
[13:29:11] <imbusy> Question is: is the connection timing out somewhere? why does the database stop responding after not making any requests for a while?
[13:31:35] <imbusy> it only takes about five minutes for the database to stop responding
[13:57:49] <ekristen> I'm new to mongodb -- I'm working on setting up high availability
[14:00:23] <ekristen> I'm planning on two replicasets, my understanding for data to be sync'd between the two replicasets is to have have each collection shard'd is that correct?
[14:05:52] <NodeX> 3 shards = the data is split into 3 machines
[14:06:03] <NodeX> those m,achines are replicated 3 times giving you nine nodes
[14:06:28] <NodeX> this means that each machine (roughly) holds 1/3rd of your data
[14:07:46] <NodeX> you have to decide if you need to or want to shard and also whether high availability in your case is for your entire database or if you're okay with parts of it being available
[14:08:54] <ekristen> NodeX: ok, so let me make sure I understand, the shard is the horizontal in that link, IE between the replica sets?
[14:18:23] <NodeX> if you have 3 replicas of your data and the master goes down then an election is held and one of the replicas takes the master's place
[14:18:38] <NodeX> and so on - but you NEED an odd number of replica sets for this to happen
[14:19:28] <NodeX> if you shard the data and replicate the shards then the same principal applies
[14:20:11] <NodeX> if you need either 100% or 0% then 9 replicasets is better than 3 shards + 3 Replicas for each shard
[14:23:00] <NodeX> you dont gain and read scale with RS though iirc
[14:24:05] <NodeX> I can't remember if you can force a read from a RS or not, prolly best to ask someone who does RS alot
[14:26:12] <ElGrotto> nodex: I think you can, but I'm a noobie who only heard about the existance of nosql yesterday lol.. referencing http://docs.mongodb.org/manual/core/replication/ tho in the Consistency section)
[14:28:42] <imbusy> i'm using pymongo+apache mod_wsgi. I open a connection and it stays open. I call end_request() when I finish processing. If the connection stays idle for about 3 minutes, I can not send any more requests through it, the database simply does not respond. Does anyone have a clue what is going on?
[14:32:38] <ekristen> do you need a mongo config server if you aren't using shards?
[14:33:04] <ElGrotto> I have what's prob a noob question tho; is it possible to have more than one server accepting writes in a multiple server setup? Assume I can ensure in software that they don't step on each other's toes (say, software instance 1 will only write to rows where some column A=1, instance 2 where A=2, etc).. I'm happy with reads being "eventually consistent".
[14:34:02] <kali> ElGrotto: you more or less described the behaviour of a sharded mongodb setup
[14:35:13] <ElGrotto> ah see I think I may have joined just a minute too late in the conversation above :) hehe
[14:41:34] <_m> imbusy: By default Mongo kills a cursor after 10 minutes of inactivity.
[14:42:05] <_m> imbusy: You can pass timeout=False to find() to disable cursor timeouts entirely
[14:42:27] <imbusy> I can't make any new find() calls
[14:42:39] <ekristen> if you have 3 shards and you shard a collection, that means that collection is chunked across all three shards?
[14:43:52] <imbusy> i'm playing with connectTimeoutMS setting it to 30000 right now, but it's not helping - the connections stay open way past the 30s limit
[14:44:35] <imbusy> 5 minutes in, still 614 connections open (~800 were open during peak time) and the database is not responding to any more calls
[14:49:20] <imbusy> i've restarted apache and there's still >500 connections still open
[14:53:44] <imbusy> i don't think i completely understand how connection pooling works accross threads. where is the pool stored?
[15:14:20] <ElGrotto> kali: ty, and sorry for the delayed response, I was re-reading about sharding.. my understanding is that for that to work, A must be a primary key (it isn't - I want all A=1's to go to one server regardless of their _id).. and I want to choose which server.. let me give a more concrete example.. if A is the user's location, and 1 is France say, and 2 is USA, I would want all A=1 writes to go to the server located in france. I also do want to read
[15:14:20] <ElGrotto> from any server, but I'm happy with the A != 1 servers being a little behind. My understanding is that sharding works by splitting the primary key into chunks and distributing those chunks, which is not the same :S
[15:17:39] <ElGrotto> .. also the french servers would connect to the french instance of mongodb and only write to rows where A=1. Maybe I'm just getting myself confused here :S
[15:21:21] <kali> ElGrotto: ok, several things here. i thnk you can do the chunking yourself (you need to check that)
[15:22:27] <kali> ElGrotto: and there are some new features that I only know from the changelog because they're useless to me at the current time around rack, tagging and datacenter awareness, so maybe you can use these to get what you're looking for
[15:33:16] <aster1sk> Greetings all, attempting to use the lovely new aggregate framework to sum some page view counts. Only problem is I wont have a list of the page id's to group. Here's an example :
[15:34:14] <aster1sk> a : { p : { 1234 : { v : 5 }, 4321 : { v : 16 } } }
[15:34:56] <ekristen> at what point is it smart to shard? I know thats a very subjective question, but I am trying to get an understanding when it is overkill
[15:35:31] <ElGrotto> from the manual, shard when you anticipate more data than your server can handle (in ram or on disk)..
[15:35:44] <ElGrotto> but shard before you reach the limits not after :)
[15:36:12] <ekristen> ElGrotto: so, if I anticipate 1mil documents in a collection with a total size of 4gb and I only have 4gb of ram, I should shard?
[15:36:38] <ElGrotto> uh no I think it's referring to working set.. I'll point ya at the page.. sec
[15:36:55] <NodeX> your indexes should fit into ram
[15:36:59] <NodeX> and if possible your working set
[15:40:29] <zastern> I had an unclean mongodb shutdown - I'm trying to get it started with mongod repair, but I'm getting things like exception: "dbpath (/data/db/) does not exist". I'm pretty sure that's not where my database is stored . . . but there's nothing in the man page to tell me how to specify a config, location, etc. Mongo 1.8.x
[15:40:47] <NodeX> zastern : ehck your mognodb.conf
[15:41:08] <aster1sk> The Aggregate Framework documentation is unclear whether it is possible to sum subdocuments of unknown id's
[15:41:17] <zastern> but i have no way of specifying that, because the man page is blank...
[15:41:22] <ElGrotto> ekristen: http://docs.mongodb.org/manual/core/sharding/ section "Indications" -- sorry the last page wasn't quite the right one :S
[15:41:27] <zastern> the man page just says "dont call mongod directly"
[15:43:19] <zastern> but i have no idea how to tell mongo to use that config when i start it from the command line, because the man pages are basically empty
[15:44:38] <zastern> logs now say remove log file and run repair. which is weird, because this page - http://www.mongodb.org/display/DOCS/Durability+and+Repair - says DONT remove the lock file
[15:59:15] <aster1sk> Using the aggregate framework is it possible to (I'm using PHP) match : _id : array(1,2,3,4) etc? OR must I append and $and?
[16:01:49] <syskk> ahh it seems the order is indeed important
[16:02:25] <ekristen> can I enable sharing but only have 1 shard to start and then add additional shards as I need them or is it easier to just enable sharding and scale when the time comes to use it?
[16:02:26] <aster1sk> I'd like to $group and $sum where a document key matches multiple values.
[16:05:40] <aster1sk> aggregate : { pages : { 1234 : { views : 1 }, 4321 : { views : 5 } } } <--- Is it even possible to sum views in this scenario?
[16:06:20] <aster1sk> Using the aggregate framework that is.
[16:18:13] <syskk> I have this object: { "_id" : ObjectId("506c5c39733b7e89e13a9222"), "name" : "Test", "passports" : [ { "a" : "b", "c" : "d" } ] }
[16:18:23] <syskk> db.users.find({passports:{a:"b", c:"d"}}); returns the object
[16:18:31] <syskk> but db.users.find({passports:{c:"d",a:"b"}}); doesn't
[16:18:44] <syskk> is that expected behavior? field order is important?
[16:20:30] <imbusy> why does the database stop responding if I keep the connection open for a few minutes? I'm using pymongo
[16:23:25] <ekristen> do I need to do anything special with my rails application to work with mongodb that has implemented shards?
[16:23:40] <aster1sk> The driver should handle that sir.
[16:43:02] <kali> ag4ve: seriously no. the whole point of the document model is too free you from the schema.
[16:43:22] <kali> ag4ve: anything you want to implement as control has to go application side
[16:43:26] <ag4ve> what i want to do is grab generally standard json data and shove it into a store. however, in order to be able to write better queries, i'd like some type of report that a different schema was used
[16:44:25] <ekristen> how much is the config server used in a shard setup, can it be small server or does it need to be as large as or the same size as the nodes being used in a shard cluster?
[16:44:58] <kali> ekristen: it can be quite small, but you want three of them
[16:45:03] <ag4ve> oh, so the schema that mongoose uses is purely sugar and has nothing to do with mongo... i know i read that, i guess i just didn't get it
[16:45:21] <kali> ekristen: you can piggy back the config server on a data node
[16:46:30] <imbusy> it's fucking windows azure, i knew it
[16:47:05] <kali> ag4ve: yep, the closest thing mongodb has to a schema is the list of index and their properties
[16:47:26] <ekristen> kali: I am planning the infrastructure for a large application (50k users in first year, 200k second year or more) -- can I stand up one replicate set as a single shard, with a single config server, so that later when I need to expand I just need to add additional shards, or do I need to leave shard disabled until I am ready to have more than one shard?
[16:48:01] <kali> ekristen: you need 3 config server from day one (for reliability)
[16:48:33] <Derick> but yes, you can start out with one shard
[16:48:44] <ekristen> kali: so I could have 3 config servers, 1 shard (consisting of 3 nodes in 1 replica set) to start, then expanding later would be easier yes?
[16:48:46] <kali> erkules|away: as for the sharding, it is quite relatively easy to add it beforehand, or to start with a single shard
[16:51:43] <kali> ekristen: make it clear with your dev that you plan to do that though. and think about what your sharding key will be. sharding break a few features that are easy to avoid ( group() opertors )
[16:51:46] <aster1sk> I need to query multiple values on the same key find( { i : [ 1,2,3,4,5 ] } ) -- amidoingitright?
[16:52:00] <aster1sk> Doesn't seem to work, I think I may be missing an $and
[16:52:18] <kali> aster1sk: maybe what you want is $in
[16:55:43] <ElGrotto> I'm always doing alt+f, s .. while editing wiki pages :-S
[16:56:13] <ElGrotto> it's save, just not the save i want in my web browser :P
[16:56:23] <aster1sk> local GNU screen + ssh irssi screen... get's a little wacky with the shortcuts.
[16:57:05] <aster1sk> Didn't want to flood other channel with /join -window #mongodb (you guys deserve the utmost attention) therefore I'm using pidgin for this window... too many things.
[16:58:58] <Derick> gnu screen with irssi is awesome :P
[16:59:01] <ElGrotto> ok I've been reading about shards for a looong while now and the one thing I've discovered for certain.. it gives me a headache :S
[16:59:53] <aster1sk> Derick took a while to get used to keybindings + dvorak.
[17:00:19] <kali> ElGrotto: if i may... don't overthink it right now. your job right now is to get the right features for your 50k. goind from 50 to 200k is next year problem
[17:00:41] <ElGrotto> uh no that's ekristen's problem lol
[17:00:46] <aster1sk> Luckily for you mongo is flexible enough to scale / migrate with you.
[17:01:05] <kali> ha, sorry. too many conversation in //
[17:01:18] <ElGrotto> I'm trying to figure out whether it's possible to have a database spread over geographic locations..
[17:02:03] <kali> well, maybe the same comment apply to you :)
[17:02:11] <ekristen> kali: I shouldn't be taking next years problem into consideration now?
[17:02:19] <ElGrotto> remembering I hadn't ever heard about mongo/nosql until yesterday, my mind has gone racing over the possibilities..
[17:03:06] <aster1sk> I've spent nearly 1.5 months just reading the docs / practising... still feel total n00b
[17:03:10] <ElGrotto> wondering whether it could achieve stuff I'd found impossible with sql, because of its lack of tables
[17:03:14] <aster1sk> And I'm the one implementing the entire stats stack here.
[17:03:48] <kali> ekristen: there is a compromise to find. in my current job, we have worried about scalabitly much too early, and implementing anything became complex much too early
[17:04:46] <aster1sk> Our CTO really wants read preference so M/R is out of the question (can't run command on master) so I'm hustlin trying to figure out aggregation framework for as many queries as I can.
[17:04:48] <kali> depends what they do. but 50k is not that much
[17:05:35] <kali> aster1sk: m/r in mongodb is a bit of a last resort system, imho
[17:05:39] <aster1sk> Oh totally, however the framework wasn't available during initial R&D - so I had to change a lot of stuff along the way (esp 2.1 release).
[17:05:43] <ekristen> Derick: I am trying to figure that out right now
[17:07:28] <ElGrotto> that sounds like hacking a solution.. making "improvements" as you find out you're reaching a limit
[17:07:30] <aster1sk> Funny when experimenting with our data model I cooked up a few hairbrained solutions to deal with not having to use that silly positional operator... turns out now I'm stuck in a bit of a quandary.
[17:09:03] <kali> ElGrotto: it was not a comment to be taken literraly. i'm just commenting on a real life experience and advising against premature optimization and overdesign
[17:09:08] <aster1sk> Came back to bite me now though, I can't $sum a $group because there's no finite key of what to $group... so I'll either have to array_sum() it in the model or keep digging... I'm running out of steam.
[17:09:33] <Derick> aster1sk: or you change your data schema
[17:09:52] <aster1sk> I simply cannot think of a better way to represent this data.
[17:10:09] <aster1sk> This one's nearly two months in the making... perhaps I have tunnel vision.
[17:10:44] <Derick> it evolves depending on how you query it too...
[17:15:12] <ElGrotto> leads to a bit of a question tho.. the integer equivalent in mongo (I've not seen anything on datatypes if they even exist), is it 32 or 64 bit?
[17:15:51] <aster1sk> Derick would you know how to $group $sum this : aggregate : { pages : { 1234 : { views : 1 }, 4321 : { views : 3 } } } ?
[17:15:57] <Derick> both are supported as two types ElGrotto
[17:16:27] <Derick> aster1sk: you shouldn't have non-descriptive keys like "1234" and "4321"...
[17:21:12] <Derick> aster1sk: yeah, keep one for the view separately? Might be better as it would avoid moving things around on disk if the document keeps growing
[17:21:16] <aster1sk> That's logical - I will test in shell.
[17:21:48] <aster1sk> Yeah, I know what you're getting at though.
[17:22:05] <Derick> http://www.mongodb.org/display/DOCS/Updating#Updating-The%24positionaloperator has a similar example actually :)
[17:22:12] <aster1sk> I want to be as forthcoming as possible with the data model (it's far more complicated than that) but boss would get mad if I share the damn thing.
[17:22:49] <aster1sk> I also have have subdocuments under pages describing unique views... it gets pretty scary.
[17:23:05] <Derick> i can only help you with things you share :-)
[17:23:21] <aster1sk> I know, let me see what I can do.
[17:27:41] <ekristen> kali: I don't need config server or mongos if I am not using shards correct?
[17:28:22] <FraFraFra> morning guys! I'm looking for a good strategy for finding intersection between two huge collections. Huge = ~2M of docs. Collection 1 contains all ids, Collection 2 contains let say 200k missing ids from Collection 1. I would find these 200k
[17:36:34] <kali> FraFraFra: you'll have to do that out of mongo. i would export the ids from both collection with mongoexport, then sort and join them in shell
[17:36:41] <crudson> FraFraFra: if ids are unique within each collection, perform two map reduces, the second of which uses {:out => {:reduce => 'collection'}}. Map key will be the id, and value will be a count.
[17:37:42] <crudson> FraFraFra: providing the second is a subset of the first.
[17:44:42] <FraFraFra> crudson: ids are the same, I'm using the collection numeber 2 just a temporary collection where I stored just the complete list of Ids cames from Collection 1, minus the missing ids
[19:05:52] <jmpf> I keep getting this - http://pastie.org/private/yuc0bphpqk3342qudhjuaw on 2.2.0 - but both the server and the one backing it up have tons of space - any other gotchas to be aware of?
[19:53:28] <airportyh> Hello all, is it possible in the aggregation framework to group by values in an array field?
[20:00:17] <crudson> airportyh: yeah, use $unwind first to flatten the array
[20:00:34] <airportyh> crudson: thanks, let me try
[20:01:56] <ezakimak> question: it says in the manual under Dot Notation vs Subobjects that key order must be the same, and that ... "This can make subobject matching unwieldy in languages whose default document representation is unordered.". However, in the manual page for updating in the "Field (re)order" section it says: "There is no guarantee that the field order will be consistent, or the same, after an update"
[20:01:57] <bhosie> i have an admin user that i log in using this: use admin -> db.auth({'user':'pass'}) after setting up a replica set, i can no longer log in. are auth and the keyfile required by replication incompatible?
[20:02:00] <ezakimak> how can it expect the client to test w/the "correct" key order if the server won't guarantee that it will preserve the same order it was originally written with?
[20:09:25] <airportyh> Can you group by a composite key?
[20:09:43] <airportyh> again, with aggregation framework
[20:10:08] <airportyh> so like _id: ['$key_one', '$key_two']
[20:15:30] <bhosie> oh crap. nevermind me. it's been a long day and i see my obvious syntax error
[20:35:42] <airportyh> Hello all, I am writing a batch job that uses the aggregation framework to summarize some data, but after getting the response from aggregate, what is a good way to delete all the data I have processed so far?
[20:36:05] <airportyh> Without having to worry about race conditions with other processes
[20:49:09] <leehambley> should the server delete the database and index files when I drop the database ?
[20:49:40] <leehambley> I'm on a MBA, and the 20Gb of files that mongo creates when my tests run are sortof out of control, I expected deleting the database would take care of those