PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Wednesday the 3rd of October, 2012

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[02:05:14] <ketema> looking for help and discussion on a question I posted: http://kjkh.me/SBsIQd Thanks!
[02:06:17] <mrpro> whats with the link?
[02:10:49] <ketema> mrpro: sorry are links bad etiquette ?
[02:14:33] <mrpro> whats kjkh.me
[02:15:04] <ketema> mrpro: its just a bit.ly shortened link kjkh are my initials
[02:53:29] <jwilliams> we get the following errors Assertion: 13282:Couldn't load a valid config for after 3 attempts. Checking user group shows that upgrading to 1.8 (this seems an 1.6 bug) would fix the problem. but we use 2.0.1.
[02:53:56] <jwilliams> Any other possible root cause?
[05:00:17] <meson10> How does connection Pooling work.. Is it suggested to create a new connection per request or should it be cached at module level and reUsed ?
[05:11:15] <andoriyu> is there a way to make bulk update.
[05:11:15] <andoriyu> (not just update all documents with one function, but update each document separetly with different values)
[05:12:57] <meson10> andoriyu: No. You can look at the aggregation framework, or write server side JS code.
[05:23:41] <andoriyu> meson10, nah, I will just update each document separetly
[05:28:53] <whatasunnyday> Hi! I was reading the mongo manual and I have a simple question. If I make an index that is unique and I insert a document that has that field repeated, will it not be inserted into the collection? Will it return an error?
[05:29:18] <whatasunnyday> Or will it appear stored in the collection but not accessible unless I remove the index?
[05:34:27] <whatasunnyday> Actually, I understand now.
[05:34:30] <whatasunnyday> No problems.
[06:11:45] <whatasunnyday> hey
[06:11:51] <whatasunnyday> i have a quick question on sharding
[06:12:14] <whatasunnyday> what is the "right" cardinality when sharding?
[06:12:54] <whatasunnyday> http://www.mongodb.org/display/DOCS/Choosing+a+Shard+Key i was thinking that time would be an example of bad sharding but it doesn't seem to be the case. if you shard by one attribute and that attribute grows beyond 64 mb, does it mean it doesn't create another shrad?
[06:19:59] <michaeltwofish> I've just upgraded a server to use MongoDB 2.2.0 from 1.8.3. Starting the shell shows the correct version but db.serverStatus() still shows 1.8.3. Is this cause for concern and something is screwed up?
[06:25:46] <iksik> morning
[06:49:48] <iksik> hmm, still testing and learning about replSets... yesterday, when i was removing master nodes one by one, 'primary' node was selected after less then 1 sec(?) today i'm looking on replset status table and hmm http://i.imm.io/Gw65.png - no primary node o.O
[06:50:40] <iksik> any ideas why?
[06:50:57] <oskie> iksik, do you have an arbiter?
[06:51:17] <iksik> oskie: no
[06:51:20] <oskie> or uneven number of members?
[06:51:29] <oskie> odd :)
[06:51:45] <iksik> as You can see... there is 6 nodes
[06:52:15] <iksik> but, yesterday 3 of them were turned off
[06:52:28] <oskie> maybe there isn't consensus to elect a node as primary. the log should probably tell
[06:53:00] <iksik> ok, i'll try to look at the logs ;-)
[06:53:42] <oskie> i'm guessing you'll need 4 nodes in a 6 node replica set in order to get consensus
[06:54:09] <oskie> so you'd always want an odd number of nodes (counting arbiter) in replica sets
[06:54:48] <iksik> hum
[06:55:15] <iksik> so how can i deal with it when random number of nodes goes down? (for examples, an OS crash)
[06:56:19] <oskie> for instance, a 5 node replica set would survive intact as long as 3 or more nodes are up
[06:56:58] <oskie> personally i don't see why you'd want to have anything other than 3 node replica sets
[06:57:45] <oskie> with either (3 data nodes) or (2 data+1 arbiter)
[06:58:01] <iksik> my target replset will contain only 2 nodes
[06:58:23] <iksik> this current setup is just for testing and learning purpose
[07:00:53] <oskie> even number of nodes is the reason you are not getting a PRIMARY elected
[07:01:28] <iksik> hm, currently 3 nodes are up and running ;-D
[07:01:48] <oskie> the total number of nodes is important, not the number of up and running
[07:01:53] <iksik> oh
[07:01:58] <iksik> got it
[07:03:12] <iksik> oskie: one more question... how many resources can be used by an arbiter instance? cpu/mem/bandwidth?
[07:03:43] <iksik> in target arch i could set it up on machine used by postgresql server i think
[07:08:16] <iksik> heh, 4th node is up, and primary was selected ;-D
[07:15:00] <ezakimak> question: it says in the manual under Dot Notation vs Subobjects that key order must be the same, and that ... "This can make subobject matching unwieldy in languages whose default document representation is unordered.". However, in the manual page for updating in the "Field (re)order" section it says: "There is no guarantee that the field order will be consistent, or the same, after an update"
[07:15:31] <ezakimak> this seems to be an inconsistency...
[07:16:34] <ezakimak> how can it expect the client to test w/the "correct" key order if the server won't guarantee that it will preserve the same order it was originally written with?
[07:34:47] <[AD]Turbo> hi there
[08:09:49] <arussel> I have a field with a set in it. What is the query to check if an element is part of it ?
[08:10:04] <arussel> someghint like contains or ismember
[08:11:03] <MongoDBIdiot> $in
[08:12:27] <MongoDBIdiot> mongodb has nothing like a "set"
[08:12:56] <arussel> I thougth $in was the opposit like "a" $in ("a","b","c")
[08:13:10] <arussel> I'm looking for ("a","b","c") contains "a"
[08:15:02] <MongoDBIdiot> please?
[08:15:05] <NodeX> $in : [1,2,3]
[08:16:05] <NodeX> arussel : if you have an array then you can just do foo:"bar" and it will match ["bar","baz","joe"]
[08:17:45] <arussel> nice, thanks
[08:18:27] <NodeX> if you want exact matches you need to pass it the array
[08:18:47] <NodeX> for example if you have [1,2] and you want only docs with 1,2 then it would be foo:[1,2]
[08:18:57] <NodeX> that's 1 and 2 not 1 or 2
[08:21:22] <arussel> that was the question I was going to ask :-)
[08:35:20] <Lujeni> Hello - someone can explain what does this error mean pls ? http://sebsauvage.net/paste/?fd747367994a0cd3#xwA7ebeaAIPPg2cOmCeAz4b2lAgBH1jt9y6LbARX1Mg=
[08:37:58] <kali> Lujeni: don't worry about it.
[08:38:38] <kali> Lujeni: it just means you have an index that has empty physical pages. it happens all the time in collection with heavy write/update/delete rate
[08:38:49] <kali> Lujeni: i litterally have millions of them in my logs
[08:38:58] <Lujeni> kali, ok thx :)
[10:46:01] <WormDrink> Hi, I'm using mongodb 1.8 - with following config (http://pastebin.com/SasVcUHK) - I then stop all replica set memebers except the ones with id 3 (status afterward can be seen here: http://pastebin.com/ys4pKnWJ) - then at this point in time I get error whenever I try connect to router : uncaught exception: error { "$err" : "socket exception", "code" : 11002 }
[10:46:11] <WormDrink> how do I go about fixing this ?
[10:47:36] <WormDrink> basically I want to change the locations of the shards
[10:48:10] <WormDrink> I see there is some info regarding this in echo 'db.shards.find()' | mongo localhost:27219/config (localhost:27219is config db server)
[10:51:35] <WormDrink> Ok I will just update the db.shard - I think this is appropriate
[11:54:57] <WormDrink> is there any risk of losing data when creating a replica set
[13:29:02] <imbusy> I've set up two load balanced apache servers as workers(16 threads per process, ~25 processes) to run mod_wsgi + mongoengine + replica set with two instances. I'm stress testing the setup and the logs of the primary mongodb instance say that there are ~530 connections open. I'm calling connection.end_request() every time the rendering of the page is done. After a while, if no requests to the servers are made, the database stops responding, but t
[13:29:02] <imbusy> here still remain ~530 open connections. When i restart the apache instances, the connections drop and everything is fine again.
[13:29:11] <imbusy> Question is: is the connection timing out somewhere? why does the database stop responding after not making any requests for a while?
[13:31:35] <imbusy> it only takes about five minutes for the database to stop responding
[13:31:41] <imbusy> maybe even less
[13:47:07] <syskk> any non web based MongoDB GUI that works on OS X? which one do you recommend?
[13:48:01] <syskk> has anyone used MongoHub?
[13:57:27] <ekristen> hello
[13:57:49] <ekristen> I'm new to mongodb -- I'm working on setting up high availability
[14:00:23] <ekristen> I'm planning on two replicasets, my understanding for data to be sync'd between the two replicasets is to have have each collection shard'd is that correct?
[14:02:49] <NodeX> sharding is not replicasets
[14:03:08] <NodeX> a replica set is a copy of another replicaset's data
[14:03:25] <NodeX> you can have a shard with a replicaset (copy) of itsself
[14:03:45] <NodeX> sharding is when you shard / fragment your data across multiple machines
[14:04:44] <ekristen> NodeX: I'm looking at http://www.mongodb.org/display/DOCS/Simple+Initial+Sharding+Architecture
[14:05:21] <ekristen> correct me if I am wrong but each vertical replicate each other
[14:05:38] <ekristen> and horizontally they are shared?
[14:05:41] <ekristen> shard'd
[14:05:52] <NodeX> 3 shards = the data is split into 3 machines
[14:06:03] <NodeX> those m,achines are replicated 3 times giving you nine nodes
[14:06:28] <NodeX> this means that each machine (roughly) holds 1/3rd of your data
[14:07:46] <NodeX> you have to decide if you need to or want to shard and also whether high availability in your case is for your entire database or if you're okay with parts of it being available
[14:08:54] <ekristen> NodeX: ok, so let me make sure I understand, the shard is the horizontal in that link, IE between the replica sets?
[14:09:41] <NodeX> no, the shard is the shard]
[14:09:58] <NodeX> a shard is a part of somehting - a segment if you wish
[14:10:21] <ekristen> oh
[14:10:22] <ekristen> I see
[14:10:28] <NodeX> lets say you have users Alex, Bertie, Charlie
[14:10:32] <ekristen> light bulb moment
[14:10:41] <NodeX> Alex might live on one shard, Bertie on another, CHarlie on another
[14:10:44] <ekristen> http://www.mongodb.org/display/DOCS/Sharding+Introduction <-- architecture overview
[14:11:51] <NodeX> if you can live with parts of your data being offline then shard + replicaset is the highest availability
[14:12:14] <ekristen> no, I can't
[14:12:16] <NodeX> because it ensures at least X percent is online (ruling out power failure in 3 data centers)
[14:12:38] <NodeX> so you need All of your app or none of your app online ?
[14:12:51] <NodeX> I mean, Either all of it online or none of it online
[14:13:32] <ekristen> NodeX: can I have a replicate set with multiple nodes, spread across different data centers to retain 100% uptime?
[14:13:55] <NodeX> you can replicate as much as you like
[14:14:03] <NodeX> it's just another copy of the data
[14:14:44] <NodeX> analogus to master->slave replication in Mysql (for the most part)
[14:14:51] <ekristen> if I shard an entire database across multiple replica sets doesn't that mean id have 100% up time?
[14:15:04] <NodeX> you just said you dont want sharding
[14:15:17] <NodeX> sharding splits the data into chunks
[14:15:21] <NodeX> you said all or nothing
[14:15:55] <ekristen> NodeX: correct, I'm just trying to make sure I understand how it works properly
[14:16:04] <NodeX> sharding != replication
[14:16:40] <NodeX> the more shards / replicas you have then the higher the availability
[14:16:56] <NodeX> if however you need 100% of your data then sharding is not for you
[14:17:15] <NodeX> sharding will allow the app to continue if one server fails
[14:17:50] <ekristen> NodeX: if one node in a replica set fails it goes down?
[14:17:55] <NodeX> replication will allow the WHOLE app to continue
[14:18:02] <ekristen> oh ok
[14:18:23] <NodeX> if you have 3 replicas of your data and the master goes down then an election is held and one of the replicas takes the master's place
[14:18:38] <NodeX> and so on - but you NEED an odd number of replica sets for this to happen
[14:19:28] <NodeX> if you shard the data and replicate the shards then the same principal applies
[14:20:11] <NodeX> if you need either 100% or 0% then 9 replicasets is better than 3 shards + 3 Replicas for each shard
[14:20:30] <ekristen> NodeX: ok, thanks!
[14:20:33] <ekristen> that helps alot
[14:20:53] <NodeX> some people can live with their app having 75% if data in worst case scenarios
[14:21:02] <NodeX> or 50% or w/e - at least the app still runs
[14:21:23] <NodeX> imo, if you're unlucky enough to have 3 or more RS fail then you might aswel call it a day!
[14:21:35] <ekristen> NodeX: sure, makes sense, if there are certain collections that are only needed for lets say administrative use vs regular use
[14:22:01] <NodeX> you do pay a price for Replicasets (RS) though, they'll need to be the same size boxes to the master
[14:22:11] <NodeX> as they may need to take over all the queries
[14:22:42] <ekristen> yeah
[14:23:00] <NodeX> you dont gain and read scale with RS though iirc
[14:24:05] <NodeX> I can't remember if you can force a read from a RS or not, prolly best to ask someone who does RS alot
[14:26:12] <ElGrotto> nodex: I think you can, but I'm a noobie who only heard about the existance of nosql yesterday lol.. referencing http://docs.mongodb.org/manual/core/replication/ tho in the Consistency section)
[14:26:43] <NodeX> cool, good to know
[14:28:42] <imbusy> i'm using pymongo+apache mod_wsgi. I open a connection and it stays open. I call end_request() when I finish processing. If the connection stays idle for about 3 minutes, I can not send any more requests through it, the database simply does not respond. Does anyone have a clue what is going on?
[14:32:38] <ekristen> do you need a mongo config server if you aren't using shards?
[14:32:48] <kali> nope
[14:33:04] <ElGrotto> I have what's prob a noob question tho; is it possible to have more than one server accepting writes in a multiple server setup? Assume I can ensure in software that they don't step on each other's toes (say, software instance 1 will only write to rows where some column A=1, instance 2 where A=2, etc).. I'm happy with reads being "eventually consistent".
[14:34:02] <kali> ElGrotto: you more or less described the behaviour of a sharded mongodb setup
[14:35:13] <ElGrotto> ah see I think I may have joined just a minute too late in the conversation above :) hehe
[14:41:34] <_m> imbusy: By default Mongo kills a cursor after 10 minutes of inactivity.
[14:42:05] <_m> imbusy: You can pass timeout=False to find() to disable cursor timeouts entirely
[14:42:14] <_m> Reference: http://api.mongodb.org/python/current/faq.html#how-do-i-change-the-timeout-value-for-cursors
[14:42:21] <imbusy> it's not the cursor
[14:42:27] <imbusy> I can't make any new find() calls
[14:42:39] <ekristen> if you have 3 shards and you shard a collection, that means that collection is chunked across all three shards?
[14:43:52] <imbusy> i'm playing with connectTimeoutMS setting it to 30000 right now, but it's not helping - the connections stay open way past the 30s limit
[14:44:35] <imbusy> 5 minutes in, still 614 connections open (~800 were open during peak time) and the database is not responding to any more calls
[14:49:20] <imbusy> i've restarted apache and there's still >500 connections still open
[14:53:44] <imbusy> i don't think i completely understand how connection pooling works accross threads. where is the pool stored?
[14:57:21] <imbusy> calling end_request doesn't happen
[14:57:35] <imbusy> s/happen/help
[15:14:20] <ElGrotto> kali: ty, and sorry for the delayed response, I was re-reading about sharding.. my understanding is that for that to work, A must be a primary key (it isn't - I want all A=1's to go to one server regardless of their _id).. and I want to choose which server.. let me give a more concrete example.. if A is the user's location, and 1 is France say, and 2 is USA, I would want all A=1 writes to go to the server located in france. I also do want to read
[15:14:20] <ElGrotto> from any server, but I'm happy with the A != 1 servers being a little behind. My understanding is that sharding works by splitting the primary key into chunks and distributing those chunks, which is not the same :S
[15:17:39] <ElGrotto> .. also the french servers would connect to the french instance of mongodb and only write to rows where A=1. Maybe I'm just getting myself confused here :S
[15:21:21] <kali> ElGrotto: ok, several things here. i thnk you can do the chunking yourself (you need to check that)
[15:22:27] <kali> ElGrotto: and there are some new features that I only know from the changelog because they're useless to me at the current time around rack, tagging and datacenter awareness, so maybe you can use these to get what you're looking for
[15:33:16] <aster1sk> Greetings all, attempting to use the lovely new aggregate framework to sum some page view counts. Only problem is I wont have a list of the page id's to group. Here's an example :
[15:34:14] <aster1sk> a : { p : { 1234 : { v : 5 }, 4321 : { v : 16 } } }
[15:34:37] <aster1sk> so I need to sum a.p.$ID.v
[15:34:56] <ekristen> at what point is it smart to shard? I know thats a very subjective question, but I am trying to get an understanding when it is overkill
[15:34:59] <ekristen> or when it is not
[15:35:31] <ElGrotto> from the manual, shard when you anticipate more data than your server can handle (in ram or on disk)..
[15:35:44] <ElGrotto> but shard before you reach the limits not after :)
[15:36:12] <ekristen> ElGrotto: so, if I anticipate 1mil documents in a collection with a total size of 4gb and I only have 4gb of ram, I should shard?
[15:36:38] <ElGrotto> uh no I think it's referring to working set.. I'll point ya at the page.. sec
[15:36:55] <NodeX> your indexes should fit into ram
[15:36:59] <NodeX> and if possible your working set
[15:38:33] <ElGrotto> ekristen: http://docs.mongodb.org/manual/faq/sharding/
[15:40:29] <zastern> I had an unclean mongodb shutdown - I'm trying to get it started with mongod repair, but I'm getting things like exception: "dbpath (/data/db/) does not exist". I'm pretty sure that's not where my database is stored . . . but there's nothing in the man page to tell me how to specify a config, location, etc. Mongo 1.8.x
[15:40:47] <NodeX> zastern : ehck your mognodb.conf
[15:40:50] <NodeX> mongodb.conf
[15:41:06] <zastern> NodeX: yep, it's /var/lib/mongodb
[15:41:08] <aster1sk> The Aggregate Framework documentation is unclear whether it is possible to sum subdocuments of unknown id's
[15:41:17] <zastern> but i have no way of specifying that, because the man page is blank...
[15:41:22] <ElGrotto> ekristen: http://docs.mongodb.org/manual/core/sharding/ section "Indications" -- sorry the last page wasn't quite the right one :S
[15:41:27] <zastern> the man page just says "dont call mongod directly"
[15:41:41] <NodeX> do you have a startup script?
[15:42:06] <zastern> NodeX: yes, old-style init script
[15:42:18] <zastern> its ubuntu 10.04 + mongo 1.8 from 10gen repo
[15:42:37] <NodeX> I think the default is /etc/mongodb/mongodb.conf
[15:42:48] <NodeX> nano `locate mongodb.conf`
[15:42:55] <zastern> NodeX: i already found my config
[15:42:58] <zastern> and looked at it
[15:43:19] <zastern> but i have no idea how to tell mongo to use that config when i start it from the command line, because the man pages are basically empty
[15:43:42] <NodeX> echk the init file
[15:43:47] <NodeX> check *
[15:44:02] <aster1sk> `which mongod` -h
[15:44:14] <zastern> alright -f works.
[15:44:16] <zastern> making progress
[15:44:38] <zastern> logs now say remove log file and run repair. which is weird, because this page - http://www.mongodb.org/display/DOCS/Durability+and+Repair - says DONT remove the lock file
[15:44:44] <zastern> remove *lock file
[15:44:57] <NodeX> dont remove it while it's running
[15:44:57] <NodeX> lol
[15:45:06] <NodeX> if it's an old lock file then you need to remove it
[15:45:17] <zastern> NodeX: yes but the docs arent talking about when its running. theyre talking about when it wont start due to unclean shutdown
[15:45:27] <zastern> theyre saying DONT remove the lock file, just run repair
[15:46:20] <NodeX> I always remove it personaly
[15:46:30] <zastern> mm already did. its just a testing server
[15:46:36] <zastern> repair is running now
[15:55:11] <ekristen> do you have define collections to shard, or if you just define sharding for a database will all collections shard?
[15:55:56] <syskk> im trying to match a document with a query selector
[15:56:15] <syskk> db.users.find({passports:{a:"b",c:"d"}});
[15:56:33] <syskk> this works fine when i have a user document whose passports array contains an identical object
[15:56:46] <syskk> however, it doesn't seem to work for more complex objects
[15:56:52] <syskk> is the key order important?
[15:57:01] <syskk> how can i debug this?
[15:59:15] <aster1sk> Using the aggregate framework is it possible to (I'm using PHP) match : _id : array(1,2,3,4) etc? OR must I append and $and?
[16:01:49] <syskk> ahh it seems the order is indeed important
[16:02:25] <ekristen> can I enable sharing but only have 1 shard to start and then add additional shards as I need them or is it easier to just enable sharding and scale when the time comes to use it?
[16:02:26] <aster1sk> I'd like to $group and $sum where a document key matches multiple values.
[16:05:40] <aster1sk> aggregate : { pages : { 1234 : { views : 1 }, 4321 : { views : 5 } } } <--- Is it even possible to sum views in this scenario?
[16:06:20] <aster1sk> Using the aggregate framework that is.
[16:18:13] <syskk> I have this object: { "_id" : ObjectId("506c5c39733b7e89e13a9222"), "name" : "Test", "passports" : [ { "a" : "b", "c" : "d" } ] }
[16:18:23] <syskk> db.users.find({passports:{a:"b", c:"d"}}); returns the object
[16:18:31] <syskk> but db.users.find({passports:{c:"d",a:"b"}}); doesn't
[16:18:44] <syskk> is that expected behavior? field order is important?
[16:20:30] <imbusy> why does the database stop responding if I keep the connection open for a few minutes? I'm using pymongo
[16:23:25] <ekristen> do I need to do anything special with my rails application to work with mongodb that has implemented shards?
[16:23:40] <aster1sk> The driver should handle that sir.
[16:23:49] <ekristen> aster1sk: ok
[16:24:04] <ekristen> aster1sk: can I enable shards but only have 1 shard to start with
[16:24:13] <ekristen> then went I need to scale add additional shards later?
[16:24:23] <ekristen> or is it better to wait until I need shards and then enable it?
[16:24:56] <aster1sk> Logically it's best to plan for sharding before it's too late.
[16:25:13] <ekristen> aster1sk: ok
[16:25:13] <ekristen> thanks
[16:25:15] <aster1sk> But I'm no expert.
[16:25:25] <ekristen> aster1sk: understood
[16:25:39] <ekristen> anyone in here an shard expert? ;-)
[16:26:01] <aster1sk> Anyone here from Toronto going to next meetup?
[16:26:16] <aster1sk> If so when / where, I'd love to attend.
[16:26:41] <ekristen> me too -- I'm not from toronto but I am there frequently
[16:27:43] <meghan> http://www.meetup.com/Toronto-MongoDB-User-Group/
[16:28:30] <aster1sk> Just signed up.
[16:28:47] <aster1sk> Thanks meghan
[16:29:09] <meghan> np
[16:34:17] <imbusy> i ran this script: http://pastebin.com/sd5G2Nb9
[16:34:25] <imbusy> the second attempt never finished
[16:34:35] <imbusy> sorry
[16:35:00] <imbusy> http://pastebin.com/knJdk5Az
[16:35:37] <imbusy> could that be an issue with mongo or is my cloud provider killing connections that don't send anything over for a few minutes?
[16:39:45] <ag4ve> can i input data and have mongo report on the records that deviate from a schema?
[16:40:38] <kali> ag4ve: nope
[16:41:41] <ag4ve> suggestions?
[16:43:02] <kali> ag4ve: seriously no. the whole point of the document model is too free you from the schema.
[16:43:22] <kali> ag4ve: anything you want to implement as control has to go application side
[16:43:26] <ag4ve> what i want to do is grab generally standard json data and shove it into a store. however, in order to be able to write better queries, i'd like some type of report that a different schema was used
[16:44:25] <ekristen> how much is the config server used in a shard setup, can it be small server or does it need to be as large as or the same size as the nodes being used in a shard cluster?
[16:44:58] <kali> ekristen: it can be quite small, but you want three of them
[16:45:03] <ag4ve> oh, so the schema that mongoose uses is purely sugar and has nothing to do with mongo... i know i read that, i guess i just didn't get it
[16:45:21] <kali> ekristen: you can piggy back the config server on a data node
[16:46:30] <imbusy> it's fucking windows azure, i knew it
[16:47:05] <kali> ag4ve: yep, the closest thing mongodb has to a schema is the list of index and their properties
[16:47:26] <ekristen> kali: I am planning the infrastructure for a large application (50k users in first year, 200k second year or more) -- can I stand up one replicate set as a single shard, with a single config server, so that later when I need to expand I just need to add additional shards, or do I need to leave shard disabled until I am ready to have more than one shard?
[16:48:01] <kali> ekristen: you need 3 config server from day one (for reliability)
[16:48:33] <Derick> but yes, you can start out with one shard
[16:48:44] <ekristen> kali: so I could have 3 config servers, 1 shard (consisting of 3 nodes in 1 replica set) to start, then expanding later would be easier yes?
[16:48:46] <kali> erkules|away: as for the sharding, it is quite relatively easy to add it beforehand, or to start with a single shard
[16:49:12] <kali> ekristen: yes
[16:49:28] <kali> erkules|away: sorry, was speaking to ekristen
[16:50:24] <ekristen> kali: sweet, thanks
[16:51:43] <kali> ekristen: make it clear with your dev that you plan to do that though. and think about what your sharding key will be. sharding break a few features that are easy to avoid ( group() opertors )
[16:51:46] <aster1sk> I need to query multiple values on the same key find( { i : [ 1,2,3,4,5 ] } ) -- amidoingitright?
[16:52:00] <aster1sk> Doesn't seem to work, I think I may be missing an $and
[16:52:18] <kali> aster1sk: maybe what you want is $in
[16:52:24] <ekristen> kali: ok thanks
[16:52:36] <aster1sk> I'll give it a shot kali
[16:52:53] <aster1sk> I forgot to mention this is in aggregate() not find()
[16:53:02] <aster1sk> Same thing?
[16:53:09] <kali> aster1sk: i think so
[16:54:24] <aster1sk> :w
[16:54:31] <kali> wrong window
[16:55:03] <ElGrotto> eek, vi!
[16:55:06] <aster1sk> Luckily it wasn't my password.
[16:55:16] <kali> aster1sk: been there, done that
[16:55:43] <ElGrotto> I'm always doing alt+f, s .. while editing wiki pages :-S
[16:56:13] <ElGrotto> it's save, just not the save i want in my web browser :P
[16:56:23] <aster1sk> local GNU screen + ssh irssi screen... get's a little wacky with the shortcuts.
[16:57:05] <aster1sk> Didn't want to flood other channel with /join -window #mongodb (you guys deserve the utmost attention) therefore I'm using pidgin for this window... too many things.
[16:58:58] <Derick> gnu screen with irssi is awesome :P
[16:59:01] <ElGrotto> ok I've been reading about shards for a looong while now and the one thing I've discovered for certain.. it gives me a headache :S
[16:59:53] <aster1sk> Derick took a while to get used to keybindings + dvorak.
[17:00:09] <Derick> hah, lol, yes
[17:00:19] <kali> ElGrotto: if i may... don't overthink it right now. your job right now is to get the right features for your 50k. goind from 50 to 200k is next year problem
[17:00:41] <ElGrotto> uh no that's ekristen's problem lol
[17:00:46] <aster1sk> Luckily for you mongo is flexible enough to scale / migrate with you.
[17:01:05] <kali> ha, sorry. too many conversation in //
[17:01:18] <ElGrotto> I'm trying to figure out whether it's possible to have a database spread over geographic locations..
[17:01:29] <kali> right !
[17:01:41] <ElGrotto> but I think I need to learn to walk before cycling :)
[17:01:43] <kali> interesting too :)
[17:02:03] <kali> well, maybe the same comment apply to you :)
[17:02:11] <ekristen> kali: I shouldn't be taking next years problem into consideration now?
[17:02:19] <ElGrotto> remembering I hadn't ever heard about mongo/nosql until yesterday, my mind has gone racing over the possibilities..
[17:03:06] <aster1sk> I've spent nearly 1.5 months just reading the docs / practising... still feel total n00b
[17:03:10] <ElGrotto> wondering whether it could achieve stuff I'd found impossible with sql, because of its lack of tables
[17:03:14] <aster1sk> And I'm the one implementing the entire stats stack here.
[17:03:48] <kali> ekristen: there is a compromise to find. in my current job, we have worried about scalabitly much too early, and implementing anything became complex much too early
[17:03:49] <ElGrotto> well, sorry s/tables/schemata/
[17:04:10] <kali> ekristen: so there is a sweet spot to find. starting with mongodb is a bette start than we had though :)
[17:04:29] <ekristen> kali: I guess I am trying to figure out if 50k need sharding or not ;-)
[17:04:39] <Derick> I doubt it
[17:04:42] <Derick> 50k is nothing
[17:04:46] <aster1sk> Our CTO really wants read preference so M/R is out of the question (can't run command on master) so I'm hustlin trying to figure out aggregation framework for as many queries as I can.
[17:04:48] <kali> depends what they do. but 50k is not that much
[17:05:04] <Derick> how large are your documents?
[17:05:04] <kali> aster1sk: you CTO is right
[17:05:35] <kali> aster1sk: m/r in mongodb is a bit of a last resort system, imho
[17:05:39] <aster1sk> Oh totally, however the framework wasn't available during initial R&D - so I had to change a lot of stuff along the way (esp 2.1 release).
[17:05:43] <ekristen> Derick: I am trying to figure that out right now
[17:05:49] <aster1sk> I feel the same way kali.
[17:06:33] <ElGrotto> I do disagree with one point though, your *design* has to be scaleable even if your current *implementation* isn't scaled.
[17:06:58] <kali> ElGrotto: your design will have to change as the featureset will evolve
[17:07:06] <kali> ElGrotto: so yes and no :)
[17:07:28] <ElGrotto> that sounds like hacking a solution.. making "improvements" as you find out you're reaching a limit
[17:07:30] <aster1sk> Funny when experimenting with our data model I cooked up a few hairbrained solutions to deal with not having to use that silly positional operator... turns out now I'm stuck in a bit of a quandary.
[17:07:53] <aster1sk> Upserts don't like $
[17:09:03] <kali> ElGrotto: it was not a comment to be taken literraly. i'm just commenting on a real life experience and advising against premature optimization and overdesign
[17:09:08] <aster1sk> Came back to bite me now though, I can't $sum a $group because there's no finite key of what to $group... so I'll either have to array_sum() it in the model or keep digging... I'm running out of steam.
[17:09:33] <Derick> aster1sk: or you change your data schema
[17:09:52] <aster1sk> I simply cannot think of a better way to represent this data.
[17:10:09] <aster1sk> This one's nearly two months in the making... perhaps I have tunnel vision.
[17:10:44] <Derick> it evolves depending on how you query it too...
[17:15:12] <ElGrotto> leads to a bit of a question tho.. the integer equivalent in mongo (I've not seen anything on datatypes if they even exist), is it 32 or 64 bit?
[17:15:51] <aster1sk> Derick would you know how to $group $sum this : aggregate : { pages : { 1234 : { views : 1 }, 4321 : { views : 3 } } } ?
[17:15:57] <Derick> both are supported as two types ElGrotto
[17:16:27] <Derick> aster1sk: you shouldn't have non-descriptive keys like "1234" and "4321"...
[17:16:39] <aster1sk> Bollocks, thought so.
[17:16:44] <Derick> split it into: { pages : { id: 1234, views : 1 } ...
[17:17:02] <aster1sk> But then how would one upsert those?
[17:17:44] <aster1sk> Wait.... that was a stupid question.
[17:17:46] <Derick> probably by using the $ positional operator? I would have to investigate
[17:18:16] <imbusy> is there a way to keep a socket alive through the pymongo driver?
[17:18:19] <aster1sk> Well instead of upserting on the fly, I'd have to find it first then insert / update -- which means two queries.
[17:18:28] <aster1sk> (Which I was trying to avoid)
[17:18:33] <Derick> aster1sk: hmm, you shouldn't
[17:19:05] <Derick> $ can't be used with upsert
[17:19:06] <Derick> meh
[17:19:20] <aster1sk> I mentioned that above, you see my problem now :(
[17:19:58] <Derick> you could do a non-upsert with an $inc though, not?
[17:20:36] <aster1sk> I'm thinking it might be time to separate this schema into different collections as much as I don't want to.
[17:20:43] <Derick> db.foo.update( { page.id : 1234 }, { $inc: 'page.$.views' } );
[17:20:46] <Derick> or something?
[17:21:12] <Derick> aster1sk: yeah, keep one for the view separately? Might be better as it would avoid moving things around on disk if the document keeps growing
[17:21:16] <aster1sk> That's logical - I will test in shell.
[17:21:34] <Derick> $inc syntax was wrong
[17:21:43] <Derick> $inc: { 'pages.$.views' : 1 }
[17:21:48] <aster1sk> Yeah, I know what you're getting at though.
[17:22:05] <Derick> http://www.mongodb.org/display/DOCS/Updating#Updating-The%24positionaloperator has a similar example actually :)
[17:22:12] <aster1sk> I want to be as forthcoming as possible with the data model (it's far more complicated than that) but boss would get mad if I share the damn thing.
[17:22:49] <aster1sk> I also have have subdocuments under pages describing unique views... it gets pretty scary.
[17:23:05] <Derick> i can only help you with things you share :-)
[17:23:21] <aster1sk> I know, let me see what I can do.
[17:27:41] <ekristen> kali: I don't need config server or mongos if I am not using shards correct?
[17:28:22] <FraFraFra> morning guys! I'm looking for a good strategy for finding intersection between two huge collections. Huge = ~2M of docs. Collection 1 contains all ids, Collection 2 contains let say 200k missing ids from Collection 1. I would find these 200k
[17:29:39] <Derick> ekristen: correct
[17:29:50] <ekristen> Derick: thanks
[17:36:34] <kali> FraFraFra: you'll have to do that out of mongo. i would export the ids from both collection with mongoexport, then sort and join them in shell
[17:36:41] <crudson> FraFraFra: if ids are unique within each collection, perform two map reduces, the second of which uses {:out => {:reduce => 'collection'}}. Map key will be the id, and value will be a count.
[17:37:17] <kali> crudson: ho, nice one
[17:37:42] <crudson> FraFraFra: providing the second is a subset of the first.
[17:44:42] <FraFraFra> crudson: ids are the same, I'm using the collection numeber 2 just a temporary collection where I stored just the complete list of Ids cames from Collection 1, minus the missing ids
[18:21:17] <WormDrink> hi
[18:21:19] <WormDrink> how can I do isMaster via shell ?
[18:21:23] <WormDrink> vnm
[19:05:52] <jmpf> I keep getting this - http://pastie.org/private/yuc0bphpqk3342qudhjuaw on 2.2.0 - but both the server and the one backing it up have tons of space - any other gotchas to be aware of?
[19:53:28] <airportyh> Hello all, is it possible in the aggregation framework to group by values in an array field?
[20:00:17] <crudson> airportyh: yeah, use $unwind first to flatten the array
[20:00:34] <airportyh> crudson: thanks, let me try
[20:01:56] <ezakimak> question: it says in the manual under Dot Notation vs Subobjects that key order must be the same, and that ... "This can make subobject matching unwieldy in languages whose default document representation is unordered.". However, in the manual page for updating in the "Field (re)order" section it says: "There is no guarantee that the field order will be consistent, or the same, after an update"
[20:01:57] <bhosie> i have an admin user that i log in using this: use admin -> db.auth({'user':'pass'}) after setting up a replica set, i can no longer log in. are auth and the keyfile required by replication incompatible?
[20:02:00] <ezakimak> how can it expect the client to test w/the "correct" key order if the server won't guarantee that it will preserve the same order it was originally written with?
[20:03:59] <airportyh> crudson: thank worked, thanks!
[20:04:13] <crudson> airportyh: great :)
[20:09:25] <airportyh> Can you group by a composite key?
[20:09:43] <airportyh> again, with aggregation framework
[20:10:08] <airportyh> so like _id: ['$key_one', '$key_two']
[20:15:30] <bhosie> oh crap. nevermind me. it's been a long day and i see my obvious syntax error
[20:35:42] <airportyh> Hello all, I am writing a batch job that uses the aggregation framework to summarize some data, but after getting the response from aggregate, what is a good way to delete all the data I have processed so far?
[20:36:05] <airportyh> Without having to worry about race conditions with other processes
[20:46:09] <airportyh> http://stackoverflow.com/questions/12716599/mongodb-batch-jobs-and-deleting-data
[20:49:09] <leehambley> should the server delete the database and index files when I drop the database ?
[20:49:40] <leehambley> I'm on a MBA, and the 20Gb of files that mongo creates when my tests run are sortof out of control, I expected deleting the database would take care of those