[00:11:50] <multi_io> is there a way to atomically create a document ONLY IF it doesn't exist yet (as determined by a query for it returning nothing)?
[00:12:17] <multi_io> ...but if the query does return something, don't modify the document?
[00:15:24] <richardraseley> I have a question related to sharding in MongoDB. So, I understand that sharding is done at the collection level, does that mean that any fields within any documents within that collection could be on any shard in the MongoDB infrastructure?
[00:16:19] <richardraseley> I didn't know if the document was the most granular unit that could be sharded or if fields within the same document could live in different shards?
[00:17:06] <multi_io> richardraseley: the document is the most granular unit.
[00:17:56] <richardraseley> So, the name of the document is the primary attribute that determines the placement within the parition?
[00:18:51] <richardraseley> Or is there a specific field within the document that is keyed off from?
[00:21:07] <richardraseley> multi_io: So, to clarify - a document can never be spread across multiple chunks
[00:21:25] <richardraseley> And therefore can never exist in multiple shards?
[00:31:26] <richardraseley> so the mongos process acts as a transaction coordinator / proxy for user requests - does that service have to query the config service for each query to determine where the chunks exist in the shards, or do they maintain knowledge of the chunk / shard configuration?
[00:31:54] <multi_io> richardraseley: you choose the field. look for "sharding key" in the docs.
[00:32:11] <richardraseley> multi_io: I see - thank you.
[00:38:31] <therealkasey> anyone have a pointer to startupStatus code meanings? I'm trying to debug a replica set where the nodes don't seem to see each other.
[00:57:09] <jstout24> is it good or bad practice to have multiple databases per application?
[01:24:56] <therealkasey> hmm, i figured out my replication issue, but i'm not quite sure what to do about it. i have a replicaset config that has gone out to all nodes with an incorrect hostname (the host that originally was the primary). the primary demoted itself to secondary, so i can't reconfigure the set from there. so i've got a set with no primary and a bad config on all nodes.
[01:27:52] <therealkasey> thinking about dropping my oplog and replset collections
[01:51:11] <kotedo> To all the MongoDB guru's out there: Can I have something like a many to many relationship in MongoDB?
[02:06:55] <kotedo> so, maybe I should have a second database for millions of user and groups in a SQL fashion and the rest lives in MongoDB?
[02:07:15] <therealkasey> i would have an indexed field for groups in each user, where groups is a list of simple strings, then an indexed name attribute in the group
[02:07:41] <alnewkirk> there is no redundant data, and that design is very normal
[02:08:00] <webjoe> i'm assuming groups has more metadata then a string
[02:08:15] <webjoe> i mean if you want to jam it all into a nested array, don't' let me stop you.
[02:08:20] <kotedo> it will have more meta data, yes
[02:33:51] <Kane`> a new document will be made, the next day, to get logged to
[02:34:29] <dstorrs> as to the indices, try something like this: { node : -1, client : 1 } => "has [node] logged [client]?"
[02:35:03] <dstorrs> you want to put the higher-selectivity ones first, so node before timestamp before client
[02:35:40] <dstorrs> well, depending on what "10 or more timestamps" really means
[02:36:19] <Kane`> sorry, does that mean i should run my queries like: db.log.find({node: n, client: c}) as opposed to db.log.find({client: c, node: n}) ?
[02:36:52] <Kane`> dstorrs, "10 or more" is more of an average. some clients may only be logged be each node once. in that case, there'll only be one timestamp
[02:36:56] <Kane`> for the most part though, there will be many
[02:37:01] <dstorrs> the order you put them in the 'find' doesn't matter, but the order in the index matters a lot.
[02:37:29] <dstorrs> You want to build your index with the most selective item (i.e., the one with the fewest elements) on the left
[02:38:13] <dstorrs> check out this article on indexes: http://kylebanker.com/blog/2010/09/21/the-joy-of-mongodb-indexes/
[02:51:39] <dstorrs> what exactly is the difference between $pull and $pullAll in update()? they seem to do the same thing.
[02:53:41] <dstorrs> oh wait, I see. $pullAll takes an array and removes everything that matches any element of it.
[05:42:21] <Kage`> Anyone know of any PHP+MongoDB forums systems?
[06:28:55] <sagarchalise> Hi this may be wrong question to ask but are there any resource on mongodb database samples for people who have only worked on RDBMS
[06:39:01] <ukd1> sagarchalise, what kind of samples
[06:41:13] <sagarchalise> ukd1: I am having hard time designing mongo schema. I am thinking in mysql point of way of tables and relations.
[06:41:40] <sagarchalise> ukd1: thinking about a mapping sample of mysql database to mongo db design just for reference
[06:45:00] <ukd1> sagarchalise, there are examples - I'll google for you
[06:45:16] <ukd1> - this might be handy : http://www.mongodb.org/display/DOCS/SQL+to+Mongo+Mapping+Chart
[06:45:55] <sagarchalise> ukd1: I am looking into those but they talk about queries.
[08:48:58] <spillere> i'm doing z = db.users.find(), I want to get all data from the db, when I print z, it gives <pymongo.cursor.Cursor object at 0xb0d120ec>
[08:49:10] <spillere> how can I display all data properly?
[09:18:07] <multi_io> i.e., is there a way to atomically create a document ONLY IF it doesn't exist yet (as determined by a query for it returning nothing)?
[09:23:41] <spillere> where can I get a mongodb sticker?
[09:25:50] <multi_io> NodeX: problem with "craft a sneaky upsert that would not update anything" is that, if the document already exists, I don't know its contents. So I can't craft that upsert...
[09:52:20] <millun> i am rewriting a part of code to work with BasicDBObjects as advised. it is going ok but i don't know how to issue "ensureIndexes" command
[09:53:06] <carsten> it is our business to guess framework and implementation language?
[18:04:13] <mitsuhiko> does the startup order for mongos and mongo controllers matter? is mongodb supposed to start up properly if the services go up in random order?
[18:04:26] <mitsuhiko> i noticed mongos being stuck if we reboot the whole cluster at once for testing purposes
[18:48:35] <adamt> If anybody sees the auther of the Haskell-bindings, do let him know that there's a problem with the dependencies in the cabal-files for both version 1.2 and 1.3 of the bindings.
[20:29:56] <dstorrs> I tried this > db.temp.find({ 'pages.owner' : 'worker_9'}, { 'pages.$':1}) but it returns { "_id" : "carol", "pages" : [ { }, { } ] }, when what I actually want is { num:1, owner:'worker_8'}
[20:30:19] <dstorrs> (less the typo on the worker num)
[20:38:17] <mitsuhiko> multi_io: why would you use a number?
[20:38:24] <mitsuhiko> multi_io: objectid solves a completely different problem
[20:38:29] <mitsuhiko> multi_io: how would you allocate numbers?
[20:38:42] <mitsuhiko> no, objectid is not efficient, that's not what it tries to do
[20:39:38] <multi_io> mitsuhiko: allocate using a separate "counter" or maybe using an optimistic strategy
[20:40:02] <dstorrs> multi_io: that's going to break HARD on a cluster
[20:40:03] <multi_io> the mongodb docs at one point say that you should use natural id if applicable, iirc.
[20:40:04] <mitsuhiko> multi_io: that's the reason why numbers are not used in distributed systems
[20:40:09] <mitsuhiko> because you would need a central counting server
[20:40:24] <mitsuhiko> multi_io: i recommend against object ids though. use uuids
[20:41:20] <multi_io> dstorrs: the point is, we need that unique integer anyway (it's required by the use case), so it's a "natural" key, not an artificial one
[20:41:43] <dstorrs> You asked, we answered. Your call.
[20:42:00] <multi_io> I was thinking of making the counter values globally unique using only information that's available locally on the shard
[20:46:06] <multi_io> the customer requires that the numbers be roughly incrementing values, starting from 1. They are supposed to be exposed to end users :P
[20:55:31] <mitsuhiko> multi_io: mongodb is great for my particular usecase, don't get me wrong, but it's very unreliable still and it needs someone to sit next to keeping it running
[20:55:57] <mitsuhiko> and if you run it with less than three controller servers i just don't trust it yet
[20:56:15] <multi_io> that's kind of a minority opinion I guess :P I can't imagine that it would be so hard to implement app-specific sharding yourself in Postgres
[20:56:50] <mitsuhiko> multi_io: do you have to shard?
[20:59:42] <richardraseley> Just trying to clarify some points with regard to MongoDB sharding a replication. So the smallest component that will be sharded is the document. It is sharded according to the shard key which is selected and must be unique to all documents. multiple documents with contiguous shard keys will exist in a chunk and chunks are divided across shards as needed? Is that correct?
[21:00:12] <multi_io> mitsuhiko: so you wouldn't use mongodb because it's too unreliable, not because it doesn't provide any more features than, say, Postgres.
[21:04:42] <tilleps> how to calculate how much space mongoconfig servers take?
[21:05:54] <multi_io> richardraseley: the shard key doesn't have to be unique, I don't think.
[21:08:39] <richardraseley> multi_io: Hmm, I thought that was the case - but assuming your right, can you speak to the rest of my assumptions?
[21:10:11] <richardraseley> multi_io: Also, can you comment on whether or not the mongo routing service has to talk to the mongo config service for every query, or if it maintains a copy of the shard configuration locally as well?
[21:13:01] <richardraseley> multi_io: Just confirming that you are correct with regard to shard key uniqueness (per http://docs.mongodb.org/manual/core/sharding/#sharding-shard-key)
[21:13:13] <jstout24> i'm trying to think of the best schema design for an events passing an object like, `db.events.insert({ name: 'impression', parameters: { page: 'some_page_id', layout: 'some_layout_id', ……., visitor: 'some_visitor_id' });
[21:13:26] <jstout24> i was thinking about turning the parameters into a key / value pair and do a multi-index on that
[21:13:33] <jstout24> but i'm not sure how to query upon given parameters
[21:14:47] <multi_io> richardraseley: I guess it keeps it in memory, but I don't know, sorry. I haven't done sharding in production :)
[21:16:38] <richardraseley> Can anyone else confirm my assumptions in the comment above? Documents being the most granular unit that can be sharded, being gathered in ranges in "chunks" and spread across shards as needed?
[21:16:44] <richardraseley> multi_io: that's ok - thanks.
[21:33:01] <jstout24> if i have a document with field "data" which has key / value pairs… how can i search all documents where `(data.k = 'foo_k' & data.v = 'foo_v') AND (data.k = 'bar_d' & data.v = 'bar_v')`
[21:59:48] <mitsuhiko> multi_io: mongodb without replication is unreliable. mongodb with replication is a very complex setup compared to postgres
[22:00:15] <mitsuhiko> on top of that you don't have transaction safety or the expressiveness of sql
[22:01:24] <mitsuhiko> richardraseley: yes, that's how it works
[22:01:47] <richardraseley> mitsuhiko: Thank you for your response.
[22:02:23] <mitsuhiko> richardraseley: "Also, can you comment on whether or not the mongo routing service has to talk to the mongo config service for every query" <- talks to the config service yes
[22:02:27] <mitsuhiko> which is why you want three of them
[22:02:41] <richardraseley> So it talks to it with every query?
[22:03:07] <richardraseley> Client Request -> Router -> Config to determine placement -> to appropriate shard -> back to client?
[22:03:40] <richardraseley> Also, with regard to scalability write performance - it is safe to say that you are limited in write performance on a single document to that maximum performance of the master node in the shard that owns the chunk in which it lives?
[22:04:25] <mitsuhiko> richardraseley: actually, it does not need to talk to the config all the time
[22:04:35] <mitsuhiko> mongos will cache it for some time
[22:04:46] <mitsuhiko> for as long as the shard setup does not change i think
[23:54:39] <dstorrs> ok, this has to be an ID10T error on my part but I don't see it. count_pages = function(v) { print(v.pages.length) }; count_pages.apply({pages : [] }) => 'TypeError: v has no properties (shell):1'