[02:51:41] <macBigRig> need help with indexing and syntax, see http://pastie.org/4607825
[02:57:05] <crudson> macBigRig: use dot notation to index nested attributes, but note that you will get entire documents back, not just the bit of the array that matches, unless you do some extra work (depends what you really want to do). The second question is simple enough with .aggregate()
[02:59:38] <macBigRig> curdson, looking for syntax examples for indexing. As for selecting, what kind of extra work is needed to get that output. Again any example is greatly appreciated.
[03:05:45] <crudson> macBigRig: e.g. ensureIndex({'PS.RRB':1})
[03:12:28] <crudson> macBigRig: this will give you an array of items that fits your second Q: db.c.aggregate({$unwind:'$PS'}, {$project:{CR:1,'PS.ETB':1,'PS.DRB.HIC':1,'PS.PTB':1}}, {$unwind:'$PS.PTB'})
[03:13:01] <crudson> filter out the unrequired 'PS.PTB.PCC' and PLC as desired
[03:20:50] <crudson> macBigRig: specifically for your output: http://pastie.org/4607928
[03:46:59] <hadees> how terrible is it to use a string as the _id ? I'm creating a versioning system for my collection and so I want the history table to have the _id: <objectid>:<version>
[04:02:20] <crudson> hadees: Not terrible at all. "If your document has a natural primary key that is immutable we recommend you use that in _id instead of the automatically generated ids." source: http://www.mongodb.org/display/DOCS/Object+IDs
[04:16:48] <tyl0r> Is it worth adding an arbiter to a 3 node replica set? Does that mean if 2 servers go down (for whatever unlikely reason), the 3rd will be elected to be the Master for writes?
[07:40:52] <jsmonkey123> If I have a collection called Foo and that collection stores multiply documents that has a property called boo that has an array with strings. If I want to remove one of the strings. Is the proper method to use update? or is it remove and specify what property/entity to remove?
[08:54:12] <makka> I'm having trouble debugging a mapreduce - is this the right syntax for launching a mapreduce from the shell with a subset of documents in the collection (those with foo == bar) : "db.coll.mapReduce(map, reduce, {out:"test"}, {query: { "foo" : "bar" })"
[09:07:05] <rantav> Hi, I need help with performance problems, anyone on board?
[09:07:20] <rantav> I just posted my question to the forum https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/NClTihy_PPk
[09:07:33] <rantav> But I figured that might be easier to do this over IRC...
[09:18:45] <NodeX> and again. Change your index to reflect RANGE at the end
[09:19:16] <rantav> NodeX yeah sorry, I'll use pastebin next time
[09:19:36] <rantav> anyway, I will change the index such that the range is last
[09:19:45] <rantav> however, that doesn't explain everything
[09:19:58] <Gargoyle> Eg. Wed Aug 29 11:17:17 [conn35034] warning: virtual size (82655MB) - mapped size (78108MB) is large (4265MB). could indicate a memory leak
[09:20:10] <rantav> b/w I have another query, in which the index does have ranges last, and it's still awefully slow
[09:20:25] <rantav> I'll put it in pastebin and post it here in a minute
[09:23:20] <NodeX> you changed the index on 1.8 million documents already ? .... I would imagine the index is still building
[09:28:03] <rantav> Here's the pastebin with one of my slow queries http://pastebin.com/PjxtQMG8
[09:28:13] <rantav> NodeX, I have not changed the index yet
[09:28:37] <rantav> and in this pastebin you can see that the ranges are last on the index (It's a bit different query)
[10:34:40] <rantav> b/c the UI updates in realtime as users do things on the site
[10:35:03] <NodeX> my approach to realtime (or near as) is every minute process X documents from a history collection (this stores every interaction with my app)
[10:35:40] <NodeX> I group by session id / user id and process say 5k docs at a time which keeps my queries and processing time nice and low
[10:36:19] <NodeX> the caveat is that you need to know what you want to report on before you start
[10:36:26] <rantav> nodex your approach may help performance or insertion of updates
[10:36:33] <rantav> but how does it help perf when querying?
[10:39:38] <rantav> when I limit() the results are SO MUCH BETTER
[10:39:45] <rantav> why didn't I think of that?!!!
[10:41:13] <NodeX> rantav : my method simply grabs the data, if I wanted to query for a range I query on the already processed data
[10:42:02] <rantav> nodex, so, in my model, the Session collection is "the processed data"
[10:42:05] <NodeX> for example I have a report that's got 1 days worth of reads/writes or w/e inside it and a date on it, this is one document and inside it lay nested objects with grouped/reduced actions from a user
[10:42:35] <NodeX> no, the session collection is your live data, you grab 5k docs from it and process them into a smaller collection, then the next minute you grab the next 5k
[10:43:13] <rantav> I guess it depends on your point of view
[10:43:34] <NodeX> In one of my apps I need users as a whole so I save the net result of what all users did ... i/e 50 people searched for Blah.... yours needs individual users
[10:44:00] <rantav> what I do is, a backgroud job processes the events from user activity, each event is a single user action
[10:44:04] <NodeX> it doesn't depend on any POV it's the correct model. You NEED a live collection where the data is dumped
[10:44:23] <rantav> and that backgroud process calculates session data from the stream of user activity
[10:44:31] <NodeX> it's analoguous to message queues, they grab soem data from a job server, process them and continue
[10:45:01] <NodeX> it's the most efficient way of doing it for mine and many apps that require analytics
[12:14:22] <NodeX> vak , easier to ask your question than wait for people to ask you
[12:14:35] <vak> Are there any standard commands to re-arrange documents in a collection according some "master" index? my collection doesn't fit in RAM, so, I'd like to minimize seek operations at least for uch a master index.
[12:20:03] <NodeX> delete your other indexes... LRU and MRU take over what is and isn't in RAM
[12:25:49] <vak> NodeX: why to delete others? I need them too... I just wuold like to phisically store data according to one index as much as possible. Kind of defragmentation if you like
[13:05:22] <Bartzy> ^ This would return only the pids that the specific user_id really likes.
[13:05:35] <Bartzy> it seems like pid_uid index would solve both queries
[13:06:05] <Bartzy> but I have a problem - the 2nd query will be much more effective with uid_pid, since Monog will only have to look in the index in the uid, then all the PIDs that that user liked are under the same btree "branch"
[13:06:11] <Bartzy> Am I correct in my assumption?
[13:06:25] <Bartzy> And... If I am - is it worth to have both "pid" index and "uid_pid" index, just for that ?
[13:11:26] <[AD]Turbo> when logging with syslog, what is the facility that mongod uses? some of the local1/2/3/4/5/6/7 ?
[13:19:56] <NodeX> [14:04:49] <Bartzy> it seems like pid_uid index would solve both queries
[13:20:38] <NodeX> "If you have a compound index on multiple fields, you can use it to query on the beginning subset of fields. So if you have an index on "
[13:21:16] <Bartzy> <Bartzy> but I have a problem - the 2nd query will be much more effective with uid_pid, since Monog will only have to look in the index in the uid, then all the PIDs that that user liked are under the same btree "branch"
[13:21:19] <NodeX> "you can use it query on "..... "a or a and b or a and b and c"
[13:21:35] <Bartzy> Meaning both indexes could be picked up by the optimizer, and could be used
[14:00:24] <NodeX> as with most things noSql it's down to your app and how it best suits you
[14:00:31] <ansolas> something else, I found teh following in the skip() doc: Range based paging provides better use of indexes but does not allow you to easily jump to a specific page.
[14:00:39] <ansolas> so whats the proper way to paginate ?
[14:30:22] <NodeX> Our model is made possible due to our resiliant infrastructure and scalability thanks to products like MongoDB from our friends at 10gen "
[14:49:58] <ramsey> Derick: I was just having some bit of frustration over DBRef not appearing to work in find() anymore, like my notes from 3 months ago suggest. I thought I recalled seeing examples in the docs, but I can no longer find them.
[14:51:24] <emocakes> Our model is made possible due to our resiliant infrastructure and scalability thanks to products like MongoDB from our friends at 10gen "
[14:51:54] <kali> Our model is made possible due to our resiliant infrastructure and scalability thanks to products like MongoDB from our friends at 10gen "
[14:57:57] <NodeX> emocakes : since I decided to never use an RDBMS again I have not encountered one problem that I cannot get around
[14:58:06] <emocakes> id like a couple of replicas of myself
[14:58:06] <Derick> the db ref contains a field called $id, but it still points to the normal _id field
[14:58:15] <emocakes> dress them up, play out some weird fantasies with them
[14:58:27] <emocakes> stuff all my ex's said was creepy
[14:58:31] <ramsey> Derick: right, but the work around with "catalog.$id" works correctly
[14:58:33] <NodeX> I have built many many apps for many different purposes and not ran into a single problem that could not be overcome in a day or two
[15:45:04] <vak> info about 2 days delay is just my date comparison from these two links: http://downloads-distro.mongodb.org/repo/ubuntu-upstart/dists/dist/10gen/binary-amd64/ and http://dl.mongodb.org/dl/linux/x86_64 Do you see any error in conclusion?
[15:50:28] <vak> What is faster -- insert a record with _id field or without?
[15:51:20] <bjori> vak: if you don't provide an _id the driver will generate it for you, and if the driver is broken, the server will
[15:52:02] <bjori> if you have data to use as an id that will be 0.0000000001% faster :)
[15:54:52] <vsmatck> If the user doesn't provide _id it's better to let the server generate it rather than the driver IMO. Less stuff to send over the network.
[15:55:38] <algernon> the advantage of the driver generating it is that if you later want to use the _id for other stuff, you'll already have it at hand, without an extra network round trip
[15:56:01] <ramsey> Derick: here's something weird that I just noticed... it appears that the MongoDBRef class in PHP is creating structures in the DB that look like this: { "$ns" : "collection", "$id" : "1234..." } rather than what the docs say, which is this: { "$ref" : "collection", "$id" : "1234..." }
[15:56:10] <ramsey> Derick: this is version 1.2.12 of the driver
[16:13:14] <ramsey> Derick: I see what's going on now. I'm using MongoHQ, and it appears that they are displaying through the interface $ns in place of $ref (in a DBRef object), but when I use the shell to access the DB directly, I see $ref for the same exact object, so it's not actually stored in the DB as $ns
[16:16:18] <ramsey> Derick: you can see what I mean here: http://www.evernote.com/shard/s4/sh/4ca017cb-249b-4646-bc4b-7983db6184c9/539198e667a53cc286601eabdb3b17de
[17:06:18] <ribo> 2.2 is stable today, just curious if people have had any migration stories
[17:10:23] <mog_> is there a way to obtain this behaviour: db.col.update({x: 1}, {$inc: {a: 1}}, {x: 1, a: 1, b: 0, c: 0}) where the third argument object would be inserted in the upsert request if it wasn't found in the first place?
[17:11:22] <ranman> derick: what distro are you on?
[17:15:27] <Derick> ranman: I'm happy to test package if you'd like btw
[17:16:02] <ranman> thanks, I'll hit you up in a little bit
[17:27:10] <sssilver> Hello everyone. I have a BinData field in my collection. Is it possible to use a BinData value as a condition when selecting from it? Thanks
[17:30:33] <IAD> sssilver: try to create a new md5 field for binary data in documents. i'm not sure about using binary data as filter value
[17:38:11] <IAD> BurtyB: 2.2.0 runs faster or you have some errors after the upgrade?
[17:50:52] <BurtyB> IAD, looks OK - numbers in the output are what I'd expect :)
[18:55:24] <coopsh> same behavior for 2.2.0 and 2.0.7
[18:55:45] <coopsh> is this "normal" and expected? is there a way to speed it up?
[19:01:40] <Honeyman> Hello. A weird question: there is no conditional sorting of the output in any form yet, isn't there? I was trying to perform something like db.messages.find({}, {}).sort({user: {'$eq': 'JohnDoe'}, time: 1}) but seems no hope for it...
[19:02:27] <Honeyman> That is, in this case, kind of - 'give me all messages ordered by time, but, if there are messages by JohnDoe, they'll go first'.
[19:03:32] <Honeyman> Is there any legal working way to do that?..
[19:07:51] <sssilver> Is this the right place to ask questions about pymongo?
[19:08:34] <sssilver> I'm trying to understand why for _id it returns a nested structure like this: { "$oid" : "<id>" }, and how can I make it flat, getting rid of $oid
[19:10:47] <Honeyman> sssilver: haven't the result been already preprocessed by something like json_util ?
[19:11:38] <sssilver> Honeyman: you mean it's the result of preprocessing by json_util?
[19:14:48] <Honeyman> Reformulating my question about conditional sorting. Well, since in fact I need only the ONE result, then... $or does NOT guarantee the order of execution, right? Isn't there anything that guarantees?
[19:19:17] <Honeyman> Well, what I probably need is some kind of SQL UNION equivalent...
[19:19:48] <Honeyman> (that would work in findAndModify)
[20:03:13] <vak> just wanted to say thanks to 10gen. After upgrading one part of my calculation takes 30 min instead of 2 hours 30 min. Well, 5x speed up is a nice thing :)
[21:43:05] <Gargoyle> Is it possible to "tag" an query in code to get it to put something in output.log ?
[21:54:13] <leandroa> hi.. I've been using a .js script on 2.0.4, and it worked fine.. now (since 2.0.7, and even 2.2) I noticed that when I do ObjectId('….').toString(), it returns the entire string, instead of the 24 digits id as string.. is there a better way to do that?
[22:06:44] <thomasthomas> HI ALL! I am trying to .find({transactions:undefined}).count() and failing any explanation?
[22:07:06] <thomasthomas> Well it's not failing, its returning 0 and I know that's wrong...
[22:09:46] <Gargoyle> thomasthomas: is $exists what you are looking for?
[22:09:47] <leandroa> thomasthomas: try using .find({transactions: {$exists: true}})
[22:10:09] <leandroa> i mean, false.. you're looking for nonexistent ones
[22:12:41] <thomasthomas> leandroa & Gargoyle: thanks but false is returning 0 and true is returning 25327 (all), I've seen transaction equal to undefined when I'm querying.
[22:13:07] <leandroa> thomasthomas: undefined as a string?
[22:27:52] <kzoo> ok, pyes and mongodb - please play nice. back for another kick at the can
[22:44:05] <thomasthomas> I have a root level field that is equal to an array sometimes and null others. How can I select all orders when this field "transactions" is null?
[22:49:05] <thomasthomas> I finally figured it out! .find({"transactions.0":{$exists:false}}).count();
[22:49:45] <thomasthomas> db.orders.find({"transactions[0]":{$exists:true}}).count(); comes much more natural ...