[00:07:38] <radicality> Hi. Does anyone here have any experience/tutorials using amazon's elastic map/reduce for mongodb ? I can't find even the simplest example :S
[00:22:25] <acidjazz> [Tue Jun 05 00:20:48 2012] [notice] child pid 23477 exit signal Segmentation fault (11)
[01:31:32] <sebastiandeutsch> Hi I'm running a mongodb with the following stats: mapped 20.1g vsize 40.9g index 2.4g - queries are getting slow lately. I have server timeouts etc. Are there any parameters I can tweak beside the default config?
[01:33:07] <sebastiandeutsch> the number of faults is not very high, but the locks look quite high
[01:33:53] <dstorrs> sebastiandeutsch: are you doing a lot of map reduce?
[01:34:21] <sebastiandeutsch> dstorrs: no but we need to identify docs by string (and we have 8mio of them)
[01:35:00] <dstorrs> and you're sure it's a DB issue and not a memory or disk I/O issue ?
[01:35:12] <dstorrs> I know you said locks were high, just wanted to make the suggestion
[01:36:20] <sebastiandeutsch> dstorrs: currently mongo uses 31% of ram
[01:37:25] <sebastiandeutsch> dstorrs: and any suggestions welcome ;-)
[04:38:46] <freezey> recommended mount settings for physical hardware? ssd
[04:39:03] <OsamaBinLaden> http://news.ycombinator.com/item?id=3982142 - and there are lots more
[04:48:29] <OsamaBinLaden> anyway, i'll stick to mongo :P
[04:53:32] <dstorrs> OsamaBinLaden is right. I've been seeing more hater posts around lately
[06:01:57] <hdm> quick question, trying to add a generic MR script for counting unique fields, this seems to blow up: emit(eval("this." + fname), { c : 1 }); in the map routine, fname is defined further up in the same script
[06:02:11] <hdm> do i need to use a function factory instead?
[06:02:50] <hdm> can a MR map function access vars in the scope of the caller basically
[06:03:01] <hdm> looks like no, but figured i should check
[06:03:08] <dstorrs> hdm: I think you want the 'scope' variable
[06:52:15] <philnate> and you would do primary key lookups through regex?
[07:08:43] <henrykim> dstorrs: Is a md5 algorithm able sure to generate unique keys perfectly?
[07:09:44] <henrykim> dstorrs: for example, currently, I got 100 billion urls now. If I changed them to md5 values, it is all different?
[07:11:04] <henrykim> dstorrs: md5 has 128 bits array size to keep a hash value. I am sure that it's enough size to keep my datas. but, it's algorithm. now, I have no idea about that.
[07:16:26] <philnate> as it's only a hash algorithm there may be collisions, so you may end up having two or more URLs with the same hash, that can happen with every hash algorithm, so you have to live with that.
[07:16:57] <philnate> Although as we mentioned it's quite unlikely to happen you may encounter that two URLs hash to the same value
[07:21:23] <philnate> henrykim: whats your system setup and your query paths?
[07:21:54] <philnate> when the performance degration started what was happening in your system?
[07:23:42] <henrykim> philnate: I setup 3 shards, 5 routers.
[07:24:12] <philnate> ok how is the memory/drive utilisation?
[07:24:26] <philnate> have you looked into mongostat to see some stats about your system
[07:24:31] <henrykim> each server has 8G memory. and it's disk capability is over 600G.
[07:38:03] <philnate> so simple _id = URL and no in (URLs) or prefixed lookups?
[07:39:01] <philnate> so I may missed it, but can you name it what happend when the performance degration started. When/What did you monitor where you encountered this problem?
[07:39:29] <philnate> Did you just monitor the normal system doing daily business, or was it importing those data, or whatever
[07:41:04] <philnate> btw from what I saw from your mongostat, it looks like only two queries where sent to the server
[07:42:45] <henrykim> from mongostat updates/queries data, I draw the performance graph. I can monitor the total performance is about to half of the max performance after 1 hour.
[07:43:34] <henrykim> this system is storing all documents from our services. we need to store it permanently.
[07:43:50] <henrykim> but, we can restore it if we missed it by any reasons.
[07:43:52] <philnate> maybe, but you have to consider the other values as well, if no queries enter your system you may see less queries executed
[07:44:41] <henrykim> yes sure, currently I am testing the performance of mongodb. in a real service, we need to ensure several indexes more.
[07:45:08] <philnate> I would like to help, but I'm missing somewhat performance outputs
[07:45:57] <philnate> so how did you generate the load for your system?
[07:51:53] <philnate> basically without looking at your stats, you'll have two problems with random inserts as soon as you hit the memory limit. It will decrease to some extend.
[07:52:11] <philnate> For random inserts I guess this drop will be quite bigger than for sequential
[07:53:16] <henrykim> philnate: here is my log http://pastie.org/4030245
[10:32:36] <ub|k> i was using a list of DBRefs in each document, but the fact that i.e. mongokit has problems with it makes me think that maybe i am doing something terribly wrong
[10:35:56] <NodeX> "talks.speaker":"bob" ... where your talks collection looks somehting like this ... {speakers : [{speaker:"bob",date:"blah"},{speaker:"john",date:"blah"}...]}
[11:49:07] <ub|k> NodeX: still about my earlier question... the problem with your approach is that i will end up having several copies of the same speaker lying around, if (s)he's got more than 1 talk
[11:49:23] <ub|k> which then may be undesirable in case i want to update speaker information etc
[12:20:45] <ledy> after playing with db.largecollection.find( { $or : [ { identifierX : "sha1_hash123" } , { identifierY : "md5_hash456" } , { identifierZ : "another_hash789" } ] } ) i checked the indexes that mongodb prepared automatically.
[12:21:53] <ledy> in addition to the "single" ensureIndex for the identifiers, there is a identifierX_1_identifierY_1_identifierZ_1 now and performance is down :-(
[12:22:40] <ledy> any ideas how to explain to mongodb that it could be enough to use the indexes for the single IDs because i do not have $and, but $or queries?
[12:41:09] <kali> kelye: afaik, you can force mongodb to use a given index, force it to perform a table scan, but not forbid the use of one given index
[12:41:58] <kali> ledy: have you consider performing three queries instead of one and mixing the result in application land ?
[12:52:45] <ledy> kali: strange, after removing the triple-index, it has not been created again. now it is using the single indexes.
[12:53:32] <ledy> "MongoDB can only use one index at a time" => this leads to the question:
[12:53:45] <ledy> What query do you suggest me to use with the find? <i>{ $or : [ { identifierX : "sha1_hash123" } , { identifierY : "md5_hash456" } , { identifierZ : "another_hash789" } ] } </i> OR better 3 * db.find for any single identifierY/Y/Z and merging the results on my own?
[12:55:17] <kali> ledy: i don't remember how $or are treated by the optimizer... but if you perform three simple query, at least, i'm sure you'll get the right index for each query :)
[13:10:06] <ledy> i'd prefer to let mongodb do the job with one query including all three identifiers in the $or... i hope $or is the "lazy operator" for this statement. so mongodb can stop on its own after first match when i use findOne...
[13:40:21] <leecher> Hey guys, I'm trying to use find to locate all documents that do not have the filed "deleted_at" .. give me a tip on how to do that?
[18:27:36] <NodeX> "You can now "tag" replica set members to indicate their location. You can use these tags to design custom write rules across data centers, racks, specific servers, or any other architecture choice."
[18:28:36] <linsys> Yes I know how mongodb works...
[18:29:01] <linsys> That is for devs to code that they want to make sure a write goes to two racks or 3 specific servers etc..
[18:29:17] <NodeX> DId I say that you didn't know how mongo works?
[18:29:45] <linsys> No, just stating that since u pasted a link. Anyway any other question
[18:30:11] <NodeX> it was not for you.. you obvisouly know how mongo works so it can't be for you ;)
[18:35:11] <souza> guys i have a non tecnical question, what is more recommended using mongodb, i can use three schemas, and make relations among them, or create only an schema and create arrays inside this schema, what's more recommended? Thanks
[18:36:53] <NodeX> souza : it depends on your data model
[18:37:53] <souza> NodeX: i don't understand, i think that's the same think, but showed in differents ways.
[18:38:15] <NodeX> if your data model prefers relational then use relational
[18:38:26] <NodeX> if you query one thing less than another then use embedded
[18:38:43] <linsys> souza: It depends very much so on your data model... If the single json document will only grow to a certain size you might want to imbed. If you want to create a chat service and user Joe: is going to have 100s or even 1000s of imbeded conversations in a single json document its better to create a new collection called chat and reference Joe's conversations from that collection. Each conversation its own document
[18:41:54] <souza> linsys: in my case i'm using this only for tests, i want to login determinated system, and all my actions generate a log, the i'll have a "object" user and another "log"
[18:42:40] <souza> i think to this case, i can use only one schema, right?
[18:43:58] <linsys> Not sure I understand exactly.. sounds like the actions could grow and grow. If this is a test you can make it as simple as you want but if the actions are going to grow in an unlimited manner once you go to prod you might want to break them out into their own collection
[18:49:45] <souza> linsys: Ok, thanks, now i can think about and get a conclusion.
[18:56:42] <spicyWith> is anyone here using cloudformation to deploy mongo on ubuntu and ec2?
[19:03:38] <linsys> spicyWith: I am actuaing using Fabric to do ec2 deployments working on that right now
[19:04:46] <spicyWith> linsys: ah cool - used to use fabric as well. just discovered cloudformation which seems very powerful to describe infrastructures - having some trouble attaching EBS volumes to ubuntu though
[19:08:05] <nicholasdipiazza> why is mongodb so much better for documents with plenty of internal structure versus a small fixed size?
[19:09:28] <durre_> if I have two "entities" (Domain & Position). I want to find all positions for a certain domain. should I "link" to Domain with the id, or should position contain the domain name to letting me do the query? what's "the mongo way"?
[20:03:54] <fjay> anyone seen anything like this...
[20:27:50] <dstorrs> db.COLL.find(NNN).limit(MMM) means "create a cursor in collection COLL that is restricted to at most 900 rows and then will return 'undef'"
[20:28:04] <dstorrs> (or whatever your driver's equivalent of undef is)
[20:28:45] <dstorrs> I don't know why you keep trying to apply .length() to it. I can't even tell what you think it's going to do, which is why I'm having trouble helping.
[20:29:15] <dstorrs> step back and lay out for me exactly what you're trying to achieve. I'll help if I can.
[20:29:23] <ayakushev> ok, i'll get the results, from it
[20:30:25] <dstorrs> ayakushev: I'll say it one more time, then I give up. Please step back and lay out for me exactly what you're trying to achieve. I'll help if I can.
[20:31:16] <ayakushev> i'm trying to get the first 900 results from a collection
[20:31:53] <dstorrs> ok. db.profile.find().limit(900) returns a cursor which does that.
[20:32:19] <dstorrs> you can then repeatedly call "doc = cursor.next()"
[20:35:49] <dstorrs> you and fjay have mentioned the number '804' twice now. The first time was when he said you were doing this: db.profile.find().limit(900).length()
[20:36:01] <fjay> yeah.. and if i do this.. in ruby...
[20:43:19] <dstorrs> sorry guys, I got nothin'. All I can say is try to narrow it to that one record and then throw it in JIRA so it can be fixed in next release.
[20:43:29] <dstorrs> please do be sure to post the ticket, though.
[21:31:36] <nicholasdipiazza> If I have doc = {"inner":[{"innerId":1, "name":"nick"}, {"innerId":2, "name":"fred"}, ..., {"innerId":100, "name":"jeff"}, ]} How can I update everyone's name to "Todd"
[21:33:10] <skot> you cannot in a single operation
[21:33:50] <skot> see the docs here: http://www.mongodb.org/display/DOCS/Updating
[21:34:54] <nicholasdipiazza> ok. so I would have to write some sort of loop?
[21:36:12] <nicholasdipiazza> i am having trouble knowing what operations i can perform on the document variables themselves
[21:36:54] <dstorrs> nicholasdipiazza: a document is just a JSON struct.
[21:37:21] <nicholasdipiazza> d = {'myId':'value'}
[21:37:41] <nicholasdipiazza> that's a json struct where d.myId = value
[21:38:32] <dstorrs> nicholasdipiazza: as to the updating...you could write a loop, you could map/reduce, you could do it in app code, you could replace all of those records with a single one -- it all depends on what you're trying to achieve
[21:38:52] <nicholasdipiazza> let's say i have d = {"myid":"thisismyid", "values":["1", "2", "3"]}
[21:39:10] <nicholasdipiazza> how can i update that document (without using db.save or update) form mongo console
[21:39:56] <dstorrs> erm...why are db.save and db.update verboten? If you're trying to change data from the console, that's kinda how you do it
[21:40:35] <dstorrs> also, step back farther. what does this collection represent? why are you using embedded docs? why the update?
[21:40:37] <nicholasdipiazza> i'm used to treating a JSON object in Javascript... where if I loaded an object with an array of strings, i could loop through and update those strings with a for loop
[21:41:12] <nicholasdipiazza> This is all based on an interview question
[21:41:32] <dstorrs> you can do the exact same thing in mongo. The diff is that if you want the data to persist, you need to save it back to disk
[21:42:58] <nicholasdipiazza> yeah and it left me scratching my head
[21:43:35] <dstorrs> nicholasdipiazza: I'll save you some time -- if someone else wants to help you with this, more power to both of you. But I'm not comfortable doing someone else's homework, so I'm going to bow out.
[21:43:46] <dstorrs> I hope you find the answer and get the job, though.
[21:44:46] <nicholasdipiazza> oh it's not like that
[21:44:51] <nicholasdipiazza> the interview is over i'm just wondering
[21:45:08] <dstorrs> suddenly new information appears. :>
[21:45:11] <nicholasdipiazza> so feel free to look at it if you want. http://pastebin.com/5nnqagPB
[21:46:04] <dstorrs> and you have to do this in the shell, or can you use app code?
[21:46:34] <nicholasdipiazza> had to write it on a whiteboard i was like... nope no idea lol
[21:47:35] <dstorrs> nicholasdipiazza: good for you for finding the answer afterwards. I'd start looking here: http://docs.mongodb.org/manual/reference/javascript/
[21:48:15] <dstorrs> you may also want to look at this: http://docs.mongodb.org/manual/reference/javascript/#cursor-methods
[21:48:38] <nicholasdipiazza> cool thanks. anyone else want a head scratcher give that guy a look
[21:49:21] <dstorrs> at each step ask yourself "if I were doing this in a JS script, how would I do it?" then back that up until you've turned the mongo environment into essentially the same script
[21:49:22] <nicholasdipiazza> why are these api docs not easier to get online? i was looking for something like this with no luck
[21:49:51] <nicholasdipiazza> oh it's because i wasn't looking in the javascript reference
[21:56:28] <dstorrs> nicholasdipiazza: if you're at all interested in Mongo, I STRONGLY suggest setting aside 3-4 hours and reading all of docs under http://docs.mongodb.org/manual/
[21:57:03] <modcure> lets say i fire up mongodb. I run a query(get documents from a collection). mongodb would need to fetch this from vritual memory(on disk but faster than non virtual memory). this would cause a page fault since its the first time the data is being access?
[21:57:03] <nicholasdipiazza> I did. i feel like they get you part of the way with simple scenarios
[21:57:11] <dstorrs> ramsey: you know, I've never seen any of Derick kchodoro_ or tychoish actually active on this channel. have you?
[21:58:07] <dstorrs> or hey, I just realized. kchodorow must be Kristina Chodorow -- shiny! we are the in presence of greatness.
[21:59:00] <eph3meral> so I'm basically absolutely new to mongo querying, I've worked a bit with Mongoid and I did the basics of this like, last year... plenty familiar with SQL and JavaScript themselves... so anyway...
[22:00:13] <dstorrs> eph3meral: step back for a second and tell us what you're trying to achieve at a high level
[22:00:25] <dstorrs> (I really need to make a macro for that sentence)
[22:00:30] <eph3meral> dstorrs, uhh... I'm trying to add a field to this document
[22:00:35] <modcure> dstorrs, when mongodb loads. it maps the data files to virtual memory(on disk but faster than regular block on disk). then i run a query . mongodb would need to fetch the data from virtual memory into memory? this would be consider a page fault?
[22:00:37] <eph3meral> the document already exists
[22:00:55] <eph3meral> dstorrs, i know what you mean, but there is not much higher of a level than this
[22:01:04] <eph3meral> i want to add a field to an already existing document
[22:04:07] <kchodoro_> eph3meral: that looks correct, what's happening?
[22:04:34] <dstorrs> kchodoro_: I write in Perl and have been using your MongoDB driver. It is great -- thank you.
[22:04:50] <kchodoro_> you're welcome! glad it's working for you
[22:04:53] <modcure> dstorrs, since mongodb maps the files to virtual memory on disk. this would mean I would need double space for my data? one for disk and one for virtual disk?
[22:05:38] <eph3meral> kchodoro_, essentially "nothing" i get no response from the mongo shell at all, i hit enter, it goes back to the prompt, and when I do a subsequent db.dashboards.find() the data is not there
[22:05:46] <eph3meral> the data is not updated, nor is it added
[22:05:59] <dstorrs> modcure: dude, you're in a channel with kchodoro_ answering questions! don't ask me, ask her. :>
[22:06:28] <modcure> new to mongodb, I don't know who is who :)
[22:06:41] <dstorrs> (did you notice how nicely I avoided saying "I don't know?" :>)
[22:06:49] <kchodoro_> dstorrs: i appreciate the enthusiasm, but feel free to help out :)
[22:08:43] <modcure> kchodoro_, when mongodb loads. it maps the data files to virtual memory(on disk but faster than regular block on disk). then i run a query . mongodb would need to fetch the data from virtual memory into memory? this would be consider a page fault?
[22:11:05] <modcure> kchodoro_, since mongodb maps the files to virtual memory on disk. this would mean I would need double space for my data? one for disk and one for virtual disk?
[22:11:20] <dstorrs> kchodoro_: are all ObjectIDs with the same hash considered == ? And are they considered === ?
[22:13:02] <dstorrs> ObjectID("4fce7c96b29a7b141c000001") <=== what is the "4f..." properly called?
[22:14:19] <kchodoro_> dstorrs: i guess the value of the object id
[22:14:26] <kchodoro_> generally just the object id
[22:15:28] <kchodoro_> dstorrs: looks like they're not even ==
[22:15:39] <kchodoro_> which seems silly, since the db matches them
[22:15:51] <kchodoro_> that would probably be considered a bug in the shell
[22:16:42] <dstorrs> post a JIRA on that? you want to or should I?
[22:20:47] <modcure> kchodoro_, since mongodb maps the files to virtual memory on disk. this would mean I would need double space for my data? one for disk and one for virtual disk?
[22:22:08] <kchodoro_> modcure: nope, just once... although journaling can mess with that, it can map everything a second time, i think
[22:22:57] <modcure> kchodoro_, im lost. the data in on disk... mongodb loads the data files to virtual memory(this is in disk too). wouldnt this be double the space?
[22:26:35] <kchodoro_> modcure: virtual memory isn't disk, it's an abstraction the OS uses
[22:27:18] <modcure> kchodoro_, i need to read up on virtual memory
[22:42:15] <timebomb> heya, say i have a company with many locations, should these be embedded via embeddeddocuments? and will i then still be able to make use of the spatial indexing?
[23:05:21] <dstorrs> kchodoro_: (back. had to go interview someone) Bug report is here: https://jira.mongodb.org/browse/SERVER-6010
[23:16:23] <dstorrs> kchodorow: fjay is doing something like this: db.profiles.find().limit(900) and getting only ~800 records back from a >2000 record sharded collection. Is there any reason you can think of for this, or is this a "file a bug" thing?
[23:18:04] <dstorrs> fjay: you might start queuing up that bug report on jira (see channel title).