PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Thursday the 20th of June, 2013

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:01:57] <Astral303> brycelane, depends on your use case… one downside i can think of is that if you store everything in one collection, and index on a common field, you will have a single large index. if you never wanted to use that index across multiple types of objects, then you will be unnecessarily working with a larger-than-necessary index. in contrast, if you had a collection per type of object, then you'd have index only as large as needed for
[00:01:57] <Astral303> given type, even for a common property.
[00:03:46] <brycelane> I'm considering the case of an ORM, which I suspect typically follows the method you describe. I noticed that there is a relatively small limit on the number of indexes possible, which could get conjested in a "one big collection" style.
[00:03:49] <brycelane> Thanks for your reply.
[00:07:20] <Astral303> brycelane, the biggest reason for putting things into one collection would be that you can run aggregations on that one collectoin
[00:07:38] <Astral303> so if you have similar objects that you'd like to aggregate across, it might make sense
[00:08:53] <brycelane> Oh, I see. That does throw a wrench in my thinking.
[04:19:28] <thesheff17> anyone compile mongodb scons with distcc? I have done this in the past but for some reason I can't get it working this time around
[14:15:24] <manjush> hi
[14:16:12] <manjush> I can't imagine what a multi version concurrency protocol means.
[14:16:43] <manjush> can anyone help me out with this
[14:43:51] <justHannes> hi everybody, i am somehwat new to mongodb (using the doctrine ODM) i have a setup where i have a relation from posts to provider ( 1:M ) and i want to select posts based on information stored in the provider ( my setup looks like http://pastebin.com/0b8sgmDR )
[14:44:22] <justHannes> what i would like to do is something like ` db.Post.find({ "publicationStatus": "1", "provider.publicationStatus": "1" } ) ` on post
[14:45:18] <monbro> is it common to reference embeded documents?
[14:48:56] <astro73|roam> justHannes: I haven't found any cross-document JOINs on Mongo, period.
[14:49:43] <astro73|roam> The aggregation framework _might_ (big might) have something usable. Otherwise, you'd have to do it in your application
[14:50:07] <astro73|roam> monbro: I would argue that if you don't have embedded documents, you're not doing it right
[14:50:37] <justHannes> astro73|roam thanks for the info, i feared as much ... well, on the application level would be a bummer for pagination ... i will probably handle it provider update events
[14:52:29] <monbro> astro73|roam: yes, that makes sense :-) I am struggling because doctrine is not handling dbref to embeded documents correct… so I was confused
[14:53:01] <astro73|roam> oh, i thought you meant subdocuments, not referenced documents
[15:38:05] <monbro> a document size with more than 1 mb does not speak for a good scheme?!
[15:49:07] <astro73|roam> monbro: it depends on what's in it
[15:49:19] <monbro> many embeded documents
[15:49:21] <astro73|roam> if you just have that much data, then maybe it's fine
[15:49:25] <monbro> up to 1000
[15:57:18] <monbro> astro73|roam: but wouldnt it take a too huge load at the end? I do not need all the embeded documents every time
[15:57:49] <astro73|roam> that's what projecting is for
[15:58:50] <astro73|roam> find() has another argument for selecting which bits you want returned
[15:59:44] <monbro> astro73|roam: ah right, I forgot about that. so that the document inside the mongodb is big e.g. up to 5 mb then does not matters?
[16:00:26] <astro73|roam> well, it'll still take time to process, but i don't know mongo well enough to say if it'll be significant
[16:00:41] <astro73|roam> and it depends on your application
[16:01:26] <monbro> astro73|roam: okay but thank you anyway, was a good help!
[16:01:27] <astro73|roam> at multi-MB documents, you probably need some combination of embedded and referenced
[16:02:04] <monbro> yeah I got some references already, so I try to find the best balance actually
[16:03:26] <starfly> monbro: up to 16MB/document
[16:04:07] <monbro> starfly: right, but you can change the size as well up to 2gb. I am quiet unsure what document size is good for a "proper" process for mongodb itself
[16:04:17] <astro73|roam> starfly: that's the hard upper limit, not necessarily a practical one
[16:04:29] <starfly> astro73|roam: of course
[16:04:56] <monbro> fyi http://docs.mongodb.org/manual/reference/configuration-options/#nssize
[16:14:24] <starfly> monbro: there are many factors to consider, how many documents in your larger collections, how much physical memory will you have to accommodate large working sets, what kind of disk I/O throughput you'll have (MB/second), etc. Although you can technically modify nssize, I think most people would agree that would be ill-advised
[16:17:09] <monbro> starfly: thank you very much, yeah I just red something on stackoverflow, that the default maximum at 16mb per document is not bad at all and good to orientation I guess
[16:18:02] <monbro> starfly: so in terms I got a better I/O troughput and more physical memory, the faster bigger documents would be proceeded?
[16:20:01] <starfly> monbro: sounds good, I think the best overall advice is to think carefully about what you'll need to store, model some options (embedding vs. linking), consider your data access patterns, etc. The truth is whatever you design will very likely be changed in the future
[16:20:24] <astro73|roam> monbro: if only computing were that simple
[16:21:53] <starfly> monbro: bigger isn't necessarily better, you have to weigh the traditional (from SQL world) issue of normalization vs. denormalization issues when considering embedding a lot vs. linking collections in code
[16:24:16] <starfly> monbro: model and design as much as possible up front to save refactoring later, but again, no matter which way you go, you'll likely evolve it into something else later.
[16:24:36] <monbro> starfly: mh thank you a lot, that makes sense and will keep me / us thinking and concepting some more time :-)
[16:25:06] <starfly> monbro: good deal and I agree with astro73|roam, which it was simple!
[16:25:11] <starfly> *wish
[16:25:25] <monbro> :-=
[16:25:27] <monbro> :-)
[16:42:44] <starfly> monbro: one last thing, try to use small key names to minimize doc footprint, those are included and can quickly add up; changing them later is a pain
[16:44:05] <monbro> starfly: ah you mean "name" instead of "customerName" ?
[16:44:28] <monbro> starfly: or better "n" than "name"
[16:48:09] <codenapper> Hi, could someone have a look at the explain() of my slow running query (100ms as opposed to ~10ms for similar queries)? Indexes are set and are being used, but it's still so much slower that I wonder what I'm doing wrong.. http://pastebin.com/MRQujMK1
[16:51:23] <starfly> monbro: I mean to use name vs. customerName, you want the key names to be meaningful, but not over the top given impact on document footprint
[16:52:06] <monbro> starfly: hehe alright, yeah I agree
[16:57:20] <cTIDE> how do you address a replica set in which the secondary is consistently falling out of sync with the master?
[16:57:50] <cTIDE> we basically end up resetting our secodnary from scratch every 2 weeks because it falls off
[16:58:45] <starfly> cTIDE: getting behind because of inadequately-performing hardware, write-storms, network saturation--any ideas?
[16:59:26] <cTIDE> presumably due to write-storms
[16:59:31] <cTIDE> our primary is very write heavy
[16:59:52] <cTIDE> it acts as a queue for one of our applications
[16:59:55] <cTIDE> so, lots of writes + deletes
[17:00:30] <starfly> cTIDE: sounds like your secondary is probably not hardware-sized like what's in use for the primary? You may need to throw hardware at the problem
[17:01:02] <cTIDE> ok, yeah, our master is an m2.xlarge and the slave is an m1.large
[17:01:28] <starfly> cTIDE: then it sounds like you're getting a predictable outcome… :)
[17:01:48] <cTIDE> well, our slave is really only in use as backup
[17:01:55] <cTIDE> so we figured it'd be safe enough to have it be smaller hardware
[17:02:00] <cTIDE> but i guess that's not the case :)
[17:02:30] <starfly> cTIDE: well, it's fine as long as you're up for cost of time to rebuild secondary every 2 weeks :)
[17:03:03] <cTIDE> ok, since we're going to rebuild the box
[17:03:08] <cTIDE> can you have a secondary of a different version?
[17:03:13] <cTIDE> our primary is 2.2.2 i believe
[17:03:27] <cTIDE> and if we're going to build a bigger box, might as well go 2.4 if that works
[17:03:42] <starfly> cTIDE: within limits, some oplog versions are compatible, some aren't
[17:08:47] <starfly> cTIDE: should be fine to pursue that strategy: http://docs.mongodb.org/manual/release-notes/2.4-upgrade/
[17:17:51] <cTIDE> cool, looks pretty straightforward
[17:17:55] <cTIDE> thanks starfly
[17:39:29] <starfly> cTIDE: NP, good luck
[18:37:13] <Kiba> hi
[18:37:24] <Kiba> hello
[18:37:39] <Kiba> Thu Jun 20 14:36:03.482 JavaScript execution failed: listDatabases failed:{ "errmsg" : "need to login", "ok" : 0 } at src/mongo/shell/mongo.js:L46
[18:37:41] <Kiba> what does this mean?
[18:39:58] <starfly> Kiba: looks like an auth (authentication) issue
[18:40:20] <Kiba> I logged in with my username nad password. Why could it be an auth issue?
[18:40:27] <Kiba> s/nad/and
[18:41:01] <starfly> Kiba: can you perform other operations, were you just trying to run "show dbs"
[18:42:02] <Kiba> show dbs yup
[18:42:06] <Kiba> not sure about other operations
[18:42:42] <starfly> Kiba: then presumably your username and password aren't correct
[18:43:23] <Kiba> starfly: huh? my password and username let me connect
[18:44:19] <starfly> Kiba: so you just have issues with "show dbs" -- can you run find() on collections, etc.?
[18:44:54] <Kiba> I don't know what collections I can run on
[18:45:45] <starfly> Kiba: think this is one of those "too hard to help without hands-on" issue, suggest you see your MongoDB DBA or admin
[18:46:44] <Kiba> oh well. I won't lose much of anything if I delete and restart again
[18:55:53] <tjmehta> Hi anyone familiar with the agregation api?
[18:55:57] <tjmehta> aggregation*
[19:08:10] <spicewiesel> hi all
[19:32:44] <kali> hi spicewiesel
[19:33:02] <spicewiesel> hi :)
[19:33:32] <spicewiesel> I could need some help sizing my mongdb instances, but I could not find some calculation examples, could you help me with that?
[19:37:13] <kali> yeah maybe, i don't know... ask the real question :)
[19:38:20] <spicewiesel> Okay :)
[19:39:18] <spicewiesel> first the simple questions: Do I have to keep the index in RAM and do I have to keep all data in RAM?
[19:39:51] <spicewiesel> as I know so far, I have to keep the index, but I do not have to keep all the data (with performance limitations then, of course)
[19:39:52] <spicewiesel> am I right?
[19:39:55] <kali> spicewiesel: index performance are absolutely terrible when not in RAM
[19:41:18] <kali> for the rest, it depends on the request rate and the expected performance of course, but DBs with index in RAM and documents on disk are suitable for many apps
[19:45:45] <spicewiesel> okay, fine. That's what I learned while reading the last days.
[19:45:50] <spicewiesel> thank you. :)
[19:46:00] <kali> you're welcome
[19:46:02] <kali> that was easy
[20:15:53] <jcalvinowens> Hello all. Is it possible to conditionally project a variable in an agggregation query? As in, {"$project: {"var": {"$cond": [CONDITION,True,False]}}}?
[20:16:15] <jcalvinowens> That always yields a literal "True", instead of including the variable
[20:16:41] <jcalvinowens> (This is in PyMongo)
[20:17:23] <jcalvinowens> Just using "$eq" without "$cond" does the same
[20:43:24] <tjmehta> aggregation*
[20:43:25] <tjmehta> Hi anyone familiar with the agregation api?
[20:43:27] <tjmehta> aggregation*
[21:00:38] <kali> tjmehta: yeah. ask you *real* question
[21:01:45] <tjmehta> users okay, writing up some example code
[21:01:51] <tjmehta> kali *
[21:09:09] <tjmehta> so here are some example docs: http://codeshare.io/mX5Fz
[21:09:36] <tjmehta> the question is, how would i set up an aggregation query for projects owned by "a0" sorted by votes
[21:10:40] <tjmehta> kali , ^^
[21:12:48] <kali> you can't
[21:12:55] <kali> this is a join
[21:13:00] <kali> mongodb does not do join
[21:35:32] <tjmehta> correct
[21:35:34] <tjmehta> kali
[21:35:41] <tjmehta> how would i do it with two queries
[21:35:54] <tjmehta> ill add an example of what i am thinking to that same url
[21:35:56] <tjmehta> hold on
[21:43:34] <tjmehta> kali , updated : http://codeshare.io/mX5Fz
[21:43:43] <tjmehta> added help comments in some js code
[23:35:45] <jaCen915> I know it's possible to find documents based off an array of values but is it possible to do the same with an update? for example db.products.update({ _id: [1,2,3,4,5]}, {$set: {blah:blah}})
[23:49:46] <jude0_> does anyone know if you can query getCollectionNames from mongo's rest api?
[23:58:48] <bjori> does the rest api support commands?