PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Friday the 1st of June, 2012

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[18:29:35] <jaraco> I've added pmxbot to the channel. Logs at http://chat-logs.dcpython.org . pmxbot is also set up to relay twitter messages to the channel. If that turns out to be undesirable or too noisy, don't hesitate to ping me and I'll turn it off.
[18:39:57] <wereHamster> jaraco: then you need to mention the bot in the topic, otherwise it's agaist the freenode TOS
[18:40:08] <wereHamster> or when people join the channel
[18:40:26] <jaraco> The bot will send private messages to people when they join the channel.
[18:41:26] <wereHamster> ask the channel ops...
[18:41:28] <rossdm> sounds potentially annoying
[18:41:29] <jaraco> It may be that it notifies when one connects to the server. No, it's got to be the channel.
[18:43:13] <wereHamster> yes. annoying.
[18:45:37] <tystr> indeed ;)
[18:46:20] <jaraco> That's the first news - subsequent entries will only happen when there are new tweets.
[18:46:28] <jaraco> Still want it gone?
[19:05:07] <GRiD> question for core devs: are the error codes in core server/tools assigned in any particular way? i see they are tracked in docs/errors.md
[19:13:27] <kchodorow> jaraco: yes, want it gone
[19:13:32] <infinitiguy> anyone use mongo in AWS?
[19:13:41] <infinitiguy> and if so - do you use ec2-consistent-snapshot
[19:13:43] <kchodorow> GRiD: just ascending order
[19:15:42] <GRiD> kchodorow, ok so if i were to add a new one, pick the next highest? fyi i'm looking at submitting a patch for SERVER-5912
[19:18:07] <jaraco> RSS feed disabled.
[19:18:14] <kchodorow> GRiD: if you set it to 0, scons will automatically insert the next # on compile
[19:18:30] <GRiD> kchodorow, oh, cool.
[19:18:38] <kchodorow> jaraco: thank you!
[19:19:07] <jaraco> kchodorow: welcome
[19:47:20] <infinitiguy> any mac users have a particular mongo viewer they like?
[19:47:43] <cfitzhugh> mongohub is what I've used
[19:48:12] <infinitiguy> cool - ill give it a go
[20:27:45] <infinitiguy> if I require auth on my mongoDB but I'm creating a new database - how do I create the initial user if auth is required?
[20:29:16] <infinitiguy> hrm weird - looks like it just worked for me - maybe i was typing wrong
[20:59:17] <westoque> So I have a field wherein I store immutable JSON data, I was thinking of storing it as flat .json files, it would make it faster for fetching in terms of the http server in contrast to mongodb.. Would you guys still recommend mongodb for my use case? Whats the advantages if I do? Thanks
[21:00:16] <westoque> So I have a field wherein I store immutable JSON data, I was thinking of storing it as flat .json files, it would make it faster for fetching in terms of the http server in contrast to mongodb.. Would you guys still recommend mongodb for my use case? Whats the advantages if I do? Thanks
[21:08:47] <heph_> hello
[21:09:38] <heph_> is there anyone here that can answer a question of mine
[21:11:49] <kchodoro_> heph_: maybe, what's your question?
[21:18:27] <SkramX> i have a map reduce that goes through all objects and extracts the event's location and counts all of them and puts the _id and count (as value) into a new collection. Would I use a finalizer to drop all records where value is less than 5?
[21:59:29] <SkramX> anyone around?
[22:00:43] <zirpubolci> .
[22:01:00] <SkramX> i have a map reduce that goes through all objects and extracts the event's location and counts all of them and puts the _id and count (as value) into a new collection. Would I use a finalizer to drop all records where value is less than 5?
[22:02:24] <zirpubolci> i haven't used mongodb MR yet. but i think that might be right. or just make the reduce use a query for > 5.
[22:03:20] <SkramX> a reduce doesnt take a query
[22:03:53] <SkramX> and if all the emits are done on one server.. i think the emits are what take so long so modifying the reduce doesnt really help
[22:04:11] <SkramX> im running on an m2.2xlarge ec2 server with 70gb of data..
[22:04:33] <SkramX> i wish funds were unlimited so i could just keep upgrading the server
[23:05:14] <tystr> how would I query and return only a single document in an array of embedded documents?
[23:05:52] <dstorrs> what are you trying to achieve?
[23:06:18] <tystr> I want to retrieve an embedded document
[23:06:55] <dstorrs> do you want a random document, just to verify something is there? do you want to match a set of criteria? if the latter, what? etc.
[23:07:43] <tystr> oh, I'm basically doing this:
[23:07:44] <tystr> db.collection.find({ "embedded_docs._id": ObjectId("4fc84597ab3c453633000002") });
[23:07:56] <tystr> to find the embedded doc by it's unique id
[23:08:18] <tystr> but I'm getting all the docs in embedded_docs
[23:08:36] <tystr> rather, db.collection.find({ "embedded_docs._id": ObjectId("4fc84597ab3c453633000002") }, {embedded_docs:1);
[23:09:36] <dstorrs> have you read these? http://www.mongodb.org/display/DOCS/Dot+Notation+(Reaching+into+Objects) and http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-ValueinanEmbeddedObject
[23:10:00] <tystr> yeah, but I am a noob w/ mongodb :)
[23:10:16] <dstorrs> hold one, let me verify something.
[23:10:33] <tystr> and we're porting our app from mysql to mongodb, so I'm still trying to wrap my brain around the whole scheless design thinking
[23:11:36] <tystr> I've embedded these docs b/c usually when showing content to the user I want to pull everything
[23:12:12] <tystr> but when it comes to adding/editing single embedded documents (in the backend, for example) I'm not really sure how to do it, or if I'm going about it the proper way
[23:13:48] <dstorrs> this is where my "what are you trying to achieve" question comes into play. What are these embeddeed docs and why embed them?
[23:14:07] <dstorrs> The more I talk to people on channel, the more it seems that embedding is often a bad design choice
[23:14:16] <tystr> hmm
[23:14:16] <dstorrs> But, here's a quick test I just ran:
[23:14:28] <dstorrs> > db.video_raw_stats.save({ vt : "not_real", foo : { bar : 1 } })
[23:14:28] <dstorrs> > db.video_raw_stats.save({ vt : "not_real2", foo : { bar : 3 } })
[23:14:42] <dstorrs> > db.video_raw_stats.find({ 'foo.bar' : 1 })
[23:14:42] <dstorrs> { "_id" : ObjectId("4fc94bc2586d3356a8e56e4c"), "vt" : "not_real", "foo" : { "bar" : 1 } }
[23:15:08] <tystr> hmm
[23:15:13] <tystr> that's what I'd expect
[23:15:18] <dstorrs> does that answer your immediate question about how to query one doc?
[23:15:55] <dstorrs> Also: { "_id" : ObjectId("4fc94bc2586d3356a8e56e4c") }
[23:15:56] <dstorrs> > db.video_raw_stats.find({ 'foo.bar' : 1 }, { bar : 1})
[23:16:14] <dstorrs> oops. reverse the order on those two lines.
[23:16:43] <tystr> I'd expect
[23:16:43] <tystr> db.collection.find({ "embedded_docs._id": ObjectId("4fc84597ab3c453633000002") });
[23:16:56] <tystr> to only return the item inside embedded_docs with that id
[23:17:56] <tystr> hmm
[23:18:52] <dstorrs> again I ask, "why are you embedded docs? what do they represent?"
[23:19:19] <dstorrs> the danger with embedded docs is that they are harder to update, and that unless they have a clearly defined limit, you will eventually blow the 16M doc size
[23:19:47] <dstorrs> (e.g., if you are embedded all the comments / votes / emails a user has made)
[23:20:48] <dstorrs> embedded docs are good for things like "here are all the line items that are part of this invoice"
[23:20:49] <tystr> this is basically a cms type of thing, so I know the document size will not be an issue
[23:21:20] <dstorrs> "cms type of thing" means what?
[23:21:31] <dstorrs> what are the primary docs and what are the embedded docs?
[23:31:35] <tystr> hmm
[23:34:09] <tystr> querying for an embedded document by it's id doesn't seem to work
[23:34:29] <tystr> but querying for a simple key in array does
[23:37:54] <tystr> oh hmm
[23:41:18] <tystr> hmm using the object notation doesnt return anything at all
[23:42:49] <SkramX> tystr
[23:42:54] <SkramX> paste the document
[23:42:58] <SkramX> and then say what you want to search for
[23:44:57] <tystr> https://gist.github.com/f3936ce45841f5f2d2d1
[23:45:32] <tystr> how would you find a specific document from the creatives array?
[23:47:19] <callen> hokay
[23:47:24] <callen> I have a list of embedded documents
[23:47:34] <Bilge> I want to be the Mongod
[23:47:35] <callen> they do not currently have a means of being uniquely identified.
[23:47:54] <callen> should I just add an objectid field to them? if so, how do I enforce uniqueness across the list of embedded documents?
[23:48:15] <dstorrs> callen: paste a sample, please
[23:48:28] <callen> you really don't want me to do that.
[23:48:35] <callen> it's a MongoEngine managed collection.
[23:48:37] <tystr> heh
[23:48:39] <callen> but if you insist.
[23:48:51] <dstorrs> ok, if not paste, then clarify what your objects represent
[23:48:59] <dstorrs> what are the primary documents and what are the embedded ones?
[23:49:31] <dstorrs> also, is your collection sharded?
[23:49:37] <callen> { "top_level_doc": blahblah, "list_of_things": [{"innerdoc":data, "_id_field_here?":id}, ...]}
[23:49:53] <dstorrs> what are the primary documents and what are the embedded ones?
[23:50:00] <callen> the unique ids for the embedded documents only have to be unique to that top level document
[23:50:04] <callen> not across the collection.
[23:50:05] <callen> thankfully
[23:50:11] <callen> so sharding is irrelevant.
[23:50:18] <callen> I'm trying to identify a "best practice" here.
[23:50:20] <dstorrs> ok, that helps.
[23:50:27] <callen> I'm not stupid enough to attempt that.
[23:50:35] <dstorrs> what is the data model you're building here?
[23:50:41] <callen> built*
[23:50:58] <callen> it's in production, but I realized I couldn't keep keying across multiple fields of data in the list of embedded documents to identify them
[23:51:06] <callen> so I want to add an objectid or uuid field or something
[23:51:15] <callen> it's a user at the top level
[23:51:19] <callen> then a list of "days"
[23:51:24] <callen> within each day, there's a list of meals
[23:51:30] <callen> I need to uniquely identify the meals.
[23:51:37] <callen> I was trying to hide you from the horror, but you insisted.
[23:51:42] <dstorrs> :>
[23:51:45] <callen> the days are uniquely identified on the basis of date
[23:51:50] <callen> which is not an actual date but a string
[23:51:52] <callen> (Don't ask)
[23:52:03] <callen> but the meals are not right now
[23:52:17] <callen> it used to be that they could be one of three types of meals, but that's no longer the case, arbitrary meals are expected now
[23:52:18] <dstorrs> I would probably use epoch because it makes range searches easy and fast. but that's me.
[23:52:20] <callen> so I need unique identifiers.
[23:52:27] <callen> dstorrs: normally I would too, but god hates me
[23:52:30] <callen> so moving right along
[23:52:33] <dstorrs> heh
[23:52:47] <callen> what's the "usual" way to uniquely identify an embedded document and enforcement uniqueness within the root document?
[23:54:46] <dstorrs> db.coll.save({ user : "bob", meals : { 2012-jan-18 : { ... }, 2012-jan-19 : { ... } } })
[23:54:56] <callen> no
[23:55:01] <callen> nice try though
[23:55:24] <dstorrs> why not?
[23:55:46] <dstorrs> what is you definition of "uniqueness" then ?
[23:55:49] <callen> {user: "bob", days: [{02/23/2012, meals:[{breakfast, unique_id?..},]},]}
[23:56:08] <callen> list of meals within the days which are a list within the user doc
[23:56:19] <dstorrs> and you have no control over the data model...?
[23:56:25] <callen> the meals don't have something like a date to uniquely identify them, because they're already---yes of course I do.
[23:56:29] <callen> awaek;hnsrlkthlkmsrht
[23:56:29] <callen> okay
[23:56:30] <callen> stop it.
[23:56:34] <callen> stop talking about the data model.
[23:56:36] <callen> it's simpler than that
[23:56:51] <callen> How do you uniquely identify, using something unrelated to the data itself, an embedded document?
[23:56:55] <callen> am I using an objectid or not?
[23:57:39] <dstorrs> first, lose the 'tude. I'm trying to help you, and to do that I need to understand the problem. Often questions like this relate to "the problem goes away if you change the data model", which is why I'm asking.
[23:57:44] <callen> I'm sorry
[23:57:58] <callen> and I can't change the data model because we can no longer assume the user has a unique type of meal
[23:58:07] <callen> they could have two breakfasts
[23:58:09] <callen> or elevensies
[23:58:11] <callen> or whatever.
[23:58:33] <callen> if this was MySQL, I would've known the answer to this ages ago..."just use a guid or pk"
[23:58:40] <callen> so is the answer to use an objectid or a guid?
[23:59:30] <callen> arbitrary list of embedded meal documents, need to uniquely identify them, nothing innate to the data model to do so...leaving me...needing to add something to uniquely identify them
[23:59:35] <callen> the obvious choice is objectid
[23:59:46] <callen> or generating my own guids and hoping for no collisions / checking for collisions.
[23:59:52] <dstorrs> neither one will help you unless there is a 'unique' index on that field.