PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Tuesday the 7th of August, 2012

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[01:13:10] <mappum> I have documents that are able to be "liked" by users, and I want to get a few fields from them. How can I check if a certain user has liked it without making a separate query or serving the whole "like" array?
[05:33:01] <macarthy> hi there, newbie question
[05:34:29] <macarthy> I have a report type model which has 1-2 million items contain a bunch of JSON data, I'll have 100k or so reports
[05:34:55] <macarthy> I need to query within a report and also across reports
[05:35:30] <macarthy> Each report will be way over the data limit for a document
[05:35:51] <macarthy> what is the best way to store the report items ? linked ?
[05:36:19] <ron> for me, that's a bit too vague.
[05:36:38] <macarthy> OK, what else you need to know ?
[05:37:27] <ron> more about the data structure, query types.
[05:39:31] <macarthy> Each item will have an array of hashes, some deeply nested, generally queries will access the array and and the first level of the hashes
[05:40:23] <ron> macarthy: okay, I'm going to milk it out of you :)
[05:40:31] <macarthy> milk away
[05:40:35] <ron> why will an item have an array of hashes?
[05:40:55] <ron> do you ever need to return a single item as a whole?
[05:41:01] <macarthy> because there are an undefined set of sources
[05:41:27] <ron> but by defining them in a single item, you're forcing a source on them :)
[05:41:28] <macarthy> each source and the data it provides is an array item
[05:41:50] <ron> yes, but why are they in the same item?
[05:41:57] <macarthy> true I chould break it down further
[05:42:15] <macarthy> ok now my report has 3 - 4.5 million items
[05:42:41] <macarthy> where number of sources = 3
[05:42:44] <ron> you're breaking them down live? :)
[05:43:26] <macarthy> really what I'm asking is if my report links to those 3.5 million items
[05:43:43] <macarthy> will i be able to query them server side ?
[05:43:53] <ron> why not?
[05:47:09] <macarthy> so my report should contain DBrefs to the items correct?
[05:49:03] <macarthy> Guess I'm thinking about this http://docs.mongodb.org/manual/use-cases/storing-comments/?highlight=embed
[05:49:47] <ron> okay, I'd really like to help but I'm afraid that any answer I'll give you will be unprofessional. I don't know what you mean by 'report' - you don't describe your application. but let's say this, assuming you have many data entries, each entry as its own document, and you want to collate those entries to a single document when you read from the database, don't hold an additional document with reference to those entries but rather query the database for those
[05:49:47] <ron> entries.
[05:49:52] <ron> I really hope that makes sense to you.
[05:50:41] <ron> you should store the data so that reading it would be easy. assuming you come from the DRBMS world, you need to break free from some... hmm.. assumptions.
[05:50:41] <macarthy> ok
[05:51:03] <ron> keep as little references between documents as possible.
[05:51:16] <ron> avoid keeping bi-directional links between documents.
[05:52:05] <ron> depending on what you do more, storing or querying (or which is more important) - base your data model. aim for the more important one to be easier.
[05:52:38] <macarthy> and mongo can store 1.5 million * 100K * n where n is between say 4 - 10 with reasonable performance?
[05:52:58] <macarthy> thanks ron:
[05:53:24] <ron> one of the great things about mongo is that it can scale out, so if one server isn't enough, you can add another one.
[05:53:46] <ron> not saying it's a flyby operation, you need to take it into account, but it's still relatively easy to do.
[05:53:59] <macarthy> this data is easily shardable
[05:54:04] <ron> there you go.
[05:54:32] <macarthy> write once, read often
[05:55:18] <ron> then you should definitely make your queries easier/more efficient than storing your data.
[05:55:20] <macarthy> and I'll enforce some domain level set of attributes
[05:55:25] <ron> even if it means writing your data more than once.
[05:55:31] <macarthy> ok
[05:55:55] <macarthy> one last question
[05:56:04] <ron> again, it's difficult to give concrete suggestions without going into details, but I think that should give you the basics.
[05:57:06] <macarthy> lets say I an creating said report, the report has a date, author, title, should I create a report object and ref it in the items or … ?
[05:57:48] <macarthy> eg item { reportid : q1 , data }
[05:58:18] <macarthy> report {id: q1 , title : "xxx"}
[05:58:52] <ron> that'd work.
[05:58:57] <macarthy> ok thanks for the help
[05:59:05] <ron> np
[05:59:10] <macarthy> experiment time :-)
[06:33:17] <avdeveloper> newbie question, what's a good Rails gem for caching MongoDB requests?
[06:50:53] <wereHamster> why would you want to cache mongodb requests?
[07:39:30] <[AD]Turbo> hi there
[08:25:08] <augustl> where is the "mongo" command documented?
[08:25:32] <augustl> more specifically, I want to know what I can do in the .js file. I'd like to access some environment variables, for example.
[08:34:12] <Bartzy> Hi
[08:34:28] <Bartzy> for(i=10000000000;i<=10000100000;i++) { db.shares.update({_id: ObjectId("4ff9c2f3733a582655000000")}, {$push: {likes: {uid: i, name: "Some Guy"} }}) }
[08:34:31] <Bartzy> I'm doing that to test something
[08:34:47] <Bartzy> getting 100,000 elements into the 'likes' array in a specific document.
[08:35:03] <Bartzy> it does 300-500 updates per sec according to mongostat
[08:35:06] <Bartzy> Isn't that extremely slow ?
[08:35:39] <ron> that depends.
[08:35:43] <Bartzy> and it's going down, now it's 180.
[08:35:51] <Bartzy> depends on what ?
[08:36:28] <ron> do you run it on a raspberry-pi?
[08:36:44] <Bartzy> not really
[08:36:48] <Bartzy> Dual Xeon E5-2620
[08:36:50] <Bartzy> 64GB RAM
[08:36:55] <Bartzy> 4 x 256GB Samsung SSDs
[08:36:57] <Bartzy> on RAID10
[08:37:04] <Bartzy> :|
[08:37:11] <Bartzy> and it's not doing anything else but that insert.
[08:37:18] <Bartzy> I mean really nothing. It's not used :)
[08:38:01] <ron> well, to me it sounds low, but I'm no expert.
[08:38:12] <ron> algernon?
[08:38:26] <Bartzy> It keeps going down, now it's 120
[08:38:56] <ron> well, live updates aren't that interesting ;)
[08:39:24] <Bartzy> Just wanted to let you know, maybe it's related to the fact that the document constantly needs to get moved ?
[08:39:49] <Bartzy> it finished now - but I'm afraid in real scenarios it will be that slow too ?
[08:40:19] <ron> oh, now I see what you did there!
[08:40:27] <ron> we do it differently.\
[08:40:29] <ron> iirc.
[08:40:37] <Bartzy> ?
[08:40:46] <ron> each like is a different document.
[08:41:16] <Bartzy> and there is a ref in the document being liked, or just in the like document ?
[08:41:42] <Bartzy> And why do you do it as a different document, and what does it have to do with mongo doing that at 120 updates per sec ?
[08:42:22] <Derick> Bartzy: we did explain yesterday that this was not going to be the best design I think?
[08:42:52] <ron> there's a 'ref' sortof, we do it in a different document since it's easier to control, and I have no idea regarding the performance.
[08:43:01] <ron> but Derick can mock you now until you get it right.
[08:43:14] <Bartzy> Derick: But I still don't understand why is it so slow? After our talk yesterday, I decided to start checking my data.
[08:43:22] <Derick> it ends up being moved all the time
[08:43:41] <Bartzy> OK, so that makes sense why it kept slowing down as it got bigger
[08:43:42] <Derick> if you run db.shares.stats(), what does it show about padding factor?
[08:43:50] <Bartzy> now? It is finished. Just a sec
[08:44:13] <Bartzy> "paddingFactor" : 1.0099999999999918,
[08:44:20] <Derick> so very low
[08:44:30] <Derick> the collection still holds all the data, right?
[08:45:17] <Bartzy> Derick: The collection has 100 million documents (real data) with not likes/comments/views at all
[08:45:27] <Bartzy> and 1 document that I chose randomly and added 100,000 likes into it
[08:45:32] <Derick> ah, right
[08:45:39] <Bartzy> sorry for not pointing that out.
[08:46:50] <Bartzy> so I did
[08:47:05] <Bartzy> big = db.shares.findOne({_id: ObjectId("4ff9c2f3733a582655000000")});
[08:47:15] <Bartzy> Object.bsonsize(big) - 5189277
[08:47:21] <Bartzy> I guess that's in bytes - 5MB ?
[08:47:25] <Derick> yup
[08:47:44] <Bartzy> The compression sucks, because the 'name' field is exactly the same on all elements :P
[08:48:03] <Bartzy> Or does BSON compresses ?
[08:48:11] <NodeX> and here is your trade off between UI and performance I pointed out yesterrday ;)
[08:48:30] <Bartzy> NodeX: Yeah, sorry for bailing out yesterday, I'm now in the part of realizing my mistake :D
[08:48:39] <Derick> no compression
[08:48:51] <Bartzy> OK. so if that's the case I really can't do that in one document.
[08:49:03] <Bartzy> So what I'm now thinking of is kinda what NodeX suggested (I think):
[08:49:07] <NodeX> LOLOL
[08:49:16] <NodeX> I hate being right -most- of the time :P
[08:49:20] <Bartzy> ;)
[08:49:35] <Derick> Bartzy: it's always good to test things yourself though
[08:49:59] <Bartzy> Do you want me to provide a quick spec of the data before I ask more annoying questions? :)
[08:50:01] <Derick> Bartzy: maybe just to know that NodeX doesn't (often) make stuff up ;-)
[08:50:21] <NodeX> often :P
[08:50:28] <Bartzy> I read the entire MongoDB in Action book and thought it would make me smarter! It did, but just a bit :)
[08:50:43] <Bartzy> I guess you can't beat experience.
[08:50:45] <NodeX> Bartzy : I really suggest that you split into 1-3 collections
[08:51:33] <Bartzy> So I have this: Pinterest like UI - each photo can be liked, viewed, commented on (each of those actions is recorded with the uid and name of the user who did it, and with comment some more data of course)
[08:52:17] <SisterArrow> Good morning yall!
[08:52:37] <NodeX> I would save teh count of likes/views/comments with each photo, and probably save the bare minimum of meta information
[08:52:49] <Bartzy> 50 photos ("pins") per page, when you scroll down you get 50 more. Each of those photo boxes has a like/comment/view counters, and the last 3 comments. When you click on a photo, you can see all the comments, likes, views in some other UI.
[08:52:59] <NodeX> (that way you can display a page then users can drill down / paginate should they need to)
[08:53:27] <SisterArrow> Im doing a mongorestore from a dump.. Im at a specific db.collection now and when doing a db.collection.count() in the shell(on another machine that is being restored to) its reported that there are 82618776 objects.
[08:53:42] <Bartzy> Almost all (!!) of the photos will have no likes/comments/views, but popular ones can have hundreds of thousands of "views", thousands of likes... And there is a "Popular" tab so it shows 50 of those heavy photos each time :)
[08:53:55] <SisterArrow> The mongorestore hoewher says that there is 308168834839.. :o
[08:54:05] <SisterArrow> The mongorestore comes from the same machine as I query..
[08:54:13] <SisterArrow> Mongodump, sorry.
[08:54:13] <Bartzy> NodeX: OK I'm done explaining what are the needs, now to my (yours? :)) solution :
[08:54:15] <NodeX> The "count" of likes/views/comments will take care of the "heavy" page Barzty
[08:54:49] <Bartzy> photos collection with metadata like _id, URL, image size... and - likes/comments/views counters , AND the an array of the last 3 comments.
[08:54:53] <Bartzy> the an=an
[08:54:55] <NodeX> storing the last 50 likes/comments/views as embedded / nested data inside the photo document will take care of the initial view
[08:55:06] <Bartzy> why last 50 ?
[08:55:10] <ron> why not?
[08:55:10] <NodeX> storing the rest in a separate collection will take care of drill downs and pagination
[08:55:30] <Bartzy> why not just storing the likes and view counters, and the last 3 comments?
[08:55:31] <NodeX> how ever many you like, you use 50 for other things so why not 50...
[08:55:50] <NodeX> let's be honest at 10 comments per page how many people go past page 5
[08:56:01] <ron> agreed.
[08:56:05] <Bartzy> But now I have the issue of how to determine if the current user liked/viewed/commented on a photo (this is specified in the UI).
[08:56:10] <NodeX> if you can save that "expensive" count() for the pagination I suggest you do
[08:56:24] <Bartzy> count() on the client side, that is ?
[08:56:43] <NodeX> whether the user liked it will be expensive
[08:56:44] <ron> we just don't use mongo to count.
[08:56:53] <NodeX> I suggest a caching layer
[08:57:26] <Bartzy> When it is in one collection (bad, like it is now), in order to know if the user liked/commented/viewed on each of those 50 photos displayed each time - We get the entire document into PHP, and in PHP we check if their UID is in likes/comments/views fields.
[08:57:26] <NodeX> it's probably less expensive to store with the user what he/she has liked/commented on (just the ID's)
[08:58:11] <Bartzy> NodeX: So a "users" collection with likes/comments/views of his, with just the _id of the photos in those arrays ?
[08:58:23] <NodeX> because you can initialise that data into the PHP session and you already have the ID of the photo so a quick "in_array()" and you know if they have or not
[08:58:36] <NodeX> correct
[08:58:48] <NodeX> that's probably the least expensive / most efficient
[08:59:01] <ron> well, I'm out.
[08:59:02] <ron> :D
[08:59:08] <NodeX> every time they like something add it to the DB and their session
[08:59:11] <Bartzy> and in the (very) unlikely scenario where a user viewed 10,000 photos (in theory) - that in_array() on the "views" array of the user - will not be a problem ?
[08:59:26] <Bartzy> NodeX: Session/memcached/APC - doesn't matter I guess ?
[08:59:55] <NodeX> I would persistently (with a timeout) cache it in redis personaly - easier to manage
[09:00:18] <Bartzy> why use redis and not just mongo ?
[09:00:21] <NodeX> you can save the number of likes a user has (as a count) and if it's >N goto your cache
[09:00:23] <Bartzy> it's a simple array in the user document
[09:00:36] <NodeX> if you let me finish my suggestion you will find out ;)
[09:00:44] <Bartzy> NodeX: OK :)
[09:01:09] <NodeX> having never done in_array() on 10k element's I could not speak of the performance
[09:01:18] <NodeX> best to benchmark it
[09:03:50] <Bartzy> NodeX: OK - you said: you can save the number of likes a user has (as a count) and if it's >N goto your cache
[09:03:52] <Bartzy> What did you mean ?
[09:04:12] <Bartzy> I save the numbe rof likes a user has in some cache - and what is N exactly? :)
[09:04:20] <NodeX> if the number of likes is greater than "N" (some large number), check in your cache
[09:04:30] <Bartzy> why ?
[09:04:38] <Bartzy> Why not always check in cache ?
[09:05:29] <NodeX> in_array on 10k members takes 6.9 ms
[09:05:39] <NodeX> so I think you;ll be fine ;)
[09:06:38] <NodeX> and 9.0599060058594E-6 on 100k members
[09:06:47] <NodeX> (microtime(true))
[09:08:14] <Derick> how can it be 9 µs on 10k and 9ms on 100k?
[09:09:07] <NodeX> microseconds sorry
[09:09:25] <NodeX> 6.9141387939453E-6 <--- 10k
[09:09:34] <ron> Derick: PHP is _that_ good.
[09:11:02] <NodeX> Bartzy : when your data gets to big you really want an edge graph like neo4j
[09:14:36] <Bartzy> NodeX: I can also switch and make the array keys as the values
[09:14:38] <Bartzy> then use isset
[09:14:52] <Bartzy> and use PHP's hashmap/index/whatever it is called
[09:15:08] <Bartzy> NodeX: But I still didn't understand why to cache like that in a non-mongo tool ?
[09:15:23] <NodeX> you wont need to
[09:16:17] <Bartzy> Why is that ? Why did you mention it ?
[09:16:33] <NodeX> isset is 3 tiems faster it seems ;)
[09:16:44] <Bartzy> Yeah cause PHP indexes the array keys
[09:16:51] <NodeX> err, I mentined it
[09:17:00] <Bartzy> But building the array by using array_keys would be slower..
[09:17:01] <NodeX> mentioned * because I didn't know the performance
[09:17:11] <Bartzy> I thought of just getting the user metadata (what he liked/viewed/commented on) when he first arrives at the page - save it in memcached/session/apc
[09:17:32] <Bartzy> and then for each photo, do in_array or isset (whatever) on that likes/comments/views array in PHP
[09:18:30] <Bartzy> NodeX: Also, why separate the comments to a different collection too - and not just keep them in the same document, if we assume that there would not be thousands of comments ?
[09:18:44] <NodeX> eh?
[09:19:10] <Bartzy> We said that it would be best to have a photos collection and a metadata collection, that has the comments/likes/views.
[09:19:16] <NodeX> If you recall to yetserday I suggested ONE collection for comments/views/likes
[09:19:19] <Bartzy> Why separate the comments from the photos documents ?
[09:19:22] <Bartzy> ^ ...
[09:20:03] <NodeX> I said store an acceptable number nested and teh rest in a separate collection
[09:20:13] <Bartzy> what is acceptable ?
[09:20:17] <NodeX> how should I know
[09:20:22] <NodeX> it';s your app and your data
[09:20:44] <Bartzy> Acceptable is some magic Mongo number of how much would be best to store nested - or is it only by my UI ?
[09:21:07] <NodeX> how many comments do you show on page 1?
[09:41:55] <Ushu> hi !
[09:51:52] <Bartzy> NodeX: 50
[09:52:04] <Bartzy> NodeX: Page 1 = page with all the photos ?
[09:52:08] <Bartzy> On the photo viewer - 50 comments
[09:52:17] <Bartzy> on the page with all the photos (i.e. pinterest.com) - 2-3 commentsa.
[09:55:20] <Bartzy> I don't get why to put the last liikes/views/comments on the photos collection, and then the rest on another collection?
[09:55:36] <Bartzy> Why not only put 2 last comments on the photos collection and then everything else (including those comments) in the metadata collection.
[10:16:25] <NodeX> then 50 is probably the acceptable number
[10:16:50] <NodeX> and you're asking me questions that you should be asking the mirror
[12:00:22] <Ushu_> hello, can anyone help me ? i have this error _a != -1 when i want to insert
[12:01:02] <NodeX> pastebin the insert query
[12:07:09] <Ushu_> NodeX : i'm using django + mongodbengine, i haven't the request yet
[12:09:51] <Bartzy> NodeX: Hi again :) You think 2-3 hours of lightning consulting from 10gen will help me out with the schema design decisions ?
[12:10:14] <NodeX> yep
[12:10:30] <Bartzy> OK, thanks.
[12:23:47] <Bartzy> to represent a time and date - for example when a comment was created, or a post was updated - Should I use Timestamp or ISODate ?
[12:23:52] <Bartzy> and why? :)
[12:28:01] <kali> Bartzy: you want datetime (which is mapped to ISODate in shell). Timestamp is not really a timestamp
[12:28:46] <NodeX> the only reason I can think of using ISODAte is to visualy aid when viewing objects, unless there is a performance advantage I am unaware of
[12:29:15] <Derick> the ttl collection can only use isodate types too
[12:29:31] <Derick> but otherwise, just an int will work as well for timestamps!
[12:29:43] <NodeX> 2038 might pose a problem
[12:29:53] <Derick> nope, mongo does 64bit ints
[12:30:10] <NodeX> *palm slap*
[12:30:46] <Derick> which gives it a range from 292 billion years in the past, to 292 billion years in the future
[12:30:54] <Derick> the isodate stores millisecs though
[12:31:01] <Derick> in a 64bit int
[12:31:03] <Derick> iirc
[12:31:15] <kali> yes
[12:31:23] <kali> millisec from/to epoch
[12:31:37] <ron> like java's long.
[12:31:50] <Derick> so only 3 billion yesars in the past, to 3 in the future then :-)
[12:32:01] <NodeX> either way we'll all be dead!
[12:32:12] <kali> we'll worry about this later.
[12:32:29] <Bartzy> So kali said datetime, Derick and NodeX says timestamps are fine too ?
[12:32:30] <kali> the next guy problem
[12:33:12] <ron> we recently changed all our dates to isodates, but who knows.
[12:33:29] <ron> they were mills from epoch before.
[12:33:43] <Bartzy> Does this really matter ?
[12:33:46] <ron> definitely easier to view, not sure that easier to control.
[12:33:50] <NodeX> it doesnt matter lol
[12:33:54] <NodeX> ^ +1
[12:33:59] <kali> Bartzy: no. timestamps does not make sense. datatime is good, and an integer from to epoch make sense too
[12:34:05] <ron> now our QA whines less.
[12:34:08] <ron> so it's a win for me.
[12:34:36] <Bartzy> kali: Why timestamp don't make sense ?
[12:34:54] <kali> Bartzy: because it's meant as an internal type to manage replication and unique ids
[12:34:56] <ron> I actually get bugs because (quote): "it looks different in RockMongo"
[12:34:59] <Bartzy> timestamp is an integer from epoch
[12:35:05] <kali> Bartzy: no.
[12:35:12] <ron> I really hate that admin console.
[12:35:14] <kali> Bartzy: bson timestamp is not
[12:35:24] <Bartzy> ah.
[12:35:25] <Bartzy> ok
[12:38:06] <Derick> Bartzy: no, "timestamp" is a wrong tiype, but yo ucan use an int to store unix timestamps
[12:59:17] <Bartzy> I have another question in the likes saga
[12:59:30] <Bartzy> if I have an array in a document. I want to always have a maximum of 50 elements in it
[13:00:08] <Bartzy> i.e. when a new element needs to get in, get it in (first if possible, if not, then last), and remove the last one only if it's the 51th element. If we didn't reach 50 , don't do it
[13:00:19] <Bartzy> Can I do it without first fetching the count of this array ? :|
[13:04:15] <feklee> I want to store settings in a MongoDB database. Every setting has a key and a value. What's best design? (A) `settings` collection with one document containing all the settings, or (B) `settings` collection with one document per setting (`_id` = key, `value` = value)?
[13:04:49] <feklee> Where do I learn about best practises for MongoDB database design?
[13:06:11] <kali> feklee: http://www.mongodb.org/display/DOCS/Schema+Design you can start from there
[13:07:04] <feklee> kali: Thanks! Any suggestion concerning my particular problem (which should be common)?
[13:07:48] <kali> feklee: it's likely to be a small collection with an even smaller number of write, so anything should work, realyk :)
[13:10:33] <feklee> kali: OK. So, I assume that everyone does that differently.
[13:11:34] <kali> feklee: i have a collection like that, with hierarchical content. I have one document per key at the first level of the tree...
[13:14:27] <Bartzy> Anyone ?
[13:15:15] <feklee> kali: Well, in my case (settings), documents would really just consist of a single value. Therefore my idea to use: `_id`, `key`
[13:15:26] <feklee> s/`key`/`value`
[13:15:59] <feklee> If there are any conventions, I'd like to know about them.
[13:17:49] <Bartzy> Anyone has an answer of the fixed size $push question ?
[13:38:53] <NodeX> Bartzy : no, you have to do it app side, you cannot $push and $chop
[13:39:00] <NodeX> $pop *
[13:39:08] <Bartzy> NodeX: But how can I do that atomically
[13:39:15] <Bartzy> app-side.
[13:39:47] <NodeX> keep a count of the array size and if > 50 add a $pop statement into your update
[13:40:07] <Bartzy> But I can't pop and push
[13:40:09] <Bartzy> so...
[13:40:25] <NodeX> eh?
[13:40:35] <Bartzy> get array size
[13:40:39] <NodeX> you can push and pop in one but you need ther size3
[13:40:41] <NodeX> the size*
[13:40:53] <Bartzy> you cannot $push and $chop
[13:40:55] <Bartzy> you said that.
[13:40:57] <NodeX> you can
[13:41:01] <NodeX> you mistake my meaning
[13:41:25] <NodeX> what I meant was you cannot apply logic to your $pop inside mongo
[13:41:26] <Bartzy> can you explain the queries that would be needed? if I have 'likes' array field and a likes_c counter integer field.
[13:41:42] <NodeX> I just told you what you need
[13:41:50] <NodeX> any more advice I'm sending you an invoice LOL
[13:42:04] <Bartzy> https://jira.mongodb.org/browse/SERVER-1050
[13:42:08] <NodeX> you might aswel send me your app and have me write it :P
[13:42:28] <Bartzy> NodeX: See that case ^
[13:42:43] <NodeX> oops, forgot about that
[13:42:57] <Bartzy> Yeah. So how can I do that? :|
[13:43:24] <Bartzy> only with some kind of state field, and findandmodify? :|
[13:43:52] <NodeX> I would do it once a day or something
[13:44:01] <NodeX> or in a msg queue
[13:44:03] <Bartzy> for all documents that were liked ?
[13:44:08] <NodeX> yep
[13:44:14] <Bartzy> that sucks :\
[13:44:23] <Bartzy> https://jira.mongodb.org/browse/SERVER-991
[13:44:24] <NodeX> 25 elements in an array wont cause anything bad to happen
[13:44:26] <Bartzy> ;[
[13:45:00] <NodeX> 25 extra elements *
[13:47:05] <NodeX> the trouble you will keep stumbling on is you're trying to use a k/v store / database for something that's designed to be handled by an edge graphing database
[13:47:18] <NodeX> (this part of the functionality at least)
[13:47:42] <Bartzy> NodeX: No one is using an edge graph for these stuff.
[13:47:48] <Bartzy> We're not building facebook.
[13:47:53] <Bartzy> You can't like the comment. Or like the like...
[13:48:08] <Bartzy> It's as simple as a blog platform
[13:48:24] <Bartzy> Posts (photos) has "thumbs up". Has comments. Has views.
[13:48:51] <NodeX> ok, good luck ;)
[13:49:00] <Bartzy> heh
[13:49:10] <Bartzy> so you're saying a blog site needs an edge graphing database? :)
[13:49:21] <NodeX> nope
[13:49:37] <Bartzy> And the feature I described earlier is different than a blog site, how ?
[13:49:38] <NodeX> your photo (client) has a relationship to it's meta (likes/comments/users etc)
[13:50:34] <Bartzy> same as a blog post
[13:51:02] <NodeX> except you're asking your app to handle massive amouints of data
[13:51:16] <NodeX> try doing that for a blog post in an RDBMS without a million cores
[13:51:33] <NodeX> so yes in large data sense a blog should be using edge graphs for relationships
[13:53:42] <Bartzy> NodeX: Foursqaure are not using edge graphs
[13:53:54] <Bartzy> And many many other giants (Twitter? Facebook)
[13:54:22] <ron> actually, twitter does use a graph database.
[13:54:26] <ron> a crappy one, but they do.
[13:55:20] <NodeX> then go ask them for advice
[13:55:24] <Bartzy> their own ?
[13:55:32] <NodeX> every advice you're offered you pick a hole in it
[13:55:52] <Bartzy> NodeX: I'm really not trying to antagonize.
[13:56:30] <Bartzy> NodeX: And that's not true. I take your (and others) advice very seriously ;)
[13:56:54] <NodeX> if you fully believe that facebook and foursquare tell the truth on thier infrastructure you're mistaken
[13:57:13] <NodeX> they do not divuldge anything that can hurt them finiancialy or give any competitors an edge
[13:57:48] <NodeX> and facebook (as an example) have huge daat centers so can afford to be liberal with data flows
[13:59:01] <NodeX> daat -> data
[14:03:51] <Bartzy> NodeX: in your metadata collection solution - each metadata document (like/comment/view) has a _id ?
[14:04:00] <Bartzy> it must - but it's redundant, I think... ?
[14:06:22] <ron> why redundant? you need a unique identifier, otherwise you can't address it. that identifier is the _id. that's it.
[14:06:36] <NodeX> ^
[14:06:44] <NodeX> ron is on fire today
[14:06:58] <ron> I _am_ burning up.
[14:07:04] <ron> actually, no fever this time.
[14:08:28] <NodeX> hahah
[14:08:49] <Bartzy> I don't want to address it
[14:08:55] <Bartzy> I want to address it only as a reference
[14:09:09] <NodeX> it will by default have an _id
[14:09:21] <NodeX> what happens when you want to delete it?
[14:09:27] <Bartzy> Yeah - I know. But it is redundant...
[14:09:27] <Bartzy> hm
[14:09:29] <Bartzy> OK :D
[14:09:41] <NodeX> or query it for lawful purposes or something ;)
[14:09:52] <ron> do you _not_ have an unique identifier for each entry?
[14:10:02] <Bartzy> That is actually another great reason for separating the comments from the photos document
[14:10:07] <Bartzy> the deletion
[14:10:08] <NodeX> I personaly use a "GUessing" method Ron
[14:10:16] <NodeX> php is good at guessing the right one :P
[14:10:41] <Bartzy> Ah, actually you can also use an id in the embedded document... But still referencing seems much more flexible
[14:10:45] <NodeX> Bartzy : you will find all sorts of good reasons when you develop your app - things you never thought of in the beginning
[14:11:31] <NodeX> As I said yetserday - I started with an app that nested the images in a gallery, even at 10 per gallery it was a nightmare
[14:11:42] <NodeX> I was really limited to what I could do with my data
[14:12:26] <Bartzy> NodeX: We were now thinking that no one will ever want to know who is the 51th person who liked their photo. Then we just need the last 50 likes and a total likes counter (and save the likes of each user in their own user document, to avoid more than 1 like per user per photo). But we have that $push and $pull issue which is a bitch...
[14:13:19] <NodeX> the user may not want to know it but you -may- want to know it in your app where you can graph trends and things
[14:13:32] <NodeX> if you have the capactiy I recommend storing data like that
[14:13:37] <Bartzy> hm
[14:14:00] <NodeX> even if it's not indexed - you can pull it out and map/reduce it for those kinds of things
[14:14:24] <Bartzy> pull it out - to another DB ?
[14:15:16] <Bartzy> but what graphing can be done with that data. Averages can be done with counts. Averages per user can be done with the user document.
[14:15:44] <Bartzy> But yeah, I get your point
[14:15:48] <NodeX> you might want to know what your users do
[14:15:53] <Bartzy> yes
[14:15:58] <NodeX> i/e from first contact right thru every click
[14:16:18] <NodeX> for me personaly I have a history collection that tracks all that but some people like the raw daaty
[14:16:20] <NodeX> data*
[14:16:37] <Bartzy> history collection for various actions? :)
[14:16:49] <NodeX> all actions
[14:16:51] <Bartzy> click there, like that, view this?
[14:17:02] <NodeX> yeh, what, who, when, how etc
[14:17:12] <Bartzy> isn't that an extremely big collection ?
[14:17:12] <NodeX> referer, that kida crap
[14:17:33] <NodeX> yes but I parse it daily and archive what I parse then drop the data
[14:17:38] <Bartzy> Ah
[14:17:39] <Bartzy> cool
[14:17:42] <Bartzy> How do you parse it ?
[14:17:49] <Bartzy> just one by one ?
[14:18:36] <NodeX> varying methods, the main one is grouping by session_id so I can have an array of what a user did from first contact to leaving the site
[14:18:53] <Bartzy> cool
[14:19:30] <NodeX> I had the luxury of building it into my CMS though
[14:19:36] <NodeX> so it's all automatic
[14:20:12] <Bartzy> I have another question about my data. the photos collection I was talking about only contains photos that has been "shared" by the user that created those photos. There is another "results" collection which just has all the photos, regardless if they are shared or not. When a photo is created, it is inserted into 'results'. When it is shared, it is inserted (some of the data at least), into 'photos'.
[14:20:51] <Bartzy> I've done the results/photos separation, so it would be easier to get photos that are available for showing, without indexing on a "shared:1, shared:0" key...
[14:21:07] <Bartzy> Is that a correct assumption? :|
[14:21:22] <NodeX> it's twice the index size
[14:21:30] <NodeX> or near enough twice
[14:21:39] <Bartzy> what is ? Splitting them or keeping them together ?
[14:21:52] <NodeX> you ahve an _id for each photo's and results rather than one _id and another small field
[14:22:00] <Bartzy> yes
[14:22:05] <NodeX> have * -... splitting them is twice the index size
[14:22:11] <NodeX> and twice the overhead
[14:22:18] <NodeX> and wont make good use of caches inside mongo
[14:22:24] <Bartzy> But on results I do nothing - It's there for archive - I can actually delete it later, or just have it on some slower disks ....
[14:22:32] <Bartzy> I never query 'results'
[14:22:45] <Bartzy> Only maybe on a slave, for our internal statistics
[14:22:57] <NodeX> for example, if you query the results then goto a result and query the "photos" mongo -could- viably visit the cache rather than query again
[14:23:18] <NodeX> but you still have an index on _id
[14:23:35] <NodeX> that index -may- make other indexes splill to disk and have a performance hit
[14:24:09] <NodeX> I'd save yourself the headache and have one collection tbh
[14:24:18] <Bartzy> Yes, but what if I never query 'results', all my queries are for shared photos only. So if I have only one collection, all my indexes need to have a 'shared' with them
[14:24:36] <NodeX> you still have an index automatically on _id no matter what
[14:24:49] <Bartzy> the results collection is 600 million documents, increasing 2 million documents per day. Shared is %18 of that.
[14:25:09] <Bartzy> Now if I want to get all the photos a user wishes to show to the world:
[14:25:14] <NodeX> still alot of data that you dont need
[14:25:28] <Bartzy> db.photos.find({uid:UID_OF_USER})
[14:25:33] <Bartzy> and index on uid
[14:25:40] <Bartzy> if I have only one collection:
[14:25:59] <Bartzy> db.results.find({uid:UID_OF_USER, shared:1}), and a compound index on {uid:1, shared:1}. Right?
[14:26:19] <NodeX> you can have the same index on photo's though
[14:26:27] <NodeX> and save the "_id" index
[14:26:34] <Bartzy> save ?
[14:26:43] <NodeX> yes - save = dont need
[14:26:58] <NodeX> as in save yourself money
[14:27:20] <Bartzy> But the shared key will be in any index.
[14:27:29] <Bartzy> and useless results documents will reside in memory
[14:27:40] <Bartzy> instead of valuable shared-results (photos) documents.
[14:27:44] <Bartzy> I think? :|
[14:27:57] <NodeX> the most used documents will be in memory as per the LRU
[14:28:45] <Bartzy> so adding 'shared' field in any index I have is better than having that _id index on results ?
[14:29:52] <NodeX> it's better than having a results collection period
[14:30:29] <Bartzy> And what if I can delete those unshared results ?
[14:30:47] <NodeX> at some point they have an index on _id
[14:31:16] <Bartzy> so you say if I can delete them - just delete them from that one collection, not their own
[14:32:15] <Ushu_> NodeX : i have a log for my error _a != - 1 ... here http://pastebin.com/FNnkemig
[14:32:16] <NodeX> In my opinion yes, if nothing than to save a headache and indexes
[14:32:27] <Bartzy> NodeX: Thanks.
[14:33:10] <NodeX> Ushu_ : the query please not the log lol
[14:39:59] <Ushu_> i think that the problem is not in query because it run on an another server
[14:40:31] <NodeX> ok, good luck
[14:41:19] <Ushu_> thanks a lot
[14:44:36] <Bartzy> downloads-distro.mongodb.org - bad throughput :
[14:44:37] <Bartzy> :[
[15:11:33] <Bartzy> NodeX: OMG. for 600 million rows the _id index only is 20GB
[15:11:56] <NodeX> only
[15:12:22] <NodeX> what if that 20gb pushes an important index out of memory .. when it doesn't need to ?
[15:13:18] <kali> this is ~ 33bytes per record, right ?
[15:13:27] <kali> not so bad
[15:14:02] <Bartzy> only - was not "oh that is small". It was "the index alone is 20GB!" :p
[15:14:11] <Bartzy> how is that not bad
[15:14:23] <Bartzy> Object ID is 8 bytes ?
[15:15:21] <Bartzy> NodeX: But I wonder how much the extra "shared" index in all my indexes will take .
[15:15:22] <kali> 12
[15:15:41] <Bartzy> kali: So an index for that key is x3 than the value itself ?
[15:16:34] <NodeX> you seem hell bent on dong your method so I should do it
[15:16:41] <NodeX> dong -> doing
[15:17:18] <NodeX> then tomorrow when it sinks in why it's not as efficient as the other way we're here to help again ;)
[15:40:59] <kzinti> When I type show log in a shell I get a type error stating that res.names has no properties. Does this mean that I just don't have any logs?
[16:36:45] <doxavore> Is it possible to create circumstances where, just after inserting a document, a query for values (non-_id) matching that document returns no results?
[16:37:03] <TkTech> Sure, constantly.
[16:37:17] <TkTech> You can't even be sure your insert succeeded without safe=True.
[16:37:21] <doxavore> Well, that's no fun. :)
[16:37:25] <doxavore> Oh, I am running safe=True
[16:38:01] <TkTech> Then from the same connection (assuming no error is returned) and you aren't sharding or reading from a replica secondary, it shouldn't be.
[16:39:38] <doxavore> Hmmm - it's using a connection pool, but the query is definitely firing after the create, with no sharding and reading off of master.
[16:41:58] <TkTech> Did you open up the shell and make sure your insert actually works?
[16:42:09] <TkTech> It's more likely you've made a mistake than anything else.
[16:47:00] <doxavore> TkTech: It's basically utilizing Mongoid's find_or_create_by. Out of about 400,000 cases, 400 of them suffer from an issue where they've basically been created twice.
[16:49:24] <doxavore> Mongoid translates it to 2 calls to MongoDB: find(with my options) and if that returns null, create(with my options). I can't consistently get it to break, but for a month now it's been hovering at about 0.1% of cases.
[16:49:42] <TkTech> doxavore: https://github.com/mongoid/mongoid/blob/master/lib/mongoid/finders.rb#L143
[16:49:56] <TkTech> doxavore: Yeah, it's non-atomic.
[16:50:06] <TkTech> doxavore: You will encounter race conditions.
[16:51:50] <doxavore> I will only ever have one thread executing a find_or at a time with given arguments, so I think that removes the chances of a race condition on the Mongoid side.
[16:53:07] <doxavore> What I don't know is if 1 connection inserts (with safe=true), gets an OK, and then potentially another connection queries, will Mongoid tell me it doesn't exist yet?
[16:53:30] <doxavore> Sorry, I mean will Mongo tell me it doesn't exist
[16:56:04] <digitalfiz> Hey guys I had a question about proper design with mongodb
[16:56:33] <digitalfiz> I come from a long mysql background so I am sturggling with this new way of thinking, enjoying it but struggling
[16:57:09] <TkTech> doxavore: Nope. Safe just means the issuer waits until success and minimal replication occurs (w=3).
[16:58:09] <digitalfiz> I have 2 peices of data I want to store: users and devices. The devices are connected to the users so my thought was in mongo I should make a users collection and have an a user with a list in it of devices attached to that user but I'm not sure if I should keep them seperated like in a users collection and a devices collection
[16:58:28] <digitalfiz> what makes more sense for mongo
[16:58:45] <TkTech> Why would you keep a list?
[16:58:59] <TkTech> Just put the user identifier on the device and query on that.
[16:59:02] <digitalfiz> im not sure my terminology is correct
[16:59:10] <digitalfiz> yeah thats what i was thinking
[16:59:17] <digitalfiz> a users collection and a devices collection
[16:59:26] <digitalfiz> and in the devices collection have a user_id
[16:59:51] <digitalfiz> but that seemed very relational to me so i wasnt sure if it was better to combine them somehow
[17:03:48] <wereHamster> digitalfiz: google 'mongodb schema design'
[17:05:15] <digitalfiz> thanks wereHamster ill do that
[17:06:21] <digitalfiz> here is the 2 examples im struggling with http://pastie.org/4406883
[17:06:41] <NodeX> what get's queried more?
[17:07:03] <NodeX> how large is the nested devices likely to get
[17:07:28] <digitalfiz> shouldnt be more then maybe 2 devices per person if that
[17:07:35] <digitalfiz> 20 not 2
[17:07:45] <NodeX> do the devices change?
[17:07:52] <doxavore> TkTech: A findAndModify can upsert, so I think I should be able to get that working instead of Mongoid less-than-exemplary implementation. Thanks.
[17:07:55] <digitalfiz> yes and often
[17:08:01] <digitalfiz> more often then the users information
[17:08:32] <NodeX> how many devices in total
[17:08:35] <NodeX> how many users
[17:08:49] <TkTech> digitalfiz: My original statement stands for your use case.
[17:08:52] <digitalfiz> i dont have those numbers yet its a new database with nothing so far
[17:09:23] <digitalfiz> TkTech: yeah thats what im thinking more and more as i talk it out on irc
[17:09:47] <digitalfiz> the devices may also need to be queried without user information
[17:10:01] <Bilge> I want to be the Mongod
[17:11:16] <NodeX> You'll need 2 collections then
[17:13:04] <digitalfiz> thanks for the help guys I'm just super new to mongo so I'm second guessing my decisions as I was told to forgot what I know about mysql so if it seems similar I try to think how it could be done differently
[17:24:29] <|RicharD|> hello
[17:24:37] <|RicharD|> how i can see all
[17:24:53] <|RicharD|> "field" of a document ?
[17:25:33] <ron> umm, what now?
[17:30:22] <|RicharD|> ?
[17:31:16] <ron> try rephrasing your question.
[17:31:28] <|RicharD|> sorry i have
[17:31:35] <|RicharD|> a mongo database
[17:31:57] <|RicharD|> i see all "tables" like users,profiles ecc...
[17:32:09] <ron> so far so good.
[17:32:13] <|RicharD|> but i don't know how i can see the content (like in phpmyadmin)
[17:32:19] <|RicharD|> i'm really new to mongodb
[17:32:25] <|RicharD|> so sorry if i compare to mysql
[17:32:28] <ron> by the way, "tables" are called "collections" in mongodb.
[17:32:38] <ron> don't worry, you're not the first.
[17:32:49] <ron> how do you see your collections? do you use the shell?
[17:33:01] <|RicharD|> i can use
[17:33:07] <|RicharD|> shell but now i'm using mongohub
[17:33:48] <ron> I'm not familiar with mongohub, so I'm not sure I can help you with it :-/
[17:33:57] <|RicharD|> and
[17:33:59] <|RicharD|> with shell ?:)
[17:34:04] <|RicharD|> it's oki also with shell
[17:34:07] <ron> okay
[17:34:29] <ron> if you want to just see documents in a collection, simply do db.collectionname.find()
[17:34:44] <ron> so if you have a collection 'users' - db.users.find()
[17:34:57] <ron> that will give you 10 documents.
[17:35:35] <|RicharD|> how
[17:35:40] <|RicharD|> i can specific the db in connect ?
[17:35:57] <ron> use 'dbname'
[17:36:08] <|RicharD|> good
[17:36:38] <|RicharD|> not give me nothing
[17:36:44] <|RicharD|> :
[17:36:44] <|RicharD|> :S
[17:36:48] <|RicharD|> > db.accounts.find()
[17:36:48] <|RicharD|> > db.accounts.find()
[17:36:48] <ron> sec.
[17:37:23] <Almindor> hello
[17:37:30] <ron> |RicharD|: http://www.mongodb.org/display/DOCS/Tutorial
[17:37:45] <|RicharD|> oki thx a lot
[17:38:25] <Almindor> I know this is not mongoose channel but I couldn't find anything closer. Does anyone know what mongoose.remove(conditions, callback) calls on the callback? (the args) they don't say at http://mongoosejs.com/docs/finding-documents.html
[17:38:47] <Almindor> sorry, ment "Model.remove"
[17:39:06] <Venom_X> Almindor: #mongoosejs
[17:39:41] <Almindor> aha
[17:39:42] <Almindor> thanks
[17:39:44] <Venom_X> and I'm pretty sure passes err
[17:57:19] <jgornick> hey guys, in 2.2, how do you query for db references? before you were able to do something like field.$id, but that doesn't seem to exist anymore.
[20:41:02] <autolycus> I have a query like yItem= { "e" : 187, "options.id" : { '$exists': true } }; which works fine but now I want to add another condition which is that children.option.id has to be true too…how can I do that
[20:41:03] <autolycus> thanks
[20:41:34] <autolycus> so I wanna do something like { "e" : 187, "options.id" : { '$exists': true } or { "e" : 187, "children.options.id" : { '$exists': true }
[20:41:45] <autolycus> but not sure what the correct syntax for mongo is
[20:43:42] <autolycus> anyone?
[20:43:48] <ikiini> yo
[20:44:29] <ikiini> whats up @autolycus
[20:44:49] <autolycus> so I wanna do something like { "e" : 187, "options.id" : { '$exists': true } or { "e" : 187, "children.options.id" : { '$exists': true }
[20:44:56] <autolycus> how can I do that
[20:46:45] <ikiini> ok you do <your_colleciton>.find({'e':187,'option.id':{$exists:true}})
[20:47:34] <autolycus> that will work..I need or condition
[20:47:47] <ikiini> oh
[20:47:51] <autolycus> i need option.id or children.option.id
[20:47:51] <ikiini> ok
[20:48:28] <ikiini> you give or a list to chose from
[20:49:56] <autolycus> how?
[20:50:01] <ikiini> so <your_colleciton>.find( {"e" : 187, $or : [ {"options.id" : { '$exists': true }},"children.options.id" : { '$exists': true }]})
[20:50:04] <autolycus> first day today with mongo
[20:51:26] <ikiini> does that work?
[20:52:12] <autolycus> gives me this Tue Aug 7 13:51:24 SyntaxError: missing ] after element list (shell):1
[20:54:24] <autolycus> any idea?
[20:56:40] <payal> shards = nodes in MongoDB?
[20:57:50] <jY> payal: no
[20:58:22] <jY> a shard member *should* be comprised of a replicaset
[20:58:31] <jY> but it doesn't have to be.. it can be a since server
[20:58:33] <payal> jY: I was asked a question that went something like "On how many nodes are you running MongoDB"
[20:59:19] <payal> what is meant by "nodes" here?
[20:59:30] <jY> probably servers
[20:59:44] <payal> oh, okay
[20:59:48] <payal> thanks :)
[21:03:14] <autolycus> can someone help me with this
[21:03:14] <autolycus> db.products.find({events" : 18775, $or : [{"custom_options.id" : { '$exists': true }},{"children.custom_options.id" :{ '$exists': true }}]});
[21:03:18] <autolycus> its not working
[21:03:24] <autolycus> any help will be appreciated
[21:06:46] <ikiini> events is not fully quoted
[21:09:08] <ikiini> @autolycus. you can start by making sure your quotes are consistent. you are mixing double and single quotes.
[21:09:19] <ikiini> and your events is not fully quoted
[21:09:38] <ikiini> the events key is not fully quoted
[21:10:48] <ikiini> other than that it looks ok
[21:17:03] <autolycus> hmm not sure why it doesn't give me correct results
[21:17:17] <autolycus> when I just do custom_options_id it has 64
[21:17:32] <autolycus> but when I combine it with children.custom.option.id it has 0
[21:17:44] <autolycus> its an or condition, at the least the result should be 64
[21:17:57] <autolycus> for ikiini
[21:21:40] <dstorrs> hi all. I have a sharded DB where each shard is the same size (800G). Shard 1 is 60% full, shard 2 is 42% full. I am pretty sure that the extra space on shard 1 is deleted docs from an earlier schema change. Is it likely that, after compaction, shard 1 will be the same size as shard 2
[21:21:45] <dstorrs> ?
[21:32:37] <jY> dstorrs: not 100% sure but compact just defrags.. i don't think it will un-allocate any of the files already created for use
[21:34:10] <dstorrs> I was thinking I would set up a new replicant, let the data copy over to that, and it SHOULD only copy the undeleted stuff. then I either take the primary out and run a repair (? I forget the command) on it, which will actually compact it, or I save it off somewhere and migrate it back from the replicant
[21:40:59] <dstorrs> jY: I'm really just focused on the "if I got rid of the deleted stuff, is it plausible that the two would end up the same size? what information is needed to assess that?"
[21:41:22] <dstorrs> oh, there are about a dozen unsharded collections on shard 1, totally maybe a gig between them
[21:42:29] <jY> dstorrs: mongo doesn't recover space like that
[21:42:50] <jY> it will use what it has pre-allocated though
[21:45:24] <dstorrs> jY: db.runCommand({compact:'collectionname'}) will not shrink the datafiles. db.repairDatabase() will.
[21:45:31] <dstorrs> http://www.mongodb.org/display/DOCS/Excessive+Disk+Space#ExcessiveDiskSpace-RecoveringDeletedSpace
[21:46:45] <dstorrs> I'm just looking for a sanity check on whether there is some reason that one shard (the original one, from before the DB became sharded) would always be heavier.
[23:22:10] <acidjazz> so im live now w/ a 3 server replica set
[23:22:19] <acidjazz> and i need to setup an automated back up plan
[23:22:25] <acidjazz> any suggestions?
[23:28:05] <linsys> acidjazz: LVM snapshots
[23:28:12] <acidjazz> of hte directory?
[23:28:23] <acidjazz> that involves turning on journaling
[23:28:33] <acidjazz> i think i might just rock this https://github.com/micahwedemeyer/automongobackup
[23:28:35] <linsys> yes it does.
[23:28:46] <acidjazz> on my 2nd slave
[23:29:12] <linsys> acidjazz: yea that is a full backup each time
[23:29:24] <acidjazz> i see what you're saying
[23:29:27] <acidjazz> vs a snapshot
[23:29:28] <acidjazz> a diff
[23:29:46] <linsys> acidjazz: You could always stop mongodb on the 3rd node and snapshot the file system real quick then start it up again
[23:30:04] <linsys> I still do that and I have journaling enabled
[23:30:05] <acidjazz> why not just mongodump then? cuz its full each time?
[23:30:35] <acidjazz> can i enable journaling on just one node of the replica set?
[23:30:44] <linsys> yes
[23:30:48] <acidjazz> ok cool
[23:30:57] <acidjazz> ill start w/ this script to do full backups daily/weekly
[23:31:03] <acidjazz> then ill do this next
[23:31:19] <linsys> yea I mean there is nothing wrong with that script but if/when you have gigs and gigs of data it becomes kind of a pain in the a**
[23:32:11] <acidjazz> yea i dont yet
[23:32:16] <acidjazz> but i will soon
[23:32:30] <acidjazz> even when i have gigs i will want at least a weekly full backup to put somewhere offline
[23:32:48] <acidjazz> i just dont have the time to research lvm snapshots just yet
[23:34:02] <linsys> I'll write a blog about it :)
[23:34:11] <linsys> How to backup mongodb with LVM :)
[23:34:30] <acidjazz> id love that
[23:34:46] <acidjazz> ill def spread it
[23:35:08] <acidjazz> if you wouldnt mind acidjazz@gmail.com if you get it done and im idle in here
[23:35:15] <acidjazz> and ill def send it along
[23:35:29] <linsys> I'm sure 10gen will retweet it for me
[23:35:35] <linsys> my site is http://www.briancarpio.com
[23:35:43] <linsys> I write about mongodb when I have time
[23:36:19] <acidjazz> i can target a more general aws heavy crowd
[23:36:33] <linsys> Ok kool..
[23:36:36] <acidjazz> yea i think ive seen someone refer to you in here
[23:39:36] <acidjazz> hwo do i check my journaling status?
[23:54:06] <icicled> I'm a little lost on how mongo handles dates
[23:54:46] <icicled> in the mongo shell when I run: db.mycoll.find({ reading_date: { $gte: new Date(2012,1) } }).count() I get the correct count
[23:55:39] <icicled> however, when I do: db.mycoll.find({ reading_date: { '$gte': { '$date' : 1328007600000} } }).count() I get 0
[23:55:43] <acidjazz> how do i check if journaling is enabled
[23:55:50] <acidjazz> google and mongo docs are 0 help
[23:56:41] <icicled> I'm trying to build up a query to pass to mongodump (dump docs w/reading_date > than 31st Jan 2012)