[01:13:10] <mappum> I have documents that are able to be "liked" by users, and I want to get a few fields from them. How can I check if a certain user has liked it without making a separate query or serving the whole "like" array?
[05:36:38] <macarthy> OK, what else you need to know ?
[05:37:27] <ron> more about the data structure, query types.
[05:39:31] <macarthy> Each item will have an array of hashes, some deeply nested, generally queries will access the array and and the first level of the hashes
[05:40:23] <ron> macarthy: okay, I'm going to milk it out of you :)
[05:47:09] <macarthy> so my report should contain DBrefs to the items correct?
[05:49:03] <macarthy> Guess I'm thinking about this http://docs.mongodb.org/manual/use-cases/storing-comments/?highlight=embed
[05:49:47] <ron> okay, I'd really like to help but I'm afraid that any answer I'll give you will be unprofessional. I don't know what you mean by 'report' - you don't describe your application. but let's say this, assuming you have many data entries, each entry as its own document, and you want to collate those entries to a single document when you read from the database, don't hold an additional document with reference to those entries but rather query the database for those
[05:49:52] <ron> I really hope that makes sense to you.
[05:50:41] <ron> you should store the data so that reading it would be easy. assuming you come from the DRBMS world, you need to break free from some... hmm.. assumptions.
[05:51:03] <ron> keep as little references between documents as possible.
[05:51:16] <ron> avoid keeping bi-directional links between documents.
[05:52:05] <ron> depending on what you do more, storing or querying (or which is more important) - base your data model. aim for the more important one to be easier.
[05:52:38] <macarthy> and mongo can store 1.5 million * 100K * n where n is between say 4 - 10 with reasonable performance?
[05:56:04] <ron> again, it's difficult to give concrete suggestions without going into details, but I think that should give you the basics.
[05:57:06] <macarthy> lets say I an creating said report, the report has a date, author, title, should I create a report object and ref it in the items or … ?
[08:50:28] <Bartzy> I read the entire MongoDB in Action book and thought it would make me smarter! It did, but just a bit :)
[08:50:43] <Bartzy> I guess you can't beat experience.
[08:50:45] <NodeX> Bartzy : I really suggest that you split into 1-3 collections
[08:51:33] <Bartzy> So I have this: Pinterest like UI - each photo can be liked, viewed, commented on (each of those actions is recorded with the uid and name of the user who did it, and with comment some more data of course)
[08:52:37] <NodeX> I would save teh count of likes/views/comments with each photo, and probably save the bare minimum of meta information
[08:52:49] <Bartzy> 50 photos ("pins") per page, when you scroll down you get 50 more. Each of those photo boxes has a like/comment/view counters, and the last 3 comments. When you click on a photo, you can see all the comments, likes, views in some other UI.
[08:52:59] <NodeX> (that way you can display a page then users can drill down / paginate should they need to)
[08:53:27] <SisterArrow> Im doing a mongorestore from a dump.. Im at a specific db.collection now and when doing a db.collection.count() in the shell(on another machine that is being restored to) its reported that there are 82618776 objects.
[08:53:42] <Bartzy> Almost all (!!) of the photos will have no likes/comments/views, but popular ones can have hundreds of thousands of "views", thousands of likes... And there is a "Popular" tab so it shows 50 of those heavy photos each time :)
[08:53:55] <SisterArrow> The mongorestore hoewher says that there is 308168834839.. :o
[08:54:05] <SisterArrow> The mongorestore comes from the same machine as I query..
[08:54:13] <Bartzy> NodeX: OK I'm done explaining what are the needs, now to my (yours? :)) solution :
[08:54:15] <NodeX> The "count" of likes/views/comments will take care of the "heavy" page Barzty
[08:54:49] <Bartzy> photos collection with metadata like _id, URL, image size... and - likes/comments/views counters , AND the an array of the last 3 comments.
[08:57:26] <Bartzy> When it is in one collection (bad, like it is now), in order to know if the user liked/commented/viewed on each of those 50 photos displayed each time - We get the entire document into PHP, and in PHP we check if their UID is in likes/comments/views fields.
[08:57:26] <NodeX> it's probably less expensive to store with the user what he/she has liked/commented on (just the ID's)
[08:58:11] <Bartzy> NodeX: So a "users" collection with likes/comments/views of his, with just the _id of the photos in those arrays ?
[08:58:23] <NodeX> because you can initialise that data into the PHP session and you already have the ID of the photo so a quick "in_array()" and you know if they have or not
[08:59:08] <NodeX> every time they like something add it to the DB and their session
[08:59:11] <Bartzy> and in the (very) unlikely scenario where a user viewed 10,000 photos (in theory) - that in_array() on the "views" array of the user - will not be a problem ?
[08:59:26] <Bartzy> NodeX: Session/memcached/APC - doesn't matter I guess ?
[08:59:55] <NodeX> I would persistently (with a timeout) cache it in redis personaly - easier to manage
[09:00:18] <Bartzy> why use redis and not just mongo ?
[09:00:21] <NodeX> you can save the number of likes a user has (as a count) and if it's >N goto your cache
[09:00:23] <Bartzy> it's a simple array in the user document
[09:00:36] <NodeX> if you let me finish my suggestion you will find out ;)
[09:17:00] <Bartzy> But building the array by using array_keys would be slower..
[09:17:01] <NodeX> mentioned * because I didn't know the performance
[09:17:11] <Bartzy> I thought of just getting the user metadata (what he liked/viewed/commented on) when he first arrives at the page - save it in memcached/session/apc
[09:17:32] <Bartzy> and then for each photo, do in_array or isset (whatever) on that likes/comments/views array in PHP
[09:18:30] <Bartzy> NodeX: Also, why separate the comments to a different collection too - and not just keep them in the same document, if we assume that there would not be thousands of comments ?
[09:52:04] <Bartzy> NodeX: Page 1 = page with all the photos ?
[09:52:08] <Bartzy> On the photo viewer - 50 comments
[09:52:17] <Bartzy> on the page with all the photos (i.e. pinterest.com) - 2-3 commentsa.
[09:55:20] <Bartzy> I don't get why to put the last liikes/views/comments on the photos collection, and then the rest on another collection?
[09:55:36] <Bartzy> Why not only put 2 last comments on the photos collection and then everything else (including those comments) in the metadata collection.
[10:16:25] <NodeX> then 50 is probably the acceptable number
[10:16:50] <NodeX> and you're asking me questions that you should be asking the mirror
[12:00:22] <Ushu_> hello, can anyone help me ? i have this error _a != -1 when i want to insert
[12:28:01] <kali> Bartzy: you want datetime (which is mapped to ISODate in shell). Timestamp is not really a timestamp
[12:28:46] <NodeX> the only reason I can think of using ISODAte is to visualy aid when viewing objects, unless there is a performance advantage I am unaware of
[12:29:15] <Derick> the ttl collection can only use isodate types too
[12:29:31] <Derick> but otherwise, just an int will work as well for timestamps!
[12:38:06] <Derick> Bartzy: no, "timestamp" is a wrong tiype, but yo ucan use an int to store unix timestamps
[12:59:17] <Bartzy> I have another question in the likes saga
[12:59:30] <Bartzy> if I have an array in a document. I want to always have a maximum of 50 elements in it
[13:00:08] <Bartzy> i.e. when a new element needs to get in, get it in (first if possible, if not, then last), and remove the last one only if it's the 51th element. If we didn't reach 50 , don't do it
[13:00:19] <Bartzy> Can I do it without first fetching the count of this array ? :|
[13:04:15] <feklee> I want to store settings in a MongoDB database. Every setting has a key and a value. What's best design? (A) `settings` collection with one document containing all the settings, or (B) `settings` collection with one document per setting (`_id` = key, `value` = value)?
[13:04:49] <feklee> Where do I learn about best practises for MongoDB database design?
[13:06:11] <kali> feklee: http://www.mongodb.org/display/DOCS/Schema+Design you can start from there
[13:07:04] <feklee> kali: Thanks! Any suggestion concerning my particular problem (which should be common)?
[13:07:48] <kali> feklee: it's likely to be a small collection with an even smaller number of write, so anything should work, realyk :)
[13:10:33] <feklee> kali: OK. So, I assume that everyone does that differently.
[13:11:34] <kali> feklee: i have a collection like that, with hierarchical content. I have one document per key at the first level of the tree...
[13:47:05] <NodeX> the trouble you will keep stumbling on is you're trying to use a k/v store / database for something that's designed to be handled by an edge graphing database
[13:47:18] <NodeX> (this part of the functionality at least)
[13:47:42] <Bartzy> NodeX: No one is using an edge graph for these stuff.
[14:10:08] <NodeX> I personaly use a "GUessing" method Ron
[14:10:16] <NodeX> php is good at guessing the right one :P
[14:10:41] <Bartzy> Ah, actually you can also use an id in the embedded document... But still referencing seems much more flexible
[14:10:45] <NodeX> Bartzy : you will find all sorts of good reasons when you develop your app - things you never thought of in the beginning
[14:11:31] <NodeX> As I said yetserday - I started with an app that nested the images in a gallery, even at 10 per gallery it was a nightmare
[14:11:42] <NodeX> I was really limited to what I could do with my data
[14:12:26] <Bartzy> NodeX: We were now thinking that no one will ever want to know who is the 51th person who liked their photo. Then we just need the last 50 likes and a total likes counter (and save the likes of each user in their own user document, to avoid more than 1 like per user per photo). But we have that $push and $pull issue which is a bitch...
[14:13:19] <NodeX> the user may not want to know it but you -may- want to know it in your app where you can graph trends and things
[14:13:32] <NodeX> if you have the capactiy I recommend storing data like that
[14:15:16] <Bartzy> but what graphing can be done with that data. Averages can be done with counts. Averages per user can be done with the user document.
[14:18:36] <NodeX> varying methods, the main one is grouping by session_id so I can have an array of what a user did from first contact to leaving the site
[14:20:12] <Bartzy> I have another question about my data. the photos collection I was talking about only contains photos that has been "shared" by the user that created those photos. There is another "results" collection which just has all the photos, regardless if they are shared or not. When a photo is created, it is inserted into 'results'. When it is shared, it is inserted (some of the data at least), into 'photos'.
[14:20:51] <Bartzy> I've done the results/photos separation, so it would be easier to get photos that are available for showing, without indexing on a "shared:1, shared:0" key...
[14:21:07] <Bartzy> Is that a correct assumption? :|
[14:22:45] <Bartzy> Only maybe on a slave, for our internal statistics
[14:22:57] <NodeX> for example, if you query the results then goto a result and query the "photos" mongo -could- viably visit the cache rather than query again
[14:23:18] <NodeX> but you still have an index on _id
[14:23:35] <NodeX> that index -may- make other indexes splill to disk and have a performance hit
[14:24:09] <NodeX> I'd save yourself the headache and have one collection tbh
[14:24:18] <Bartzy> Yes, but what if I never query 'results', all my queries are for shared photos only. So if I have only one collection, all my indexes need to have a 'shared' with them
[14:24:36] <NodeX> you still have an index automatically on _id no matter what
[14:24:49] <Bartzy> the results collection is 600 million documents, increasing 2 million documents per day. Shared is %18 of that.
[14:25:09] <Bartzy> Now if I want to get all the photos a user wishes to show to the world:
[14:25:14] <NodeX> still alot of data that you dont need
[15:17:18] <NodeX> then tomorrow when it sinks in why it's not as efficient as the other way we're here to help again ;)
[15:40:59] <kzinti> When I type show log in a shell I get a type error stating that res.names has no properties. Does this mean that I just don't have any logs?
[16:36:45] <doxavore> Is it possible to create circumstances where, just after inserting a document, a query for values (non-_id) matching that document returns no results?
[16:38:01] <TkTech> Then from the same connection (assuming no error is returned) and you aren't sharding or reading from a replica secondary, it shouldn't be.
[16:39:38] <doxavore> Hmmm - it's using a connection pool, but the query is definitely firing after the create, with no sharding and reading off of master.
[16:41:58] <TkTech> Did you open up the shell and make sure your insert actually works?
[16:42:09] <TkTech> It's more likely you've made a mistake than anything else.
[16:47:00] <doxavore> TkTech: It's basically utilizing Mongoid's find_or_create_by. Out of about 400,000 cases, 400 of them suffer from an issue where they've basically been created twice.
[16:49:24] <doxavore> Mongoid translates it to 2 calls to MongoDB: find(with my options) and if that returns null, create(with my options). I can't consistently get it to break, but for a month now it's been hovering at about 0.1% of cases.
[16:50:06] <TkTech> doxavore: You will encounter race conditions.
[16:51:50] <doxavore> I will only ever have one thread executing a find_or at a time with given arguments, so I think that removes the chances of a race condition on the Mongoid side.
[16:53:07] <doxavore> What I don't know is if 1 connection inserts (with safe=true), gets an OK, and then potentially another connection queries, will Mongoid tell me it doesn't exist yet?
[16:53:30] <doxavore> Sorry, I mean will Mongo tell me it doesn't exist
[16:56:04] <digitalfiz> Hey guys I had a question about proper design with mongodb
[16:56:33] <digitalfiz> I come from a long mysql background so I am sturggling with this new way of thinking, enjoying it but struggling
[16:57:09] <TkTech> doxavore: Nope. Safe just means the issuer waits until success and minimal replication occurs (w=3).
[16:58:09] <digitalfiz> I have 2 peices of data I want to store: users and devices. The devices are connected to the users so my thought was in mongo I should make a users collection and have an a user with a list in it of devices attached to that user but I'm not sure if I should keep them seperated like in a users collection and a devices collection
[16:58:28] <digitalfiz> what makes more sense for mongo
[17:07:52] <doxavore> TkTech: A findAndModify can upsert, so I think I should be able to get that working instead of Mongoid less-than-exemplary implementation. Thanks.
[17:13:04] <digitalfiz> thanks for the help guys I'm just super new to mongo so I'm second guessing my decisions as I was told to forgot what I know about mysql so if it seems similar I try to think how it could be done differently
[17:38:25] <Almindor> I know this is not mongoose channel but I couldn't find anything closer. Does anyone know what mongoose.remove(conditions, callback) calls on the callback? (the args) they don't say at http://mongoosejs.com/docs/finding-documents.html
[17:39:44] <Venom_X> and I'm pretty sure passes err
[17:57:19] <jgornick> hey guys, in 2.2, how do you query for db references? before you were able to do something like field.$id, but that doesn't seem to exist anymore.
[20:41:02] <autolycus> I have a query like yItem= { "e" : 187, "options.id" : { '$exists': true } }; which works fine but now I want to add another condition which is that children.option.id has to be true too…how can I do that
[21:21:40] <dstorrs> hi all. I have a sharded DB where each shard is the same size (800G). Shard 1 is 60% full, shard 2 is 42% full. I am pretty sure that the extra space on shard 1 is deleted docs from an earlier schema change. Is it likely that, after compaction, shard 1 will be the same size as shard 2
[21:32:37] <jY> dstorrs: not 100% sure but compact just defrags.. i don't think it will un-allocate any of the files already created for use
[21:34:10] <dstorrs> I was thinking I would set up a new replicant, let the data copy over to that, and it SHOULD only copy the undeleted stuff. then I either take the primary out and run a repair (? I forget the command) on it, which will actually compact it, or I save it off somewhere and migrate it back from the replicant
[21:40:59] <dstorrs> jY: I'm really just focused on the "if I got rid of the deleted stuff, is it plausible that the two would end up the same size? what information is needed to assess that?"
[21:41:22] <dstorrs> oh, there are about a dozen unsharded collections on shard 1, totally maybe a gig between them
[21:42:29] <jY> dstorrs: mongo doesn't recover space like that
[21:42:50] <jY> it will use what it has pre-allocated though
[21:45:24] <dstorrs> jY: db.runCommand({compact:'collectionname'}) will not shrink the datafiles. db.repairDatabase() will.
[21:46:45] <dstorrs> I'm just looking for a sanity check on whether there is some reason that one shard (the original one, from before the DB became sharded) would always be heavier.
[23:22:10] <acidjazz> so im live now w/ a 3 server replica set
[23:22:19] <acidjazz> and i need to setup an automated back up plan