[01:19:15] <darkpassenger> I cannot put body data into mongo db without creating weird json monsters instead of replacing the fields value that already exist
[06:50:38] <PaiSho> The maximum document size helps ensure that a single document cannot use excessive amount of RAM or, during transmission, excessive amount of bandwidth
[07:06:11] <PaiSho> i should have put quotation marks around that
[08:45:08] <retran> i have a mongodb server running that has 20GB in the data directory and 3GB in journal
[08:45:35] <retran> is there a way to know how much of that is indexing data?
[08:50:40] <ninepointsix> @retran - if you run db.collection.stats() you can find the size of an index for a given collection
[08:51:21] <retran> the size in data... will that equate reasonably 1:1 to size on disk?
[08:51:30] <retran> or is stats reporting size on disk
[08:51:58] <retran> i'm trying to evaluate to project future system needs
[08:57:26] <ninepointsix> @retran - I wasn't 100% sure myself, but here's a stack overflow that seems to have a rough equation you can use to figure out how big indexes will be http://stackoverflow.com/questions/8607637/are-there-any-tools-to-estimate-index-size-in-mongodb
[08:57:55] <retran> hey cool thanks for looking into it a bit :)
[09:03:52] <retran> heh...i think i had like just 2GB of 'data' before i indexed it for my finds(). 10times biggr so far
[09:09:51] <tadasZ> Good morning everyone ;) recently I have been reading a lot about mongodb, and one thing that stuck in my head is that 16mb limit for a document, but is it really such a big problem? for example I've read a lot examples about blogs and comments, and a lot of them mentioned that 16mb limit, but how many comments does a post have to have to reach that 16mb limit, don't you have to be facebook to _maybe_ reach 16mb or maybe commenters post entire
[09:10:28] <remonvv> tadasZ, it should never be a problem.
[09:10:49] <remonvv> tadasZ, in fact I can not think of a single valid use case for having 16mb documents or anything near that size.
[09:10:59] <remonvv> It's extremely rare to have ones that top 10kb
[09:11:33] <remonvv> Most big content blocks should not be stored in MongoDB like that. Either use a more specialized content distribution solution or use GridFS (which is a MongoDB feature that circumvents that limit).
[09:11:55] <remonvv> Basically, if you have a schema where running out of that 16mb is a problem your schema is not okay ;)
[09:12:04] <tadasZ> thats what I think too, but in every article I read that 16mb 16mb bla bla, and I started thinking maybe my math is really bad
[09:12:28] <retran> you could simply change document structure and link a collection instead
[09:13:21] <retran> if you feel you could reach that limit
[09:13:34] <tadasZ> yes, but for comments and text info like that it really should not be a problem yes?
[09:14:14] <retran> some webdevs always have hard time doing anything except trivial selects
[09:15:10] <tadasZ> yes I understand, its yes and no
[09:15:10] <remonvv> tadasZ, that's because most articles/blogs that mention it as a downside are theorycrafting. Nobody wants 16Mb blocks of content in a database.
[09:15:39] <remonvv> Right, a) probably not, and b) if comments get above 16mb you need to adjust your schema anyway because clearly you have a lot of traffic
[09:16:24] <retran> if you make a blog webapp that you want the possibility of over 16mb of comments in a post, don't have the comments nested in one object
[09:16:29] <remonvv> For example, depending on the r/w ratio you can store a page of comments per document, or a bucket of comments (say, 100). This will dramatically improve read and pagination performance.
[09:16:55] <retran> that's not to say they can't be treated as same object in the abstract
[09:17:15] <remonvv> I think you should conclude that that certainly should not be a limiting factor when choosing to use MongoDB over something else.
[09:17:40] <remonvv> Now, why it is a technical limitation is a more interesting discussion of course ;)
[09:17:43] <retran> in example of blogpage+comments, you could link them to your page and retrieve in 1 query
[09:17:59] <retran> but even that seems not too important... since you'd usually only display a few comments at once
[09:18:15] <retran> and you'd be making separate calls later to DB if a user wanted to see more
[09:19:26] <remonvv> FYI; the reason it's limited is not so much a technical limitation as it is a design choice. The maximum size of a BSON document is 16mb (was 4mb) to ensure a single document cannot take up an excessive amount of RAM/disk/bandwidth. Since MongoDB has to regularly move around growing documents on disk this would become a serious performance issue.
[09:19:47] <retran> i'd probably design it with two collections :| the comment collection linked to blog entry collection
[09:20:03] <retran> mongodb seems absurdly fast to me
[09:20:14] <remonvv> There might be some addressing/index size reasoning behind it too (only needing 20 bits to store size)
[09:20:48] <tadasZ> with comments i think i'd do it with 2 collections also, but i'm thinking about trying to make simple invitation system for first try, that would be events collection with embedded invites
[09:20:49] <remonvv> retran, wouldn't go as far as absurdly ;) It hits a few sweet spots at the expense of durability/consistency.
[09:20:53] <retran> i have the whole IMDB actor/film/tv epsiode database imported into a mongodb deployed on a DigitalOcean 1GB server, and it's blazingly fast
[09:21:02] <retran> i would cal it absurdly fast :|
[09:23:46] <PaiSho> i understand that the default is called "fs" with my driver
[09:23:52] <remonvv> retran, it's fast compared to slower things ;) If you do unsafe writes you can get amazing throughputs but it has durability issues.
[09:25:01] <PaiSho> also, the closest thing ive found to storing a mongodb document is to store a string of data
[09:25:02] <remonvv> PaiSho, GridFS is nothing more than splitting up large files into chunks and saving it into a dedicated collection (fs.chunks iirc). The metadata of the file goes in another collection for fast querying/listing.
[09:25:31] <tadasZ> well thank you guys for help! I was really confused up until now ;)
[09:25:47] <remonvv> PaiSho, hm? Documents are stored as BSON (a binary JSON-ish protocol, www.bsonspec.org)
[09:27:32] <remonvv> PaiSho, why would you want to do that?
[09:27:35] <retran> see above link, that's mongo searching entire IMDB actor db (from a cheapo VPS)
[09:28:17] <PaiSho> remonvv: it's an array of strings relevant to a user id
[09:28:24] <retran> it took 8 hours to create all the indexes for it on a 2 core bare metal machine, then moved data dir to a VPS
[09:28:25] <remonvv> PaiSho, there's no reason to use MongoDB to store huge, non-queryable data. That's what (distributed) filesystems are for. Just store the metadata.
[09:28:51] <PaiSho> the document name refers to the the user id
[09:28:57] <remonvv> PaiSho, right, so store a document with userId, searchable fields if any and a reference (URL/FS id) and store the data itself on some sort of CDN or S3
[09:28:57] <retran> result takes "execseconds":0.030270099639893
[09:29:37] <remonvv> PaiSho, MongoDB is not a good fit at serving large chunks of static data. But if you're pressed to use MongoDB then GridFS is for you.
[09:29:50] <PaiSho> but i'd like to have each string associated with a boolean value
[09:30:57] <remonvv> PaiSho, then you'd have to process the large chunk of strings, create a queryable bit of metadata with the boolean->string associations and query those.
[09:31:45] <remonvv> PaiSho, to be honest it sounds like you need something other than MongoDB. You're not hitting its sweetspot ;) Care to elaborate on why you'd need 16mb+ of data for a single user each time? Just curious
[10:11:50] <PaiSho> remonvv2: 14:10:24 PaiSho | remonvv: i think ive figured it out
[10:12:10] <PaiSho> ill create a gridfs file for each user
[10:12:25] <PaiSho> and that file will have lines of the user's quotes
[10:59:36] <moogway> 'morning (or 'afternoon) people, would anybody be kind enough to tell me if a new index is created if I am indexing a reference to an objectid (object is from collection "A", objectids are indexed in collection 'a') in collection 'b'
[11:03:17] <moogway> to explain with an example, let's say I am storing posts in a collection db.posts and each post document contains a reference to the creator of the post (aka 'user') which is an object (with indexed objectids) in collection 'users'
[11:04:23] <moogway> now to search for all the posts by a user, do I have to create an index of 'users' in the db.posts or will the index on users from db.users suffice?
[11:05:50] <moogway> i could embed the posts in the db.users documents itself, but am asking this to satisfy my curiosity
[11:05:54] <moogway> any help would be appreciated
[11:12:03] <Nodex> references are more of a common way of doing things than actually supported in the engine itself. Some ORM/ODM's will do the work for you when referencing and in those cases they do a SECOND lookup on a reference collection so ANY indexes MUST be on that second (referenced) collection
[11:14:04] <moogway> okay, so if i do something like db.posts.create_index('userid') will it create an index with the same content as the index created by db.users.create_index('userid')?
[11:17:06] <Nodex> references are not supported, they are merely a standard if you like .... a common way to express something, they have ZERO functioanlity inside mongodb
[11:18:46] <moogway> hmm, i think i need to read a bit more to be able to understand your point better... i am confused about how indexes work
[11:19:57] <Nodex> creating a reference to something DOES NOT automagically create an index for that
[11:20:04] <Nodex> it's doesn't get any simpler than that
[11:23:39] <Nodex> as a pointer, you really should avoid references where possible
[11:26:12] <moogway> okay, which is better in mongodb? to create one collection with multiple indexes, or to create collections with one index each?
[11:44:43] <remonvv> moogway, better as in faster?
[11:45:09] <moogway> not just faster, but also in terms of RAM usage
[11:45:38] <remonvv> moogway, multiple indexes on a single collection should generally be more RAM efficient and about as fast. More importantly it's cleaner.
[13:14:40] <baconmania> Hey guys. If I have a blogPost collection, which contains embedded comments so the documents look like: { postTitle: 'etc', timestamp: 'etc', comments: [ {commentId: 123, commentText: 'etc'}, ... ] } then is there any way for me to do a find by comment ID and get back a single comment object?
[13:15:58] <Derick> not unless you use the aggregation framework to unwind on comments
[13:26:20] <Nodex> haha, first = one matched or the first one matched
[13:26:40] <Nodex> this is cool, I can now change a load of code \o/
[13:26:49] <starfly> mrapple: with three shards in the US, create a replica set of each shard. It is highly recommended that you have three members of a replica set for proper elections of the primary in failover scenarios, but one member could be an arbiter in the US and the other a replica in the EU. So, at a minimum, each shard would have two "with a copy of data" replica set members and one arbiter, or a total of 3 x 3 = 9 replica set members in 3 replica sets
[13:27:30] <Derick> Nodex: doesn't work for me though
[13:29:35] <Derick> i think it's because of two nested arrays
[13:30:15] <Nodex> ah, perhaps it's limited to 1 level deep
[13:38:25] <Derick> however, just doing it in the app might be faster :-)
[13:38:35] <baconmania> I'm surprised there isn't already a quick module that wraps this query and exposes an easy way to do a query over embedded docs, but yeah I'm glad it can be done
[13:38:58] <Derick> it's often a sign of a badly designed schema that you need this...
[13:42:42] <baconmania> @Derick, the thing is if I were to separate the comments out into their own collection then I'd manually have to maintain integrity constraints like cascade deletes and such
[13:44:24] <hasse_> Derick: I can't get it work when I try to query embedded documents. Are there any special rules for that?
[13:44:36] <Nodex> what happens when comments make your document greater than 16mb?
[13:46:32] <baconmania> Nodex: isn't that a problem with embedded documents in general?
[13:50:57] <Nodex> or 2 instead of 1 depending on how you do it
[13:51:52] <double_p> Derick: because in certain actions a lot of writes will happen.. within 10minutes, generating 4-5GB WAN-traffic. i would like to stretch the 10minutes to avoid hitting isp-bw-limits
[13:52:16] <double_p> the sync is to a DR site, to it wont matter if it takes 30minutes instead of 10 (hopefully, hehe)
[13:52:22] <Derick> double_p: can you slow down the writes to the primary?
[13:52:48] <Derick> perhaps with a queing system like rabbitmq?
[13:53:22] <double_p> no, that part rather shall be as fast as possible from biz-owner
[13:53:50] <double_p> if there's no typical workflow for this, i likely traffic-throttle via the firewall.
[13:53:59] <Derick> i guess you'd have to rate limit the network interfaces with QOS and things like that
[13:54:04] <baconmania> Nodex: in general when you have normalized collections in mongo, let's say A and B which have a many-to-many relationship, do you maintain references from A to B AND from B to A? Or do you just do one or the other?
[13:54:35] <double_p> i was thinking of something like having a chunky "redo log" transfer instead of moving every single write. but well then. altq it is
[13:54:43] <Nodex> baconmania : I don't tend to normalize much of anything but my top priority is performance which is not the same for everyone else
[13:55:22] <Nodex> I tend to keep semi persistent data embedded (things that don't change much - name/email etc)
[13:55:51] <double_p> the other day i'd like #mongodbs-ops or so %-)
[13:55:56] <Nodex> I prefer a headache on the update/delete rather than load on the read because reads outweigh writes 90/10 in my apps
[13:56:34] <Derick> double_p: if you have like a dev blog, I'm sure people would want to read about it - perhaps we can even aggretate it on the mongodb blog
[14:04:42] <baconmania> Nodex: so you're saying you don't normalize usually, but when you move out comments into a separate collection after the first page, isn't that normalization?
[14:05:54] <Nodex> if you say so then I guess it is
[14:06:08] <Nodex> not really my idea of normalizing but each to their own
[14:13:44] <bee_keeper> hi, i have a celery task which connects to mongodb to store a record. The task is run multiple times, thus opening and closing a new connection each time - is this a problem?
[14:22:11] <trajik210> bee_keeper: Not if your network / servers are okay with it. It's pretty much up to you whether you want your app to open a new connections to MongoDB or whether you want to reuse a connection pool.
[14:25:03] <michael_____> does cascade in generell 'remove' mean?
[14:54:46] <michael_____> mongodb on same server or not =
[15:08:46] <Kzim> hi is there a clean way to to fix the FQDN/non-FQDN mongoS configuration issue without stop every mongo ? thx
[15:23:48] <patrickgamer> Hey guys. I've got an instance of mongo that gets kicked off by Play (Scala framework). I want to clean up the data but I can't access it from the cli
[15:24:41] <patrickgamer> i'm following http://stackoverflow.com/questions/12831939/couldnt-connect-to-server-127-0-0-127017 but not really getting anywhere.
[17:11:50] <cmendes0101|> Does mongo $near use haversine formula?
[17:22:09] <byungsin> have anyone used tokuMX before??? how was it???
[17:37:18] <leifw> byungsin: I'm a tokumx developer if you have any questions for me (but I'll stay out of the way otherwise)
[20:05:27] <polynomial2> in a sharded setup, are there three total processes running on each machine? mongod for the db, mongod as a config server, and mongos as a shard server ?
[21:14:44] <markgm> Within a doctrine query is it possible to query by a reference within an embedded document?
[21:17:07] <markgm> like ->field(subevent.form)->references($form) where subevent is an embedded document, form is a referenced document
[22:56:25] <remonvv> Well, we just had a rather catastrophic MongoDB failure. Anyone else ever had issues with balancing getting stuck after a series of addShard commands?
[23:21:49] <mrapple> so I understand I should run one mongos server per app server, but, how many config servers should I run?
[23:25:51] <mrapple> i can answer my question, guess i should run exactly three
[23:37:21] <remonvv> mrapple, 3. No more, usually no less.