[00:52:23] <BigOrangeSU> hi all, is anyone familiar with Mongoid, I was hoping to understand how it supports updates to embeded objects within an array. Does it override the whole array element?
[00:54:15] <Boomtime> unless it has a way of figuring out the differences it pretty much has to - meanwhile, if you added a new item in the middle of the array then there is no other option anyway
[00:58:22] <GothAlice> BigOrangeSU Boomtime : MongoEngine recursively wraps arrays and sub-documents in proxies that record modifications. Certain ODMs do, in fact, implement proper delta .save() operations.
[00:59:01] <GothAlice> As for Mongoid, I do not believe it does from what I remember of the last time I went spelunking through the code.
[00:59:44] <Boomtime> I do not dispute the ability to do this, but I see you've arrived at the very same line I initially said, thanks
[01:02:01] <GothAlice> Boomtime: Elaboration FTW. Inserting values into the middle of an array _can_ be "delta"d, but depending on parallelism of writes to one record can result in the inserted element not matching the expected position in the final record. (Sometimes this is an acceptable risk.)
[01:02:44] <zamnuts> question: if i have usePowerOf2Sizes enabled, and a chunk size of 1mb within GridFS, will a 1kb file consume 1mb or the closet power2 i.e. 1kb? (disregard the n/_id/chunkSize meta overhead)
[01:04:14] <zamnuts> s/closet power2/next highest power of 2/
[01:07:28] <Boomtime> GridFS does not pad out the file chunk documents, it only caps their size
[01:07:59] <Boomtime> regardless of the chunksize you use, the last chunk of a file will be the size of the remainder, likely less than the chunksize
[01:08:26] <Boomtime> at that point, the other options kick in, if you use powerOfTwo then that is applied
[01:08:28] <zamnuts> Boomtime, so a 1kb file with a chunk size of 1mb will only consume 1kb on disk, and the extra 1023kb will be free and usable, right?
[01:09:03] <Boomtime> the numbers may not be precsiely that, but loosely, yes
[01:09:24] <zamnuts> Boomtime, that is understandable, i'm simplifying to get the idea...
[01:09:40] <zamnuts> Boomtime, won't that increase fragmentation then?
[01:09:42] <GothAlice> Boomtime: Can you confirm the default "padding factor" on newly inserted documents for me?
[01:10:51] <Boomtime> GothAlice: I assume you mean "new collections", since new documents use the padding factor of the collection - also, it only applies when not useing powerOfTwo
[01:11:08] <GothAlice> Boomtime: I was asking that one for me. :3
[01:11:14] <Boomtime> i have no idea what the seed value is, it must be low though
[01:12:11] <Boomtime> zamnuts: very technically, yes, files of different sizes will cause a little fragmentation to occur
[01:12:22] <GothAlice> zamnuts: Fragmentation will mostly depend on the rate at which you delete things. And yes, unless you compact (defragment) it occasionally, a hole left behind after deleting a final chunk will only ever fit a final chunk of that size or less in the future. (As long as MongoDB's on-disk storage rules remain otherwise the same operating under powerOfTo.) *digs more docs.
[01:14:43] <zamnuts> Boomtime, GothAlice, thanks so much - that answers that question; aside: will a higher chunk size in gridfs increase my read/write throughput?
[01:15:13] <GothAlice> A ha. I grok. The default powerOfTwo strategy would likely decrease document fragmentation a little, but waste a little bit of space in the process.
[01:15:48] <zamnuts> GothAlice, that is correct, that effect is prominent in https://jira.mongodb.org/browse/SERVER-13331
[01:17:32] <GothAlice> zamnuts: To a point. It's something worth experimenting with and benchmarking: larger chunk size will reduce the ratio of overhead to data on-the-wire, but the wire protocol already uses getMore() operations and other tricks to chop your data up in ways you can't overcome. There'll be a break-even point in performance. Not sure if MongoDB does jumbo frames on IPv4 or not, but IPv6 mandates support for it.
[01:18:57] <GothAlice> (Optimization without measurement is by definition premature. :)
[01:20:59] <zamnuts> GothAlice, that makes pretty good sense; i did run a test, increased chunk size from 255kb (default) to just less than 1mb, and saw no performance increase during writes, but it improved 2x for reads, granted this was a local loopback test
[01:21:50] <GothAlice> Loopback cheats: it's a shared memory circular buffer—doesn't even touch the network interface card.
[01:23:00] <GothAlice> Even if it's on the same network, always test over the full stack, and it's best to match the test DB's performance as closely as possible to production's, so you'll know how things will behave when deployed. :)
[01:23:07] <zamnuts> with that in mind, given what you mentioned about IPv4/jumbo frames, i might not even get that since at that point mongo won't be the bottleneck~ ok, i'll have to do more testing, thanks for the sanity check tho
[01:24:45] <GothAlice> IPv6 is nice if you can get it; it's often easier to get it internally. Less overhead, larger packets possible out-of-the-box.
[01:25:35] <zamnuts> GothAlice, i'll have to look into that, the mongodb cluster is in a private vrack, so it is a very feasible change
[01:25:53] <zamnuts> (the prod and prepod ones that is...)
[01:40:18] <adoming> Hey all, I'm a noob at mongo, I wanted to get some feedback on this Schema. My use case - I am creating a collection of documents, in an one to many relationship, I want to create a collection of unique URLs to view the document, then each time a link is viewed I want to store analytics info about the view of the link. My question is, do I break up the sub objects of Links and Analytics like RDBMS or keep it as is?
[01:59:26] <ggoodman> is there a mechanism by which I can migrate entire collections from one remote machine to another while skipping the dump/restore process?
[02:02:26] <Boomtime> ggoodman: maybe copydb will be helpful http://docs.mongodb.org/manual/reference/command/copydb/
[02:03:41] <zamnuts> ggoodman, mongoexport -d db -c coll | ssh mongouser@host "mongoimport -d db - coll"; # will pipe the export of localhost to remote mongod, that is if the problem is simply you not wanting to transfer files
[02:04:13] <ggoodman> zamnuts: helping me on every front!
[02:04:29] <Boomtime> note that mongoexport/mongoimport may not preserve all type information
[02:04:39] <zamnuts> ggoodman, make sure you test it first
[02:05:15] <zamnuts> Boomtime, perhaps mongodump/mongorestore then :)
[02:09:37] <Boomtime> monogdump can write to stdout (only really good for piping) but this tactic doesn't achieve the goal of "skipping the dump/restore process", it merely makes that process avoid using the local disk (which may be the desired result)
[02:18:50] <ggoodman> can I do mongodump directly to the db dir on the target system to avoid the restore step?
[02:30:09] <zamnuts> cheeser, that seems much easier, only problem is it doesn't do a snapshot, compared to mongodump --oplog ? or am i missing something?
[03:15:57] <ggoodman> Can anyone hint to me what tool would make sense to migrate all records matching a query from one remote host to the local instance?
[03:19:13] <ggoodman> cheeser: will that clobber the existing database? What I'd like to do is only update those documents that match the query from the source database server
[03:20:14] <cheeser> you can't do updates that way, no.
[03:20:41] <cheeser> you can do mongoexport/mongoimport but you have to be careful of types in your docs.
[03:20:53] <cheeser> json has far fewer types than bson
[03:22:32] <zamnuts> ggoodman, if you can do this in real time, an application tied to the oplog might be better...
[03:22:54] <ggoodman> zamnuts: that is a bit above my understanding
[03:23:07] <ggoodman> Neither database is connected to production instances atm
[03:23:37] <ggoodman> perhaps I will need to do it by code?
[03:25:39] <ggoodman> annoying this lack of a simple mechanism to move a subset of documents from one instance to another (overwiting any at the target)
[03:57:49] <tomhardy> hi guys.... i've been looking at indexeddb to write client side applications and need a backend database to store it too which mongodb looks a good fit. Conceptually I'm finding it quite difficult to understand coming from a standard sql background (i've worked on many enormous sql systems). If you have a users table and you constently need to join say the users name to other tables.. how would you recplicate that functionality in mongod
[03:59:45] <zamnuts> there's an optional 4th in there for options, which i'm using for {upsert:true,new:true}
[04:00:17] <ggoodman> so should I stick a sort in there or is that optional... and yes I'm reading :D
[04:00:34] <zamnuts> tomhardy, yes, denormalize the data, there are no joins in mongodb, instead you nest documents
[04:01:00] <ggoodman> zamnuts: just felt like it was missing the updated document
[04:01:05] <zamnuts> ggoodman, sort is required, but if you don't really care, just sort on something indexed, e.g. _id
[04:01:22] <tomhardy> zamnuts: ok.. so when you say update a usersname, you then need to find everywhere that name is stored and update it?
[04:02:42] <zamnuts> tomhardy, yes unfortunately, there are no foreign keys, so there is no other way... you could store the username as an ObjectId, and have a separate collection that maps ObjectId to username<string>, but you still cannot join and it will require 2 queries
[04:03:10] <Boomtime> tomhardy: how often do you change username compared to reading the current value?
[04:03:23] <tomhardy> i understand there are no joins.. but i mean you can implement a join by collecting up all the users values and then firing a separate query to colect the data
[04:04:25] <tomhardy> bootime: yeah rarely in that case, but really i'm trying to get conceptually how you would do something like that
[04:05:20] <Boomtime> most of the time you find you are better off optimizing the read condition - i.e embed the data - i ask the question to make you think about it
[04:05:44] <tomhardy> my real example is actually, i have Users(id, name) 300 WorkItems (id, user_id, title) 3000 per user TimeItems (work_item_id, starttime, endtime) 30 per day per user
[04:06:17] <zamnuts> user_id becomes the Users.name typically
[04:06:23] <Boomtime> store username in workitem _as well_
[04:06:41] <Boomtime> you can still have a users table as the canonical source
[04:07:15] <Boomtime> but your reads can be vastly reduced workload by storing the needed data inline with your common result documents
[04:07:45] <Boomtime> yes, updating a username now becomes a considerable task, which is why i ask: how often does a username change?
[04:09:44] <zamnuts> considerable yes, but not to be confused with scary, this is quite simple: workitems.update({username:'oldusername'},{$set:{username:'newusername'}},{multi:true});
[04:10:02] <georgeblazer> hello there, I'm having a weird issue after I migrated our DB to another piece of hardware
[04:20:56] <zamnuts> tomhardy, i'm hesitant to say "yes" ... do note that you will have a collection write-lock, and if it IS slow, you will incur yields
[04:21:28] <joannac> georgeblazer: you are on a terribly old version
[04:22:20] <georgeblazer> joannac: sure, but I doubt it's a culprit
[04:22:24] <georgeblazer> unless you think it'd help
[04:22:37] <zamnuts> tomhardy, your historical/cumulative time on reads by embedding data will be much faster than the amount of time it will take to perform a "join" equivalent, so i wouldn't worry about the speed of the update
[04:22:48] <georgeblazer> i have a slave where everything works, but on the master it doesn't
[04:23:08] <zamnuts> hopefully that last statement isn't too negligent
[04:23:15] <georgeblazer> REALLY wanted to see if I can _fix_ the master w/o cutting over to the slave and/or losing the data post-upgrade
[04:23:32] <joannac> georgeblazer: the slave doesn't replica writes?
[04:23:48] <tomhardy> zamnuts: yeah i finding it very hard to work out an appropriate way of appraoching the problem coming from an sql backgorund
[04:24:04] <georgeblazer> joannac: yeah, it looks like replication is currently broken :(
[04:24:05] <joannac> georgeblazer: your version is old enough that I don't know if my knowledge applies to your version
[04:24:21] <georgeblazer> joannac, so you think upgrade is in order
[04:26:13] <zamnuts> question: how does everyone deploy their mongos instances? on the same node as the driver? on a dedicated vm/hardware? what about a cluster config: map to the mongos' in the driver, or put them behind 1 or several LBs?
[04:28:15] <georgeblazer> joannac:just upgraded the DB, but still in the same boat
[04:28:59] <zamnuts> georgeblazer, why are you hesitant with switching over to a secondary, then rebuilding the old primary?
[04:29:15] <georgeblazer> zamnuts: because there were some writes to the primary
[04:29:23] <georgeblazer> i'd love not to lose it if possible
[04:29:58] <zamnuts> georgeblazer, can you reply the oplog on the secondary from where the broken primary left off?
[04:39:25] <georgeblazer> shut down the old master, and copy /var/db?
[04:40:37] <zamnuts> if you can read it, then you can move it... perhaps not by normal migration channels however. might need to do a db.col.find({...});
[04:41:45] <zamnuts> depends on 2 things: if the size of the db is "small" then replicating from scratch will be fine; if you cannot reliably copy a good db from /var/db, then you have no choice but to replciate from scratch
[04:42:06] <georgeblazer> another thing is, is there any way to find the most recent entries if I don't have created_at field?
[04:43:07] <zamnuts> the recommended approach is: mongodump > mongorestore > replicate from point-in-time to play catchup; the point of mongodump/mongorestore is so you don't have to replicate EVERYTHING since you'll be starting from a large subset
[04:43:35] <georgeblazer> so I'd restore the old master on both new master and slave, right?
[04:43:44] <georgeblazer> and then how do you I start replicating at the right position
[04:44:01] <georgeblazer> also, can't i just copy /var/db instead of doing mongodump? mongorestore?
[04:44:24] <zamnuts> you don't have to specify from the right position, mongod will figure that out... if there were any interim writes before the mongodump and the repl setup, it'll get those (granted your oplog cap is large enough depending on your write volume)
[04:45:20] <zamnuts> yes, u can copy from /var/db, but you gotta stop the mongod process that is using those files.
[04:45:34] <zamnuts> if you can't stop the process, your only course of action is a mongodump
[04:45:50] <zamnuts> if you're low on disk, you could pipe mongodump via stdout over ssh to a new node
[04:46:14] <zamnuts> or perform the mongodump from the remote host by connecting to your target over the write
[04:46:27] <zamnuts> s/over the write/over the wire/
[04:48:11] <zamnuts> georgeblazer, "so I'd restore the old master on both new master and slave, right?" that is correct (sounds like a 3-piece replset), but i thought your "old master" was corrupt?
[04:48:55] <zamnuts> georgeblazer, are you using a replset or a legacy master/slave configuration?
[05:02:00] <georgeblazer> my old old master is fine, but it doesn't have any space left on disk
[05:02:33] <georgeblazer> it's on EBS, so I can shut down mongo, copy DB to another instance, and copy the DB
[05:04:11] <zamnuts> georgeblazer, going to be honest w/ you, i've never worked with master/slave config, only the current replset
[05:05:00] <zamnuts> georgeblazer, you got enough free space to work on? i.e. more than a few hundred mb?
[05:05:01] <georgeblazer> zamnuts:well, one way or another, it sounds like I can copy /var/db from the shutdown DB, to both master and slave, and hopefully the slave will start slaving
[05:05:21] <georgeblazer> zamnuts:not on the main EBS volume, I'd attach another volume
[06:00:37] <zamnuts> the first 4 bytes are seconds since the unix epoch, last 3 bytes are a counter, so generally... _id can be used to sort by "creation" date
[06:01:01] <zamnuts> see http://docs.mongodb.org/manual/reference/object-id/
[06:21:10] <georgeblazer> is there any way to find an object whose BSON respresentation is a negative number?
[06:23:14] <Boomtime> BSON is a data encoding system, what do you mean by "is a negative number" in this context?
[06:23:44] <Boomtime> documents, arrays, integers, dates, etc, all have BSON representations
[09:32:27] <h3m5k2> I want to run a repairDatabase and understand that I need enough free disk to "hold both the old and new database files + 2GB". My question is; what is "database files" referring to in stats()? is it storageSize or totalSize os dataSize? This is crucil info as my storageSize is 30GB+ while dataSize is only ~1.5GB and I have about 10GB of free space.
[09:33:42] <kali> h3m5k2: storageSize is the current space occupied. so it's "old database files"
[09:34:17] <kali> h3m5k2: the new database will be at least "totalSize" (total is actually data+indexes)
[09:34:57] <h3m5k2> kali: ok, so the free disk space minimum is totalSize + 2gb
[09:35:39] <kali> h3m5k2: yeah. but to be on the safe side, i would count 2*totalSize + 2GB
[09:36:13] <h3m5k2> kali: ok. great thanks a lot for clarifying that!
[12:29:02] <alexi5> I am new to mongo db and I wondering why is mongodb used instead of an ORM on a RDBMs ?
[12:30:00] <vincent-> alexi5: I think that question is widely answered over the Internet. You just need to Google it.
[12:31:04] <nofxx_> alexi5, ORM is to RDMS as ODM is to mongo
[12:31:18] <nofxx_> you'll going to problably choose a ODM to work faster...
[12:32:28] <alexi5> i found it mongodb is more preffered due to the impededance mismatch in converting objects to relational structures while objects are similar to documents, so less missmatch between the two
[12:33:05] <Derick> IMO, and ODM is now just a simple small layer checking against data types - not an ORM that needs to construct queries, pull in things from different tables etc...
[12:33:13] <nofxx_> alexi5, I heard the same of RDBMs... its really a matter of YOUR data
[12:35:04] <alexi5> I am thinking of using mongodb for a project I have to catalog network nodes on our network at work. catalog as in store info about configuration, location, its components and also pictures that engineers take of the equipment
[12:41:31] <alexi5> other than blogs and news sites, are there any other use cases for mongodb ?
[14:56:54] <alexi5> if i have an application that inserts a value in an array in a document and also increments a field in that document, how does mongodb handles this without causing race conditions with multiple connections attempting to the same thing ?
[15:28:26] <GothAlice> alexi5: It doesn't. :) It does, however, give you a method by which you can implement your own locking semantics, if desired. Using update-if-not-modified and/or upserts.
[15:29:34] <GothAlice> alexi5: Basically, individual update operations are atomic (i.e. $set, $push, etc.) and each of these is applied, one at a time. Using update-if-not-modified you query for an expected value and only perform some update if that expected value is found. If another process snuck in an changed it since the last time the first process loaded it, the first process will fail its update in a predictable way. (nUpdated=0)
[15:30:45] <GothAlice> Upserts are "insert this if not found, update it otherwise" updates. This lets you avoid needing to be so worried about inserting vs. updating records that may or may not exist.
[15:46:43] <agend> what i want to achieve is insert multiple docs into mongo - but if document with this _id already exists I want it to be replaced - how can I do it?
[18:52:16] <bttf> new to mongo ... is it possible for a node app to have something like a local instance of mongodb that is runnable, rather than installing it systemwide on the machine?
[18:52:27] <bttf> for development/testing purposes
[18:55:27] <syllogismos> "note" : "thisIsAnEstimate", "pagesInMemory" : 79799, "computationTimeMicros" : 18343, "overSeconds" : 770 these are the workingSet stats of our mongodb
[20:42:20] <Thinh> Hi guys, if I have a collection with the structure: {id: ObjectID, courses: [CourseObjectID1, CourseObjectID2, ... ]}
[20:42:39] <Thinh> How can I select all objects from that collection that has say, CourseObjectID2 in courses field?
[20:45:38] <Thinh> nevermind, I need $elemMatch :)
[20:54:14] <blizzow> I dropped a lot of my collections (50 or so) and re-created them with sharding enabled. I enabled sharding while the collections were empty and then started to insert data. Almost 24 hours later, the chunks have still not distributed amongst my shards. Does anyone know what might cause the sharding to be so slow? On a side note, I have one collection that I cannot drop, because the "metadata is being changed, but I don't see the c
[21:32:53] <olivierrr> question: for a 'kik' type chat app, is it best to have a document per message or per session
[21:33:27] <olivierrr> session = chat between 2 or more members
[21:41:15] <jonasliljestrand> findOneAndModify is atomic right? :S
[21:48:41] <jonasliljestrand> If i perform findAndModify { multi: false }, can this document be read by any other connection at the same time? And when the write lock are released will the other connections get that document if it still matches their criteria?
[22:00:13] <jonasliljestrand> also, is it possible to lock a document in all databases for a time
[22:11:21] <dlewis> is there a general purpose Mongo -> SQL importer? I plan on generating a database from the SQL import, so it doesnt have to be too pretty
[22:12:34] <dlewis> this is for analytics purposes
[22:57:00] <blizzow> From mongoshell I want to run db.my_collection.find().sort( { _id:-1 } _.limit(1) and only return the objectID instead of all the records in the document. How do I limit the output to not show any of the records in the document?
[23:03:21] <modulus^> blizzow: you cannot parse the JSON output?
[23:11:15] <dpg2> how do I quiry for something like { "object": { "id" : "19198dc5-5040-4034-9edb-d0e9cb3207e2" } } ? is it just { "object.id": "19198dc5-5040-4034-9edb-d0e9cb3207e2" }