pmxbot IRC Log Viewer

[04:17:28] <mgeorge> 346 people and this chan is dead for hours

[04:17:40] <GothAlice> Dead… but listening. Muhahahahaha.

[04:22:24] <mgeorge> o.O

[04:22:30] <mgeorge> im fairly new to mongodb and learning quite a bit

[04:22:47] <mgeorge> still trying to figure out the conecepts of large scale architectural deployments

[04:23:06] <mgeorge> http://www.severalnines.com/sites/default/files/shard_repset.png

[04:23:31] <mgeorge> on this image, correct me if im wrong, in shard1, rs0 on primary, rs0 on the two secondaries, those are all 3 seperate servers?

[04:24:22] <GothAlice> mgeorge: Each rounded rectangle on that chart can, in theory, be a physically distinct box.

[04:25:09] <GothAlice> I combine my config servers and typical primaries.

[04:25:17] <mgeorge> and ideally in a large HA enviroment, physically distinct boxes would be prefered

[04:25:38] <mgeorge> im currently doing a startup and evaluating mongodb as our primary database for our cloud based application

[04:26:40] <GothAlice> mgeorge: Certainly. Though even more exotic setups are possible, with datacenter physicality and distance being used by clients to direct queries to physically close servers. Lagged replication for "one hour delayed" backups, hidden replicas for offsite backups, etc., etc.

[04:27:14] <mgeorge> interesting, the more i learn about mongodb the more i like it

[04:27:20] <GothAlice> mgeorge: Personally, at home, I have 24TiB+ of data in MongoDB. At work, far less, but I've had great success on multiple deployments, including "web-scale" Facebook games.

[04:27:32] <mgeorge> now lets say i have a large scale deployment of mongodb, a dc in nyc, dallas and sf

[04:27:52] <mgeorge> can i make the primary for specific databases located in sf, vice versa

[04:29:34] <GothAlice> You can have multiple clusters with different sharding/replication setups to handle that. Each replica set can only have one primary, and it's for all databases on that set. Note you can adjust what gets sent there by wrapping it all in sharding. :)

[04:30:57] <mgeorge> so the primary for the rs is tied to the shard, not the db?

[04:31:34] <GothAlice> Correct.

[04:32:04] <GothAlice> However MongoDB gives you some pretty good control over how records get allocated across a sharded cluster, though.

[04:32:25] <mgeorge> so the application i'm currently developing does a shit ton of writes

[04:32:56] <GothAlice> Replication won't help with that, but sharding can. It'll distribute those writes down the chain to one of the shards.

[04:32:59] <mgeorge> and i want to be able to direct traffic on the west cost to the SF datacenter, but having that data go there just to jump on a tls circuit to nyc if the primary is there would increase latency significantly

[04:33:22] <GothAlice> (Thus with three shards and an unrealistically optimistic view of every record being inserted round-robin, you can approach 3x the performance.)

[04:35:03] <GothAlice> You could, in fact, shard your data by locality. Have a field on each document (since sharding is configured per-collection) that defines the data center to use. Not a typical approach; generally you'd have one "cluster" (complete copy of that chart) for each local dataset, with offsite replication for HA and backup.

[04:35:31] <mgeorge> i see

[04:35:41] <mgeorge> so 3 main dc's plus a DR/Backup dc

[04:38:06] <GothAlice> Or three DCs with a hidden secondary (backup-only) from every other DC in each. (A has a secondary from B and C, etc.)

[04:38:51] <mgeorge> okay so now im starting to understanding the architecture :) now i gotta figure out how to minimize cost

[04:39:44] <mgeorge> having a feature where specifying which server is a master for a sharded replica set would be huge in cost savings and minimize complexity

[04:40:03] <GothAlice> https://gist.github.com/amcgregor/c33da0d76350f7018875 is a nice little script which automates the process of setting up that exact chart you gave (with authentication enabled) on one host for testing purposes. (Adjust to taste to try out different arrangements; test durability by kill -9'ing processes. :)

[04:40:04] <mgeorge> on a per db basis that is

[04:40:18] <mgeorge> anywho i gotta get to bed, i'll sleep on it ^_^

[04:40:25] <GothAlice> mgeorge: Minimzing complexity on the wrong side.

[04:40:26] <GothAlice> ;)

[10:16:32] <djlee> Hi all, easy one which has me confused. I'm trying to remove a field from all documents using unset, that field is an array. Heres my query, but it's not working, any pointers? db.xxxxx.update({'cleared_by':{'$exists':true}}, {'$unset':{'cleared_by':''}});

[10:28:33] <kali> djlee: i use {'$unset':{'cleared_by':true}} but i'm not sure that's a problem. are you aware "update" updates only one document by default ? it has a "multi" option

[10:32:58] <djlee> kali: thanks, yeah i just realised i needed to set the multi option. Got too used to working with an ORM that i forgot the raw db commands :/

[11:25:08] <Lope> how can I prevent race conditions in my code? Lets say I've got 2 worker programs, that look in the same DB to find jobs. So initially there will be a document where job will have available:1, and then worker A will find it, and set available:0 and start working on that job. but it's possible that workerB could do the same thing at the same time?

[11:25:29] <Lope> these two worker programs will run on separate machines.

[11:27:31] <kali> Lope: use findAndModify to $set: { lock: { locked_by: <worker id>, locked_until: <now + 1 minute> } }, while looking for jobs without a lock

[11:27:59] <kali> Lope: delete the job when they are done

[11:28:19] <kali> Lope: and check on a regular basis for expire locks (for worker failures)

[11:28:58] <kali> Lope: adjust my 1 minute placeholder to be something unambiguously longer than any job running time

[11:29:18] <Lope> kali: is it not possible for 2 workers to read the same unlocked job before either one's write request has locked the job?

[11:29:24] <Lope> so they both think they're locking it?

[11:29:47] <kali> findAndModify is atomic

[11:30:27] <Lope> does that mean that 2 clients can't use findandmodify at the same time with the same query?

[11:31:05] <kali> yes, it means one of them has to be done entirealy (both the find and the modify part) before the other can start

[11:31:18] <Lope> oh great

[11:31:28] <blaubarschbube> hi. some time ago i asked here regarding hardware specs for an additional shard. i got told that it is a good idea to use similar hardware configuration. anyway, is there anywhere documented if the load balancer is able to handle different specs of multiple shards?

[11:31:28] <kali> Lope: believe me, it does work :)

[11:31:36] <Lope> so you can have a findandmodify run at the same time as a find, but not at the same time as another findandmodify? :)

[11:31:58] <Lope> well that makes life easier :) thanks

[11:32:26] <kali> Lope: findandmodify will actually pick the write lock, and the write lock is exclusive, so even another find() can not run during the findandmodify

[11:34:40] <Lope> kali: great :)

[11:35:21] <kali> blaubarschbube: you would need to disable the autobalancing and do the balancing yourself

[12:52:54] <deathknight> Hey guys. I'm looking for a bunch of sample data to use as various collections for some software I'm working on that streams mongo -> relational

[12:53:01] <deathknight> where can I find a bunch of sample data?

[12:55:20] <niczak> Does anyone have any thoughts no how shutting down mongod could *not* release the RAM that it had previously been using?

[12:55:26] <niczak> *no = on

[13:00:01] <niczak> It seems like that shouldn't even be possible.

[13:05:30] <idar> I need to instert a document containing a sequence from an auto increment collection in one operation from java. Something like this db.col.insert({seq: getNextSequence(seqname)}).

[13:05:52] <idar> Is this possible?

[13:13:30] <bcrockett> Greetings, I am trying to locate either the SRPM's for the RPM's in your repo (baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64/), or the .spec file they are built from. They don't seem to have a SRPM directory at the x86_64 (or i686) level.

[13:31:34] <drag0nius> what happens if i run update, but nothing is returned?

[13:58:54] <GothAlice> niczak: It can, in theory happen, with zombie processes. I haven't checked, but since the process can't be freed I doubt any of its allocated or bound memory pages get freed either.

[13:59:07] <GothAlice> niczak: But in five years I've never seen mongod zombie.

[14:00:10] <niczak> Yeah, I rebooted the server and memory usage dropped by 50%.

[14:00:16] <niczak> It was wild.

[14:00:28] <niczak> So all the mongo data was still in RAM after shutting down the process.

[14:00:50] <niczak> Old instance though, 2.4.9

[14:00:51] <GothAlice> niczak: Or, you're misreading free memory, which is more likely.

[14:01:05] <niczak> Nope, I can read charts and graphs.

[14:01:06] <GothAlice> Subtract caches from used before interpreting "free memory".

[14:01:09] <niczak> Nice vote of confidence though.

[14:01:17] <niczak> Well whatever it was, it was mongo.

[14:01:33] <GothAlice> Kernel-level caches often show up as used, even though they're the lowest priority page possible and get swapped out the moment the RAM is needed.

[14:01:53] <GothAlice> (Common enough problem for me to doubt anyone that says "RAM isn't freed when this process ends".)

[14:02:01] <niczak> Fair enough

[14:08:40] <GothAlice> Heh; also since MongoDB memory maps some rather large on-disk files, eventually nearly all of RAM becomes cache, which can be alarming. I'm a happy camper if my DB servers have less than 32MB of truly free RAM.

[14:09:43] <GothAlice> (My work primary has 80MB of free RAM. That's oddly high for it. ;)

[14:39:34] <dupondje> I'm trying to compile mongo. But get the following error: scons: *** [/home/dupondje/ebdev/mongodb/bin/bsondump] Source `src/mongo-tools/bsondump' not found, needed by target `/home/dupondje/ebdev/mongodb/bin/bsondump'.

[14:44:35] <GothAlice> dupondje: I use build automation tools to compile MongoDB for me. Much more reliable and reproducable. Here's Gentoo's ebuild script for MongoDB 2.6.5: https://gist.github.com/amcgregor/61ab964079c24b39ae1b

[14:46:17] <GothAlice> Unfortunately this means I've never had to debug a build failure, let alone that specific failure, but there might be something in there you can identify as differing from your process.

[15:02:47] <drag0nius> are there any special characters in keys/values in mongodb?

[15:29:40] <drag0nius> how would i query 'give me all documents where every player in players array has level 0'?

[15:30:42] <mike_edmr> drag0nius: migrate to postgres and use SQL

[15:30:52] <mike_edmr> jk idk

[15:35:24] <tscanausa> drag0nius: db.find("level":{"$elemMatch":"0"})

[15:36:46] <drag0nius> ok maybe let's give a live example

[15:36:59] <drag0nius> collection is composed of those documents: http://wklej.org/hash/e6315edbf36/

[15:37:41] <drag0nius> i want to find documents where there isn't any player having leaver_status != 0

[15:38:16] <drag0nius> 'players' field is array of documents having 'leaver_status' number field

[15:39:10] <drag0nius> in the linked document one of players has leaver_status == 3, so i don't want this document returned

[15:48:06] <drag0nius> ohh

[15:48:18] <drag0nius> i actually don't have any document without leavers

[16:29:41] <niix> Would mongo be a bad choice for a blog api? Where I want to have a relationship between the author and their posts?

[16:33:35] <mike_edmr> not a terrible choice, though with a blog where the pages are relatively static i'd think you'd be doing whole-page caching

[16:33:45] <mike_edmr> unless you're doing the comments on site

[16:34:16] <mike_edmr> so not a lot of strain falls on a db in that case

[16:35:30] <cheeser> speaking from experience, storing the data in mongo and caching the assembled pages in, say, couchbase works very well.

[16:35:54] <mike_edmr> if you expect a huge amount of comments, something reddit-like, how the data gets stored matters a lot moer

[16:35:57] <mike_edmr> *more

[16:37:13] <mike_edmr> if you have user profiles where you can see their comment history, that too should be taken into account

[16:37:40] <mike_edmr> the planned feature set should inform the architecture

[16:38:04] <mike_edmr> itd be easy to take the wrong approach with mongo and get bit by it later

[16:38:46] <mgeorge> is there a way to fork the mongos service and run it as user mongod ?

[17:15:52] <kali> what ?

[17:15:57] <kali> ha !

[17:54:00] <wsmoak> are there full api docs for ruby ? like this: http://api.mongodb.org/perl/0.42/MongoDB/Cursor.html

[17:54:18] <wsmoak> new-ish to the whole stack, not sure if there’s a command-line incantation or what…

[17:55:17] <kali> wsmoak: http://api.mongodb.org/ruby/current/ ?

[17:57:31] <GothAlice> mgeorge: https://gist.github.com/amcgregor/8ccb06ebc15c7fb4ddd4 — most platforms offer "initialization scripts", this is the Gentoo one for MongoDB. Note the preculiarirty of sysfs and --user/--chuid arguments to start-stop-daemon.

[18:07:00] <wsmoak> kali: okay. I had seen that, was looking for mor like “full api docs”. all the types, all the methods, what each method returns… for example, that doesn’t talk about “Cursor” at all

[18:10:06] <wsmoak> Looks like https://github.com/mongodb/mongo-ruby-driver/wiki/Tutorial has more info

[18:13:51] <kali> wsmoak: http://api.mongodb.org/ruby/current/Mongo/Cursor.html ?

[18:15:36] <wsmoak> kali: perfect, but where? oooohhh, the tabs up in the top right.

[18:16:20] <kali> wsmoak: that is just... how ruby api docs usually work :)

[18:16:21] <wsmoak> ftr, the confusion was that when I scrolled down to “API Reference Documentation” and clicked “here”, it took me back to the list of versions, and then to the same exact page.

[18:17:11] <wsmoak> great! just not familiar with stuff yet. :)

[18:36:14] <will> Hello. I have a collection that uses dateparts as keys, like this:

[18:36:15] <will> https://gist.github.com/anonymous/e6fc3606315228dab446

[18:36:17] <will> I need to update these documents. I'd like to use $inc for the 'c' value, but I'm not sure how to filter these without using a projection. Any suggestions?

[18:38:00] <daidoji> will: aggregation framework with multiple "$match"s?

[18:38:40] <will> can I aggregate and update in the same query?

[18:38:59] <daidoji> hmm, I'm not sure. I've only output agg results into another collection

[18:39:15] <daidoji> odds are you'll probably have to use the agg framework to get all the keys, and then update based on that query set

[18:39:27] <daidoji> err resultset

[18:40:05] <daidoji> oh wait, what are you trying to do?

[18:40:07] <GothAlice> will: It looks like that's a pre-aggregated dataset, but it's waaaay too broad.

[18:40:40] <will> I know. It's a terrible data set. Unfortunately, we're way too deep to change it.

[18:40:45] <daidoji> cause you can call db.whatever.update({}, {$inc: {"2014.10.6.1.c": 1}})

[18:40:51] <daidoji> but I agree

[18:41:15] <will> I'll give that a try daidoji

[18:41:40] <GothAlice> My pre-aggregated data looks like https://gist.github.com/amcgregor/1ca13e5a74b2ac318017#file-sample-py — note that it stores the "hour" of that record's aggregation as a concrete date; it's easier to work with. (Still only one record per hour, if I wanted a breakdown from there I might have a mapping of minutes in that hour, but not deeply nested like yours.)

[18:45:16] <GothAlice> Lots of deep $inc's on each additional click, though.

[18:46:03] <daidoji> yeah...

[18:46:23] <will> does the depth slow things down, or does the lack of filter slow things down?

[18:46:28] <will> or both?

[18:46:45] <GothAlice> Depth can make the data essentially unqueryable.

[18:47:32] <GothAlice> Extreme depth also ruins the ability to index. (I.e. you might want to index on the days covered by that record… but they're keys…)

[18:47:38] <will> daidoji's update works for me. I don't know what it'll look like at scale.

[18:48:58] <will> yes, this data's shape has plagued me since I started. Unfortunately, this is now my burden to bear. Wish me luck.

[18:48:59] <GothAlice> {"2014.10.6.1": {"$exists": true}} — those can be hideously slow and require collection scans.

[18:49:39] <hicker> i'm supposed to create an API for a design that's 7 subdocuments deep in some areas. does this sound like a reasonable number?

[18:50:44] <GothAlice> hicker: Not if there are lists of sub-documents involved. It's also less reasonable considering the keys count against the data storage requirements of the document. (Why all of my "real life" examples all use single-character keys!)

[18:51:36] <GothAlice> hicker: ($elemMatch can only operate on one list in a document for an example limitation.)

[18:54:30] <hicker> lists are pretty minimal in this design. do you know of any good resources to help me work with complex schemas using mongoose?

[18:55:14] <GothAlice> The fact that mongoose already provides concrete schemas is about the best starting point you can have.

[18:56:34] <hicker> ok, sounds like i just need to practice

[18:58:47] <GothAlice> hicker: https://gist.github.com/amcgregor/0e7a55bc29c7050ee41d#file-task-model-py-L136-L157 is an example of one of my "complex" document models (in Python's MongoEngine syntax). Except for certiain very explicit use-cases, I don't nest deeply at all. (Invoices have a list of line-items, packages have a list of discounts, each only contains *one* list of sub-documents to avoid querying issues, etc.)

[19:01:03] <hicker> examples are good, thank you. i think the hard part for me is determining whether or relationships exist between the models.

[19:01:26] <hicker> i created a mind map of all the collections, but i'm not sure if that's the best way to visualize it

[19:01:53] <GothAlice> hicker: With MongoDB it really comes down more to how you plan on *using* the data. Modelling up-front is good, but you need to plan for usage patterns. (Thus ideas like pre-aggregation of statistics.)

[19:02:32] <GothAlice> http://www.devsmash.com/blog/mongodb-ad-hoc-analytics-aggregation-framework

[19:02:53] <GothAlice> It's somewhat specific to analytics, but it includes data/index sizes and query speeds to compare the structures.

[19:03:17] <hicker> interesting, maybe it would be good to start with UI wireframes

[19:03:35] <GothAlice> For my own development process I like to start with user stories.

[19:03:36] <hicker> thank you for the article!

[19:04:16] <hicker> that sounds better... much more definition with user stories

[19:04:34] <MrDHat__> I've given dbAdmin privileges to a user for a db but I am not able to do inserts or anything for that matter with the user.

[19:04:36] <GothAlice> While user stories tend to avoid UI-specific details. (The behaviour is what matters, not the method.)

[19:04:56] <PirosB3> heeeey JamesHarrison

[19:04:57] <PirosB3> oops!

[19:04:59] <PirosB3> jax

[19:05:05] <PirosB3> how’s it going

[19:05:30] <MrDHat__> I followed these steps: http://pastie.org/9683951

[19:05:36] <MrDHat__> What am I doing wrong here?

[19:07:08] <GothAlice> MrDHat__: AFIK permissions don't include other permissions within themselves. They grant access to specific collections of functionality. dbAdmin != read. http://docs.mongodb.org/manual/reference/built-in-roles/#dbAdmin

[19:07:29] <GothAlice> MrDHat__: Thus if you want an admin who can also read and write, you need dbAdmin *and* readWrite.

[19:08:23] <MrDHat__> GothAlice: Oh cannot believe I missed the docs for this :/

[19:08:23] <MrDHat__> Thanks a lot :)

[19:08:30] <GothAlice> No worries. :)

[19:09:05] <GothAlice> MrDHat__: You can also create your own custom roles: http://docs.mongodb.org/manual/reference/security/#role-management-methods

[19:09:43] <MrDHat__> GothAlice: I don't think I need them at this stage, but thanks! :)

[19:10:28] <GothAlice> MrDHat__: It'd let you set up a "myAdmin" role which *does* get read/write access automatically, if you so chose. :)

[19:11:30] <MrDHat__> GothAlice: I think I'll use dbOwner for now as the user is infact the db owner.

[19:17:18] <unholycrab> how much space does mongodb need to operate?

[19:17:27] <unholycrab> im not talking about a repair operation

[19:17:41] <unholycrab> t queries, writes, drops

[19:18:43] <GothAlice> unholycrab: The rule I follow is to have a minimum of 2GB of additional free space beyond the on-disk stripe size of the MongoDB data files.

[19:21:07] <GothAlice> unholycrab: But a problem will arise any time MongoDB attempts to allocate a new stripe and can't.

[19:28:32] <unholycrab> define "stripe" GothAlice

[19:28:59] <GothAlice> unholycrab: The db.0, db.1, db.2, … files created to store the data. Generally in subfolders of /var/lib/mongodb

[19:29:09] <unholycrab> i got you

[19:30:28] <unholycrab> so if all my stripes are 2G, i should definitely have 4G available

[19:30:46] <unholycrab> i could consider that the a hard limit

[19:31:32] <GothAlice> One would hope any monitoring you have would alert you to low space before MongoDB needs to allocate a second 2G stripe. ;)

[19:31:44] <unholycrab> of course

[19:32:47] <naquad> hi

[19:32:56] <naquad> is there something like "Little MongoDB Book" but up to date?

[19:35:40] <unholycrab> GothAlice: does number of collections factor into your figure?

[19:35:47] <unholycrab> if you have 10 collections that grow together, for example

[19:37:21] <GothAlice> unholycrab: Number of collections allowable relates to the size of your namespace, which defaults to 16MB. (And covers a lot of collections at that size.) This size is tuneable: http://docs.mongodb.org/manual/reference/limits/#namespaces

[19:37:33] <GothAlice> However multiple collections are stored within the database-level stripes.

[19:37:53] <unholycrab> awesome. your answers are perfect GothAlice

[20:24:05] <drag0nius> is there some atomic insert if not exist?

[20:24:24] <cheeser> upsert does that, afaik

[20:24:44] <cheeser> or this http://docs.mongodb.org/manual/reference/method/db.collection.findAndModify/

[20:27:03] <drag0nius> upsert modifies document if it exists

[20:29:35] <cheeser> or inserts if it doesn't exist

[20:54:55] <GothAlice> drag0nius: IDs are a unique index, so if you try to insert a record with a duplicate _id, it will fail. You could also unique index any combination of other fields you wish.

[20:55:21] <GothAlice> Sounds like "atomic insert" is covered by the unique index feature.

[21:37:39] <iapain> We are on 2.6.5 and running shard cluster with 5 shards + 1 configserver (just to test) and we found that mongoimport is giving us 2K/sec on sharded database while running it on single mongod (standalone) we get 25K/sec. Is mongoimport is that slow on shard or we are doing something wrong?

[21:43:46] <GothAlice> iapain: Hmm, that does sound odd. Are the shards running on separate hosts? (I.e. are they being bottlenecked on a single network connection, hard disk, etc.)

[21:44:45] <GothAlice> iapain: Also, what's the result of db.currentOp(true) during the import run on both setups? (This will give information about things like time spent in locks, etc.)

[21:44:46] <iapain> GothAlice: Yes they all are on seprate machines with SSD drive over 1Gbps network.

[21:46:22] <GothAlice> And finally, is the data being balanced correctly and reasonably? (I.e. are the records being inserted across all nodes roughly evenly, and is your input data grouped by locality during insert? I.e. insert all records that would end up on node A at once, then all that would end up on B, etc.) Interleaving of your input data can cause the shard query balancer to wig out a bit.

[21:46:56] <GothAlice> (Typically not an issue for normal usage patterns; bulk inserts are hard.)

[21:47:58] <iapain> GothAlice: sh.status() gives well balanced nodes. we have a shard_key on timestamp (unix) {ts: hashed}

[21:48:25] <GothAlice> Hashed is the magic, so yeah, you'd get roughly even distribution with that.

[21:48:46] <GothAlice> Unfortunately that makes it difficult to have locality-aware bulk inserts.

[21:50:00] <iapain> GothAlice: I don’t see any locks during db.currentOp(true)

[21:50:17] <iapain> Can 1 configserver can be bottleneck?

[21:50:29] <iapain> we are running it on one of the shard server

[21:50:58] <GothAlice> Shouldn't be. It's not HA, but… For diagnostic purposes, have you tried isolating the configserver on its own boxen?

[21:51:21] <iapain> No we haven’t tried that

[21:51:40] <GothAlice> (You'd need to run that during the insert operations, and grab the query stats for one of the inserts.)

[21:52:29] <GothAlice> ("that" being db.currentOp(true))

[21:53:54] <iapain> GothAlice: There are so many sync operation in that list, its hard to examine. Let me try to look into it.

[21:57:41] <GothAlice> My own configservers run on a secondary of a replica set, one per set, and I have three replica sets configured for sharding. Haven't measured throughput performance in a while, though. (Like, three years.)

[22:01:06] <iapain> GothAlice: We recently converted our RS to sharded but we are really stuck with slow import, it’s really curcial for us. I guess we are doing something wrong then.

[22:01:49] <GothAlice> iapain: Hmm; could you try something for me? Have a UNIX box and your dataset handy? :)

[22:02:11] <iapain> GothAlice: Sure :)

[22:03:25] <GothAlice> iapain: Grab a copy of https://gist.github.com/amcgregor/c33da0d76350f7018875 and update it with some more appropriate paths; switch the port back to the default as well. Use the admin user provided (default password xyzzy also in that top-of-file config) to replicate your bulk insert test, remembering to configure the sharding index. How well does this setup perform on a single box?

[22:04:13] <GothAlice> (It'll be < than your single-mongod test, but it'll be interesting to see how it differs from your own sharded setup.)

[22:06:07] <GothAlice> iapain: Your own setup may need to tune http://docs.mongodb.org/manual/reference/program/mongos/#cmdoption--chunkSize to optimize for bulk inserts at the potential expense of less even distribution of documents.

[22:08:16] <iapain> GothAlice: Interestingly we I do mongoimport on a non-sharded database via mongos then we get slow insert rate as well, so it must not be the chunkSize or shardKey.

[22:08:28] <iapain> when* we*

[22:09:00] <GothAlice> chunkSize is a mongos option which would have an impact on performance. The default is usually good, but sometimes it's not.

[22:12:13] <GothAlice> This handy script, though, will allow you to rapidly try out different tuning options. Stop the cluster, nuke the datafiles, and start it back up with new options. :)

[22:12:50] <iapain> GothAlice: Hmm.. I thought that mongos doesn’t chunks data when database is not shard enabled.

[22:13:34] <iapain> GothAlice: Yeah right, I am on it. Thanks for sharing it. :)

[22:13:53] <GothAlice> Oh, of course, you're right. I'm silly and didn't notice the heading that option was under in the docs. XD

[23:01:21] <jedichu> Hi! Why does mongo tell me 'not running with --replSet' even though replSet is set in mongodb.conf ?

[23:01:50] <GothAlice> jedichu: Are you telling mongo to use that configuration file?

[23:04:00] <jedichu> Good point. Guess I just assumed the ubuntu package would put everything in the standard place.

[23:04:17] <jedichu> Is there a standard config file that it will slurp in on startup?

[23:05:10] <GothAlice> jedichu: No. You must tell mongod and mongos which configfiles to use. See: http://docs.mongodb.org/manual/reference/configuration-options/

[23:05:40] <jedichu> ok thx

[23:05:52] <GothAlice> No worries.

[23:06:08] <jedichu> I kind of want to enquire what kind of logic was behind the decision to do it that way ... but no.

[23:09:38] <kakashiA1> hey guys, I am new in mongodb and worked with mysql before

[23:10:06] <kakashiA1> so I have to relearn some stuff, like that there are no relations

[23:10:10] <GothAlice> jedichu: That way distribution-specific initialization scripts can specify distribution-specific default configurations.

[23:10:34] <GothAlice> kakashiA1: Indeed, MongoDB makes you think about your data differently.

[23:11:06] <GothAlice> kakashiA1: http://docs.mongodb.org/manual/reference/sql-comparison/ may be a useful resource for learning, coming from an SQL background.

[23:11:25] <kakashiA1> what I want to know is how to model a "relation" for example a user (id, name, email) who can have many notes (name of the note, content, date etc)

[23:11:36] <kakashiA1> I know how to model one of them

[23:11:39] <kakashiA1> even with mongoose

[23:11:54] <kakashiA1> but not how to create relation between both

[23:12:36] <GothAlice> kakashiA1: Well, what you wrote isn't bad. You simply have to manage the relationship yourself. (Some libraries like MongoEngine simulate reverse-delete-rules and such, but that's not in MongoDB itself.)

[23:13:11] <GothAlice> {_id: ObjectId(), name: "Bob Dole", email: "penguin@whitehouse.gov"}

[23:13:46] <GothAlice> Then a note would look like: {_id: ObjectId(), user: ObjectId("…"), name: …}

[23:14:44] <kakashiA1> hmm... need a simple example, let me see what I can find :)

[23:14:57] <GothAlice> The "user" field, in this arrangement, would be the ObjectId of a user. You can query on that. db.notes.find({user: ObjectId("…")})

[23:15:29] <GothAlice> kakashiA1: Are you familiar with the structure of typical forums? Like phpBB?

[23:15:43] <GothAlice> (With categories of forums with threads and replies to those threads?)

[23:15:55] <kakashiA1> no

[23:16:07] <GothAlice> Hmm.

[23:16:31] <kakashiA1> but its a very simple idea: user <1--------n> notes

[23:17:02] <kakashiA1> GothAlice: forget about that, there must be many example on google :)

[23:17:50] <GothAlice> kakashiA1: I just explained exactly how to do that in terms of how to structure your MongoDB documents.

[23:19:01] <kakashiA1> okay thanks :)

[23:32:44] <scrandaddy> Hey guys. I'm hoping to get a little advice. I'm new to the noSQL thing, but so far so good. I'm building pixel tracking system and am interesting in using mongo as the datastore for what are essentially logs. Is this an appropriate use case for mongo?

[23:33:45] <GothAlice> scrandaddy: Logging is an excellent use of MongoDB; I use it for that task frequently. http://docs.mongodb.org/ecosystem/use-cases/storing-log-data/ is a good write-up about using MongoDB this way.

[23:34:32] <scrandaddy> great thanks i'll give that a read. does mongo serve well as an ending place for my data?

[23:35:33] <scrandaddy> if it's a great solution for writing the logs, is it a good solution for reading / parsing the logs?

[23:35:37] <GothAlice> If your log data is an event log, with useful bits of statistical data, you can also do analytics, for example, ocean bouy data: http://www.devsmash.com/blog/mongodb-ad-hoc-analytics-aggregation-framework.

[23:35:53] <GothAlice> scrandaddy: I currently have 24TiB+ of data in MongoDB.

[23:36:16] <scrandaddy> oh wow

[23:37:33] <scrandaddy> i don't want to get too ahead of myself, but if this set up needed to scale, how hard of a time would i have with concurrency?

[23:37:58] <GothAlice> (Including click tracking and pre-aggregation of statistics: https://gist.github.com/amcgregor/1ca13e5a74b2ac318017)

[23:38:13] <scrandaddy> before when i was considering using mysql to store the data i was also looking at redis as a "write cache" that would log the data from each request and do batch updates to the real db

[23:39:14] <GothAlice> You can indeed split it up that way; using capped collections as a queue you can rapidly slurp in data, and have multiple "workers" in the back doing interesting things with it.

[23:39:25] <GothAlice> All with MongoDB. :)

[23:40:44] <GothAlice> (I insert each request, response, session data, cookies, etc. for each request to the application server, with certain redaction rules, into a capped collection that holds the "last 30 days" of activity.)

[23:41:20] <GothAlice> (Part of the reason I have so much data, I guess. ;^)

[23:41:25] <scrandaddy> oh that's interesting

[23:41:39] <scrandaddy> so you're effectively using mongo as a "write cache" for itself?

[23:41:58] <scrandaddy> sorry i don't know what else to call it

[23:42:04] <GothAlice> But! This arrangement lets me see what the user is seeing as they see it during live support, as well as replay all of the initial conditions that lead up to a bug by replaying those requests in a development environment with debugging turned on.

[23:42:08] <GothAlice> Indeed I am. :)

[23:42:37] <scrandaddy> that's basically just a special section of the database that you've designated to have a much lower save priority?

[23:42:45] <scrandaddy> sorry i'm missing a lot of vocabulary :)

[23:44:00] <GothAlice> scrandaddy: Yes; in MongoDB you can have a "writeConcern" that lets you specify if you reeeeally care to know that what you just sent across the wire was written to disk. On logging data, generally you want as little delay as possible.

[23:44:38] <GothAlice> So on my request/response/state data logging, I say, "I don't even care if MongoDB even understood what I said, let's keep going". Total time added to each HTTP request was about 6ms on a local network. ^_^

[23:46:42] <GothAlice> Ah, here they are; some screenshots of an ancient version of what I wrote to do this: http://twitpic.com/12qr1a http://twitpic.com/12vmdg http://twitpic.com/12qex3 You can see how I needed to query the data (as a time series graph of latency).

[23:50:15] <iapain> GothAlice: I managed to nuke mongodb with your script but unfortunately it also gives me 1.8K/sec (we got 2K/s almost identical).

[23:51:17] <GothAlice> Hmm; iapain: I'd open up a ticket on Jira to ask the developers about your use-case. Bulk-inserts only ever happened at the beginning of my app's lifecycle.

[23:51:52] <GothAlice> iapain: What happens if you insert without the sharding index? :3

[23:52:07] <GothAlice> (… and then add it later and have it rebalance, if that's a viable strategy?)

[23:52:23] <iapain> GothAlice: via mongos? we get the same slow inserts

[23:52:52] <iapain> But if we do it straight on one mongod instance then we get awesome fast inserts

[23:53:38] <GothAlice> iapain: Found this for you, which may be topical: http://blog.zawodny.com/2011/03/06/mongodb-pre-splitting-for-faster-data-loading-and-importing/

[23:53:55] <GothAlice> (Which relates to what I described earlier as data locality.)

[23:54:44] <iapain> GothAlice: Read that but our data is in order of 100 GB/day which makes it inase.

[23:55:28] <iapain> Thanks for your help, I guess there is something fancy which I dont understand or it’s a bug.

[23:55:38] <iapain> I will try official mongodb support as well.

[23:55:40] <GothAlice> Definitely open a ticket. :)

[23:56:06] <iapain> GothAlice: Sure, Thanks for your help :)

[23:56:12] <GothAlice> (My personal DB adds few dozen GB per day, but it's sparodic, not bulk.)

[23:57:35] <iapain> GothAlice: Some other developer suggested me to write non-bulk insert and see if it helps. So I guess you’re immune if that’s the case.

Log file Viewer

Help | Karma | Search:

#mongodb logs for Wednesday the 29th of October, 2014