[02:10:49] <appledash> CPU seems fine, network isn't in the equation because this is a local server, and as for I/O... I'm not sure
[02:11:24] <appledash> dstat says I'm writing about 4000k/sec. Would that max out a standard desktop HDD? (I think the l is actually capital K as in kilobytes, not kilobits.
[03:47:28] <CipherChen> what's the internal command for db.getReplicationInfo
[03:49:58] <CipherChen> i need to get the info by pymongo
[03:50:16] <Noxywoxy> Quick Q: Is anyone able to recommend a consultancy service? I was looking here and I'm not really sure how to guage quality. Hoping a personal recommendation would help.
[03:50:19] <Boomtime> that summary is made locally from several queries
[03:50:34] <Boomtime> type "db.getReplicationInfo" at a shell prompt, it will show you the javascript
[03:52:22] <CipherChen> thks, i just forgot the js source code
[07:18:07] <n^izzo> I need to count the number of documents referance annother document
[07:18:21] <n^izzo> this is what I have at the moment db.histories.aggregate([{ $match : { archived : false } }, { $group:{ _id: "$_zone", amt: { $sum: 1 } } }])
[07:18:58] <n^izzo> I want to group buy the value of _zone and count that
[07:22:07] <joannac> n^izzo: okay. and that doesn't work?
[10:11:32] <CipherChen> hi all, do your mongodb cluster size execeed 12 nodes, is there exists an good solution to solve this
[10:33:19] <Gargoyle> CipherChen: What needs solving?
[10:35:24] <CipherChen> when my cluster exceed large enough that i need more than 12 slaves for querying
[10:37:16] <CipherChen> when i need a lot nodes for reading, for example 100 nodes
[10:41:20] <CipherChen> we cannot now, for that we deploy multi data center and the data in each center are totally the same
[11:19:56] <CipherChen> I'm running 2.4, if I want to upgrade to 2.8, when it says "To upgrade an existing MongoDB deployment to 2.8, you must be running 2.6.", do it mean that the WHOLE rs has to being in 2.6 first or just the upgrading-node being running in 2.6?
[12:08:45] <joannac> CipherChen: why are you upgrading to 2.8? testing?
[12:20:22] <markotitel> What needs to be done in order for mongod logs queries to a file? I have enabled "profile =2" and "slowms = 30" in config file.
[13:08:18] <joannac> markotitel: you should see it in the mongod log as well as in the profile collection
[13:13:38] <markotitel> joannac, it seems that still there is no slow log, DB is not used at all for now so possibily no slow queries. Should I see read queries also?
[13:19:57] <joannac> markotitel: yes, for queries > 30ms
[13:20:58] <joannac> but all queries, even those <30ms, will go into the system.profile collection
[13:24:43] <markotitel> joannac, so mongo will not write all queries into file?
[15:51:51] <Lingo__> hi. I have a collection with documents such as { id: 1, version: 1 }, { id: 1, version:2 }, { id: 2, version: 1 }. is it possible to construct a query that gets me the document with greatest version for each id?
[15:53:08] <Lingo__> so my query would return [{ id: 1, version: 2}, { id: 1, version: 1 }]...
[15:55:20] <amk> Can someone give me a hint where to look for causes of multiple lingering connections (ex. 810 open connections on a low low low traffic app db)? Is a the event of a connection not being closed always caused by the client?
[15:56:49] <amk> Lingo__, look at http://docs.mongodb.org/manual/reference/operator/query/gt/
[15:57:31] <Lingo__> amk: I need the max value, regardless of it's actual value.
[15:57:59] <Lingo__> version could just as easily be 1234234, and will be different for every id
[15:58:28] <Lingo__> I basically want select * from collection where id = X have version = max(version)
[15:58:36] <Lingo__> or something along those lines
[16:03:39] <amk> Lingo__, in other words 'find documents with distinct ID, limit to 1, sorty by VERSION' ?
[16:04:45] <Lingo__> amk: yes. 99% sure it's not possible with find(), thinking it's not possible with aggregate either, but not sure.
[16:09:57] <amk> Lingo__, how about this http://stackoverflow.com/questions/14697332/mongodb-query-for-all-documents-with-unique-field
[16:13:20] <drorh> Hi. having a unique hashed index is supported?
[16:20:46] <Lingo___> amk: trying that approach gives me "exception: the group aggregate field 'id' must be defined as an expression inside an object" and I have an id property defined.
[16:22:13] <amk> Lingo___, this is a bit over my head now; but sounds like you're throwing it an _id where you should be throwing an {something}
[16:22:24] <amk> I'm afraid I can't help you out mor
[17:25:58] <OrNix> hi. tell me please how it is better to make replica set with 2 nodes? On which node should be started arbiter: on master or on slave?
[17:35:25] <GothAlice> OrNix: Your question is non-sensical. An arbiter is its own thing, separate from a primary or secondary or any other data-storing process.
[17:36:03] <OrNix> GothAlice, i understand it. But is it possible to make rs with 2 nodes?
[17:36:09] <GothAlice> OrNix: http://docs.mongodb.org/manual/tutorial/deploy-replica-set/ — Give this a good read, in addition to http://docs.mongodb.org/manual/core/replication-introduction/
[17:36:41] <GothAlice> OrNix: Yes, it certainly is possible. It's not high-availability unless you have three or more nodes, though, since if one of the two goes offline the other has no way of knowing if it's buddy went offline or it did.
[17:36:54] <GothAlice> (Thus; in a "two data storage replicas" setup, add an arbiter.)
[17:38:06] <GothAlice> All the arbiter does is voice an opinion in primary elections.
[17:40:00] <OrNix> i understand the arbiter's role. So if i need 2nodes rs - is it better to disable failover and select PRIMARY by hands?
[17:40:37] <GothAlice> OrNix: The difficulty is that in that arrangement, if the SECONDARY goes down, the primary will demote itself to a *read-only* secondary.
[17:40:55] <GothAlice> I.e. even if the non-critical part of the replica set goes away, the critical part fails.
[17:41:11] <GothAlice> So, no. It's never better to try to "manually" manage that.
[17:42:50] <GothAlice> (Using priority levels, i.e. a priority zero secondary, you can ensure a certain host will remain primary at all costs, or will remain primary if no other option is available. You do get control over how the cluster arranges itself, but it's best to use these provided tools to do that, rather than any form of manual intervention.)
[17:44:32] <GothAlice> A la: http://docs.mongodb.org/manual/tutorial/configure-secondary-only-replica-set-member/
[19:24:33] <GothAlice> RaceCondition: db.example.find({$and: [{$or: [{A conditions…}, {B conditions…}]}, {$or: [{D conditions}, {E conditions}]}]}) — not wrapping in $and would mean attempting to have two values for the same key ($or) at the top level. Ref: http://docs.mongodb.org/manual/reference/operator/query/and/ http://docs.mongodb.org/manual/reference/operator/query/or/
[19:25:06] <RaceCondition> GothAlice: yes, already got it
[19:50:34] <cheeser> there's at least 1 or 2 game companies i know of that use mongo for that.
[19:50:43] <GothAlice> aliasc: As a note, since MongoDB documents are inherently dynamically typed, use within a statically typed language comes at a cost in terms of the amount of boilerplate code you will need to write. (This is especially true in Java which rather amplifies the effect.)
[19:50:45] <GothAlice> Having some form of RESTful service written in a dynamic scripting language which your game speaks to (service-oriented-architecture) may speed development, reduce bugs, and increase agility. :)
[19:51:05] <GothAlice> (And also allow for independent scaling of the services.)
[19:51:17] <aliasc> this is the reason we want mongodb
[19:52:10] <GothAlice> aliasc: Optimization without measurement is by definition premature. What're your expected pain points (things that may hit performance walls and require independent scaling)?
[19:53:07] <GothAlice> aliasc: I've used MongoDB with several gaming clients, mostly Facebook games, though. https://gist.github.com/amcgregor/4207375 is my section of a presentation on the distributed RPC system I built for one of them. (Link to the full Python+MongoDB+Facebook gaming presentation in the comments.)
[19:54:08] <aliasc> we still don't know if there will be impacts on performance.
[19:54:32] <aliasc> but if mongodb is fast and clients link directly then there should be no problems.
[19:54:43] <aliasc> of course we have server client software to manage the process
[19:54:47] <harttho> We're seeing load spike on our mongo shards (all three replica sets) every 15 minutes
[19:54:51] <GothAlice> aliasc: Wait; you'd have gaming clients connecting to mongod directly?
[19:54:57] <harttho> Are there any cron sort of things that would result in that
[19:56:52] <aliasc> i dont plan to use mongodb for coordinates and stuff just to store user data
[19:57:04] <aliasc> but for curiosity we tried that and it seems to work well
[19:58:08] <aliasc> however if we plan to coop we need to store object data from the game to database. for example if we want users to save their coop progress together
[20:33:55] <tim_t> can anyone advise the best approach to implementing a mail system in mongo? right now i have a collection holding mail documents with a to: field for single users… a giant inbox as it were. i get the feeling this is not the best use of mongo but i cannot put a fine point on why
[20:35:44] <tim_t> well, when a user gets their email i simply query using the user as a filter. each document has a single user reference on it
[20:38:40] <shoerain> any thoughts on extensions like https://github.com/gabrielelana/mongodb-shell-extensions? it adds jsonpath, lodash, and moment to the mongorc.js
[20:38:56] <tim_t> no, its more like… say you have a letterbox at a house and a bunch of people are living there. each person has a letter addressed only to them, but it is still held in the one letterbox. my collection is the letterbox, my documents are the letters and just like when you bring the letters in you look at who it is addressed to and give it to them.
[20:39:40] <GothAlice> tim_t: Since e-mail data is typically read-only (excluding flags on the message; those are entirely different) you could attempt to replicate the top-level MIME encoding of the message (headers vs. body, each header as a field, etc.) and index the ones you care to search on. What you have is referred to as a multi-tenant collection.
[20:40:16] <GothAlice> Generally this isn't an issue, esp. if the data is indexed. With a substantial enough user base, the cardinality (uniqueness) of any given user's ID in the pool of all messages will allow for highly efficient querying. (The index will be used to great effect.)
[20:40:16] <tim_t> is the best use of mongo though?
[20:41:23] <GothAlice> tim_t: What's the disadvantage of the current arrangement?
[20:41:36] <GothAlice> (All things are a trade-off.)
[20:42:14] <tim_t> nothing that i can tell. i am just curious if others had advice on a potentially better solution for issues i am not yet seeing
[20:43:43] <tim_t> like create an inbox upon account creation so each user gets their own
[20:44:10] <GothAlice> Maildir is better in that it's simpler, relies on existing atomic filesystem operations, and can exploit existing filesystem compression and encryption techniques without needing to even be aware of them. It's harder to network and scale. Using a collection-per-user will run into namespace count limits in MongoDB, but isolates user's data.
[20:44:11] <tim_t> i guess having wide-open choices has me second-guessing the wisdom of the choices i am making?
[20:44:22] <GothAlice> In terms of simplifying querying, what you've got is best.
[20:45:14] <GothAlice> Index on [user_id, mailbox_path, -recieved] — you can now answer "what messages are in mailbox X for user Y, ordered with newest on top" very efficiently.
[20:46:09] <GothAlice> (Where mailbox_path is something like "INBOX", "Sent", or "Projects.Marrow", etc.)
[20:46:17] <tim_t> okay thanks. isolation is not really an issue in my case so it looks like this is a good approach
[20:46:45] <GothAlice> (I'd recommend following IMAP conventions on mailbox naming for interoperability and consistency reasons.)
[20:48:18] <tim_t> great. i'll implement as advised. thanks!
[21:08:35] <harttho> :GothAlice Yes, we do have TTL indexes
[21:09:40] <GothAlice> That could very well be your issue. Compare the timestamps triggering cleanup vs. the periods of excessive load; you may have to begin staggering your TTL runs. (I.e. shift the dates around a bit. TTL indexes are minute-accurate.)
[21:11:08] <hicker> Hi Everyone :-) I'm getting "exception: Cannot apply $addToSet modifier to non-array" when executing Practice.findByIdAndUpdate(req.params.PracticeId, { $addToSet: { users: req.params.UserId } } ... but users is an array, so I'm not sure why it's throwing an exception.
[21:11:57] <harttho> :GothAlice Will look into it, the the number of documents is small
[21:36:02] <harttho> What causes messages like these to pop up?
[21:37:45] <GothAlice> $cmd running dbstats took an insane amount of time.
[21:38:28] <GothAlice> Looks to have been stuck in a read lock for nearly a second.
[21:40:06] <GothAlice> harttho: The system.profile collection will contain additional information; it includes client info, authenticated user, and the raw query, but you may need to enable profiling first.
[21:41:00] <harttho> Yeah, I'm confused on why the dbstats would take so long
[21:41:11] <GothAlice> They were stuck waiting for another operation to finish. Possibly indexing.
[21:41:24] <GothAlice> (Indexing blocks other administrative $cmds from operating against the collection being indexed.)
[21:42:05] <GothAlice> You'd need to go spelunking through your mongod logs to find out what it was, though.
[21:44:40] <harttho> Any way to see current admin operations only (so as to not have to sift through the rest)?
[21:45:19] <GothAlice> The profiler data is a collection like any other; formulate a query and run it. ;) (You can query the logged query to only the ones you care about.)
[21:56:51] <GothAlice> You must be missing an index on ts. Nothing in normal operation should really query listDatabases… and that read lock is nuts. There is certainly something up with your current configuration.
[21:57:07] <GothAlice> SSDs or platter disks on those hosts?
[22:03:56] <GothAlice> harttho: Could you run "mongotop/mongostat" and see what your read/write latencies are, and if they're coordinated with that bulk load operation? If so, to prevent hosing your database each time, you may have to slow that import down. If not… I'm at a loss. I'd spin up a staging environment for more vigorous testing that won't impact production.
[22:03:58] <harttho> well, once the load goes down we'll try profiling it
[22:05:19] <GothAlice> Enabling more than slow query profiling will have a deleterious effect on performance, FYI. mongotop/mongostat require no changes to the profiling configuration; you're looking for excessive lock percentages.
[22:07:54] <harttho> config server hitting 2K+ queries, too
[22:08:14] <GothAlice> That'd be sharding doing its thing.
[22:09:09] <harttho> 750 'get mores' on 'local' db pretty consistently
[22:09:57] <GothAlice> Could you pastebin/gist your db.serverStatus() output?
[22:11:47] <GothAlice> http://cl.ly/image/3n312V3Z1B3N — also sometimes amusing. Apparently my DB host has been up for so long that I've exceeded 48 bits of integer accuracy in counting how long locks have been held. XD
[22:22:45] <GothAlice> All of these are clear indications if impending doom.
[22:23:17] <harttho> Any way to clear the stats to see what is happening now vs the previous 300ish days?
[22:23:24] <harttho> Thanks for all of the assistance btw
[22:23:26] <GothAlice> Warning level assertions should appear in the mongod log. Page faults indicate a clear need to continue to scale up (add more RAM) or horizontally (further subdivide into additional shards).
[22:23:50] <GothAlice> 'Cause it's causing insane lag to wait for the kernel to pull in pages off-disk, swapping out others in the process.
[22:24:28] <GothAlice> You average *disk flush* is taking nearly a full second.
[22:26:58] <GothAlice> Eek, you're also running 2.4.9.
[22:30:01] <GothAlice> http://cl.ly/2c2g2I2m1T10 are the issues resolved (37 of them) between 2.4.9 and 2.4.12, many of which are _extremely_ important fixes. (Data corruption, reliability, filesystem, security, and performance.)
[22:31:36] <GothAlice> harttho: ^ I have to run, but I'll be back in a bit.
[22:48:45] <styles> Dell C class servers? those cloud ones?
[22:48:48] <GothAlice> Because my filesystem represents an infinite number of nodes stored in an infinite number of directories infinitely deep.
[22:48:58] <GothAlice> No, these are ancient beasts. AMD Opterons, even.
[22:49:11] <styles> I want to mimic that exact setup
[22:49:40] <styles> I have like 5 processes I consistantly have to run on my local machine that's just a daunting task & my make shift 6TB raid here (local comp) is running out
[22:49:44] <GothAlice> I priced it out; most cloud storage providers would cost me ~$2K/terabyte/year. (compose.io would cost me half a million a month)
[22:51:35] <styles> So I'm going to abuse the crap out of it
[22:51:47] <styles> I want to ramp up and have my home lab up by then
[22:52:20] <GothAlice> Bell keep coming by to try to sell me "Bell Fibe", and I keep having to teach them that they're not selling me fiber, they're selling me ADSL last-mile to the concentrator on my apartment complex roof, *then* it's fiber. They don't offer unlimited unless you bundle, and I can only tell them I don't own a home phone or television…
[22:53:19] <GothAlice> … they try to sell me the bundle anyway, as the "solution" to my "unlimited" problem. Then I tell them I have offsite backup of 25 TiB of data, and they back away from the door.
[22:57:31] <GothAlice> (userAdmin only applies to an admin user of one database, adminUserAnyDatabase would apply to an admin in the 'admin' database.)
[22:57:34] <ianp> I want to buy the cryptome DVDs too
[22:57:38] <styles> GothAlice, I've been looking at... uh
[22:59:05] <GothAlice> http://www.freshpatents.com/Framework-for-the-dynamic-generation-of-a-search-engine-sitemap-xml-file-dt20081023ptan20080263005.php for one.
[22:59:43] <l2trace99> GothAlice: is that true for 2.6.5 ?
[23:00:00] <GothAlice> l2trace99: Should be true for most if not all of the 2.6.x series.
[23:00:35] <GothAlice> l2trace99: I hope you mean the userAdmin/userAdminAnyDatabase thing, not the patent thing. ¬_¬
[23:01:40] <GothAlice> https://gist.github.com/amcgregor/c33da0d76350f7018875#file-cluster-sh-L76-L79 is the "add_user" chunk of my own staging provisioning script.
[23:06:44] <GothAlice> harttho: Zero for most datasets.
[23:07:20] <GothAlice> My at home dataset is highly "at rest", and waaaaay larger than any amount of RAM I could throw at it, so expect some when you have terabytes of data.
[23:07:23] <styles> Kinda off topic, but since we know Google has been running document stores for ages now, I wonder what they have running that people don't know about :P
[23:08:04] <l2trace99> GothAlice: thanks I have some pain ahead of me
[23:08:15] <styles> GothAlice, do you work for Google?!
[23:11:11] <GothAlice> "Did I guess this, or was I actually told this at some point?"
[23:12:05] <GothAlice> l2trace99: Feel free to adapt my script; all it really needs is some "ssh" commands wrapped around everything to execute these commands remotely.
[23:12:55] <GothAlice> l2trace99: Or, even more highly recommended, provision using MMS. https://mms.mongodb.com/
[23:23:46] <GothAlice> (My favourite example from my automation is this: git diff --name-only @{1}.. | xargs qfile -Cq | xargs --no-run-if-empty equery files | grep init.d/ | sort -u — that's the pipeline that generates a list of services to reload/restart prior to doing so after a git pull. Yes, I automate using git hooks and the… #truepowerofunix.)
[23:44:17] <harttho> So i'd need to set it on the admin db?
[23:44:32] <harttho> (For the above slow queries I pasted)
[23:44:59] <Boomtime> harttho: if you are happy to read log files instead of the profiling colection: http://docs.mongodb.org/manual/reference/parameters/#param.logLevel
[23:45:19] <Boomtime> you can cause all ops to be logged if you need to
[23:51:26] <GothAlice> In raw UTF-8 these turn into the byte sequences (in hex): C3 B9, or CB 8B 75.
[23:51:38] <bmillham> I can handle little things like that. I have web page that's currently written in php with mysql. I'm considering switching to Django/MongoDB.
[23:51:56] <GothAlice> bmillham: "Slice" of the first byte will give you an invalid character. Slice of the first two will either give you the ù or only the `.
[23:52:05] <bmillham> But I have to figure out a workaround for this issue, because it's important.
[23:52:36] <bmillham> I played with that and saw that behaviour.
[23:52:36] <GothAlice> bmillham: Avoid Django. Django is what happens when a cluster bomb goes off in code: disjoint parts and shrapnel everywhere. (I recommend any form of µframework like WebCore, or Flask. Full disclosure: WebCore is mine.)
[23:52:49] <GothAlice> (Django, along with Redis and Celery, killed my last project.)
[23:53:28] <GothAlice> bmillham: If that particular query matters, extract the first literal character, strip accents (ensure ASCII), and store that separately.
[23:53:52] <bmillham> I'll have a look at that GothAlice
[23:53:53] <styles> I stopped using frameworks a while back, full go net/http now
[23:54:09] <GothAlice> bmillham: https://github.com/marrow/WebCore/tree/master/examples and http://web-core.org
[23:54:26] <styles> GothAlice, dex doesn't seem to show agg funcccnsnsnsn is there anything else I can do?
[23:54:28] <bmillham> I always have problems with frameworks. It seems like I spend more time trying to get the framework to do what I want...
[23:54:40] <GothAlice> styles: I find OO abstractions of request/response handy, and mapping URLs to controller methods handy, and that's about all WebCore does by default. ;)
[23:54:52] <GothAlice> Oh, and using templates and JSON serialization.
[23:57:33] <GothAlice> bmillham: https://github.com/amcgregor/TSA-Timeline is an example "light-weight" project that still has some minimal structure to it.
[23:58:01] <bmillham> If you want to see what I'm trying to replicate, go to my site, authenticate so you can browse the site. Select Browse: By Artist and you will see a list of letters at the top of the page
[23:58:43] <GothAlice> bmillham: http://cl.ly/image/010d1J1h392r Damn your captcha.
[23:59:06] <bmillham> Sorry, I hate it also, but it was for preventing spam.