[00:48:58] <Doyle> Hey. Is there a number of DBS that you can potentially have on an RS, that would cause perpetual 100% disk util from replication even if there's nothing being written to the dbs?
[01:03:32] <GothAlice> That's also an unusually high iowait percentage, to my eyes. Your CPU is spending literally 22% of its time waiting on the EBS volumes to talk to it.
[01:06:49] <GothAlice> That and the cross-zone cascading failure in EBS controllers that made volumes inaccessible and un-controllable for 22 days straight, but…
[01:09:58] <Doyle> The anti-hype is crazy surrounding this movie
[01:11:03] <GothAlice> The biggest reason for that is it's a movie with an agenda / purpose. Being un-funny (as seen so far) didn't help. XD
[01:13:36] <Doyle> I'm looking forward to the social media explosion when it releases. Should be interesting.
[01:14:53] <GothAlice> I'm looking forward to the "everything wrong with" summary. Points over 9000, maybe? And don't get me started on the poor scene remakes, seeming lack of pacing, and lack of understanding suspense—such as not seeing Slimer's first attack in the original. That highschool film class was totally worth it. ;^)
[01:18:40] <GothAlice> It's… also a confusing sequel / remake hybrid. From the trailers, it's hard to tell what they were going for, in that regard. But blah. XP
[01:19:46] <Doyle> Yea, it presents very strangely. Someone mentioned that most kids growing up today will think that this movie is what Ghostbusters is, and that's what saddens me the most
[01:20:26] <GothAlice> Indeed. Potentially ruined for a generation. :'( The original still hold up, too. (Made even more comical due to the SFX differences with today.)
[01:22:33] <GothAlice> Doyle: Have you increased the namespace allocation, and/or are using the WiredTiger storage engine?
[01:22:45] <GothAlice> Upon consideration (I haven't slept) you may be running into https://docs.mongodb.com/manual/reference/limits/#Number-of-Namespaces with the sheer number of collections you have.
[01:26:19] <GothAlice> So yeah, 21,248 is dangerously close to 24,000, and the number is actually based on namespaces, which include indexes, not just collections. I have absolutely no idea what happens if you exceed that threshold.
[01:28:07] <GothAlice> Your collection count may actually be (at least partially) responsible for the load/io issues.
[01:28:35] <Doyle> I have a nagios check that counts collections. I increased the interval from the default 5m to 20
[01:32:20] <Doyle> How do I check the namespace size?
[01:33:52] <GothAlice> https://docs.mongodb.com/manual/reference/limits/#Size-of-Namespace-File < default is 16MB. nsSize configuration option. As for checking actual utilization, I do not know.
[02:00:28] <GothAlice> You can double-check me on this by looking in your data directory (I'm using WiredTiger, so won't have the corresponding .ns files).
[10:14:00] <robscow> via the shell, db.sc.getIndexes() returns an empty list, yet when querying via my Perl client, it's telling me what the indexes are after I create them - nearly feels like they're not being committed
[11:21:07] <kurushiyama> robscow You sure you are in the correct database when emitting db.sc.getIndexes() ?
[11:33:53] <kurushiyama> robscow We all were at one point in time. I just could interpret the fact because that is a mistake I make myself – to this day...
[11:34:36] <kurushiyama> robscow Like :xa in _vi_ ;)
[13:24:57] <GothAlice> Zelest: As a very, very important (nay, critical) note on that, you can only update one array's element at a time. If you have > 1 array, you're… gonna have a bad time.
[13:25:27] <GothAlice> (> 1 array either as field siblings, or nested, either is bad for being able to address specific array elements.)
[13:25:41] <cheeser> probably the biggest wart on the mongodb data model
[13:28:07] <GothAlice> In this situation, when you use $elemMatch (or other query operator) to match a value you want to update in the query portion, such as {"a.v": 2}, the $ pointer points at the "a" array element. There's only one $ operator… so you can't also search "b" for a value to update at the same time.
[13:29:33] <GothAlice> There are a few proposals in the ticket system about ways to improve that, but… so far all of them I've seen are worse than the problem. :(
[13:29:54] <Zelest> That's no issue for me atm though :) But thanks for the heads up :)
[13:30:21] <Zelest> But, what's the "best practice" approach on replacing {v: 1} for example?
[13:30:58] <GothAlice> https://jira.mongodb.org/browse/SERVER-6866 — accessing past the $ operator
[13:32:12] <Zelest> Is there any mongodb command to make my boss extend my deadline as well? :P
[13:32:14] <GothAlice> Searching for the value to update and using the $ operator in the update itself to point your change at it. You can also do things like reference elements by index, too, but that's… less good if there's a possibility the order of the array elements may change.
[13:32:35] <GothAlice> Zelest: Alas, applying the Scotty Factor is beyond the responsibility of MongoDB. ;P
[13:35:37] <GothAlice> "Starfleet captains are like children. They want everything right now and they want it their way. But the secret is to give them only what they need, not what they want."
[13:35:49] <GothAlice> Engineering life lesson, right there.
[13:38:51] <GothAlice> So yeah, on any estimate, double and add half. That'll give you headroom to handle unplanned issues, and if you deliver on time, you're golden. If you deliver early (since it shouldn't actually take the full estimated time), you've worked magic. If you give accurate times and are frequently late due to outside issues, you develop a very different reputation, all because you forgot a little multiplication in your time estimates. ;)
[13:55:53] <jayjo> In my mongo shell, can I have a collection with a - in it?
[13:56:19] <GothAlice> Only if you're careful, and only if you access it as an array subscript instead of attribute of "db". I.e. db["foo-bar"]
[13:56:27] <GothAlice> Doing so makes it difficult to access, so is not recommended.
[13:56:48] <cheeser> most drivers won't even notice but the shell is a bit sensitive.
[13:57:19] <GothAlice> Well, in most languages foo-bar means "foo mathematical minus bar", not "a symbol whose name is 'foo-bar'".
[13:57:33] <jayjo> OK - I should probably rename it then, I have named my dbs and collections with '-' instead of '_'
[13:57:39] <GothAlice> For example, in Python, you wouldn't be able to "easily" access it as an attribute, either.
[13:59:59] <GothAlice> As a style thing, I avoid multi-word database, collection, and field names. A thesaurus is handy to find synonyms that are shorter.
[14:00:24] <GothAlice> (Especially if some names are reserved words in your language of choice, i.e. "from" being reserved in Python, so for e-mail records, I use "author" instead for that field. It's also better representative of what the field actually means, which is a nice bonus.)
[14:02:14] <GothAlice> Also, in cases where you have prefixes on fields, i.e. name_first, name_last, that's an indicator that a sub-document might actually be more useful. I.e. {name: {first: "…", last: "…"}}
[14:03:34] <GothAlice> That'll let you search first name—{name.first: "Bob"}—or last name—{name.last: "Dole"}—or both first and last name: {name: {first: "Bob", last: "Dole"}}
[14:07:07] <Zelest> Can't I use $set when I update a subdocument?
[14:34:41] <GothAlice> gitgitgit: Minor note, it'd be nice if you kept your nick non-offensive. As you are using an authenticated account (GitGud) renaming shenanigans are simply annoying and add to the noise, not signal. (I'm sure there's a hilarious reason in another channel for the renames, but still.)
[14:35:09] <gitgitgit> haha ye sorry about that GothAlice i was just joking about in #freenode
[14:38:20] <GothAlice> Aaaand I just realized a consequence of a design decision in my pymongo helper library. Arbitrary type storage. (Any document or sub-document with a _cls import or entry_point plugin reference will load that class and pass the document fields as keyword arguments.) Huh. I meant it to handle Document sub-classes, but really, it'll work with any callable, after thinking about it.
[15:29:28] <saml> to backup big data, i think you can use replication
[15:30:43] <saml> but you can pay mongodb.com and use their cloud service
[15:32:30] <deathanchor> |RicharD|: for that large, a dump would take a long time. You could do an fsynclock on the db and copy the data files and then unlock it when done.
[17:48:45] <cheeser> if you think that costs too much, imagine the costs of catastrophic data loss. ;)
[17:49:05] <GothAlice> And the ongoing time of DBA maintenance.
[17:49:31] <GothAlice> (The big thing for me using the cloud manager is in maintenance automation. My word, it saves time and stress during upgrade roll-outs.)
[17:50:57] <cheeser> yeah. upgrading is awesome actually.
[18:20:09] <GothAlice> Admittedly, the initial backup took three months on my pitiful home connection, which was silly, but. (And for business use, it's $50/machine/year.)
[18:20:11] <|RicharD|> a provider that cost only $5/month for 35TB
[18:20:26] <GothAlice> They… pretty much invented the highest density storage system possible.
[18:20:48] <GothAlice> And if you don't trust them, you can totally build their infrastructure yourself: https://www.backblaze.com/blog/storage-pod-4-5-tweaking-a-proven-design/
[18:20:56] <GothAlice> (Their hardware setups are open source.)
[18:21:30] <|RicharD|> nono is not that I don't trust them like company
[18:21:35] <|RicharD|> it's only that is so cheap.....
[18:21:36] <GothAlice> The backup system is bzip2'd deep storage, so recovery is a "browse the files, pick files to restore, they'll e-mail you or ship you HDDs" situation.
[18:21:43] <|RicharD|> and usually hardware costs money
[18:22:01] <GothAlice> And they offer full encryption, too. I.e. they don't know the files they have for you or what they contain.
[18:22:14] <saml> |RicharD|, why do you backup? do you want easy restoration of mongodb data in case something happens?
[18:27:00] <saml> https://docs.mongodb.com/manual/core/replica-set-hidden-member/ can you use this with dokku ?
[18:27:49] <GothAlice> Because I'm using the personal backup client, I'm forced to mount ZFS snapshots on my Mac to have the "drive" (that is, the mongo data directory on my set of RAID arrays) appear for inclusion in the backup set on my laptop. A little more pain in setup is something I'm fully willing to accept for such inexpensive service; haven't checkout out their newer B2 storage yet, though.
[18:28:08] <GothAlice> Of course, the ZFS snapshot approach is not usable in a managed hosted environment, either.
[18:39:32] <GothAlice> Specifically, I'm not wanting to pull in every related record, but only the latest. $lookup + $unwind + $group/first… isn't going to cut it, with the amount of data that'd get pulled in during the $lookup.
[18:40:15] <GothAlice> (Also I need to preserve records with no related data, so the $unwind is out. It eats records with empty arrays.)
[18:49:11] <|RicharD|> Don't hate me, but I'm using mogodb only for stores some JSON responses...but maybe next time I will just use postgresql+jsonb
[18:52:40] <deathanchor> |RicharD|: I store lots of data (location, user info, events, etc.) I just like that the values aren't constrained by a size limit other than 16MB for total doc size.
[18:53:14] <GothAlice> |RicharD|: My at-home dataset (that 35 TiB one) is a metadata filesystem based on GridFS storing every bit of digital information I've touched since September 2001, excluding anything automatically purged due to lack of access. At work we run an employment offer (job) distribution platform, applicant tracking software, and event-based analytics suite, plus reporting and a bunch of other stuff.
[18:57:46] <StephenLynx> |RicharD|, I developed lynxhub.com using mongo not only to store the data, but also files.
[18:58:09] <StephenLynx> because gridfs allows me to easily distribute files instead of having to abstract them directly on the disk.
[18:59:23] <StephenLynx> plus not having to worry about mandatory schemas makes it much easier to develop and update it.
[18:59:38] <StephenLynx> I don't have to run a script to update a table, I just update the data when I have to.
[19:00:29] <StephenLynx> also, by using mongo I don't have to write 2 languages at the same time.
[19:02:15] <GothAlice> StephenLynx: I'm beginning to grapple with the idea of writing MongoDB server-side functions and map/reduce in Python, transpiling Python code into JS. Muhahahaha.
[19:03:08] <StephenLynx> jesus christ, how horrifying
[19:03:35] <cheeser> GothAlice: some day i'll write my js in kotlin. ;)
[19:07:34] <GothAlice> StephenLynx: It ought to be transparent to the developer. I.e. that "just use a Python function for map/reduce" thing would let you use the same code in isolated testing as well as db-side. Optional, of course: https://github.com/marrow/mongo/blob/develop/setup.py#L96 ;)
[19:07:52] <GothAlice> "Mad Science" means never stopping to ask "what's the worst thing that could happen?"
[19:08:28] <cheeser> the worst part of that plan is doing map reduce in mongo
[19:08:33] <StephenLynx> and bad design means never stopping to ask "wtf"
[19:08:57] <GothAlice> Indeed, aggregates are strongly preferred. Some things simply require code, though.
[19:10:27] <GothAlice> StephenLynx: It's been several years since I've really cared about language distinctions. Code is code, they're all just different spelling and grammar for the same ideas. Thinking this way gives a lot of freedom. (The transpiled code is not substantially worse, as code, than hand-written, and the logic is identical, thus, what's the difference?)
[19:11:01] <StephenLynx> its opaque, confusing, complex.
[19:11:11] <StephenLynx> and with larger margin for error.
[19:11:21] <GothAlice> Considering Marrow Mongo is already beginning to develop a number of simulations of MongoDB behaviour (such as document matching against a query spec, projection, etc.), having code able to be run in both environments is advantageous, too.
[19:11:56] <GothAlice> https://camo.githubusercontent.com/b4777a69407ab7b6b25b9f0f500c6a023fb750e6/687474703a2f2f7777772e7472616e7363727970742e6f72672f696c6c757374726174696f6e732f636c6173735f636f6d706172652e706e67 < a demo from one of the transpiler packages.
[19:12:06] <GothAlice> (Showing source and resulting JS.)
[19:12:31] <StephenLynx> imo your project suffers from feature creep.
[19:12:43] <GothAlice> The translation process is 100% deterministic… hardly opaque or even very complex. Get source AST, run through it generating template-driven chunks of code in another language. (Basically.)
[19:13:33] <StephenLynx> when you started it, did you document which were the project's goals?
[19:13:34] <GothAlice> That goes for _any_ transpiler.
[19:13:46] <StephenLynx> _any_ transpiler is cancer.
[19:14:01] <StephenLynx> and what are these goals?
[19:20:34] <GothAlice> StephenLynx: A non-middleware replacement for MongoEngine, to support and enhance native pymongo usage, with documentation on specific differences in design documented on the wiki. May I PM?
[19:21:45] <StephenLynx> what does mongoengine does?
[20:06:07] <deathanchor> true when it involves work on my part, but sometimes I provide an answer and if that is confusing, then get clarity :)
[20:06:50] <deathanchor> NYC Tourist: Where is Time Square? I just point in the direction. :D
[20:07:11] <deathanchor> no time for clarification :D
[20:07:40] <deathanchor> so I'm writing a webapp project on my own, using flask with mongoengine
[20:08:11] <deathanchor> I was surprised that the flask and mongoengine tutorials weren't combined in anyway
[20:08:21] <GothAlice> deathanchor: Ouch. You may wish to be aware of a number of outstanding issues: https://github.com/marrow/contentment/issues/12 < meta-ticket
[20:09:53] <deathanchor> it was a minor side project
[20:10:33] <deathanchor> what should I use instead?
[20:10:45] <GothAlice> … it doesn't support the $ operator during projection. (Like, basic stuff is missing or broken these days.) I'd recommend using Pymongo instead.
[20:11:34] <deathanchor> no other ORM toolkits available for mongo?
[20:11:42] <GothAlice> Consider what an ORM does for you.
[20:11:53] <GothAlice> I.e. what are the top five things you think it does for you?
[20:12:29] <deathanchor> my project only does simple CRUD work.
[20:12:49] <GothAlice> That's not something it does, that's a broad category of things. If you mean "have an object representing a document where I can access fields as attributes", you can tell pymongo to use ordered attribute access dictionaries or any other desired container for documents, preserving nice attribute access that is the primary benefit of using an active record-based schema system.
[20:13:27] <GothAlice> If you mean "have a schema for my data", MongoDB itself does that for you these days, and will do a better (and way more powerful) job of it: https://docs.mongodb.com/manual/core/document-validation/
[20:13:58] <GothAlice> So for simple CRUD, pymongo's Collection.insert_one, insert_many, find_one, find_many, etc., etc. are absolutely perfect.
[20:14:10] <deathanchor> not using it for schema, it was just a simple way to reduce my code down to a few lines instead of writing my own data handlers.
[20:15:04] <deathanchor> get this object (doc from db), and update it, done.
[20:15:57] <GothAlice> That… is a very simple operation, easily accomplished with no "extraneous" code with the bare driver. http://api.mongodb.com/python/current/api/pymongo/collection.html#pymongo.collection.Collection.find_one_and_update
[20:16:55] <GothAlice> Not sure how an ODM would simplify that. Instead of referring to a collection, you'd refer to the class, …
[20:17:03] <deathanchor> most of my code for work I use only pymongo, I was toying with mongoengine for my little side project.
[20:17:22] <GothAlice> The "simplification" is mostly an illusion.
[20:18:05] <GothAlice> Now, there are some very specific things that ORM/ODMs do that might be useful. Django-style query or update arguments, for example. (I.e. .update(set__field=value) instead of needing dictionaries)
[20:18:09] <deathanchor> yeah, so is mongoengine goign to die off or just has lots of known problems?
[20:18:54] <GothAlice> It's basically dying. Has been since 0.10 broke basically everything.
[20:19:14] <deathanchor> ho hum. back to pymongo solo I go :)
[20:19:22] <GothAlice> https://github.com/marrow/mongo/blob/develop/marrow/mongo/query/djangoish.py#L174-L196 < the "django-style filters" thing is this bit of code, BTW. 22 lines, including whitespace and docstring. ;)
[20:19:42] <GothAlice> Admittedly with a fairly large configuration: https://github.com/marrow/mongo/blob/develop/marrow/mongo/query/djangoish.py#L77-L127
[20:20:29] <deathanchor> eh, I'll write my own models for flask outside of mongoengine.
[20:21:31] <GothAlice> deathanchor: Pure schema, with optional data translation and validation, is the reason I wrote: https://github.com/marrow/schema#readme — I evaluated and dissected 194 other libraries that touch on schemas to build this generic implementation (where all 194 evaluated are supersets).
[20:21:35] <django_> whats the purpose of the: "_id" : ObjectId("577d66041d5bf3cd5f4b706b"),
[20:21:41] <GothAlice> There are some tricks to the metaclass that most seem to forget.
[20:22:12] <deathanchor> django_: think of it as a unique field (or primary key)
[20:22:44] <django_> deathanchor, data in different IDs are not related?
[20:22:53] <GothAlice> django_: ObjectIds are structured compact objects containing four separate fields. A simple string that is the hex value of it is explicitly _not_ the same as the real binary object that the example you gave would produce. By default if _id is not set on a record when you insert it, one will be generated for you. It's the default primary key.
[20:27:58] <GothAlice> deathanchor: Oh, and if you like SQLAlchemy-style query building, marrow.mongo uses "query-aware fields" in a similar way to SQLA: https://github.com/marrow/mongo/blob/develop/marrow/mongo/query/__init__.py#L199 — m.mongo Document instances act as seamlessly dictionaries for easy passing to pymongo, for example.
[20:29:31] <GothAlice> https://gist.github.com/amcgregor/6ddbda735e6ded267d31 for a comparison. (m.mongo isn't production ready or complete, yet, but input would be greatly appreciated! I'm trying to avoid the mistakes of MongoEngine this time around: https://github.com/marrow/mongo/wiki/Design-Considerations)
[20:38:56] <GothAlice> Biggest thing yet to tackle: handling translated documents. There's all sorts of complication involved in having an object with individually accessible fields where some of the fields are actually stored in a multilingual array.
[20:43:04] <deathanchor> I'll give it a whirl with my little side project, summer is tough with all the BBQ and beach going :D
[20:48:10] <|RicharD|> where you live deathanchor ?
[21:45:38] <GothAlice> LOL, sorry, need to be more careful when grabbing my laptop to pack away. Gotta not grab the OTP token glued into the USB socket.
[22:29:55] <Doyle> GothAlice, that $5/mo plan with backblaze, is there an acceptable use policy?
[22:31:14] <GothAlice> Doyle: There's a terms of service, aye. It's per-machine licensed, attached drives. https://www.backblaze.com/company/terms.html
[22:32:19] <GothAlice> The attached drives thing seems to be a backup client-enforced limitation, not a policy-based one in the terms, or I wouldn't be doing my ZFS shenanigans. ;)
[22:33:57] <GothAlice> Point-in-time snapshot, remote mount of the snapshot on my laptop with suitable FUSE options to make it appear as a physical drive instead of network attached storage.
[22:34:55] <Doyle> ahh, gotcha. What's your storage appliance?
[22:36:28] <GothAlice> https://github.com/osxfuse/osxfuse/wiki/Mount-options#fsid is one of the options, see also fsname, fssubtype, and fstypename (with related .plist configuration within the fuse.fs bundle).
[22:37:03] <GothAlice> Three iSCSI Drobo 8-something-i's, paired with one of three 1U Dell rack servers each, replica set between those three nodes.
[22:40:14] <GothAlice> And shovelling drives is the big reason I went with Drobo arrays; initially more expensive for the enclosures, but boy howdy the zero downtime upgrades and replacements are a joy. SSD hot cache on each, too, of course. (That was a pain… Sandforce controller use isn't always advertised, and their chipsets are utterly borked by not honouring flush requests.)
[22:41:32] <GothAlice> Not to mention variable IO performance on Sandforce chips due to transparent compression use. They're terrible. XP
[22:42:47] <Doyle> Those are some good features for these little things
[22:43:22] <GothAlice> The way they're implemented internally is also really smooth. Drobos ext format the individual drives and use them for logical stripe storage, similar to how MongoDB itself does bulk disk allocations.
[22:43:51] <GothAlice> … and the filesystem/partition table empty pattern de-duplication rocks.
[22:44:01] <Doyle> They're improving on what I saw coming out of synology a few years back
[22:44:14] <Doyle> I think they termed it synology raid, allowing for mixed drive sizes, etc
[22:44:27] <GothAlice> Aye, the marketing term / trademark is "BeyondRAID" these days.
[22:45:28] <Doyle> Do you drobo DR to an offsite drobo?
[22:47:56] <GothAlice> (Mostly because many include call-home functionality and firewall punching. There exists no universe where I'll enable something that potentially dangerous on one of my networks. Similarly: Skype is disallowed on my networks. ;)