[02:00:40] <gbell_> Noobie question. Just thinking it has all the capabilities of an RDB so, once abstracted, they would both look the same to the application.
[02:00:57] <gbell_> From reading the docs, it looks like that "once abstracted" part is the hairy one - even if using an ORM.
[02:01:26] <GothAlice> MongoDB does not support joins on standard find queries. (And only supports LEFT OUTER on aggregate queries.) All "relational" behaviour is purely application-simulated.
[02:01:47] <GothAlice> Simulated through the use of additional queries to load "related" data.
[02:02:11] <GothAlice> Thus, typically, treating MongoDB as a relational DB is both the naive approach, and sub-optimal in nearly every case.
[02:02:35] <GothAlice> (No matter how good your ODM is at faking it.)
[02:03:58] <gbell_> Interesting. So if I have a catalog of books, and I want a field to point to a publisher document (which then has contact info), what does Mongo say?
[02:05:55] <GothAlice> Consider: when looking at the details of the book, you most often require the title of the publisher, but not the full information. You might choose to embed a copy of the publisher's name along with the "reference" (which is just an ObjectId) to save on needing a secondary query just to load the most common information. Then, if the user actually does request details on the publisher (say, by tapping a magnifying glass button to open a
[02:05:56] <GothAlice> pop-over) you'd make the additional query then.
[02:06:19] <GothAlice> I.e. book = {isbn: "...", publisher: {_id: ObjectId(…), name: "Wrox"}}
[02:06:42] <GothAlice> (Publisher's names change rarely, but when they do, you can easily update all of the "cached" references, too.)
[02:07:24] <gbell_> Reading, thanks. But one of the Mongo doc's own examples stored a publisher id in book...
[02:08:07] <GothAlice> Indeed. It's not verboten, it's just sub-optimal in a majority of cases. Moving the data locality closer to where it's being consumed, as the example given above does, is one example of optimization.
[02:08:34] <GothAlice> Where SQL concentrates on purity of structure, i.e. finding new and ever more complex ways to apply the spreadsheet paradigm to your data, MongoDB concentrates on structuring your data to optimize how it actually gets used.
[02:09:11] <gbell_> "The spreadsheet paradigm". Awesome :) Hmm. Lots more reading and thinking.
[02:09:24] <GothAlice> An example I often use is that of forums. It's a model most people are immediately familiar with: forums have threads have replies.
[02:10:03] <gbell_> If I want the data where I use it, then I want the publisher data included with my book in the document. But then I'm repeating data (e.g. publisher name, etc.)
[02:10:09] <GothAlice> In my forums, since all the replies for a thread would need to be cleaned up when deleting the thread, and when viewing a single reply, I'll need all of the information about the thread anyway, I embed replies within their thread. One document to load, ability to append new replies, slice specific pages of replies out, etc.
[02:10:35] <gbell_> Interesting. No violation of DRY there.
[02:10:43] <GothAlice> "Repeating data" isn't a bad thing if it facilitates more efficient use of the data. Thus in my first book example, including the publisher name saves an extra query for each book looked up.
[02:11:00] <GothAlice> That's potentially a serious optimization if you're listing 10,000 books.
[02:11:27] <gbell_> As it would be for RDMS, right? More joins = slower.
[02:11:29] <GothAlice> (Without it, and without a cache, you'd be making 10,001 queries to display those 10,000 books with their publisher's name.)
[02:12:11] <GothAlice> So, you get to pick where your repetition is, and where you really care about DRY. DRY, to me, applies to code. Not data.
[02:13:31] <gbell_> If I have publishers in a different collection, wouldn't that be 20,000 queries for 10,000 books?
[02:13:35] <GothAlice> If each book looks like this: {isbn: "...", publisher: ObjectId(…), title: "Foo"}
[02:13:49] <GothAlice> And you find 10,000 books: db.books.find()
[02:14:23] <GothAlice> When looping over those books, you'd need to load up the publisher for each: for book in db.books.find(): publisher = db.publisher.find({_id: book.publisher}).first()
[02:14:39] <GothAlice> That makes for 10,000 queries to look up publishers, and 1 query to look up all of the books.
[02:15:04] <gbell_> RIGHT. Wow. Do you work with Mongo daily?
[02:15:05] <GothAlice> Or… you just include the name of the publisher with the reference as my first example shows. Then it's 1 query to list all books with publisher names.
[02:15:36] <gbell_> That explains it :) I don't seem to work with anything daily, and these days I feel like I do more reading than coding :)
[02:16:24] <GothAlice> http://www.devsmash.com/blog/mongodb-ad-hoc-analytics-aggregation-framework < this explores the consequences of different ways of storing the same data, as another example.
[02:16:32] <GothAlice> Trade-offs in query performance and database size.
[02:16:51] <gbell_> Will read, thanks. So 1 query vs. 10,001. Amazing.
[02:17:09] <GothAlice> You can see why treating MongoDB like a relational database can quickly go wrong.
[02:17:53] <gbell_> Won't be doing that. Though, here's another doc that diagrams using references/relationships: https://docs.mongodb.com/manual/core/data-modeling-introduction/
[02:18:36] <GothAlice> That documentation page immediately links to a description of the pros and cons of using a pseudo-relational approach.
[02:18:48] <gbell_> Yep. Like I said, reading :) And reading.
[02:19:09] <gbell_> So, before reading everything I need to, it seems with Mongo I have the choice, depending on my requirements for db size or speed. Whereas with RDMS I don't.
[02:19:59] <GothAlice> And I wasn't joking about SQL == spreadsheets. It can take quite a bit of effort to un-learn the bad habits such limiting data structures encourage. ;P Entity Attribute Value (EAV) on SQL is an example of this. MongoDB flat replaces EAV.
[02:20:25] <GothAlice> (A document being an entity, and BSON naturally being attribute/value storage.)
[02:21:04] <gbell_> Yeah, the tables mindset is ingrained.
[02:21:20] <gbell_> Thanks so much for the time, explanations, and reading references.
[02:23:10] <GothAlice> Oh, that's another thing, gbell_. Before you run off thinking you need an ODM (Object Document Mapper, like an ORM for MongoDB), MongoDB document validation utterly wrecks standard schemas in terms of what you can do. https://docs.mongodb.com/manual/core/document-validation/
[02:23:44] <GothAlice> Anything you can do in a .find() query, you can do in a validation document. (Conditional fields, type checking, bounds checking, …)
[02:24:18] <GothAlice> More and more my own projects are just using the plain MongoDB driver.
[02:24:24] <gbell_> Alright, word salad there. It seems a lot of OOP architecture is intent on being able to replace the database backend with anything. Thus the DAL, I think? So with Mongo delivering JSON-like objects, does the DAL for it turn into just about a pass-thru?
[02:25:19] <GothAlice> Depends on the language. Pymongo, for example, keeps the data in an ordered dictionary. MongoEngine on top of it then wraps it in a full OOP model with declarative schema. It's excessive.
[02:25:49] <GothAlice> Most ODMs like MongoEngine wrap every single thing the driver provides. The performance difference can be an order of magnitude worse, or worse.
[02:26:27] <gbell_> by "wrecks" above, you mean "smashes", like a good thing?
[02:26:44] <gbell_> That was stupid, I replaced one slang word with another.
[02:26:46] <GothAlice> Aye, document validation is infinitely more powerful than any OOP declarative schema I've seen so far.
[02:27:11] <GothAlice> To the point that I'm writing my own declarative schema system that generates validation documents instead of rolling its own validation.
[02:27:44] <GothAlice> (And does no wrapping of the bare driver at all.)
[02:29:51] <gbell_> So it doesn't sound like I'd be too naive to use Mongo directly with a NodeJS application...
[02:31:00] <GothAlice> Should work great. In the Node ecosystem, avoid Mongoose. It's one example of an ODM that violates a number of expectations. (People often end up accidentally storing ObjectIds in the database as strings, at an overhead of 12 bytes per instance, and one user managed to create a collection named "[object Object]" which was… unfortunate.)
[02:32:01] <GothAlice> I can basically recommend to get comfortable with the basics (and bare driver) before considering adding more layers on top. ;)
[02:34:36] <gbell_> Great. That's always the dilemma - skip the low-level stuff or make sure you've got it down? Most of us use compilers without understanding assembly language :)
[02:35:14] <GothAlice> The raw driver is hardly analogous to machine code. It's not like you're writing to the socket or anything; it's already a framework, and a darn good one.
[02:36:50] <gbell_> Yep, not really low level, you're right.
[20:16:13] <StephenLynx> anyone here familiar with GridFSBucketWriteStream?
[20:16:30] <StephenLynx> I am trying to write a buffer to it, but I never get a callback executed