pmxbot IRC Log Viewer

[12:22:40] <wtiger> hi all, I have a mongodb collection with data, now I want to update the schema(using mongoose)

[12:23:08] <wtiger> how do I go about it?

[12:58:37] <GothAlice> wtiger: There is no schema; issue an update operation applying the change you want. E.g. want to remove a field, $unset, etc.

[13:01:47] <wtiger> GothAlice: thanks, what about the mongoose schema?

[13:02:01] <wtiger> just update it too after updating the db?

[13:06:55] <GothAlice> wtiger: The application side data models will definitely need updating, however MongoDB itself could generally care less what extra tools you are using to massage your data. Changing your app-side model won’t change anything in the DB, it’ll need explicit update instructions.

[13:07:37] <GothAlice> The reverse is more generally comprehensible; if you update your DB-side data, it only makes sense that your application model would need to match it.

[13:07:39] <wtiger> GothAlice: gotcha, thanks a ton!! happy mongoing

[13:08:31] <GothAlice> wtiger: I also have the extremely strong urge to warn you about Mongoose. It… has a history of damaging calm.

[13:08:47] <wtiger> lol

[13:09:05] <GothAlice> That is, encouraging certain mistakes (i.e. somehow ending up with a collection literally named “[object Object]” — fun student challenge to figure out how to delete that or re-name it…

[13:09:26] <wtiger> what do you recommend instead?

[13:09:28] <GothAlice> … or it has an extreme tendency to store ObjectId values as hex-encoded text strings, not ObjectIds, which breaks filtering and sorting.

[13:09:32] <GothAlice> Literally anything else.

[13:09:36] <wtiger> lmao

[13:09:41] <wtiger> sqlalchemy? :P

[13:09:55] <wtiger> sequelize?

[13:10:10] <GothAlice> Literally anything else.

[13:11:51] <GothAlice> wtiger: For example, here’s a user being bitten by Mongoose who will swear up and down he doesn’t have a problem, despite asking about a problem. https://stackoverflow.com/a/55386123/211827

[13:11:54] <GothAlice> Thanks, Mongoose.

[13:13:42] <GothAlice> (See also the links to possible duplicates I added as comments on the question. This is an extremely common problem with that library.)

[13:17:33] <wtiger> makes sense!!\

[13:17:56] <wtiger> thanks for the heads up

[13:19:16] <jokke> hi there o/

[13:19:25] <wtiger> o/

[13:19:30] <wtiger> come stai?

[13:20:52] <jokke> we're running a tsdb with mongo (wired tiger storage backend) and since it has been unmaintained for a while and bad monitoring yadayada we just realized that the index of individual shards has grown to 25 GB. The shards only have 8GB of ram

[13:21:17] <jokke> this reported size is already the compressed index size? afair wiredtiger compresses the index.

[13:22:34] <jokke> db.stats() on mongos reports an index size of 39079936000

[13:23:38] <jokke> db.stats() on the first shard reports 11847487488

[13:24:10] <jokke> and on the second shard 27232432128

[13:24:30] <jokke> which kinda looks like a hot shard to me

[13:24:45] <jokke> although the shard key shouldn't be monotonically growing

[13:26:41] <jokke> ah but it might be...

[13:27:04] <jokke> ObjectIds are monotonically growing?

[13:27:22] <jokke> this is our shard key: shard key: { "_id.data_logger_id" : 1, "_sharding_hash" : 1 }

[13:27:35] <jokke> data_logger_id is an ObjectId

[13:28:02] <jokke> so i guess it will cause the whole index to grow monotonically...

[13:29:55] <jokke> uhhhhh wat

[13:30:00] <jokke> the balancer isn't running?

[13:30:59] <GothAlice> jokke: https://docs.mongodb.com/manual/reference/bson-types/#objectid — yes and no.

[13:31:48] <jokke> yeah but still

[13:32:20] <GothAlice> Formerly, an ObjectId was a multi-component packed struct containing: timestamp, hardware ID (hash of hostname), process ID, counter for that process randomized IV on each startup. This gave a guarantee of no collisions until generating 16.777 million records per second from a single process on a single host.

[13:32:56] <GothAlice> Today it is a multi-component packed struct that is essentially useless by comparison: timestamp, 5 random bytes randomized on every process startup, counter.

[13:33:24] <jokke> i see

[13:34:35] <jokke> this is very bad news...

[13:34:49] <jokke> it would take forever to migrate the db to a new shard key

[13:35:13] <jokke> i just hope that the balancer can mend things a bit

[13:36:13] <jokke> given that there aren't too many jumbo chunks...

[13:38:01] <jokke> god damn, i spend so much time researching shard keys and even added the _sharding_hash field to maximize query isolation and cardinality

[13:38:44] <jokke> and now, several years later i realize that i f'ed it up by having the key start with an objectId

[13:40:30] <GothAlice> It does happen. Heck, I had just shy of 10TiB of data in MySQL before I migrated Exocortex to MongoDB, 10 years ago. MySQL BLOB storage as a filesystem was the greatest single mistake I’ve made in my computing career. ;P

[13:40:36] <jokke> GothAlice: do you know about the index size being the compressed or uncompressed size?

[13:40:36] <GothAlice> (Today, that dataset is ~40TiB…)

[13:40:58] <jokke> haha :)

[13:41:11] <GothAlice> I do not. I would assume / hope indexes would be uncompressed; they’re packed binary tree structures, and speed is critical. Can’t be wasting time unpacking memory pages.

[13:41:26] <jokke> hmm yeah

[13:42:08] <jokke> hmm shard key: { "_id.data_logger_id" : 1, "_sharding_hash" : 1 }

[13:42:10] <jokke> ups

[13:42:15] <jokke> https://docs.mongodb.com/manual/core/wiredtiger/#compression

[13:42:17] <jokke> ^

[13:43:25] <wtiger> GothAlice: please excuse my noob/broad question - how would you rate mongodb as compared to postgresql in terms of performance etc

[13:43:29] <GothAlice> Heh, “prefix compression” is a silly name. That’s not compression, per-se, that’s just elision of repeated common prefixes. (A happy side-effect of tree based storage for these things.)

[13:43:37] <wtiger> you seem like you have a lot of experience with mongo

[13:44:13] <GothAlice> wtiger: Almost 8 years ago I benchmarked 1.9 million distributed RPC calls per second mediated over MongoDB, requiring three inserts per RPC call, i.e. 5.7 million inserts per second.

[13:45:31] <wtiger> und?

[13:46:27] <GothAlice> Today, I have event analytics collections (hit tracking, intent to apply to jobs, actually applying, being hired, …) containing 1,626,284 aggregate reporting periods—that’s hours of data, not individual event tallies.

[13:46:42] <GothAlice> (Yes, that’s 185 years worth of analytics.)

[13:46:45] <jokke> wtiger: i'd say it totally depends on what you want from your db

[13:47:04] <wtiger> GothAlice: mm

[13:47:33] <jokke> wtiger: if you just want to dump documents into a db really really fast, mongodb is your choice i'd say

[13:47:45] <jokke> but postgres is crazy fast in terms of relational data

[13:48:59] <GothAlice> Postgres also lets you get away with some interesting things; it’s an extremely powerful tool.

[13:49:31] <GothAlice> For example, at work we ran Postgres storage servers that had no permanent storage (physical, network, whatever) at all beyond backups occasionally to S3.

[13:50:13] <GothAlice> Recovery point objective: one minute of data loss, maximum. A new node spins up, sees no cluster, restores from the latest snapshot, replays every minute of oplog up to the point of failure, and it spins up friends for itself.

[13:50:24] <GothAlice> If a node spins up and sees an existing cluster, it just replicates from that.

[13:50:31] <GothAlice> WAL-e FTW.

[13:50:37] <jokke> :)

[13:52:40] <jokke> what bothers me about mongo is that the whole javascript webdev crowd sees it as a general purpose db - which imho it totally isn't - and use it just because it has a javascript query language.

[13:53:22] <wtiger> woah

[14:02:36] <GothAlice> Then there was that web application and database server that had no moving parts. Boot from DVD-RAM, entire DVD loaded into RAM…

[14:03:18] <GothAlice> jokke: It is a general purpose DB. It also replaces/combines at least six services, eliminating a huge amount of infrastructure management burden.

[14:03:42] <jokke> relations in mongodb is a huge pain

[14:04:08] <GothAlice> E.g. on one project I joined two weeks into needs analysis, they had picked: Postgres (general data storage), Redis (task queues), Membase (caches), ZeroMQ (fast queues), RabbitMQ (queues with persistence), …

[14:04:31] <GothAlice> jokke: Failure before leaving the gate: thinking it’s relational, and trying to use it like a relational database.

[14:05:01] <jokke> yeah but most ODMs do exatly this

[14:05:10] <GothAlice> On that example project, a week after I joined, all of that had been replaced with MongoDB. (That’s from where the 1.9 million dRPC test I did came from, to prove it’d work.)

[14:05:47] <GothAlice> https://gist.github.com/amcgregor/4207375

[14:13:24] <jokke> GothAlice: where would i find the logs for balancer runs?

[14:13:45] <jokke> i have planty of aborted migrations here and would like to know why

[14:14:03] <jokke> in mongos logs?

[14:14:36] <jokke> or the logs for the config server

[14:15:36] <jokke> hm nah, nothing there

[14:16:59] <GothAlice> jokke: Search all of your logs for messages involving “moveChunk migrate”.

[14:17:59] <jokke> nothing...

[14:18:02] <GothAlice> For some additional info, see: https://stackoverflow.com/questions/40494330/how-i-can-debug-mongodb-slow-chunk-migration and https://dba.stackexchange.com/a/81580 (which includes a modern update)

[14:18:16] <jokke> could it be on the shards themselves?

[14:18:37] <GothAlice> Ultimate question: is there a balancer? sh.isBalancerRunning()

[14:18:55] <GothAlice> Yes, shards would generally instigate transfers amongst themselves, from what I recall.

[14:19:45] <GothAlice> https://docs.mongodb.com/manual/core/sharding-balancer-administration/#chunk-migration-procedure — balancer triggers chunk movement, the source shard performs the moveChunk migration itself.

[14:20:03] <jokke> ah yeah, it's on the shards

[14:21:38] <GothAlice> And on the ‘most ODMs do this’, only the ones that either don’t get it, or those that are crutches for users migrating from relational ORMs. E.g. Django. Don’t get me started on brain-damaged DAOs.

[14:21:55] <jokke> yeah

[14:22:00] <jokke> mongoid too

[14:24:18] <jokke> lol :D

[14:26:30] <GothAlice> (And for people who think “relational is great!”, I have this as a wonderful demonstration of how relational is not great, esp. for modelling things like role-based access control. Which you’d think would be a great use.

[14:26:45] <GothAlice> http://s.webcore.io/1ae9e5220d30/2010-07-25--query-from-hell.png — this is an actual SQL query, not just a UML representation of things.

[14:27:19] <jokke> :)

[14:27:20] <GothAlice> Or, on the “holy sh*t, did nobody actually stop, think, and plan before doing?” page, relational hell: http://s.webcore.io/2Q121s2S2d1E/model.pdf

[14:27:25] <jokke> nothing is great for everything

[14:28:38] <GothAlice> I have found MongoDB to be great at everything I have ever tried to do with it. From 40TiB bulk filesystem storage via GridFS (for reals), to general data storage, to minimal relational information (LEFT JOIN is a-OK!), to distributed realtime push queues, to caching with auto-expiry via TTL indexes, …

[14:28:54] <GothAlice> The only thing I punted to another tool was graph storage. Neo4J is good people, for that.

[15:45:08] <synthmeat> GothAlice: i haven't gave it enough thought to discuss it, but i think i'm kinda finding nested dictionaries very very good replacement for arrays. basically, now i'm trying to effectively only have docs as "arrays" in data model.

[15:46:05] <GothAlice> That sounds like a shot to the foot in slow motion.

[15:46:15] <synthmeat> works very well for sparse data

[15:46:35] <GothAlice> I.e. volunteering to write the next “angry at MongoDB because I didn’t read the docs or understand NoSQL limitations” blog post.

[15:46:44] <synthmeat> i.e. cryptocurrency price history. doc is `coins.{BTC,ETH,etc.}` so i can do $exists on it

[15:46:56] <GothAlice> Highly dynamic keys (key:value storage, here) are extremely detrimental. I.e. you can’t index them.

[15:47:39] <synthmeat> i don't need to though

[15:47:41] <GothAlice> Simple example, translated text. {name: {en: “Example”, fr: “exemple”}} — now you can’t search for names, good job.

[15:48:16] <GothAlice> {local: [{language: “en”, name: “Example”}, {language: “fr”, name: “exemple”}]} — now you CAN index on local.name.

[15:49:01] <synthmeat> ok, if i don't need to index, what's the downside?

[15:49:16] <synthmeat> what's more space efficient, arbitrary sized arrays or this?

[15:49:34] <GothAlice> https://github.com/marrow/mongo/blob/develop/marrow/mongo/core/trait/localized.py?ts=4 — noting that a Mapping field (i.e. dictionary) is actually implemented as an array of sub-documents like the example given.

[15:50:01] <GothAlice> Arrays, actually, are more compact than dictionaries, which store every single key with every single instance of those keys, in every single document in the collection.

[15:50:32] <GothAlice> However, that’s an astoundingly narrow-focused premature optimization, to be worrying about that particular detail.

[15:51:02] <synthmeat> well, it isn't premature, i came to it because array version was too big

[15:52:06] <GothAlice> (I.e. that’s a thing to worry about when you start approaching millions of array elements. Specifically, the upper limit of 2 million nulls by virtue of the 16MB document size limit. 2 million.)

[15:53:07] <GothAlice> (Otherwise, that 16MB document size limit represents a novel containing more than 200 novel-sized chapters. In terms of user-contributed text content, no user of mine has ever come close to approaching that.)

[15:55:45] <synthmeat> ok, so, this reduced collection size by, uh, x3 iirc

[15:57:04] <synthmeat> but it's fine, easy to migrate when i need to

[15:57:09] <synthmeat> i'll keep it mind, thanks

[16:06:08] <GothAlice> One of my favourite articles touching on a number of MongoDB storage concerns: http://www.javaworld.com/article/2088406/enterprise-java/how-to-screw-up-your-mongodb-schema-design.html

[16:07:50] <synthmeat> yeah, i read that one. this works for me 100%

[16:17:04] <ZackW> Hello! I'm a bit new to mongoDB, and I'm attempting to use the $group aggregator, but I'm not sure I understand how to use it. I'm attempting to count all documents WHERE field='thing', grouped by each unique value of another field

[16:17:42] <ZackW> At present, I'm using $match and $count to get the total count of documents WHERE field='thing'

[16:20:41] <ZackW> To prevent this becoming an X/Y question, I'm a system administrator for some folks who run a minecraft server, and I'm attempting to get the number of a certain kind of block each player has broken

[16:22:06] <ZackW> the important parts of the document are going to be {Player: "some_long_uuid",Event: "break", Target: "minecraft:some_block"}

[16:56:31] <ZackW> Nevermind! I think I've figured it out after fiddling enough with Compass

[21:04:12] <Rojola> hi

[21:04:21] <Rojola> may I please ask for help?

[21:04:56] <Rojola> I am using a software named "wekan"

[21:05:04] <Rojola> this "wekan" board uses MongoDB

[21:05:17] <Rojola> now, for some reason, the board slowed down *massively* over the past week

[21:05:29] <Rojola> in the #wekan channel, nobody has responded for hours

[21:05:35] <Rojola> so I considered asking here.

[21:05:44] <Rojola> Could this be a problem with MongoDB?

[21:05:54] <Rojola> how would I go about bugfixing MongoDB being slow?

[21:31:10] <GothAlice> Rojola: Step 1: profile and prove there is a problem with MongoDB, and not your application or your application's use of it.

[21:31:58] <GothAlice> Right, never mind, he left. Effort of asking a question, no patience waiting for an answer. :/

Log file Viewer

Help | Karma | Search:

#mongodb logs for Thursday the 27th of June, 2019