pmxbot IRC Log Viewer

[05:40:55] <bsima> let's say you have a bunch of data in mongo already - how do you determine what the schema/data types are?

[10:20:59] <GothAlice> bsima: Before any data gets inserted, I have formal structures. https://mongo.webcore.io/#declarative-document-modeling

[10:21:47] <GothAlice> If I somehow had existing data without those formal structures, I'd first determine the formal structure. There are some tools, like Studio 3T, which can "analyze" an existing collection, identifying all of the property names used and statistical information over which types are most often represented/used.

[10:23:14] <Popzi> GothAlice thanks for your reply yesterday btw :-) I managed to get something similar, albeit much more basic than yours, Mongo is quiet hard to grasp at first in node xd

[10:23:55] <GothAlice> Popzi: Sorry to say, starting with such a language doesn't benefit one much. JS does not make understanding easy.

[10:24:44] <GothAlice> Ref: https://www.ecma-international.org/ecma-262/6.0/#sec-abstract-relational-comparison and the following two sections which explain why you should never use ==.

[10:25:15] <GothAlice> Most languages aren't this damaged.

[10:25:46] <Popzi> Mmm, isn't mongo used most commonly with js/node though?

[10:25:56] <GothAlice> It's used with everything, regardless.

[10:26:00] <GothAlice> It doesn't play favourites.

[10:27:01] <GothAlice> Interesting example problems that commonly crop up: use of Mongoose. Almost any use of Mongoose. It can end up creating collections for you named, literally, "[object Object]" (assisting people trying to delete these collections is fun!), and has a terrible tendency to store ObjectId values as 24-character hex textual strings, instead of BSON type 7, binary ObjectId, which can ruin comparisons and range querying completely.

[10:27:33] <GothAlice> (Not to mention storage efficiency.)

[10:33:38] <GothAlice> Of course, there's always this classic gem for some pointers at what to look out for: https://www.destroyallsoftware.com/talks/wat

[10:34:01] <GothAlice> Array(16).join([]-{}) + " Batman!"

[10:35:59] <Popzi> You are very knowledgable GothAlice =) I'll be sure to take a look when I get home, ty

[10:37:27] <GothAlice> (The result of the above in a REPL shell or browser dev tools prompt is: 'NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN Batman!', because, of all of the addition and subtraction that doesn't make sense, [] - {} at least, is actually, Not a Number.)

[11:01:07] <synthmeat> alice's back around. awesome.

[11:05:22] <GothAlice> synthmeat: Only because I literally forgot to close this tab, and that I have yet to discover a reason for the ObjectId hardware identification bytes change.

[11:06:01] <synthmeat> :(

[11:06:02] <GothAlice> I've found back lots of "update the documentation" or "bring the client driver in line with" JIRA tickets, but none describing the "original" decision to not use the richer interpretation and generation scheme.

[11:06:58] <GothAlice> Specifically, "random bytes" is egregious when before we had <host ID> and <pid on that host> which could be used in statistics to aggregate by originating host.

[11:08:07] <GothAlice> Sure, originally used MD5. That wasn't the best choice. For FIPS compliance, that got replaced with an FNV hash. I can deal with that. Throwing out the baby with the bathwater and going with pure random? Sad. :P

[11:08:57] <synthmeat> maybe it's exactly so it has more entropy / less inadvertent info leakage. i never knew you could get that data like that.

[11:09:52] <synthmeat> have i been leaking info by objectids all this time?

[11:10:04] <GothAlice> And creation time.

[11:10:08] <synthmeat> that i knew

[11:10:34] <GothAlice> The hardware ID is also heavily hashed with a majority of the hash result thrown away. Highly pseudo-anonymous.

[11:11:33] <GothAlice> Random gives a statistical probability for collision. Admittedly, with 40 bits of randomness, pretty highly unlikely. With a new opportunity for that type of collision on every application process start (when calculating that hardware identifier segment). This means, two simultaneous process starts might (esp. with bad RNG seeding) get the same HWID. Whereupon the only thing preventing a collision off the bat is the random IV of the

[11:11:33] <GothAlice> counter.

[11:11:39] <synthmeat> you could maybe guess sharding strategy like that?

[11:13:14] <GothAlice> With the hardware+PID approach, you would literally have to generate 16.777 million records in a single second on a single process on a single machine before possibly encountering a collision. I really do like guarantees about things like that.

[11:14:27] <GothAlice> synthmeat: Nope, not unless you're using broken client drivers in your server-side app. ObjectIds are constructed by application-side drivers, if _id keys are missing, not the server (unless literally not supplied an _id key as a last resort). The HWID is meant to be the application HWID, not the DB node.

[11:14:55] <GothAlice> Or, well, was.

[17:15:54] <bsima> thankks GothAlice

[18:34:50] <GothAlice> In defining a “number” type of field, allowing minimum and maximums to be defined is a fairly obvious feature. Can anyone think of a good reason to include an optional “must be evenly divisible by” limit? (Requiring n % setting == 0 to be valid.)

[18:40:14] <GothAlice> Hmm. That could incidentally permit limitations on precision. setting=0.01 would ultimately result in remainders only if defining sub-0.01 (e.g. 0.001) quantities. Hmmmmm.

[19:01:28] <GothAlice> 0.27 % 0.01 == 1.214306433183765e-17 != 0. Thanks, IEEE Double Precision Floating Point, you’re helping!

[21:04:07] <Sasazuka> for getting the maximum of a value, I should still stick to group / aggregate instead of sort / limit (pre-4.0)?

[21:06:50] <GothAlice> Sasazuka: Is that recommendation documented somewhere? (Link?) sort/limit isn’t egregious, given the sort operation will be smart enough to limit its tracking to the one. Aggregate is, however, direct, explicit, and clear in intent.

[21:07:13] <GothAlice> (Of sort/limit/skip, skip is the “dangerous” one.)

[21:07:18] <Sasazuka> https://docs.mongodb.com/manual/core/aggregation-pipeline-optimization/#agg-sort-limit-coalescence -- says it was a change for 4.0

[21:10:24] <GothAlice> Sasazuka: That’s for $sort + $limit stages in an aggregate. db.collection.find_one(…) sort/limit should maintain the hint association, though I’m having difficulty digging up mention of that in the indexing docs on my iPad.

[21:12:15] <GothAlice> (Given in a plain find there is no way to re-order the “stages”, it’s a single invocation.)

[21:15:12] <GothAlice> And the “clear in intent” bit would be $group + $max aggregation, resolving to a single returned document containing that maximum.

[21:15:30] <Sasazuka> GothAlice: thanks, I'll use that then

[21:16:28] <worstadmin> I keep creating a user using db.createUser with auth enabled and the user is just gone when I do a db.getUsers(). Im getting a success message but the user just isnt there after I create it

[21:34:50] <worstadmin> its being removed somehow even though this is happening on the primary

Log file Viewer

Help | Karma | Search:

#mongodb logs for Wednesday the 1st of May, 2019