PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Friday the 26th of August, 2016

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:43:26] <hackel> I'm useing the $lookup aggregation with a one-to-one relationship. Is there a way to make it return just the matching document, instead of a single-item array? Or a projection I can apply to do this?
[01:11:29] <GothAlice> hackel: $lookup followed by $unwind on the target field.
[01:12:11] <GothAlice> Critical note when using $unwind this way: it'll drop records that have no result in the $lookup (i.e. empty fields, or fields for which no foreign document matches.)
[01:12:45] <hackel> GothAlice: Thanks, I finally found the $arrayElemAt operator that seems to be doing what I want. Any disadvantage over $unwind?
[01:15:28] <GothAlice> hackel: Nothing comes to mind; I haven't tested any performance difference, and $unwind does have that rather large caveat, so $arrayElemAt is probably best. (I need to re-review the docs after each release… TIL $arrayElemAt is a thing! ;)
[01:17:11] <hackel> Heh, yeah things certainly change quickly. First time I've had to use $lookup, actually.
[01:18:40] <GothAlice> That actually solves a problem for some of my aggregate reports, so thank you, hackel! :)
[01:19:02] <GothAlice> (No more foo.0 references in my report column definitions, yay!)
[01:19:13] <hackel> Haha, yes that's what I was hoping to avoid as well.
[07:00:49] <sumi> hello
[12:26:24] <iamacomputer> greetings all
[12:26:59] <iamacomputer> is there a way of building the mongo client without building the entire db?
[12:56:55] <Fenaer> I'm new to MongoDB, and I'm struggling to think about designing schemas in a non-relational way. Anyone got some time to PM me and help me through some simple questions?
[13:07:08] <Derick> Fenaer: better ask in channel, so others can learn from it too
[13:07:24] <Fenaer> Derick: Alright
[13:07:39] <Fenaer> I'm making a small app, that has players and rooms
[13:07:54] <Fenaer> Players can move between rooms, and rooms can hold many players
[13:09:14] <Fenaer> Is it better to have a collection of room documents that hold a list of players, or a collection of player objects that holds the name of the room (although that feels a bit too much like a foreign key)
[13:09:21] <Fenaer> Or is there some other way that I'm not seeing?
[13:09:32] <StephenLynx> id go with the later.
[13:09:45] <StephenLynx> it offers more flexibility with your players.
[13:10:22] <StephenLynx> and foreign keys are not inherently wrong in mongo. they are just easy to misuse.
[13:10:38] <Fenaer> I got the impression that they weren't performant?
[13:11:25] <StephenLynx> if not misused, that impact is not relevant.
[13:11:38] <Fenaer> What would count as misuse?
[13:11:48] <StephenLynx> lets say that this foreign key leads you to performs one more find using an indexed field.
[13:11:50] <StephenLynx> that's ok.
[13:12:08] <StephenLynx> now
[13:12:18] <StephenLynx> lets say you have to performs a bunch of operations because of this
[13:12:36] <StephenLynx> and everything in your model is a foreign key
[13:12:48] <StephenLynx> now that I would categorize as a misuse.
[13:13:24] <Fenaer> At that point I'd just used a RDBMS
[13:13:31] <Fenaer> Doesn't make sense to use Mongo for that
[13:13:36] <StephenLynx> exactly.
[13:13:59] <StephenLynx> mongo doesn't streamline your work.
[13:14:12] <StephenLynx> you have to think about what you should do.
[13:14:35] <iamacomputer> Why not have your rooms contain a list of player ids
[13:14:49] <StephenLynx> that's valid too.
[13:15:05] <StephenLynx> if his rooms have multiple information about it.
[13:15:37] <StephenLynx> if they are just a name, they might as well live only as a name on the player document.
[13:16:06] <Fenaer> I need rooms to have multiple bits of information
[13:16:19] <Fenaer> Including links to other rooms
[13:16:24] <StephenLynx> yeah, I'd have documents for both players and rooms and have rooms with a list of players.
[13:17:04] <iamacomputer> Also, while secondary, it will help isolate your code of where you modify player data
[13:17:24] <iamacomputer> so all the mutations of the players will be (hopefully) in one play
[13:17:25] <iamacomputer> ce
[13:17:37] <iamacomputer> and you can later go through them and double check you're not messing something up
[13:17:50] <iamacomputer> anyhows
[13:18:19] <Fenaer> I have an external ID that references the player. Is it better to use this as the primary ID of the document, or as another field somewhere else
[13:18:26] <iamacomputer> I'll repost the question about the mongodb client, now, once more in 6 hours, not trying to spam tho
[13:18:46] <StephenLynx> external id that references the player?
[13:18:46] <iamacomputer> is there a way of building the mongo client without building the entire db? ?? ?? <- still building client ;-)
[13:19:06] <StephenLynx> iamacomputer, you REALLY have to compile it from source code?
[13:19:13] <iamacomputer> yes
[13:19:15] <StephenLynx> welp.
[13:19:19] <StephenLynx> dunno then.
[13:19:19] <iamacomputer> but I don't want to compile the whole thing
[13:19:20] <Derick> why?
[13:19:21] <iamacomputer> takes hours
[13:19:40] <StephenLynx> are you using -j X? :v
[13:19:43] <iamacomputer> also, when I'm deploying I was hoping to have the computer to which it is deployed compile the client
[13:20:02] <iamacomputer> so I don't have to worry about architectures/linux versions/etc
[13:20:11] <StephenLynx> which distro?
[13:20:16] <iamacomputer> but if it takes 6 hours to deploy a computer in this fashion, it is not an option
[13:20:17] <iamacomputer> all distros
[13:20:37] <iamacomputer> I pick the computer via which is cheapest on the cloud at that time from multiple providers
[13:20:51] <StephenLynx> I am 100% sure it provides centOS 7
[13:21:03] <StephenLynx> I suggest picking it and using mongo's repository for yum.
[13:21:14] <iamacomputer> I'm also using osx
[13:21:16] <iamacomputer> and windows
[13:21:27] <StephenLynx> :::::::::::::^^^^^^^^^^^^^^^)
[13:21:31] <iamacomputer> anyhow, must run
[13:21:34] <Derick> you definitely shouldn't need to compile it for that
[13:21:41] <Derick> we have binaries
[13:21:49] <StephenLynx> Lasciate ogne speranza, voi ch'intrate
[13:40:05] <Fenaer> StephenLynx: As in an ID that is unique to the player
[13:40:13] <Fenaer> So I guess I should just set _id to that then
[13:40:22] <Fenaer> I think I answered my own question
[13:40:29] <StephenLynx> yeah, I usually reference stuff by their _id
[13:40:41] <StephenLynx> unless I rather have some readable reference that happens to be an unique field.
[14:15:07] <Rumbles> hi
[14:17:50] <Rumbles> I've got a few mongo nodes set up and I am logging to a file, then collecting those files with rsyslog, using logrotate to rotate the files daily. I have noticed though that my log files are followed by rsyslog, then when the log files rotate it stops following the file. Can anyone tell me if I have made a mistake in my config? https://paste.fedoraproject.org/414359/14722204/
[14:18:30] <Rumbles> I rea the docs and few other sites but I'm not really getting an idea of what I'm doing wrong
[14:19:06] <Rumbles> I'm guessing that logrotate renames the file, the kill makes mongo create a new log file, but thn rsyslog doesn't see the new log file...
[16:25:39] <cjhowe> hi! how do i submit the documents for this spec? https://github.com/mongodb/specifications/blob/master/source/server_write_commands.rst
[16:25:46] <cjhowe> is it using runCommand from the old wire protocol?
[16:26:04] <cjhowe> we tried using that but didn't get an n value, is that because it's stored in the op_reply?
[17:23:19] <richardsmd> if one wanted to statefully progress through a very large collection in pieces, would
[17:23:19] <richardsmd> find({_id: {$gt: my_last_id}}).sort({$natural: 1}) be a sensible method?
[17:23:33] <StephenLynx> no.
[17:23:45] <StephenLynx> i'd sort by _id instead.
[17:25:52] <GothAlice> richardsmd: When performing long iteration, it's important to have a sort key and track the "last seen value" for that sort in order to continue from where you left off in the event the query times out while processing.
[17:26:57] <richardsmd> for which _id is a good indicator (since it starts with a timestamp of sorts) and won't involve large amounts of time to query/sort for?
[17:27:56] <GothAlice> There is basically zero chance of a collision, making that the "stateful" progress indicator of choice.
[17:28:50] <richardsmd> i see that part of _id is a random component so it would appear _id presents a *small* chance of missing values (only those inserted the exact same second). is that correct or am i misinterpreting?
[17:29:06] <StephenLynx> no, that's wrong.
[17:29:08] <GothAlice> Additionally, since they're incrementing with time as you mention, newly added records are naturally added to the end.
[17:29:17] <StephenLynx> _id is generated sequentially by drivers.
[17:29:32] <GothAlice> richardsmd: No, you'd need to generate more than 16 million IDs from a single app worker in a single second for there to be a collision.
[17:29:39] <GothAlice> https://docs.mongodb.com/manual/reference/method/ObjectId/
[17:29:54] <StephenLynx> unless your driver is crap or you are setting them manually, they won't be created out of order.
[17:29:54] <GothAlice> Timestamp, machine, process, counter per process with random initialization.
[17:30:36] <StephenLynx> yeah, its not exactly sequentially
[17:30:36] <richardsmd> GothAlice: im reading that exact page. my concern is that the last 3 bytes are "random" but, per StephenLynx, the random is sequential (good for sorting) as opposed to truly random (meaning if i stop iteration at any given ID i have no guarantee that other ides won't sort *before* it IFF they were created in the same second)
[17:30:43] <StephenLynx> but its pretty safe to expect them to be ordered.
[17:30:57] <GothAlice> Meaning, to have the same timestamp value between two, they'd need to be generated in the same second. But, since machine and process ID are in there, that's limited to two in one second per machine and process… but with a 3-byte (24-bit) counter… 16.777 million records per second per process per host is the collision limit.
[17:31:10] <GothAlice> The last three bytes are _not_ random, unless you have a bad driver.
[17:31:25] <GothAlice> It's _initialized_ with a random value when the process starts, but is an incremental counter.
[17:31:32] <richardsmd> ahh, the counter *starts* with a random value and is then incremental
[17:32:32] <GothAlice> I really appreciate the deterministic behaviour vs. things like UUIDs, which only have statistical collision limits. (I.e. generate one per second until the heat death of the universe gives you a 50% chance of a collision. Eew. ;)
[17:32:56] <GothAlice> (Not to mention a UUID as a 16-byte structure, ObjectIds are 12-byte.)
[17:33:18] <richardsmd> i think it strikes an excellent balance between auto incrementing (many problems) and UUID
[17:33:35] <GothAlice> Also… you can range query them to search for records by creation time range.
[17:34:05] <richardsmd> which is basically just querying the first bytes of the timetsmp though, right?
[17:34:17] <GothAlice> now = datetime.utcnow; {_id: {$gt: now - timedelta(weeks=2), $lte: now}}
[17:34:23] <GothAlice> :)
[17:34:53] <GothAlice> Er… now = datetime.utcnow; {_id: {$gt: ObjectId.from_timestamp(now - timedelta(weeks=2)), $lte: ObjectId.from_timestamp(now)}}
[17:34:55] <GothAlice> That's better.
[17:35:23] <GothAlice> richardsmd: It's a binary field, compared by value. "Faked" IDs for the purpose of range querying have zeros in the non-timestamp fields.
[17:35:52] <GothAlice> (Pro tip: never save faked IDs like that into the database. Safe to query with, not safe for other uses because of those zeros.)
[17:36:47] <richardsmd> yeah ok, i can see where something might try and use the machien ID/process ID and be unhappy with 0s
[17:38:57] <GothAlice> richardsmd: For most use cases, using ObjectId _id values (the default as inserted records have one inserted by the driver if missing) also eliminates the need for a discrete "creation time" field. :)
[17:40:07] <GothAlice> Also lets other operations have interesting side-effects. For example, using "throwaway" inserts (unconfirmed ones; risky, but useful for non-essential data and very, very fast!) you still get back the ID of the inserted record as a result of the insert. Since the driver is providing it, not the server.