[10:47:07] <Derick> madprops: that doesn't show a crash at all
[10:48:06] <Derick> it just shows the new startup and recovery
[18:02:31] <Determinist> hey guys. I have a collection with pretty regular writes and a TTL index to expire the documents after a certain amount of time. I have a change stream to track the collection's expiring documents running on a nodejs application (a micro-service deployment in kubernetes with 3 instances). My problem is that I cannot scale said change stream so that change events would get load-balanced between all 3 application instances. I could
[18:02:31] <Determinist> create a change stream per instance and spread the load using a mod N as the filter on the change stream per-instance, but this seems a bit hacky.
[18:03:02] <Determinist> what's the right way to approach this? the docs do not seem to indicate how to scale and load balance change streams.
[18:03:43] <Derick> I'm not sure whether there is a way to do that.
[18:04:30] <Determinist> maybe I'm being naive to assume this, but surely the 10gen guys assumed people would want to track change events in a load-balanced kind of way.
[18:05:37] <Determinist> I could feed a queue with change events and load-balance from there, but there's still a bottleneck in the process consuming the change stream and feeding the queue.
[18:06:40] <Derick> considering that change streams are serialised, that doesn't really matter
[18:06:53] <Determinist> I don't understand what you mean
[18:07:14] <Derick> the information coming into a changestream isn't something you can parellize
[18:07:20] <Derick> so why would reading one make sense?
[18:09:00] <Determinist> why wouldn't I be able to parallelize? if i'm only interested in a single kind of event (expiring documents) and those documents are not related to one another in any way, I don't see a reason why this wouldn't be possible besides the bottleneck related to the change stream consumer being a single process
[18:09:58] <Derick> the bottleneck would be processing these events, not? (I guess, cache expiration?)
[18:11:01] <Determinist> processing those events can be load-balanced between queue consumers. the main problem is that change streams can only be consumed by a single process. the business process related to what those documents are is irrelevant for the matter at hand, IMHO
[18:12:49] <Determinist> i'm not talking about scaling such a process in general in a generic fashion. I'm specifically asking about a scenario that involves a single collection with a TTL index and a way to be able to capture those expiring documents using change streams. those changes can be fed into a queue to be consumed by multiple consumers and I absolutely do not care about the order in which said documents expire.
[18:13:12] <Derick> right, but the reading correct (with retries, etc) can only be done from a single process, so you're going to have to add the queue to round robin yourself
[18:13:53] <Determinist> sure, that's not a problem, it's the change stream single-process-consumer that's causing the problem
[18:14:22] <Determinist> and having mod N on the change events as the filter stage for the change stream seems like a hack
[18:14:35] <Derick> yes, but that is necessary to guarantee ordering, and that no events are missed. That's how this is designed and supposed to operate. Although you might not care about order, other people might.
[18:17:43] <Derick> I think your best bet would be to throw a tiny queuing mechanism in front of it
[18:18:01] <Determinist> might be nice if there was a way to relieve the ordering requirement as an option when creating a change stream, thus allowing multiple consumers to "subscribe" to events
[18:19:15] <Determinist> oh, yes, you're right. i'm currently using RMQ with a sharded queue per TTL and dead-letter exchanges that feed another "expirations" queue (sharded as well) which in turn gets consumed
[18:19:41] <Determinist> I was just hoping to be able to remove RMQ from this solution and simplify it a bit
[18:20:44] <Determinist> of course, as I said, I was hoping for a quick win by utilizing an already existing piece of infrastructure and not having to maintain another.
[18:21:54] <Determinist> tbh, besides internal-replication scenarios, i'm not quite sure why 10gen opted to include change streams with such limited scaling scenarios. what was the killer feature they imagined for this from the end-user's perspective?
[18:23:12] <Derick> we haven't been called 10gen for like 5 years btw :)
[18:23:29] <Determinist> @Derick: also, one advantage a database with scalable change streams has over a queue like RMQ is the ability to remove documents before they expire without having to design the consumers to be idempotent
[18:23:51] <Determinist> oh, sorry, good to know, thanks :)
[18:23:54] <Derick> as to end-user requirements, I don't know why we added change streams
[18:24:03] <Derick> I think it was originally to stop people from just tailing the oplog
[18:25:43] <Derick> or rather, probably... I wasn't part of that discussion.
[18:28:24] <Determinist> how are change streams modelled in the server? I'm assuming since these are resumable, the state is handled on the server with the client only knowing where it stopped. if said state is kept on the server, it is probably modelled as some kind of queue, right? if so, what's preventing the server from spreading the events round-robin and keeping score on what was sent last?
[18:29:08] <Determinist> i'm referring to your point re: current architecture preventing this
[18:29:25] <Derick> I'm not quite sure, but the resumability isn't simple from what I remember. And, as this is still based on the oplog, we do need to "ping" things to the changestream every say 10 seconds to be able to keep track of the resume token
[18:30:02] <Derick> to be honest, I doubt people have thought of your usecase, so perhaps it's a good idea to file a Jira ticket at https://jira.mongodb.org
[18:30:36] <Determinist> I just might, thank you :)