[11:19:47] <matt_keys> the debian repo for 4.2 seems hosed
[16:52:50] <josh-cain> Hey folks! had a question about cursors + delete operations. Am I in the right place to ask that?
[17:00:34] <josh-cain> OK, I'm going to assume that I'm in the right place - feel free to redirect me otherwise! So here's my question: I need to split up a large number of delete operations into batches. Right now, I'm 1) opening a cursor based on my deletion criteria, 2) iterating over that cursor, and using the bulk API to issue delete statements by id in batches. I'm wondering if holding on to that cursor adds extra overhead, since
[17:00:34] <josh-cain> I'm deleting documents from the cursor's logical result set.
[18:12:12] <GothAlice> josh-cain: Most languages prevent you from doing things like that for good reasons. Usually crops up as an exception such as "<object> changed during iteration", where <object> would be "list", "dict", etc, and if actually intentional, often requires a small amount of indirection to provide consistent, reproducible, and guaranteed behavior. Such as iterating a copy of the object you are about to mutate.
[18:12:14] <GothAlice> Noting that https://docs.mongodb.com/manual/reference/method/db.collection.find/#sessions essentially permit operating over a frozen snapshot of the data which might be immune to alteration along the way.
[18:12:56] <GothAlice> Might be. It would require testing. But still, modification of something being iterated is bad mojo.
[18:15:20] <GothAlice> josh-cain: An alternative pattern would be to iterate and collect the modifications that need to be made after iteration, e.g. "queue up your changes". That way those changes can't impact initial record iteration.
[19:22:59] <josh-cain> GothAlice: Thanks for the response! Yeah... definitely doesn't feel 'right'. What actually happens when I delete an item referenced by the cursor?
[19:23:55] <josh-cain> for instance, I know that if I delete an item that has fields in an index, the index must also get updated and there is overhead involved with that. I can't really find any docs/guidance on the internals of what happens to my cursor when something is deleted...
[19:25:05] <GothAlice> That's an excellent, and technically very detailed question. Highly depends on the approach / algorithm they use for "continuations". Does next(cursor) fetch the next entry based on the last entry then yield the new one, updating a tracking variable, or does next(cursor) hand back an already fetched record, then load the next one eagerly? Turns out: both.
[19:26:59] <GothAlice> find results are fetched in batches, which are buffered by the client driver (e.g. pymongo). During in-process iteration, results are yielded from the internal buffered resultset, then eventually a getMore() is called when nearing to or exhausting the buffer. Does the getMore continue by offset? (If so, deleting might change the offsets in a naive approach…) Does it continue by the next indexed value? By ID? I am not actually aware
[19:26:59] <GothAlice> of that, that'd require some code spelunking.