[09:37:34] <ultrav1olet> Derick: the other day I asked you a question but you had left earlier so I'm asking it again
[09:37:53] <ultrav1olet> Actually I'd love to talk about one problem we faced with mongo 3.2.5 (all 3.x.x releases are affected): after we switched to the WiredTiger engine with compression (and without compression as well) indexing of one of our fields started to take over 2,5 hours instead of 2 minutes using mmapv1
[09:38:02] <ultrav1olet> at the same time all reads and writes were severely affected - more likely dead as each query never completed during indexing
[12:08:42] <alexi5> i am really impressed with wired tiger compression and performance
[12:17:40] <alexi5> is there a way to let mongodb sync data to disk much more regularly. eg change sync interval from 60 seconds to 30 seconds with WiredTiger engine ?
[12:38:11] <alexi5> is there a way to let mongodb sync data to disk much more regularly. eg change sync interval from 60 seconds to 30 seconds with WiredTiger engine ?
[12:41:27] <alexi5> StephenLynx: are specific type of questions allowed in this channel ?
[12:43:30] <StephenLynx> anything related to any part of mongo or its official drivers are welcome.
[12:43:53] <StephenLynx> if the topic is outside of the scope, it isn't exactly disallowed, we will just let you know that probably no one here knows it.
[13:10:22] <contrapumpkin> is there a way to either query the mongodb indices directly or to give mongo hints about how I want it to handle my queries? judging by the result of .explain() it's opting to do a full scan in a case where it seems like it could answer the query in a fairly trivial manner. If I could query the index directly I could get what I wanted.
[13:22:46] <StephenLynx> I forgot the syntax, but its possible.
[13:29:27] <contrapumpkin> StephenLynx: thanks! any pointers? I googled "direct index access" and a few other things I could think of, but was unable to find anything that seemed to do what I wanted
[13:29:42] <contrapumpkin> I see the $hint thing now
[14:23:21] <Diplomat> Hey guys, would it be possible to do something like this with MongoDB: SELECT SUM(amount), client FROM table WHERE date >= 'date-here' AND date <= 'date-here' GROUP BY client
[14:25:39] <Diplomat> It seems its possible with aggregate method
[14:42:33] <StephenLynx> just keep in mind that sometimes you rather pre-aggregate data instead of using frequently complex queries.
[14:43:20] <StephenLynx> naturally, if you don't have to perform said query too often or it doesn't impact performance, you might as well don't pre-aggregate for the sake of simplicity.
[14:47:47] <Diplomat> Unfortunately I have too much data for PHP to handle.. which is why I thought about it
[14:48:02] <Diplomat> I wouldnt want to increase PHP memory usage to 512M because of one action only
[14:49:01] <Diplomat> Basically I'd run that query only few times every month
[15:02:51] <Diplomat> It's really confusing to build those queries actually :P
[15:03:06] <Diplomat> I was able to group and sum everything right, but I can't manage to get a specific timeframe
[15:04:23] <contrapumpkin> StephenLynx: getIndexes tells me the indexes, but doesn't let me query them directly. Perhaps that isn't exposed?
[15:11:28] <contrapumpkin> I'm looking at this answer: http://stackoverflow.com/a/13585423
[15:11:52] <contrapumpkin> does the 1024 byte restriction apply to only the first index or both of them?
[15:42:41] <Diplomat> Hey guys, any ideas whats wrong with my query? http://paste.ofcode.org/uA3fTbzsZfAmyeGPx5FKUe . I'd like to get the amount of earnings, grouped by ownerid from date 1 to date 2
[16:30:17] <contrapumpkin> does mongo use an existing index to help itself make a new index? I'm creating a new index using a partialFilterExpression and it appears to be scanning the entire collection, despite the filter expression being fully satisfiable using the existing index
[17:14:11] <valleyman234> I've got a Voters collection & a Tags collection. These collections are related via a VotersTags collection. What's the best way for me to query for "all Voters with Tag A?" The only way that comes to mind is to retrieve an array of Voter IDs from the VotersTags collection, then query Voters with this array of IDs and $in. I'm worried about the sca
[17:14:11] <valleyman234> le of this operation, however, as the number of IDs in this array could number in the millions.
[17:14:49] <valleyman234> Also, the Voters collection is "sacred," and can not be edited. I can not put the tags on the Voters directly.
[17:30:49] <valleyman234> Perhaps $lookup has some potential, here?
[17:40:01] <thebigredgeek> And recomputed with every write
[17:40:17] <thebigredgeek> This is sometimes more efficient... as it allows you to spread out the cost over N requests rather
[17:41:07] <thebigredgeek> That concentrating it on every read. It also ensures that you aren't doing redundant work. Your DB only changes the aggregate value when it needs to be changed... when new data comes in. You aren't making the assumption that it might be different on every read
[17:41:12] <thebigredgeek> valleyman234 make sense?
[17:42:24] <valleyman234> I understand the application of aggregates if we're concerned with creating reports, say, but I'm not sure how aggregating allows us to query for "all Voters with Tag A"
[17:43:02] <valleyman234> This operation is performed on-demand by the client. It's not an operation that will be exactly repeated often, so memoization doesn't seem useful
[17:50:39] <thebigredgeek> @cheeser um, no not so much haha. That's pretty standard with microservices... Not necessarily 1 DB per collection... but you don't run all of your data on the same database haha
[17:50:57] <thebigredgeek> like... voters + voter tags + other shit around voting
[17:51:04] <thebigredgeek> Anyway, not really up for an argument
[21:20:34] <DarkCthulhu> In a replicaset, what happens to writes between the time when a master goes down and the new master is elected? The writes are being sent to the old defunct primary and nothing is being written to its oplog. So..?
[21:21:35] <AvianFlu> DarkCthulhu, discarded but saved to a log for your manual analysis and re-entry
[21:21:44] <AvianFlu> but likely lost in the short term
[21:22:09] <AvianFlu> unless the primary is just totally dead in which case your client likely just errors
[21:22:17] <DarkCthulhu> I see.. saved to a log by which component though?
[21:22:17] <AvianFlu> and maybe retries or maybe doesn't, depending on the client
[21:22:28] <AvianFlu> there's docs on this, I haven't seen them in a while, hang on
[21:22:32] <DarkCthulhu> Ah.. so it's upon the client to ensure that it got written.
[21:22:59] <cheeser> writes *during* an election won't happen because there is no primary
[21:23:11] <DarkCthulhu> The client has to wait for confirmation in general right? It would be considered a successful write only if a majority of the oplogs contain that entry? Is that correct?
[21:23:25] <cheeser> DarkCthulhu: that varies depending on the WriteConcern
[21:24:01] <DarkCthulhu> cheeser, When would anyone use anything other than majority?
[21:24:56] <DarkCthulhu> I guess one could get away with just writing a specific number of instances. Hmm.. not sure what happens then in case of network partitions and such when the overall state may become inconsistent.
[21:24:56] <AvianFlu> this is the doc page I was thinking of https://docs.mongodb.com/manual/core/replica-set-rollbacks/#replica-set-rollbacks
[21:25:17] <cheeser> majority introduces some throughput penalties because you have to wait for the secondaries to acknowledge
[21:26:00] <cheeser> if you have a flaky network, 'majority' might be useful. but if your replset is on a stable network 'acknowledged' is usually sufficient.
[21:26:44] <copumpkin> another index creation question: if I'm going to be creating multiple indexes that require a full table scan, can I batch up the index creation process somehow?
[21:27:44] <DarkCthulhu> cheeser, "acknowledged" here means that the master has written the event to its oplog?
[21:28:11] <copumpkin> hah, next time I start over I'll create the indices before adding any data then :)
[21:28:13] <cheeser> it means the *primary* has received it.
[21:29:43] <DarkCthulhu> cheeser, In the situtation described by the link that AvianFlu mentioned above, if a master recieved a write, and then dropped out before the secondaries could replicate it, the master would rollback those writes when it rejoined. So, in such a situation, wouldn't acknowledged lead to loss in data?
[21:32:38] <cheeser> DarkCthulhu: in that situation, yes, possibly you'd lose data.
[21:33:17] <DarkCthulhu> cheeser, Hmm.. so, why is using "w:1" as writeconcern even an option? It sounds dangerous to me if it loses data during failover.
[21:33:30] <copumpkin> (I think topic is out of date, since 3.2.8 is out)
[21:34:25] <DarkCthulhu> Hmm.. I guess if failovers were rare enough, it would be a much smaller penality to pick up the fallen primary's logs and replay them using a different system.
[21:35:07] <cheeser> failovers are rare enough. usually replication stays in data center where things are stable.
[21:35:20] <cheeser> if you're replicating to a wider a cluster, you'll likely want a different WC
[21:36:05] <DarkCthulhu> Alright! Thanks cheeser and AvianFlu :) That made a lot of things clearer