PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Monday the 8th of August, 2016

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:12:00] <Trinity> managed to get it working
[07:06:59] <vista> hello, I have a bit of a problem with mongodb - sometimes it eats too much RAM
[07:07:03] <vista> too much RAM, meaning all of it
[07:07:14] <vista> and the kernel has no choice but to kill it
[07:54:04] <truthadjustr> hi, what is the correct to use for java? mongodb-driver* versus mongo-java-driver ?
[09:21:34] <ultrav1olet> The official mongo debs do not contain /etc/init.d/mongo start script
[09:21:47] <ultrav1olet> where can I found one?
[09:33:46] <Derick> ultrav1olet: Mine contains a "/etc/init.d/mongod" file.
[09:34:24] <Derick> ultrav1olet: which .deb file do you have exactly?
[09:36:05] <ultrav1olet> Derick: I found it as well, sorry :-)
[09:36:17] <Derick> ok :)
[09:36:21] <ultrav1olet> I'm just used to /etc/init.d/mongodb
[09:36:44] <Derick> i think it changed between 2.6 and 3.0 (or 2.4 and 2.6)
[09:36:46] <ultrav1olet> Seems like you created a proper upstart script
[09:36:55] <ultrav1olet> Which is nice
[09:37:34] <ultrav1olet> Derick: the other day I asked you a question but you had left earlier so I'm asking it again
[09:37:53] <ultrav1olet> Actually I'd love to talk about one problem we faced with mongo 3.2.5 (all 3.x.x releases are affected): after we switched to the WiredTiger engine with compression (and without compression as well) indexing of one of our fields started to take over 2,5 hours instead of 2 minutes using mmapv1
[09:38:02] <ultrav1olet> at the same time all reads and writes were severely affected - more likely dead as each query never completed during indexing
[09:38:12] <ultrav1olet> Is this a known bug?
[09:38:13] <Derick> hmm, I wouldn't know what that is.
[09:38:18] <ultrav1olet> I don't know how to file this bug because our data and workload are confidential
[09:39:03] <ultrav1olet> We are now running our DB on top of SSD disks and we disabled indexing altogether - so everything is OK
[09:39:07] <Derick> hm, I don't know what either :-/
[09:39:29] <Derick> but disabling indexing makes you queries lots slower surely?
[09:54:23] <ultrav1olet> Not really
[09:54:39] <ultrav1olet> I just wonder why Mongo more or less dies when it starts indexing data
[09:55:11] <Derick> yeah, it certainly shouldn't
[09:55:25] <Derick> I mean, it will have some impact
[11:14:33] <ultrav1olet> Derick: I've got a feeling that WiredTiger is optimized for SSD disks and nothing else.
[11:14:43] <ultrav1olet> Is it true?
[11:32:39] <Derick> ultrav1olet: it's been optimised for it, but that doesn't mean it shouldn't work with spnning disks
[11:33:13] <ultrav1olet> Derick: should we file a bug report in regard to our use case?
[11:33:18] <Derick> yes
[11:33:25] <ultrav1olet> We'll do, thanks
[12:08:42] <alexi5> i am really impressed with wired tiger compression and performance
[12:17:40] <alexi5> is there a way to let mongodb sync data to disk much more regularly. eg change sync interval from 60 seconds to 30 seconds with WiredTiger engine ?
[12:36:20] <alexi5> anyone here ?
[12:36:55] <StephenLynx> no one but us trees
[12:37:42] <alexi5> hmmm
[12:37:51] <alexi5> ok
[12:38:11] <alexi5> is there a way to let mongodb sync data to disk much more regularly. eg change sync interval from 60 seconds to 30 seconds with WiredTiger engine ?
[12:41:27] <alexi5> StephenLynx: are specific type of questions allowed in this channel ?
[12:43:02] <StephenLynx> of course.
[12:43:30] <StephenLynx> anything related to any part of mongo or its official drivers are welcome.
[12:43:53] <StephenLynx> if the topic is outside of the scope, it isn't exactly disallowed, we will just let you know that probably no one here knows it.
[12:44:09] <StephenLynx> is welcome*
[12:44:44] <StephenLynx> id answer you, but I know dicks about advanced management features of mongo.
[12:45:46] <alexi5> ok
[13:10:22] <contrapumpkin> is there a way to either query the mongodb indices directly or to give mongo hints about how I want it to handle my queries? judging by the result of .explain() it's opting to do a full scan in a case where it seems like it could answer the query in a fairly trivial manner. If I could query the index directly I could get what I wanted.
[13:22:31] <StephenLynx> yes.
[13:22:40] <StephenLynx> you can do both
[13:22:46] <StephenLynx> I forgot the syntax, but its possible.
[13:29:27] <contrapumpkin> StephenLynx: thanks! any pointers? I googled "direct index access" and a few other things I could think of, but was unable to find anything that seemed to do what I wanted
[13:29:42] <contrapumpkin> I see the $hint thing now
[13:30:59] <StephenLynx> db.collection.indexes() maybe?
[13:31:13] <StephenLynx> getIndexes*
[14:23:21] <Diplomat> Hey guys, would it be possible to do something like this with MongoDB: SELECT SUM(amount), client FROM table WHERE date >= 'date-here' AND date <= 'date-here' GROUP BY client
[14:25:39] <Diplomat> It seems its possible with aggregate method
[14:42:11] <StephenLynx> yes. it is.
[14:42:33] <StephenLynx> just keep in mind that sometimes you rather pre-aggregate data instead of using frequently complex queries.
[14:43:20] <StephenLynx> naturally, if you don't have to perform said query too often or it doesn't impact performance, you might as well don't pre-aggregate for the sake of simplicity.
[14:47:47] <Diplomat> Unfortunately I have too much data for PHP to handle.. which is why I thought about it
[14:48:02] <Diplomat> I wouldnt want to increase PHP memory usage to 512M because of one action only
[14:49:01] <Diplomat> Basically I'd run that query only few times every month
[15:02:51] <Diplomat> It's really confusing to build those queries actually :P
[15:03:06] <Diplomat> I was able to group and sum everything right, but I can't manage to get a specific timeframe
[15:04:23] <contrapumpkin> StephenLynx: getIndexes tells me the indexes, but doesn't let me query them directly. Perhaps that isn't exposed?
[15:04:53] <StephenLynx> ah
[15:05:01] <StephenLynx> then you will have to use hint.
[15:07:17] <contrapumpkin> okay, cool
[15:07:22] <contrapumpkin> thanks!
[15:11:28] <contrapumpkin> I'm looking at this answer: http://stackoverflow.com/a/13585423
[15:11:52] <contrapumpkin> does the 1024 byte restriction apply to only the first index or both of them?
[15:42:41] <Diplomat> Hey guys, any ideas whats wrong with my query? http://paste.ofcode.org/uA3fTbzsZfAmyeGPx5FKUe . I'd like to get the amount of earnings, grouped by ownerid from date 1 to date 2
[16:30:17] <contrapumpkin> does mongo use an existing index to help itself make a new index? I'm creating a new index using a partialFilterExpression and it appears to be scanning the entire collection, despite the filter expression being fully satisfiable using the existing index
[16:30:30] <Derick> no, I don't think it does
[16:30:46] <contrapumpkin> ah :/
[16:31:03] <contrapumpkin> oh well, I'll just wait a day for my index to create then :)
[16:32:16] <StephenLynx> kek
[17:14:11] <valleyman234> I've got a Voters collection & a Tags collection. These collections are related via a VotersTags collection. What's the best way for me to query for "all Voters with Tag A?" The only way that comes to mind is to retrieve an array of Voter IDs from the VotersTags collection, then query Voters with this array of IDs and $in. I'm worried about the sca
[17:14:11] <valleyman234> le of this operation, however, as the number of IDs in this array could number in the millions.
[17:14:49] <valleyman234> Also, the Voters collection is "sacred," and can not be edited. I can not put the tags on the Voters directly.
[17:30:49] <valleyman234> Perhaps $lookup has some potential, here?
[17:33:12] <cheeser> sounds like it
[17:33:30] <StephenLynx> or you could pre-aggregate it.
[17:34:00] <cheeser> would still need to answer the original question, though.
[17:34:20] <StephenLynx> hm
[17:34:20] <cheeser> but definitely pre-agg would probably be best
[17:34:21] <StephenLynx> nvm
[17:34:27] <cheeser> updated nightly, say
[17:37:50] <valleyman234> I'm not familiar with pre-aggregation. Do you have a good article on this?
[17:39:28] <thebigredgeek> pre-aggregate just means that you parse an aggregate on write and memoize the value into a document
[17:39:42] <thebigredgeek> Like... you reduce the new value for every write and compound it into a document somewhere
[17:39:52] <thebigredgeek> This way you don't have to aggregate the value when you need to pull it
[17:39:55] <thebigredgeek> It's already computed
[17:40:01] <thebigredgeek> And recomputed with every write
[17:40:17] <thebigredgeek> This is sometimes more efficient... as it allows you to spread out the cost over N requests rather
[17:41:07] <thebigredgeek> That concentrating it on every read. It also ensures that you aren't doing redundant work. Your DB only changes the aggregate value when it needs to be changed... when new data comes in. You aren't making the assumption that it might be different on every read
[17:41:12] <thebigredgeek> valleyman234 make sense?
[17:42:24] <valleyman234> I understand the application of aggregates if we're concerned with creating reports, say, but I'm not sure how aggregating allows us to query for "all Voters with Tag A"
[17:43:02] <valleyman234> This operation is performed on-demand by the client. It's not an operation that will be exactly repeated often, so memoization doesn't seem useful
[17:43:09] <valleyman234> Unless I'm missing something
[17:43:09] <thebigredgeek> Right... so every time you add a tag to a voter you simply add a foreign key to tag a which is the voter's _id
[17:43:27] <valleyman234> Yes, that's the current role of the VotersTags collection
[17:43:37] <valleyman234> It relates entries in the Tags collection to entires in the Voters collection
[17:43:38] <thebigredgeek> hmm
[17:43:44] <thebigredgeek> might be best to not structure your data that way
[17:43:45] <valleyman234> s/entires/entries
[17:43:47] <thebigredgeek> That won't be performant
[17:43:52] <thebigredgeek> Mongo is best with denormalization
[17:43:57] <thebigredgeek> Rather than normalized collections
[17:44:01] <valleyman234> Yes, that's my fear. Unfortunately, the requirements of the project require that we do not touch the Voters collection
[17:44:08] <thebigredgeek> Uhh weird
[17:44:09] <thebigredgeek> Haha
[17:44:14] <valleyman234> Otherwise I'd just put the tag on the Voter
[17:44:17] <thebigredgeek> Can you touch the tags collection?
[17:44:20] <valleyman234> Yes
[17:44:27] <valleyman234> and VotersTags of course
[17:44:31] <thebigredgeek> So why not put an array of votes on the tag
[17:44:38] <thebigredgeek> that would be more performant
[17:44:41] <valleyman234> This array could number in the millions
[17:44:45] <valleyman234> Or more
[17:44:48] <thebigredgeek> Stick the voter foreign keys onto that
[17:44:51] <thebigredgeek> That's fine.
[17:44:55] <thebigredgeek> You can use query projection
[17:44:59] <thebigredgeek> To only grab the values you care about
[17:45:06] <thebigredgeek> Rather than pulling the entire tag
[17:45:13] <valleyman234> And Mongo's document size limit?
[17:45:18] <thebigredgeek> This is actually best practice IMO
[17:45:42] <thebigredgeek> In this case do you really imagine MILLIONS?
[17:45:45] <valleyman234> Actually, that still doesn't solve the problem. What I want, in the end, is an array of Voter documents.
[17:45:48] <valleyman234> Yes, absolutely millions.
[17:46:03] <valleyman234> Even if I put an array of IDs on the tags, I still have to take that array of IDs and query Voters.
[17:46:09] <thebigredgeek> Yes
[17:46:12] <thebigredgeek> But that is 2 requests
[17:46:13] <thebigredgeek> Rather than 3
[17:46:15] <thebigredgeek> :)
[17:46:20] <thebigredgeek> Which is 50% less work
[17:46:30] <valleyman234> Yeah, I'm mostly worried about the size of these arrays
[17:46:39] <thebigredgeek> Honestly
[17:46:42] <valleyman234> Performance is actually not much of a concern. It's okay if it takes a while.
[17:46:43] <thebigredgeek> If you are dealing with millions
[17:46:48] <thebigredgeek> You shouldn't be running a single DB server
[17:46:49] <thebigredgeek> Like
[17:46:53] <thebigredgeek> You should be working with microservices
[17:47:05] <thebigredgeek> And using something like gRPC for inter service communication
[17:47:13] <thebigredgeek> You should have a discrete mongo instance of voters
[17:47:17] <thebigredgeek> a single mongo instance for tags
[17:47:18] <thebigredgeek> etc
[17:47:50] <thebigredgeek> depending on the expected throughput anyway
[17:47:51] <thebigredgeek> :p
[17:47:59] <thebigredgeek> kk
[17:48:04] <cheeser> whut?
[17:48:12] <valleyman234> That sounds like a solution to scaling in the future, but I'm not sure that's right for now
[17:48:18] <valleyman234> for now, all I'm worried about is making this query work
[17:48:20] <cheeser> a different server instance for each collection?
[17:48:26] <valleyman234> despite the fact that it involves a huge array of Voter IDs
[17:48:33] <valleyman234> where huge == millions
[17:49:21] <thebigredgeek> @cheeser yup.. assuming that millions = crazy throughput
[17:49:28] <thebigredgeek> Otherwise your Db will be a huge bottle neck
[17:49:32] <thebigredgeek> At least that is what I have seen
[17:49:38] <valleyman234> We're not concerned with throughput at this stage
[17:49:52] <cheeser> thebigredgeek: um. no. that's terrible advice.
[17:50:39] <thebigredgeek> @cheeser um, no not so much haha. That's pretty standard with microservices... Not necessarily 1 DB per collection... but you don't run all of your data on the same database haha
[17:50:57] <thebigredgeek> like... voters + voter tags + other shit around voting
[17:51:04] <thebigredgeek> Anyway, not really up for an argument
[17:51:05] <thebigredgeek> :p
[17:51:44] <thebigredgeek> cheeser you usually run 1 database for N services?
[17:53:03] <cheeser> alright, then.
[18:01:52] <valleyman234> Got disconnected :\ Hope I didn't miss any wizard solutions
[21:03:22] <copumpkin> hmm, creating an index blocks all read queries?
[21:03:35] <copumpkin> I'm currently creating an index and all my other queries show waitingForLock: true
[21:03:53] <StephenLynx> are they on the same collection?
[21:03:58] <copumpkin> yeah
[21:07:44] <cheeser> the index creation will yield occasionally but yes it will block unless you tell it to build in the background.
[21:15:06] <copumpkin> ah
[21:15:17] <copumpkin> cheeser: and I can't demote an existing build to the background, right? :)
[21:15:22] <copumpkin> it seems to be charging along
[21:15:54] <Derick> it's a starting option only
[21:16:00] <copumpkin> oh well
[21:16:07] <copumpkin> I'll twiddle my thumbs for the next few hours
[21:16:11] <copumpkin> :)
[21:16:52] <cheeser> coffee run!
[21:20:34] <DarkCthulhu> In a replicaset, what happens to writes between the time when a master goes down and the new master is elected? The writes are being sent to the old defunct primary and nothing is being written to its oplog. So..?
[21:21:35] <AvianFlu> DarkCthulhu, discarded but saved to a log for your manual analysis and re-entry
[21:21:38] <AvianFlu> I believe is how that works
[21:21:44] <AvianFlu> but likely lost in the short term
[21:22:09] <AvianFlu> unless the primary is just totally dead in which case your client likely just errors
[21:22:17] <DarkCthulhu> I see.. saved to a log by which component though?
[21:22:17] <AvianFlu> and maybe retries or maybe doesn't, depending on the client
[21:22:28] <AvianFlu> there's docs on this, I haven't seen them in a while, hang on
[21:22:32] <DarkCthulhu> Ah.. so it's upon the client to ensure that it got written.
[21:22:59] <cheeser> writes *during* an election won't happen because there is no primary
[21:23:11] <DarkCthulhu> The client has to wait for confirmation in general right? It would be considered a successful write only if a majority of the oplogs contain that entry? Is that correct?
[21:23:25] <cheeser> DarkCthulhu: that varies depending on the WriteConcern
[21:24:01] <DarkCthulhu> cheeser, When would anyone use anything other than majority?
[21:24:36] <cheeser> acknowledged is the default
[21:24:56] <DarkCthulhu> I guess one could get away with just writing a specific number of instances. Hmm.. not sure what happens then in case of network partitions and such when the overall state may become inconsistent.
[21:24:56] <AvianFlu> this is the doc page I was thinking of https://docs.mongodb.com/manual/core/replica-set-rollbacks/#replica-set-rollbacks
[21:25:10] <DarkCthulhu> AvianFlu, reading..
[21:25:17] <cheeser> majority introduces some throughput penalties because you have to wait for the secondaries to acknowledge
[21:26:00] <cheeser> if you have a flaky network, 'majority' might be useful. but if your replset is on a stable network 'acknowledged' is usually sufficient.
[21:26:44] <copumpkin> another index creation question: if I'm going to be creating multiple indexes that require a full table scan, can I batch up the index creation process somehow?
[21:27:29] <cheeser> no
[21:27:44] <DarkCthulhu> cheeser, "acknowledged" here means that the master has written the event to its oplog?
[21:28:11] <copumpkin> hah, next time I start over I'll create the indices before adding any data then :)
[21:28:13] <cheeser> it means the *primary* has received it.
[21:29:43] <DarkCthulhu> cheeser, In the situtation described by the link that AvianFlu mentioned above, if a master recieved a write, and then dropped out before the secondaries could replicate it, the master would rollback those writes when it rejoined. So, in such a situation, wouldn't acknowledged lead to loss in data?
[21:31:58] <cheeser> s/master/primary/
[21:32:20] <cheeser> master/slave is both old terminology and different *tech*nology than a replica set
[21:32:24] <DarkCthulhu> Yes, primary, sorry.
[21:32:38] <cheeser> DarkCthulhu: in that situation, yes, possibly you'd lose data.
[21:33:17] <DarkCthulhu> cheeser, Hmm.. so, why is using "w:1" as writeconcern even an option? It sounds dangerous to me if it loses data during failover.
[21:33:30] <copumpkin> (I think topic is out of date, since 3.2.8 is out)
[21:34:25] <DarkCthulhu> Hmm.. I guess if failovers were rare enough, it would be a much smaller penality to pick up the fallen primary's logs and replay them using a different system.
[21:35:07] <cheeser> failovers are rare enough. usually replication stays in data center where things are stable.
[21:35:20] <cheeser> if you're replicating to a wider a cluster, you'll likely want a different WC
[21:36:05] <DarkCthulhu> Alright! Thanks cheeser and AvianFlu :) That made a lot of things clearer
[21:36:25] <cheeser> np