PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Wednesday the 24th of February, 2016

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:00:24] <cerjam> oh good god holy shit 2.4 vs 3.2
[00:00:25] <StephenLynx> it could also help if your bottleneck is RAM
[00:00:25] <cerjam> youre not kidding
[00:00:26] <cerjam> it is old
[00:00:36] <GothAlice> cerjam: :P So yeah, yours won't work under OpenVZ.
[00:00:40] <cerjam> in other words
[00:00:41] <cerjam> get to upgrading?
[00:00:42] <cerjam> lmfao
[00:02:37] <GothAlice> cerjam: Also note the last comments on the JIRA ticket I linked. You may be running an old RHEL-derived CentOS, that inherited the overall bug.
[00:03:03] <bros> StephenLynx: any negative effects of sharding early?
[00:03:09] <bros> I'd rather just get it over with.
[00:03:14] <StephenLynx> I don`t know.
[00:03:29] <StephenLynx> I never handled large datasets, big production servers or user sharding.
[00:03:39] <StephenLynx> used sharding*
[00:03:43] <bros> GothAlice: any input?
[00:03:53] <GothAlice> Sharding is the approach MongoDB uses to ensure high availability and data redundancy.
[00:04:01] <GothAlice> Like mirroring RAID.
[00:04:12] <bros> Gotcha. At what level should a MongoDB user start sharding?
[00:04:17] <cheeser> i wouldn't put redundancy in the sharding column.
[00:04:23] <GothAlice> Without it, there's only one copy of your data. You might periodically backup, but if it dies, there's downtime while you restore that backup.
[00:04:29] <StephenLynx> isnt that replica sets, though?
[00:04:35] <GothAlice> Ah, right, sorry.
[00:04:56] <bros> StephenLynx: what defines large data sets and big production servers?
[00:04:58] <cerjam> centos 6.7 host
[00:05:11] <GothAlice> You mix them, in a RAID 10-like arrangement for both data isolation between shards, and redundancy amongst the sharded replicas.
[00:05:25] <StephenLynx> my database is less than 100 mb :v
[00:05:29] <StephenLynx> because it has files on it
[00:05:37] <StephenLynx> alice got a 24tb one
[00:05:45] <StephenLynx> mine is minuscule
[00:05:51] <cheeser> 100mb is tiny
[00:05:54] <GothAlice> StephenLynx: Uh… it's been eating drives lately, it's up to 38 or so. ¬_¬
[00:05:59] <StephenLynx> kek
[00:06:18] <bros> https://gist.github.com/brandonros/8ed6b18b6d8a76c45e4a Where does that sit?
[00:07:53] <GothAlice> A couple of gigabytes?
[00:08:30] <bros> GothAlice: What size server would you put for that kind of DB (in terms of CPU cores/memory), and what kind of replication and/or sharding would you do?
[00:08:54] <cerjam> GothAlice: i'd like to thank you very much for your help
[00:09:13] <GothAlice> bros: https://gist.github.com/amcgregor/4fb7052ce3166e2612ab#how-much-ram-should-i-allocate-to-a-mongodb-server
[00:09:18] <cerjam> i've been banging my head against the wall for days saved me ass a bit here lol
[00:09:50] <GothAlice> cerjam: When in doubt, examine the logs carefully. :)
[00:09:59] <cerjam> carefully
[00:10:00] <cerjam> lmfao
[00:10:13] <bros> GothAlice: how can I see how many page faults?
[00:11:36] <GothAlice> ps -o min_flt,maj_flt 1
[00:11:42] <GothAlice> Where 1 is the process ID you are interested in.
[00:12:02] <bros> MINFL MAJFL
[00:12:03] <bros> 1470839829 49850
[00:12:24] <GothAlice> Major faults require disk access. Minor faults are page allocations, which may require page defragmentation if you have transparent huge page support enabled, see the wiki link I just linked. :)
[00:12:41] <GothAlice> How long has that been running?
[00:12:47] <bros> 44 days
[00:12:49] <GothAlice> :o
[00:13:34] <bros> What should I do?
[00:18:25] <bros> I take it those are high?
[00:18:52] <GothAlice> I have an uptime of 352 days, MIN=264464523 MAJ=4490680 or ~8.69 MIN/sec and 0.14 MAJ/sec for one city's web presence, vs. your 386/0.01. This would indicate to me that you have a very heavy query load vs. dataset size compared to the city. ;) You would need several nodes, but small ones, preferably on physically distinct hardware.
[00:19:32] <bros> do 4 small shards?
[00:19:51] <bros> Could it be that my indices blow?
[00:19:52] <GothAlice> No, what types of queries do you run, and is lag/latency an issue on a majority of your queries? Additionally, does your data have some natural method of grouping like-documents within collections?
[00:20:36] <bros> GothAlice: It's a SaaS platform. Few aggregations based on past sales data, mainly just storing data into subdocuments
[00:20:57] <bros> The longest query I think is 850ms but it's generally over a lot of data.
[00:21:14] <bros> I didn't understand the last bit about grouping
[00:21:38] <GothAlice> And yes, indexes are certainly a factor in query performance and load, but your dataset fitting in RAM somewhat offsets that. Certainly look into it, though. I highly recommend http://mongolab.org/dex/ to analyze your real queries to see how they might be improved.
[00:22:29] <GothAlice> Can't type fast enough tonight. XP
[00:22:36] <bros> What's the first thing I should do? Move it onto a dedicates server? Yes. What do I do about replication/sharding?
[00:22:40] <bros> I might just have you beat at 140wpm. :P
[00:25:41] <GothAlice> Optimization without measurement is by definition premature. Yes, however, DBs should be isolated from other services.
[00:25:57] <GothAlice> https://docs.mongodb.org/manual/administration/optimization/
[00:26:31] <GothAlice> Dex uses the profiler results to automatically catalog the most impactful queries to suggest indexes automatically.
[00:28:41] <DrPeeper> hello?
[00:28:46] <DrPeeper> is it me you're looking for
[00:28:48] <cheeser> whoa. i saw Dex and had a flashback to my days living in Denver
[00:28:49] <DrPeeper> oops wrong window :)
[00:29:25] <GothAlice> bros: Choosing sharding has an impact on other optimizations, such as the use of "covered queries" (mentioned in that documentation). If your data is write-heavy (insert/update), sharding with a "sharding key" chosen to distribute those writes amongst nodes is an effective strategy. If it's find-heavy and a little replication lag is acceptable, one can reduce network traffic on the primary by querying secondaries in a replica set.
[00:30:15] <DrPeeper> ugh. I am forced to chuckle when I read "sharding"
[00:30:31] <bros> GothAlice: so run dex for a day?
[00:31:26] <GothAlice> bros: Well, in ordinary operation I believe it watches and periodically emits what it thinks might be helpful indexes. It's a continuous process of refinement. Run as long as you need, during a period of time of typical usage, until you have indexes optimizing the queries noted.
[00:31:36] <bros> I can't find my mongodb.log?
[00:33:27] <GothAlice> cheeser: I liked Denver. I liked Boulder and Crested Butte better, though. Better biking. ;P
[00:35:25] <GothAlice> bros: Hmm. An important note: beware that creating indexes without specifying background creation will halt all other operations. So… be careful of that. (And background indexes can take a very, very long time to build, depending on load.)
[00:35:49] <bros> GothAlice: I'm running slow queries and dex isn't picking anything up. Yes, I set profiling to 1.
[00:38:14] <GothAlice> That was a bit quick. What's a typical mongostat result?
[00:43:15] <GothAlice> Also, what did you set "slowMs" to? It needs to be below the threshold for the queries you're finding slow.
[00:55:07] <bros> GothAlice: It says it is 100ms
[00:56:12] <bros> GothAlice: mongostat is super low. like, <2 queries a second
[00:56:28] <GothAlice> Yeah, I'd give it some time to populate.
[00:57:41] <bros> GothAlice: https://gist.github.com/brandonros/aee2716691e8d6b24573 I made these indexes a while ago. See anything that looks strange?
[00:57:57] <bros> Is it common to have more indexes than that?
[00:58:10] <kalx> GothAlice: Sorry for the delay, got pulled into something else. Appreciate the help, don't want to take up too much of your time though so maybe you can just peek at the explain and see if anything stands out. (I'm fairly new to mongo but not DBs in general)
[00:58:22] <kalx> GothAlice: https://gist.github.com/anonymous/ae93002f1b5d340a614b
[00:59:19] <kalx> GothAlice: This was re the 100% cpu case, all user cpu time.
[01:00:04] <Doyle> Hey. I'm trying start up 3.2 with this config, but the service fails to start without any useful logs. http://pastebin.com/JeByEev8
[01:01:17] <cheeser> do those paths exist and are the user writable?
[01:01:38] <Doyle> they do exist
[01:02:02] <Doyle> they're owned by mongod
[01:02:17] <cheeser> how are you starting it?
[01:02:19] <Doyle> the pid directory was created, and is owned by mongod
[01:02:34] <Doyle> systemctl start mongod and service mongod start both fail
[01:03:10] <cheeser> run: mongod --config <path to file>
[01:05:09] <Doyle> oh, better output. Expected boolean switch but found string: /mongodb/indexes for option: storage.wiredTiger.engineConfig.directoryForIndexes
[01:05:32] <kalx> GothAlice: Essentially, an index allows access to the records we're targeting already pre-sorted. However, in order to find a record with the proper "tags", a scan must occur. These is just bitfield comparisons so should still be pretty inexpensive.
[01:06:07] <GothAlice> Hmm, I've never actually investigated how bitfields interact with indexes.
[01:07:13] <Doyle> Hmm... Directory for indexes...
[01:07:25] <bros> GothAlice: db.orders.stats(): totalIndexSize: 108054016 Is this bad? It seems bad.
[01:07:41] <Doyle> can be true or false. Seems I have to simlink that dir rather than specify it
[01:07:41] <GothAlice> 103MB of indexes isn't bad.
[01:07:55] <Doyle> I wish I had 103MB of indexes :P
[01:08:08] <Doyle> breaching 100G some places
[01:08:48] <bros> GothAlice: so it isn't huge transparent pages, it isn't slow queries, it isn't indexes not in RAM
[01:08:54] <bros> mongostats is quiet...
[01:08:58] <cerjam> 100G?! and i thoght my project was a pain
[01:09:13] <GothAlice> cerjam: My indexes have no hope of fitting in RAM. T_T
[01:09:15] <Doyle> bros, what does iostat -xmt 1 indicate as disk utilization ?
[01:10:11] <cerjam> ive just been sitting in here reading the conversation
[01:10:15] <cerjam> and feeling like a complete idiot
[01:10:21] <cerjam> its so far over my head lol
[01:10:23] <bros> Doyle: https://gist.github.com/brandonros/74a016f39e5b4351de86
[01:10:27] <uuanton> anyone use munin on mms to track disk space ?
[01:10:43] <cerjam> anyway, i appreciate the help GothAlice ima hit it i hope to not see yall again
[01:10:43] <bros> Doyle: GothAlice and I are trying to figure out why I have 308 page faults a second or something.
[01:10:47] <kalx> GothAlice: I tried an array approach too -> array of the tagids, and then the nasty looking: "tags: { $not: {$elemMatch: {$nin: some_input_tag_array}}}. Same result. I've also tried eliminating any fs by using a ramdisk, same result (and presumably the entire collection is in ram anyhow)
[01:10:49] <uuanton> i use nagios but it looks like overkill
[01:11:20] <GothAlice> Woah, what? $not: $elemMatch $nin…
[01:11:28] <GothAlice> 'wot'
[01:12:07] <kalx> yeah too many negatives
[01:12:12] <bros> Doyle: trying to figure out what kind of hardware I should roll out. I've got my app servers and database intertwined at the moment
[01:12:20] <GothAlice> Also, no need for $elemMatch.
[01:12:24] <kalx> but only way to do such a thing without a $where and custom js
[01:12:50] <kalx> That case -> that needs to check if the query input array contains all the tags that the tags field array does
[01:13:07] <kalx> So $all in the other direction
[01:14:01] <bros> GothAlice: do you think the page faults are not from mongo but possibly node?
[01:14:20] <GothAlice> bros: Did you target the mongod process?
[01:14:27] <bros> Ah, good point. I did.
[01:14:56] <Doyle> bros, faults. Hmm. If your dataset doesn't fit in ram, that's normal
[01:15:30] <Doyle> bros, ideally the working set fits in ram, indexes and data. Next best is indexes fitting in ram. at this point faults will be normal
[01:16:23] <bros> Doyle: I've got 8GB memory and 4GB swap. Mongo is eating up 12%, and my entire db is about 2-3GB.
[01:17:32] <Doyle> On a dedicated server that should all be in memory, no problem
[01:18:44] <bros> I've got it mixed in with 20 node processes at the moment
[01:18:50] <bros> and redis, and nginx, and a wordpress blog
[01:18:53] <Doyle> free -m?
[01:19:01] <bros> -/+ buffers/cache: 3304 4681
[01:19:20] <GothAlice> I always recommend replication for availability reasons.
[01:19:27] <GothAlice> I'd recommend that as a step 1.
[01:20:00] <Doyle> I think it can go out to 50 secondaries now... it's crazy
[01:20:08] <cheeser> yep
[01:20:13] <bros> What's a good starting number?
[01:20:15] <GothAlice> So the same recommendation from earlier: three nodes, it's okay if they're VMs, on physically discrete underlying hardware (to isolate host failures to impacting only one node).
[01:20:28] <bros> GothAlice: I use cloud hosting. Not a problem?
[01:20:28] <Doyle> bros, if you want to be cheap, one primary, one secondary, and an arbiter
[01:20:36] <bros> Doyle: any sharding or no?
[01:20:46] <Doyle> if your app is ready heavy, add more secondaries
[01:21:02] <GothAlice> bros: No sharding until you can get a clean measurement.
[01:21:03] <bros> How many CPUs per node here?
[01:21:06] <bros> 1, 2, 4, 8?
[01:21:08] <Doyle> For a few G, doubt you'd need it
[01:21:11] <GothAlice> Optimization without measurement is by definition premature.
[01:21:20] <bros> GothAlice: what else can I measure?
[01:22:02] <Doyle> iops, throughput, network utilization, cpu utilization, memory
[01:22:50] <GothAlice> Well, see how query performance is affected when the DB is in isolation.
[01:23:08] <bros> Apparently query performance is a non-issue now.
[01:23:14] <GothAlice> See if utilization is still an issue, and clearly identify the cause. The solution depends on the actual problem.
[01:23:14] <bros> I have no queries above 100ms
[01:23:30] <Doyle> queries should be pretty fast for your data volume
[01:23:37] <Doyle> aggregations are what hurt
[01:23:46] <GothAlice> Still: isolation. There's no real way of knowing if any of the ~30 other things on there were impacting it.
[01:23:55] <Doyle> +1
[01:24:00] <bros> So start with just one server. How many cores?
[01:24:04] <bros> Cloud hosted, mind you
[01:24:34] <Doyle> I'd start with 2 for that data size
[01:24:49] <GothAlice> The arbiter can be run on the same node as your applications.
[01:24:57] <GothAlice> It's only job is to watch and vote occasionally. ;P
[01:25:08] <bros> Doyle: would you still roll out replication for this data size?
[01:25:19] <Doyle> Exactly. You can run multiple arbiters on a single node also, just on diff ports to safe cash
[01:25:35] <Doyle> bros, if your app has to be alive near 100% of the time, yes
[01:25:45] <Doyle> replication for availability
[01:26:02] <Doyle> if the primary dies, the secondary elects and your app can still write
[01:26:16] <bros> I'm going to hold off on replication. How many boxes do I need to spin up?
[01:27:48] <Doyle> 1. Get mongo running (tune your ulimits, disable THP, adjust TCP keepalive, adjust zone reclaim, and set RA to 32), do a mongodump, then import to your new host
[01:28:13] <kalx> @GothAlice: Anyhow, thanks for taking a peek. I will investigate a little bit more but I may just throw this out in favor of a different solution. Mongo doesn't seem to be benefiting me much for this use-case. Haven't been able to identify anything I'm doing wrong (but still.. there must be, I can't imagine Mongo's performance being that bad...)
[01:28:41] <Doyle> bros, use XFS for your data volume
[01:29:17] <Doyle> if you want to go l33t, cut a separate disk for your journal (& indexes if using WT)
[01:29:53] <bros> Doyle: any guide/link for the ulimits?
[01:33:12] <Doyle> the docs have a large page for production tuning
[01:33:51] <bros> Doyle: Ok. How many boxes?
[01:34:13] <Doyle> If you're not doing replication, just 1
[01:34:30] <Doyle> you can turn up replication later if necessary
[01:37:01] <bros> Thank you so much for all of your help. I can't thank you enough.
[01:39:28] <Doyle> Have fun bros
[01:39:38] <bros> Is it the end of the world if it is ext4?
[01:39:48] <Doyle> ext4 has mutex issues
[01:39:58] <Doyle> concurrency with writes
[01:45:09] <Doyle> I'm about to being playing with WT... So psyched. I'm really hoping to see the kind of perf benefits it's being hailed for
[02:01:55] <bros> WiredTiger?
[02:02:31] <bros> MINFL MAJFL
[02:02:32] <bros> 605930 12270
[02:02:39] <bros> Sorry, my fault.
[02:03:00] <bros> GothAlice: I already have that many page faults, and growing.
[02:03:36] <GothAlice> Initially it will. It needs to load everything back off disk.
[02:03:49] <bros> What rate should hard faults grow at?
[02:04:05] <GothAlice> They'll grow until everything is loaded, generally.
[02:05:10] <GothAlice> Then shrink down to normal levels as required by inserts/updates/deletes.
[02:05:46] <bros> Ok. So seeing 1-2 every few seconds isn't a problem? Everything seems loaded.
[02:07:07] <GothAlice> Yeah, that's fine. Writes need to hit disk, right?
[02:09:12] <cerjam> hey i just realized something
[02:09:35] <cerjam> whats the best phpmyadmin/adminer -ish software for handling mongodb?
[02:09:56] <cerjam> adminers support of its pretty awful
[02:10:14] <bros> cerjam: robomongo but it isn't web based
[02:11:29] <cerjam> ill take a gander at it but web based is just so much more convienent
[02:12:03] <GothAlice> Generally you don't want your MongoDB exposed to the web.
[02:12:33] <GothAlice> PMA-alikes make for juicy targets, too.
[02:12:51] <cerjam> true
[02:13:56] <cerjam> its to the point im just about to write my own
[02:13:59] <cerjam> if i had the damned time
[02:15:20] <GothAlice> At work we have been. It's… harder than it seems.
[02:15:44] <GothAlice> Web-based editing of aggregate pipelines is rough.
[02:39:46] <nyexpress> hi
[02:40:44] <nyexpress> I have records collected from every minute, stored with a unix timestamp field (not mongodate), how can I retrieve the average value from values in the record on an hourly basis?
[02:52:59] <GothAlice> nyexpress: Aggregate query, $group stage with an _id: {$mod: ["$dateFieldName", {$literal: 60*60}]}
[02:53:46] <nyexpress> thanks so much
[02:53:47] <GothAlice> Wait, in the record?
[02:54:33] <GothAlice> And there actually needs more math on that to work. Uno momento.
[02:54:34] <nyexpress> {"date":linux_timestamp, "cpuUsage": 10} <-- I have these records for each minute of every day
[02:54:43] <GothAlice> Eek.
[02:55:11] <nyexpress> plus an identifier, the server IP
[02:56:26] <GothAlice> {$sub: ["$dateFieldName", {$mod: ["$dateFieldName", {$literal: 60*60}]}]}
[02:57:41] <nyexpress> $sub is subtrack>
[02:57:43] <GothAlice> $subtract, rather. That should do; the _id field in each per-hour average document in the aggregate result will be "snapped" to the hour it belongs to.
[02:58:13] <nyexpress> should it be $avg instead of $sub to get the average value for the 60 values in the hour?
[02:58:56] <GothAlice> $group will average all of the documents that end up having the same _id, i.e. hour logged, as a unix timestamp.
[02:59:38] <nyexpress> gotcha
[02:59:42] <nyexpress> thank you SO much
[02:59:57] <GothAlice> So, the _id should be that $sub thing, and yes, you'll need to tell it which fields to average, and where to save them as part of the $group project.
[03:00:05] <GothAlice> https://docs.mongodb.org/manual/reference/operator/aggregation/group/#pipe._S_group
[03:00:10] <GothAlice> However!
[03:01:25] <GothAlice> There are more efficient ways to store this information.
[03:01:40] <nyexpress> how's that
[03:02:24] <GothAlice> Right now, you want an hourly sample size, but have individual per-minute events. This means MongoDB needs to perform a lot of work each time you want to pull a given time range, scaled to a large multiple of the number of hours covered.
[03:03:20] <nyexpress> yeah I was planning on reducing the set even further if the time range is larger than a month, year, etc
[03:03:38] <GothAlice> Using pre-aggregation, you can save a "view" that automatically updates its averages as it goes. So, you might keep a capped collection of limited size around storing the last 24 hours of individual events, but when .insert()'ing the log entry, also .update() a collection for the view for the scaled out timeframe with upsert=True.
[03:04:14] <GothAlice> https://gist.github.com/amcgregor/1ca13e5a74b2ac318017#file-eek-py-L30 < like this in Python
[03:05:03] <GothAlice> https://gist.github.com/amcgregor/2032211 < this is what my per-minute logging records can grow to. :3
[03:05:38] <nyexpress> woah
[03:05:51] <nyexpress> I see what you're saying though, to keep a running total
[03:05:56] <GothAlice> Though due to WiredTiger compression, key shortening isn't needed any more and it's much more readable. XD
[03:06:11] <nyexpress> :D
[03:07:00] <nyexpress> tricky stuff but interesting, thank you so much
[03:07:29] <GothAlice> There was a great writeup here: http://www.devsmash.com/blog/mongodb-ad-hoc-analytics-aggregation-framework
[03:07:35] <GothAlice> But they seem to be having intermittent hosting issues.
[03:07:50] <GothAlice> And their robots.txt rules forbid Archive.org preservation.
[03:08:03] <Doyle> 502!!!! BOOOO
[03:08:12] <nyexpress> bookmarked tho
[03:08:40] <GothAlice> Once I can snag a copy if it ever corrects itself, I'll make my own writeup.
[03:08:41] <Doyle> http://webcache.googleusercontent.com/search?q=cache:BTrp0K_TZcEJ:devsmash.com/blog/mongodb-ad-hoc-analytics-aggregation-framework+&cd=1&hl=en&ct=clnk&gl=ca&client=ubuntu
[03:08:42] <GothAlice> ¬_¬
[03:09:14] <GothAlice> Oh, Google, ignoring robots.txt. XD All the pretty charts are missing, though. T_T
[03:09:27] <Doyle> lol
[03:09:38] <Doyle> be sure to let me know if you do that writeup
[03:09:44] <GothAlice> Now with 100% more MMAPv1 vs. WT.
[03:09:54] <Doyle> I think WT has it in the bag
[03:10:20] <GothAlice> https://medium.com/@amcgregor < it'll go up here when done. :P
[03:10:24] <Doyle> The ability to abstract your index to a fast SSD, or a piops volume is value
[03:10:46] <Doyle> Bookmarked
[03:25:07] <Doyle> GothAlice, in your mongo faq, you mention that the thread attempting the read gets suspended during page fault. At what point are faults excessive? 1000s per second?
[03:25:54] <GothAlice> Depends on the hardware; when you start getting backlog it's bad. I.e. time spent in io-wait.
[03:26:14] <Doyle> queued r/w?
[03:26:30] <GothAlice> Aye.
[03:26:33] <GothAlice> I think.
[03:26:33] <Doyle> I've seen some of the older systems I worked with at 200% iowait :P
[03:26:46] <Doyle> I built better ones. Now they're 24% on a bad day
[03:27:13] <Doyle> Do you work much in AWS?
[03:27:28] <GothAlice> I switched off their services after they died for a month.
[03:28:02] <GothAlice> Frozen EBS volumes, cross-zone failure (despite SLA with guarantees against cascade failures), and snapshots locked restoring.
[03:28:06] <Doyle> Their regions are just collections of acquired DC's. It's not surprising they have some issues at times
[03:28:21] <Doyle> gross
[03:28:51] <Doyle> In cloud environments, what's your approach for getting good performance from your instances? striped LVM?
[03:28:57] <GothAlice> Quite the nightmare. I had to restore data at the time by reverse engineering the InnoDB on-disk format. ¬_¬ Took the 36 hours prior to wisdom tooth extraction. The reverse engineering without sleep was the more painful of the two. XP
[03:29:18] <Doyle> lol
[03:29:39] <GothAlice> But yeah, striped mirrored.
[03:29:43] <GothAlice> Gotta have my redundancy.
[03:29:54] <nils_> GothAlice: Is that the thing where you have to build a struct from the frm file?
[03:30:14] <GothAlice> Yeah, except when you say directory-per-db with InnoDB, it lies.
[03:30:31] <GothAlice> It stores certain critical bits of information in a shared pool, a directory up.
[03:30:51] <GothAlice> Which were in a locked EBS volume (vs. the client home folders, which were accessible),
[03:31:14] <nils_> referring to the data dictionary I suppose
[03:31:54] <GothAlice> Much pattern hex editor goodness. :P
[03:34:45] <GothAlice> On the performance front, for MongoDB I also shard and replicate, with each node on distinct hardware (mostly… I have a complex VM setup with three physical servers… but each primary is on one box) with iSCSI RAID-ish. (Drobo 8-something-i.) One RAID per VM host. Durable to one server failure, or a lot of drives failing. XP
[03:35:46] <Doyle> Nice. What's your storage hardware? NetApp? Nimble?
[03:36:10] <GothAlice> Dell 1Us running Gentoo. :P
[03:36:24] <Doyle> Love it
[03:36:26] <GothAlice> With MongoDB GridFS for "storage".
[03:36:48] <Doyle> What kind of IOPS do you get from it?
[03:36:48] <nils_> probably saves a bundle ;)
[03:37:35] <GothAlice> nils_: Paid for itself in ~7 months at initial purchase vs. cloud hosting this stupid amount of data. Compose.io would cost me just over half a million dollars a month for this dataset. :/
[03:38:02] <nils_> yeah I'm very much a sceptic when it comes to cloud hosting
[03:38:11] <Doyle> EBS billing makes me cry
[03:38:33] <nils_> especially if you're already dealing with devs that want to throw hardware at any problem, if they can do it at the click of a button...
[03:38:42] <GothAlice> Doyle: I haven't actually measured. It's a very write-heavy load.
[03:39:47] <nils_> although I remember quite a few cases where I could have probably spent a fortune on a proprietary solution instead of haggling over components for something homegrown
[03:39:47] <Doyle> optimize everything!!! Start with the code before looking to scale hardware. See if your aggregates can do a larger reduction earlier in the pipeline...
[03:40:29] <Doyle> What I love is that a Macbook will always out-perform a cloud instance, and that everyone asks about it
[03:41:22] <nils_> directly attached storage and no durable I/O will do that ;)
[03:41:39] <GothAlice> Doyle: There's never been more than 15 second replication lag on that cluster, and that's only like, one operation lagging that far intermittently. (Never did track that down either…) It's almost purely inserts, with some chatter over a few capped collections (also only ever inserts). Very unusual load. ;P
[03:42:09] <Doyle> Cool. How many inserts/s does mongostat indicate?
[03:42:30] <GothAlice> 12-28 or so.
[03:42:44] <GothAlice> I've only got one thing streaming on my network at the moment, and IRC chatter is light. :P
[03:43:42] <GothAlice> When I Duckduckgo things, though, there are larger bursts of traffic. Wikipedia browsing is deadly, sometimes rising to a few thousand/second.
[03:43:59] <Doyle> nice
[03:44:10] <GothAlice> It's fed by transparent SOCKS proxy. ;P
[03:44:21] <Doyle> Very cool
[03:53:09] <Doyle> GridFS is sweet. I hadn't looked at it before
[03:54:21] <cheeser> i'm building a Java FileSystemProvider to expose that as a FileSystem in the JVM.
[03:54:34] <cheeser> it's almost certainly a terrible idea but it's been fun to play with
[03:54:58] <GothAlice> cheeser: ^_^ HTTP via Nginx and a custom Python FUSE thingy are what I use.
[03:55:04] <GothAlice> Mine is terrible as the FUSE filesystem is infinite.
[03:56:02] <cheeser> GothAlice: funny. my next step was to expose it all via REST for reading and writing :D
[03:56:13] <GothAlice> (All content is tagged, with each sub-directory containing all possible valid tags as sub-directories, "files" matching those tags also contained therein. Recursively forever.
[03:56:22] <GothAlice> )
[04:01:20] <GothAlice> The reason for this: finding something turns into a z-shell short-hand tab completion. :3 "open mo/da/ag/tu/Manu<tab>" → "open mongodb/database/aggregate/tutorials/MongoDB Tutorials - MongoDB\ Manual.url"
[04:31:38] <cerjam> well thats unsettling
[04:33:11] <cerjam> made up a new openvz container, using mongodb3.2 and it quietly crashed again
[04:49:39] <arussel> is there an easy query for: any of the fields are undefined ?
[05:18:09] <kalx> arussel: Not sure there's a convenient way to check that besides being a bit more explicit for each field, checkout $exists and $or.
[05:18:21] <kalx> (but I'm not claiming to be a mongo expert either)
[05:48:29] <arussel> I know how to check for a field, I was just hoping not to recursively have to test for each field
[05:49:09] <kalx> it wouldn't really be recursive, you just mean avoiding a long $exists:false OR $exists:false OR blabla chain I assume?
[05:50:45] <arussel> oops,forgot to say that I don't know the documents fiels, so I have to use the shell and iterate over every fields
[05:56:14] <arussel> {$type:6} seems to still work on my mongo version, what are you supposed to use instead considering it is deprecated ?
[08:13:35] <Sagar> hello i am trying to run this software https://github.com/mrvautin/adminMongo . as per config i have updated app.json and set the ip to my server's ip but still the adminMongo is accessible via localhost only. please help
[08:16:25] <partycoder> Sagar,
[08:16:36] <partycoder> probably your mongo server is not listening on all network interfaces.
[08:17:00] <partycoder> what is the ip address you use for the listener in the mongo configuration
[08:20:14] <Sagar> i have commented out #bind
[08:20:18] <Sagar> so it's listening on all ports
[08:20:28] <Sagar> and ips i mean all ips of the server on the mongodb port
[08:32:56] <partycoder> if it's listening on the loopback interface (localhost or 127.0.0.1)
[08:33:13] <partycoder> you will probably get that behavior
[08:33:44] <partycoder> you might want to either listen on the interface you need to serve or
[08:33:52] <partycoder> just listen on all interfaces (0.0.0.0)
[08:34:02] <partycoder> Sagar.
[08:34:14] <Sagar> i will try that
[08:34:15] <Sagar> thanks :)
[08:51:45] <kurushiyama> commenting out the "listen" line in the config file triggers the default behavior of listening to all interfaces.
[08:53:26] <partycoder> he is gone
[09:05:50] <kurushiyama> Well, was an advice for both of you – makes life easier.
[10:13:48] <dunk> Hi. I was wondering - I know that you can use a Postgres Foreign Data Wrapper to pull mongo data into postgres, but can you do the reverse? (without writing a bunch of custom application code - I mean like a true foreign data wrapper)
[10:18:12] <Derick> I don't know actually...
[10:21:45] <dunk> I'd rather not write the syncing in application code. It'll be fragile, and likely wrong.
[10:38:19] <kurushiyama> dunk: Use a meesage driven model
[10:38:46] <kurushiyama> dunk: message comes in and gets persisted in both dbs, or route fails
[10:39:16] <dunk> kurushiyama: sounds like I'd be solving some nasty distributed systems problems myself right there
[10:40:06] <kurushiyama> dunk: There is a plethora of tools for that. Camel, FUSE and whatnot.
[10:43:57] <dunk> kireevco: will take a look
[11:19:08] <Sepho> Hi guys
[11:20:59] <Sepho> I'm creating a Schema for events. Each event is only visible by user who has created the event. I'm thinking about to create something like "createdBy" field, which stores de current user _id... Is this the best approach? In that case, what value type should I use on "createdBy"? A Number? Thanks in advance!
[11:37:32] <kurushiyama> Sepho: Imho, your approach is kind of wrong.
[11:38:28] <kurushiyama> Sepho: You have your use cases. From these use cases questions are derived. Based on these questions, you should model your data.
[11:39:07] <kurushiyama> Sepho: And the aim is to get your questions answered in the most efficient way.
[13:36:54] <DrPeeper> today is a new day
[13:51:04] <Sagar> hello, we are running mongodb on our dedicated server, it's getting slow in getting queries, can anyone help please?
[13:52:09] <cheeser> run explain on your queries. make sure you're indexed correctly.
[13:53:09] <Sagar> isn't there any tuner for the configuration file, we just saw that our mongodb.log was like 224MB
[13:53:19] <Sagar> also how to run queries and make correct indexing?
[14:00:41] <StephenLynx> http://bfy.tw/4QTw
[14:30:22] <x4w3> Hello to everybody, please, could someone help me with this query? <db.devices.update({$set: {disableAccelsens:true}}, {multi:true})> it doenst change all collection, i want to add new column to devices collection...tk in adv
[14:31:07] <cheeser> first part of an update() is the query
[14:32:00] <x4w3> cheeser: without multi?
[14:32:18] <cheeser> db.collection.update({query}, {update}, {options})
[14:32:51] <x4w3> then i need to use a query, .....
[14:33:00] <x4w3> to match all cases
[14:33:12] <cheeser> you could use {} for the query
[14:33:21] <x4w3> ummmm, ok, tk :P cheeser
[14:34:03] <x4w3> it run now db.devices.update({},{$set: {disableAccelsens:true}}, {multi:true}) --> tk cheeser
[14:34:25] <cheeser> np
[15:12:22] <Ben_1> is there a insertMany if entry not exist, else udpateMan?
[15:30:40] <Ben_1> is there a insertMany if entry not exist, else udpateMan?
[15:32:02] <saml> https://docs.mongodb.org/manual/reference/method/db.collection.updateMany/
[15:32:09] <saml> use upsert
[15:32:15] <saml> set upsert option
[15:32:35] <saml> no that doesn't make sense. never mind
[15:33:02] <saml> https://docs.mongodb.org/manual/reference/method/db.collection.insertMany/
[15:33:17] <saml> what are you trying to do?
[15:56:34] <Ben_1> sami: I want to update entries but if there are no entries to update I want to insert it.
[15:57:47] <Ben_1> *saml:
[15:57:56] <Ben_1> so it looks like upsert would do that
[15:59:48] <saml> upsert would create one document max
[15:59:53] <saml> which makes sense
[16:00:35] <saml> what are your entries? trying to see when it's reasonable to update entries.. but if nothing matches just create one
[16:02:59] <StephenLynx> Ben_1, use bulkwrite
[16:03:05] <StephenLynx> and on each operation use an updateOne
[16:04:02] <Ben_1> StephenLynx: thx
[16:04:02] <Ben_1> saml: sensor values
[16:18:33] <JustMozzy> hello everyone. I have a script that aggregates documents with duplicate values. when I loop over the cursor, I do a remove but the documents are not deleted. no error messages either. anyone an idea?
[16:18:48] <StephenLynx> yes.
[16:18:51] <StephenLynx> your script is wrong;
[16:21:12] <JustMozzy> StephenLynx: ah, thanks.
[16:22:20] <StephenLynx> v:
[16:22:46] <JustMozzy> here is the script. http://paste.ubuntu.com/15188323/
[16:23:51] <StephenLynx> hm
[16:24:01] <StephenLynx> you are not using it asynchronously
[16:25:03] <StephenLynx> i am not used to using aggregation like that
[16:25:32] <StephenLynx> on that print
[16:25:35] <StephenLynx> what it says there?
[16:26:10] <StephenLynx> your usage is weird, but I can't see anything actually wrong on the tools I am used to.
[16:27:50] <StephenLynx> you could use an unwind and a second group on that aggregation to get all the ids to be removed at once, though
[16:31:51] <StephenLynx> http://pastebin.com/4XqMRGMZ
[16:32:58] <JustMozzy> the print gives me the correct ID value. When I use the remove statement in the shell it actually removes the document
[16:33:29] <JustMozzy> oh sweet. I'll try your version
[16:34:28] <Ben_1> If I insert something to mongoDB with the async driver and maybe there is a duplicate _id, will it stop after the error or just skip that duplicated entry?
[16:38:29] <Ben_1> if it just skip that duplicate maybe there should be a list of throwables in the callback because there could be several keys duplicated
[16:38:33] <cheeser> write a quick test. insert the same document 10 times. see what happens.
[16:44:03] <JustMozzy> StephenLynx: Your version also left the documents untouched. Nothing was printed to the console though
[16:45:48] <Ben_1> cheeser: it seems that I got only one error and the whole operation is canceled
[16:46:01] <StephenLynx> let me check the unwind
[16:46:35] <StephenLynx> ah
[16:46:48] <StephenLynx> use '$uniqueIds'
[16:46:52] <StephenLynx> on the unwind
[16:47:08] <JustMozzy> yeah, I fixed that :) it gave me an error.
[16:47:18] <JustMozzy> also push => $push
[16:48:00] <StephenLynx> welp i dunno
[16:48:01] <JustMozzy> when I need the allowDiskUse, what comes first, the callback or the option? I could not find it in the docs
[16:48:48] <StephenLynx> probably the option
[16:48:53] <StephenLynx> usually callbacks come last
[16:49:47] <JustMozzy> hmm... I'll try using the cursor, just in case
[17:20:08] <Ben_1> cheeser: it seems that I got only one error and the whole operation is canceled, I am that right? would it not be better to insert all entries and track all errors
[17:32:30] <cheeser> Ben_1: are you doing an unordered bulk write?
[17:33:04] <Ben_1> not sure, but think so insertMany(listofDocs, response)
[17:34:58] <Ben_1> cheeser: unordered means it doesn't matter which entry is inserted after another, I am right?
[17:36:13] <cheeser> right
[17:36:22] <cheeser> this is the async api for the java driver?
[17:40:23] <Ben_1> cheeser: yes
[17:40:47] <cheeser> pass an InsertOptions refernce to configure the unordered write.
[17:59:00] <Ben_1> cheeser: thanks, thought it is unordered per default
[17:59:17] <cheeser> nope
[18:02:57] <JustMozzy> its driving me nuts
[18:04:37] <JustMozzy> this script http://pastebin.ca/3380969 I am getting a lot of results, but the remove simply does not remove. WriteReulst has always nRemoved: 0, but when I type the remove manually in shell, it does remove
[18:07:44] <Ben_1> JustMozzy: seems the values of "todelete" does not exist in your db
[18:08:22] <JustMozzy> Ben_1: but they clearly do, when I type the db.contents.remove(Object("sameidasincursor"); it does remove the document
[18:08:34] <Ben_1> bzw. you are using print("Removing " + todelete.valueOf()); but in var res you don't use valueOf, why?
[18:08:34] <StephenLynx> maybe is the type
[18:08:42] <JustMozzy> oh nooooooo!
[18:08:47] <JustMozzy> oh god this is so embarassing
[18:08:50] <Ben_1> hehe
[18:08:56] <JustMozzy> it is db.contents not db.content -.-
[18:09:11] <Ben_1> ah I see
[18:09:23] <Ben_1> db.contents.aggregate and db.content.remove
[18:09:40] <StephenLynx> :v
[18:09:41] <StephenLynx> v:
[18:09:50] <StephenLynx> this is why you keep references to collecitons
[18:09:58] <StephenLynx> instead of fetching the collection on every operation
[18:09:58] <JustMozzy> excuse me while I go and bath in shame
[18:10:02] <StephenLynx> kek
[18:10:14] <cheeser> bathe :D
[18:10:17] <JustMozzy> xD
[18:10:28] <JustMozzy> go on, twist the knife a bit more :D
[18:10:33] <Ben_1> dive :D
[18:10:56] <JustMozzy> well, a lesson learned for life :D
[18:13:37] <JustMozzy> but how come mongo does not complain, that the collection does not exist?
[18:13:57] <JustMozzy> (pardon me being a noob on this one)
[18:14:10] <cheeser> welcome to the schemaless world
[18:15:07] <JustMozzy> a weird world to me :D
[18:20:18] <StephenLynx> what I do is to fetch a pointer to the collection on boot
[18:20:22] <StephenLynx> and reuse this pointer.
[18:49:37] <Doyle> Could a high number of faults cause cursor not found errors?
[20:07:07] <TechIsCool> Hey everyone got a question about latency in Mogodb. Inserts are taking between 10-30s each right now on one of my hosts. I have no clue how mongodb works other than a few basic commands. Is there a way to figure out what happening?
[20:07:57] <cheeser> do you have a bunch of indexes?
[20:09:22] <TechIsCool> I have 3 collections and one is system.index
[20:09:34] <TechIsCool> in the db I am having issues with
[20:10:08] <cheeser> do you have a bunch of indexes?
[20:10:50] <TechIsCool> 3 indexes
[20:11:04] <Doyle> TechIsCool, iostat -xmt 1 %iowait, disk utilization
[20:11:09] <cheeser> are all collection inserts slow or just the one collection?
[20:11:20] <TechIsCool> 1 collection
[20:11:42] <cheeser> and the size?
[20:12:52] <TechIsCool> 68290464
[20:13:07] <TechIsCool> is that in bytes?
[20:13:10] <cheeser> that's the number of documents?
[20:13:12] <cheeser> ah
[20:13:27] <TechIsCool> woops
[20:13:35] <TechIsCool> 199820 count 267808576 size
[20:13:38] <TechIsCool> wrong collection
[20:13:51] <cheeser> that's not terribly large
[20:13:54] <TechIsCool> storage size is 335900672
[20:14:04] <TechIsCool> we dropped the collection earlier today
[20:14:12] <kurushiyama> Let us come back to "one of my hosts". Maybe you should describe your setup a bit.
[20:14:17] <TechIsCool> it was at 49million
[20:14:36] <TechIsCool> I have 3 mongo servers
[20:14:50] <TechIsCool> I am a sysadmin not a dev so I inherited this
[20:15:31] <kurushiyama> TechIsCool: But those are 3 standalone instances, right?
[20:15:38] <TechIsCool> there are no shards
[20:15:47] <TechIsCool> replication
[20:16:34] <Doyle> If you dropped a collection that resided on the server being slow, it may be taking chunks, but that shouldn't impede operations
[20:17:01] <kurushiyama> Huh? In a replication, writes only go to the primary. So when writes are done when server A is primary, everything is fine, but when writes go to server B, they are slow?
[20:17:23] <TechIsCool> Intresting
[20:17:28] <TechIsCool> thought yah had not thought of that
[20:17:31] <TechIsCool> looking
[20:18:06] <Doyle> Ah, it's a RS. Nevermind my last comment
[20:18:35] <kurushiyama> Doyle: I was assuming the same ;)
[20:18:39] <Doyle> hehe
[20:22:23] <TechIsCool> So it could be possible that the Primary failed over to another replica and then all servers are writing to the wrong host. Is that what you guys are saying?
[20:22:37] <kurushiyama> Nope
[20:22:44] <kurushiyama> Drivers are replica set aware
[20:23:10] <TechIsCool> So all 3 servers are defined in the Application using Mongo
[20:23:24] <kurushiyama> TechIsCool: As a replica set?
[20:23:38] <Doyle> When you say there's latency, how did you observe this latency?
[20:23:48] <TechIsCool> NewRelic
[20:23:58] <TechIsCool> The App is .Net
[20:24:07] <Doyle> The servers are part of the same RS?
[20:24:12] <Doyle> or stand-alone?
[20:24:18] <TechIsCool> The are all in the same RS
[20:24:23] <Doyle> rs.status() will indicate
[20:24:27] <Doyle> ok
[20:24:59] <TechIsCool> server="mongodb://db12,db13,db14"
[20:25:03] <TechIsCool> is what the devs passed to me
[20:25:06] <kurushiyama> .Net – I am far gone and out. *And having a Jesus and Mary Chain Song in his head now*
[20:25:07] <Doyle> rs.printReplicationStatus() rs.printSlaveReplicationStatus()
[20:25:25] <kurushiyama> TechIsCool: on the shell.
[20:26:54] <TechIsCool> those commands don't exist
[20:27:00] <TechIsCool> unless I am doing something wrong
[20:27:02] <kurushiyama> TechIsCool: Whut?
[20:27:13] <TechIsCool> I am in the mongo shell right
[20:27:13] <kurushiyama> TechIsCool: the "mongo" shell, of course
[20:27:26] <kurushiyama> TechIsCool: no "rs.status()"?
[20:27:35] <TechIsCool> I have rs.status()
[20:27:41] <TechIsCool> health is 1 for all 3
[20:27:47] <TechIsCool> state is 2 for both secondaries
[20:28:02] <kurushiyama> Can you paste the anonymized output to pastebin or sth?
[20:28:15] <TechIsCool> sure
[20:29:09] <kurushiyama> Doyle: Do you know about the .Net driver? Does it autodicover replsets?
[20:29:44] <TechIsCool> kurushiyama: http://pastebin.com/CNhYhdic
[20:30:56] <kurushiyama> lgtm
[20:32:28] <dypsilon> Hie everyone, I'm thinking about using full text search capabilities of mongodb and have a question: how does it handle rich text content as in text with html tags?
[20:33:17] <kurushiyama> TechIsCool: just to make sure (I honestly dont know about the .Net driver), but the URI string should read "mongodb://db12,db13,db14/?replicaSet=set01"
[20:35:18] <TechIsCool> Looking
[20:35:23] <kurushiyama> TechIsCool: Just a wild guess, but I think this is what happens: The URI string as provided by you does not inform the driver that he is talking to a replica set. So he tries to write to the hosts in order of their appearance. db12 refuses to take the write, as it is not primary. So the next one is tried.
[20:35:49] <TechIsCool> So I need to define the replica set?
[20:36:07] <kurushiyama> TechIsCool: Nah, just the URI string needs to be changed.
[20:36:16] <kurushiyama> TechIsCool: You already have a replset
[20:36:29] <kurushiyama> TechIsCool: But again, this is a wild guess.
[20:36:42] <TechIsCool> alright looking at the devs web config
[20:37:18] <Doyle> TechIsCool, your election Time is GMT: Wed, 24 Feb 2016 20:20:14 GMT
[20:37:39] <Doyle> so I'm guessing 12 was primary, then failed over
[20:37:52] <Doyle> Your uri starting with 12 might have worked normally until then
[20:38:08] <kurushiyama> Doyle: nah, thats optime
[20:38:16] <TechIsCool> kurushiyama: ReplicaSet is defined on url
[20:38:38] <TechIsCool> a dev failed to copy correctly
[20:38:40] <Doyle> ah, grabbed the wrong string. GMT: Fri, 19 Feb 2016 11:12:23 GMT
[20:38:57] <TechIsCool> yah and this issue has only been about 8 hours so far
[20:39:00] <Doyle> Nice
[20:40:08] <kurushiyama> Ok, just for testing, I'd have the primary step down to make sure it is not a hardware problem. Next, check your SCM. "We haven't changed anything!" is one of the most common lies in IT industry.
[20:41:11] <TechIsCool> App has not been touched in atleast a month
[20:41:14] <TechIsCool> confirmed with jenkins
[20:41:24] <TechIsCool> how would I fail over to s a specific mongodb
[20:41:47] <TechIsCool> https://docs.mongodb.org/manual/tutorial/force-member-to-be-primary/
[20:41:49] <TechIsCool> that right
[20:41:57] <bros> GothAlice: I'm back with a day's worth of data.
[20:42:24] <TechIsCool> so I would run rs.stepDown(60)
[20:42:38] <TechIsCool> right on primary and one of the secondaries would take over
[20:42:44] <kurushiyama> TechIsCool: connect to the primary and issue rs.stepDown(). We just want to make sure the primary changes – we do not care too much to which of the other nodes for now.
[20:42:52] <TechIsCool> ok
[20:44:00] <kurushiyama> TechIsCool: And dont forget the SCM. I cant count the hours I ran after problems caused by "just a minor change in some unimportant part of the code".
[20:44:02] <TechIsCool> db14 is now primary
[20:44:23] <TechIsCool> They can't deploy in our envi without jekins and it has not run
[20:44:39] <kurushiyama> Ok, do the operations still take as long?
[20:44:55] <TechIsCool> waiting for the dashboard to update
[20:45:44] <bros> I can't get dex to return anything other than the same 1 result.
[20:46:38] <TechIsCool> kurushiyama: its falling
[20:49:50] <TechIsCool> Dropped for a second but back to the same issue
[20:51:22] <kurushiyama> TechIsCool: Well. I assume you checked the usual suspects (load, mem shortage, IOWait)?
[20:51:36] <TechIsCool> GetEnumerator is whats hanging
[20:51:44] <TechIsCool> that's different then insert right
[20:52:12] <TechIsCool> High load average
[20:52:19] <TechIsCool> 5 13 7
[20:52:55] <TechIsCool> it has 8 cpu's assigned
[20:53:14] <kurushiyama> I dont know poo about .Net... Ahhh... Assigned CPUs... Shared Storage?
[20:53:34] <kurushiyama> Paste the third line of "top", please
[20:53:40] <TechIsCool> :) understood
[20:54:23] <TechIsCool> Cpu(s): 38.8%us, 3.1%sy, 0.0%ni, 57.8%id, 0.3%wa, 0.0%hi, 0.1%si, 0.0%st
[20:55:14] <kurushiyama> Dinner time. bb in 30
[20:55:21] <TechIsCool> alright
[20:58:50] <TechIsCool> so I am running Mongotop and I see writes are not the issue its all reads
[21:00:55] <kurushiyama> TechIsCool: Which version are you running, and which storage engine?
[21:01:07] <TechIsCool> 2.6.4
[21:01:42] <kurushiyama> So MMAPv1
[21:02:12] <TechIsCool> Yes
[21:02:16] <TechIsCool> I think so
[21:02:21] <kurushiyama> 2.6.4 is a bit of archeology...
[21:03:04] <TechIsCool> Haha I know I would like to be up on 3.x
[21:03:05] <TechIsCool> something
[21:05:05] <kurushiyama> Ok... do you use MMS/CloudManager?
[21:05:21] <TechIsCool> No
[21:05:23] <TechIsCool> I wish
[21:05:29] <TechIsCool> something so I could see into it
[21:05:37] <TechIsCool> basically what I have is the cli
[21:05:55] <kurushiyama> Well. I will polish the christal ball, then...
[21:07:29] <TechIsCool> So on mongotop
[21:07:32] <TechIsCool> I get total 493113ms read 493113ms write 0ms
[21:07:46] <TechIsCool> other collections aer in the 100ms range
[21:07:50] <kurushiyama> hm.
[21:07:57] <TechIsCool> so 493 seconds to read
[21:08:00] <TechIsCool> which is just crazy
[21:08:23] <kurushiyama> Ok. Since we are doing this with one-and-a-half hands tied behind our back, I can only guess.
[21:08:56] <Doyle> TechIsCool, Are the qr|qw columns of mongostat filling up?
[21:09:26] <kurushiyama> Looking into my christal ball, murmuring some gibberish, I see collection level locking might be the problem.
[21:09:41] <Doyle> XFS?
[21:10:03] <TechIsCool> RHEL 6
[21:10:07] <TechIsCool> wtih LVM
[21:10:08] <kurushiyama> Doyle: also keep in mind we are talking of shared storage, if I get that right?
[21:10:15] <TechIsCool> this is a vm
[21:10:19] <TechIsCool> yup shared storage
[21:10:21] <kurushiyama> TechIsCool: which file system?
[21:10:35] <kurushiyama> Doyle: maybe atime?
[21:10:41] <TechIsCool> ext4
[21:10:48] <Doyle> ew
[21:10:56] <kurushiyama> TechIsCool: own partition for mongodb's data?
[21:11:09] <kurushiyama> Doyle: nice spot!
[21:11:14] <Doyle> ty
[21:11:34] <TechIsCool> it shared
[21:11:40] <TechIsCool> it does not have its own vgroot
[21:11:52] <Doyle> Yea, I've seen the mutex/concurrency issues with ext4 and the mongodb files before
[21:12:11] <kurushiyama> TechIsCool: we have multiple problems, here.
[21:12:34] <TechIsCool> http://pastebin.com/vgNCkVA4
[21:12:36] <Doyle> Between 3.2's file breakout, document level locking, and xfs, the perf is miles ahead of 2.6 on ext4
[21:12:44] <kurushiyama> TechIsCool: can you check wether the partition mongodb writes on has "noatime" in fstab?
[21:14:44] <TechIsCool> So its says "/dev/mapper/vgroot-root / ext4 defaults 1 1"
[21:14:50] <TechIsCool> so no there is noatime
[21:15:02] <TechIsCool> noatime in the fstab
[21:15:12] <kurushiyama> Doyle: very. nice. spot.
[21:15:59] <kurushiyama> TechIsCool: ok, we have multiple problems here. a) Shared storage. More often than not, this is not ideal.
[21:16:19] <zivester> is it possible to use mongodump on two db's and to mongorestore all collections into a single DB ?
[21:16:42] <Doyle> You can go nuts and throw a Pure Storage all flash array at a set of mongodb servers, but that's a $100K+ game
[21:16:47] <kurushiyama> TechIsCool: b) Maybe not the optimal fs. I agree with Doyle that xfs is the FS to use
[21:16:49] <TechIsCool> Haha
[21:17:09] <TechIsCool> My boss just wants to drop the db
[21:17:12] <TechIsCool> and let it rebuild
[21:17:21] <TechIsCool> which is not solving the problem grr
[21:17:22] <kurushiyama> TechIsCool: wrong approach
[21:17:26] <TechIsCool> I know
[21:17:28] <Doyle> Booo
[21:17:49] <Doyle> Restart the system first
[21:17:52] <kurushiyama> c) atime. Basically what happens, every time the data files are _accessed_ the FS metadata is updated.
[21:17:59] <Doyle> When in doubt, turn it off and on again :P
[21:18:10] <kurushiyama> Doyle: I prefer an axe.
[21:18:33] <Doyle> +1
[21:18:40] <kurushiyama> TechIsCool: Here is what I suggest
[21:19:10] <TechIsCool> So boss does not want to reboot due to indexes being rebuilt
[21:19:23] <TechIsCool> which I don't think happens
[21:19:46] <TechIsCool> but I am up for suggestions beofre he does things
[21:19:51] <Doyle> Indexes won't be rebuilt... Shared storage... Susceptible to noisy neighbour issues.
[21:20:17] <kurushiyama> TechIsCool: attach additional storage to one of the machines, use XFS, set noatime in fstab, remove the contents of dbpath, mount the new FS to dbpath, update mongod to 3.0.9 and start mongod
[21:20:40] <TechIsCool> Yah
[21:20:46] <TechIsCool> alright so basically upgrade it
[21:21:06] <kurushiyama> TechIsCool: Not only
[21:21:21] <kurushiyama> TechIsCool: You make changes to the underlying FS
[21:21:47] <kurushiyama> TechIsCool: let the devs approve https://docs.mongodb.org/ecosystem/drivers/driver-compatibility-reference/#c-net-driver-compatibility first
[21:22:21] <kurushiyama> TechIsCool: when the instance you changed has synced, repeat the process with the next secondary.
[21:23:21] <kurushiyama> TechIsCool: Last but not least: have the primary step down, repeat
[21:23:39] <TechIsCool> So basically a rolling deploy
[21:23:52] <kurushiyama> TechIsCool: The changes in the FS are crucial.
[21:24:02] <TechIsCool> Right since its a much faster FS
[21:24:03] <kurushiyama> TechIsCool: But in general, yes
[21:24:30] <kurushiyama> TechIsCool: well, you need to change a few things anyway, so you can use the optimal FS, too
[21:24:52] <TechIsCool> So is him dropping the database going to cause harm today
[21:25:13] <kurushiyama> Except for loosing the data for no reason?
[21:25:26] <TechIsCool> meh we dropped the data once today already
[21:25:37] <TechIsCool> its flushed every day out of the db
[21:25:57] <TechIsCool> well its supposed to be anyways
[21:26:22] <TechIsCool> So basically every other time this happens we drop the collection and everything is resolved
[21:26:29] <kurushiyama> TechIsCool: basically, it doesn't hurt then. If the database is small, it wont hurt either to keep it.
[21:26:43] <TechIsCool> well that database is reported as 62gb
[21:26:45] <TechIsCool> right now
[21:27:08] <kurushiyama> TechIsCool: I wouldnt, since it is not necessary and the syncing of 62Gb should not take long.
[21:27:24] <kurushiyama> TechIsCool: but you can do.
[21:27:30] <TechIsCool> here is the question though
[21:27:47] <TechIsCool> since we dropped the 29million records why has mongo not cleaned up the db size
[21:27:51] <TechIsCool> or will it not
[21:27:57] <TechIsCool> since ist just using the files as is
[21:28:15] <cheeser> mmapv1, especially, will not release disk space back to the OS
[21:28:45] <TechIsCool> alright
[21:28:55] <TechIsCool> so it is playing in its own sand box
[21:28:58] <TechIsCool> which is fine
[21:29:31] <kurushiyama> TechIsCool: you might want to disable WT's compression on the data files and/or journal during the update.
[21:33:02] <kurushiyama> TechIsCool: Ah, you need to shutdown the instances manually before doing the yum update, since mongod is not shut down preupdate or preuninstall.
[21:34:31] <TechIsCool> http://pastebin.com/87NaNmXy
[21:34:41] <TechIsCool> Yah I wish I could do an upgrade today
[21:34:46] <TechIsCool> boss is pushing back
[21:35:32] <TechIsCool> I totally want to upgrade to the newer version of mongo it just works better
[21:38:03] <kurushiyama> Bosses. Sometimes I ask myself why they pay people good money for their technical expertise when in the end they seem to know most things, if not everything better.
[21:38:58] <dypsilon> Hi everyone, I'm thinking about using full text search capabilities of mongodb and have a question: how does it handle rich text content as in text with html tags?
[21:39:11] <kurushiyama> dypsilon: Have you tried it?
[21:39:39] <dypsilon> kurushiyama, nope, not yet.
[21:42:23] <Doyle> TechIsCool, see the notes. https://docs.mongodb.org/v3.0/administration/production-notes/
[21:43:07] <Doyle> ulimits, THP, TCP Keepalive, zone reclaim, and the noatime. xfs
[21:43:40] <kurushiyama> dypsilon: http://pastebin.com/wGESK9w8
[21:43:46] <Doyle> Disabling THP always gives me trouble. Why not make it a simple thing
[21:43:58] <TechIsCool> So my boss just wiped out the databse and now its at 200-300ms
[21:44:03] <TechIsCool> and holding fine
[21:44:05] <TechIsCool> grr
[21:44:14] <TechIsCool> I hate being called out
[21:44:41] <Doyle> LOL
[21:46:50] <kurushiyama> Doyle: I am working on a PR for the RPMs to include THP, ulimits and zone reclaim
[21:55:13] <kurushiyama> TechIsCool: nvm, you tried. In Germany, there is the notion of "Beratungsresistenz" which roughly translates to resistance against advice
[21:56:14] <cheeser> germans have the best words
[22:02:49] <Doyle> That'd be great, kurushiyama. It's annoying to script it all for each different OS (if you change os's)
[22:05:26] <zivester> is there anyway to ungzip/unarchive a --gzip --archive mongodump ?
[22:07:31] <bros> Doyle: any experience with dex?
[22:17:41] <dypsilon> kurushiy_, thank you for this simple test, looks good, but I have all kinds of html in there with classnames and so on. I think, I just test how it looks tomorrow.
[22:21:34] <kurushiyama> dypsilon: What this test was supposed to show you is that html tags wont be ignored
[22:22:45] <dypsilon> kurushiyama, oh, just missed the last line, almost midnight here... well that's the answer I need. Seems like I'll have to trim them before saving to the database.
[22:23:08] <kurushiyama> dypsilon: Well, not necessarily.
[22:23:39] <dypsilon> kurushiyama, the problem is: I have all kinds of html with classnames and they will surely mess up the results.
[22:23:53] <kurushiyama> dypsilon: most dedicated search engines have the option to strip html before indexing
[22:24:20] <dypsilon> kurushiyama, yeah, I'm indexing the pages myself.
[22:24:50] <kurushiyama> dypsilon: Well, I'd use elaticsearch instead of reinventing the wheel
[22:25:29] <dypsilon> kurushiyama, oh, that's what you mean. Not sure I want to add all the admin overhead that comes with elasticsearch.
[22:26:14] <kurushiyama> dypsilon: Well, that is obviously your call. However, I do not find the overhead to be very small.
[22:27:09] <dypsilon> kurushiyama, you mean "elastic is faster" or "elastic is easy to maintain"?
[22:27:56] <dypsilon> well, either way I'll try and go with mongodb, if html stripping is my only problem, everything is fine
[22:28:01] <dypsilon> no need to overengineer
[22:28:10] <kurushiyama> dypsilon: Agreed.
[22:29:25] <bros> Can you do indexes on subdocuments?
[22:33:45] <kurushiyama> bros: you sure can. As a whole or for single fields. http://pastebin.com/A4BiJtfC
[22:34:53] <bros> kurushiyama: If I create indexes for field "status" and field "account_id", should I also create one for { status: 1, account_id: 1 } (not subdocument related)?
[22:36:09] <kurushiyama> bros: That depends on your documents and you use cases.
[22:52:57] <bros> Is it bad to make indexes preemptively?
[22:54:56] <Derick> not particularly - as long as you know why and which queries you're running
[22:56:37] <StephenLynx> if you know you will need those indexes, no.
[23:26:53] <jiffe> if I flatten a maching with mongoc running (in a 3 mongoc node setup), it should resync from one of the other nodes right?
[23:26:58] <zeioth_> If someone could take a look I would be greateful
[23:26:59] <zeioth_> http://stackoverflow.com/questions/35615176/c-sharp-mongodb-driver-2-0-retrieving-filter-results-as-custom-model
[23:27:12] <jiffe> a machine
[23:29:56] <joannac> jiffe: what does "flatten" mean?
[23:30:28] <jiffe> joannac: resetup the raid disks and build the machine from scratch again
[23:30:45] <joannac> what version?
[23:30:56] <jiffe> 3.0.8
[23:31:00] <joannac> no
[23:31:11] <joannac> config servers are not a replica set in 3.0.x
[23:31:43] <jiffe> hmm
[23:31:56] <joannac> https://docs.mongodb.org/v3.0/tutorial/replace-config-server/
[23:33:44] <jiffe> ah ok, so it may not resync itself but I can just rsync the data over from one of the other 3
[23:33:46] <jiffe> other 2
[23:35:32] <joannac> yes
[23:46:08] <bros> I just dropped all of my indexes. How can I have dex tell me which I need to build?
[23:46:15] <bros> Every time I run it, it only returns 1, then exits.