[00:02:37] <GothAlice> cerjam: Also note the last comments on the JIRA ticket I linked. You may be running an old RHEL-derived CentOS, that inherited the overall bug.
[00:03:03] <bros> StephenLynx: any negative effects of sharding early?
[00:03:09] <bros> I'd rather just get it over with.
[00:04:12] <bros> Gotcha. At what level should a MongoDB user start sharding?
[00:04:17] <cheeser> i wouldn't put redundancy in the sharding column.
[00:04:23] <GothAlice> Without it, there's only one copy of your data. You might periodically backup, but if it dies, there's downtime while you restore that backup.
[00:04:29] <StephenLynx> isnt that replica sets, though?
[00:05:11] <GothAlice> You mix them, in a RAID 10-like arrangement for both data isolation between shards, and redundancy amongst the sharded replicas.
[00:05:25] <StephenLynx> my database is less than 100 mb :v
[00:05:29] <StephenLynx> because it has files on it
[00:08:30] <bros> GothAlice: What size server would you put for that kind of DB (in terms of CPU cores/memory), and what kind of replication and/or sharding would you do?
[00:08:54] <cerjam> GothAlice: i'd like to thank you very much for your help
[00:12:24] <GothAlice> Major faults require disk access. Minor faults are page allocations, which may require page defragmentation if you have transparent huge page support enabled, see the wiki link I just linked. :)
[00:12:41] <GothAlice> How long has that been running?
[00:18:52] <GothAlice> I have an uptime of 352 days, MIN=264464523 MAJ=4490680 or ~8.69 MIN/sec and 0.14 MAJ/sec for one city's web presence, vs. your 386/0.01. This would indicate to me that you have a very heavy query load vs. dataset size compared to the city. ;) You would need several nodes, but small ones, preferably on physically distinct hardware.
[00:19:51] <bros> Could it be that my indices blow?
[00:19:52] <GothAlice> No, what types of queries do you run, and is lag/latency an issue on a majority of your queries? Additionally, does your data have some natural method of grouping like-documents within collections?
[00:20:36] <bros> GothAlice: It's a SaaS platform. Few aggregations based on past sales data, mainly just storing data into subdocuments
[00:20:57] <bros> The longest query I think is 850ms but it's generally over a lot of data.
[00:21:14] <bros> I didn't understand the last bit about grouping
[00:21:38] <GothAlice> And yes, indexes are certainly a factor in query performance and load, but your dataset fitting in RAM somewhat offsets that. Certainly look into it, though. I highly recommend http://mongolab.org/dex/ to analyze your real queries to see how they might be improved.
[00:22:29] <GothAlice> Can't type fast enough tonight. XP
[00:22:36] <bros> What's the first thing I should do? Move it onto a dedicates server? Yes. What do I do about replication/sharding?
[00:22:40] <bros> I might just have you beat at 140wpm. :P
[00:25:41] <GothAlice> Optimization without measurement is by definition premature. Yes, however, DBs should be isolated from other services.
[00:29:25] <GothAlice> bros: Choosing sharding has an impact on other optimizations, such as the use of "covered queries" (mentioned in that documentation). If your data is write-heavy (insert/update), sharding with a "sharding key" chosen to distribute those writes amongst nodes is an effective strategy. If it's find-heavy and a little replication lag is acceptable, one can reduce network traffic on the primary by querying secondaries in a replica set.
[00:30:15] <DrPeeper> ugh. I am forced to chuckle when I read "sharding"
[00:30:31] <bros> GothAlice: so run dex for a day?
[00:31:26] <GothAlice> bros: Well, in ordinary operation I believe it watches and periodically emits what it thinks might be helpful indexes. It's a continuous process of refinement. Run as long as you need, during a period of time of typical usage, until you have indexes optimizing the queries noted.
[00:33:27] <GothAlice> cheeser: I liked Denver. I liked Boulder and Crested Butte better, though. Better biking. ;P
[00:35:25] <GothAlice> bros: Hmm. An important note: beware that creating indexes without specifying background creation will halt all other operations. So… be careful of that. (And background indexes can take a very, very long time to build, depending on load.)
[00:35:49] <bros> GothAlice: I'm running slow queries and dex isn't picking anything up. Yes, I set profiling to 1.
[00:38:14] <GothAlice> That was a bit quick. What's a typical mongostat result?
[00:43:15] <GothAlice> Also, what did you set "slowMs" to? It needs to be below the threshold for the queries you're finding slow.
[00:56:12] <bros> GothAlice: mongostat is super low. like, <2 queries a second
[00:56:28] <GothAlice> Yeah, I'd give it some time to populate.
[00:57:41] <bros> GothAlice: https://gist.github.com/brandonros/aee2716691e8d6b24573 I made these indexes a while ago. See anything that looks strange?
[00:57:57] <bros> Is it common to have more indexes than that?
[00:58:10] <kalx> GothAlice: Sorry for the delay, got pulled into something else. Appreciate the help, don't want to take up too much of your time though so maybe you can just peek at the explain and see if anything stands out. (I'm fairly new to mongo but not DBs in general)
[01:02:19] <Doyle> the pid directory was created, and is owned by mongod
[01:02:34] <Doyle> systemctl start mongod and service mongod start both fail
[01:03:10] <cheeser> run: mongod --config <path to file>
[01:05:09] <Doyle> oh, better output. Expected boolean switch but found string: /mongodb/indexes for option: storage.wiredTiger.engineConfig.directoryForIndexes
[01:05:32] <kalx> GothAlice: Essentially, an index allows access to the records we're targeting already pre-sorted. However, in order to find a record with the proper "tags", a scan must occur. These is just bitfield comparisons so should still be pretty inexpensive.
[01:06:07] <GothAlice> Hmm, I've never actually investigated how bitfields interact with indexes.
[01:07:13] <Doyle> Hmm... Directory for indexes...
[01:07:25] <bros> GothAlice: db.orders.stats(): totalIndexSize: 108054016 Is this bad? It seems bad.
[01:07:41] <Doyle> can be true or false. Seems I have to simlink that dir rather than specify it
[01:07:41] <GothAlice> 103MB of indexes isn't bad.
[01:07:55] <Doyle> I wish I had 103MB of indexes :P
[01:10:27] <uuanton> anyone use munin on mms to track disk space ?
[01:10:43] <cerjam> anyway, i appreciate the help GothAlice ima hit it i hope to not see yall again
[01:10:43] <bros> Doyle: GothAlice and I are trying to figure out why I have 308 page faults a second or something.
[01:10:47] <kalx> GothAlice: I tried an array approach too -> array of the tagids, and then the nasty looking: "tags: { $not: {$elemMatch: {$nin: some_input_tag_array}}}. Same result. I've also tried eliminating any fs by using a ramdisk, same result (and presumably the entire collection is in ram anyhow)
[01:10:49] <uuanton> i use nagios but it looks like overkill
[01:14:56] <Doyle> bros, faults. Hmm. If your dataset doesn't fit in ram, that's normal
[01:15:30] <Doyle> bros, ideally the working set fits in ram, indexes and data. Next best is indexes fitting in ram. at this point faults will be normal
[01:16:23] <bros> Doyle: I've got 8GB memory and 4GB swap. Mongo is eating up 12%, and my entire db is about 2-3GB.
[01:17:32] <Doyle> On a dedicated server that should all be in memory, no problem
[01:18:44] <bros> I've got it mixed in with 20 node processes at the moment
[01:18:50] <bros> and redis, and nginx, and a wordpress blog
[01:20:15] <GothAlice> So the same recommendation from earlier: three nodes, it's okay if they're VMs, on physically discrete underlying hardware (to isolate host failures to impacting only one node).
[01:20:28] <bros> GothAlice: I use cloud hosting. Not a problem?
[01:20:28] <Doyle> bros, if you want to be cheap, one primary, one secondary, and an arbiter
[01:26:02] <Doyle> if the primary dies, the secondary elects and your app can still write
[01:26:16] <bros> I'm going to hold off on replication. How many boxes do I need to spin up?
[01:27:48] <Doyle> 1. Get mongo running (tune your ulimits, disable THP, adjust TCP keepalive, adjust zone reclaim, and set RA to 32), do a mongodump, then import to your new host
[01:28:13] <kalx> @GothAlice: Anyhow, thanks for taking a peek. I will investigate a little bit more but I may just throw this out in favor of a different solution. Mongo doesn't seem to be benefiting me much for this use-case. Haven't been able to identify anything I'm doing wrong (but still.. there must be, I can't imagine Mongo's performance being that bad...)
[01:28:41] <Doyle> bros, use XFS for your data volume
[01:29:17] <Doyle> if you want to go l33t, cut a separate disk for your journal (& indexes if using WT)
[01:29:53] <bros> Doyle: any guide/link for the ulimits?
[01:33:12] <Doyle> the docs have a large page for production tuning
[02:40:44] <nyexpress> I have records collected from every minute, stored with a unix timestamp field (not mongodate), how can I retrieve the average value from values in the record on an hourly basis?
[02:52:59] <GothAlice> nyexpress: Aggregate query, $group stage with an _id: {$mod: ["$dateFieldName", {$literal: 60*60}]}
[02:57:43] <GothAlice> $subtract, rather. That should do; the _id field in each per-hour average document in the aggregate result will be "snapped" to the hour it belongs to.
[02:58:13] <nyexpress> should it be $avg instead of $sub to get the average value for the 60 values in the hour?
[02:58:56] <GothAlice> $group will average all of the documents that end up having the same _id, i.e. hour logged, as a unix timestamp.
[02:59:57] <GothAlice> So, the _id should be that $sub thing, and yes, you'll need to tell it which fields to average, and where to save them as part of the $group project.
[03:02:24] <GothAlice> Right now, you want an hourly sample size, but have individual per-minute events. This means MongoDB needs to perform a lot of work each time you want to pull a given time range, scaled to a large multiple of the number of hours covered.
[03:03:20] <nyexpress> yeah I was planning on reducing the set even further if the time range is larger than a month, year, etc
[03:03:38] <GothAlice> Using pre-aggregation, you can save a "view" that automatically updates its averages as it goes. So, you might keep a capped collection of limited size around storing the last 24 hours of individual events, but when .insert()'ing the log entry, also .update() a collection for the view for the scaled out timeframe with upsert=True.
[03:04:14] <GothAlice> https://gist.github.com/amcgregor/1ca13e5a74b2ac318017#file-eek-py-L30 < like this in Python
[03:05:03] <GothAlice> https://gist.github.com/amcgregor/2032211 < this is what my per-minute logging records can grow to. :3
[03:25:07] <Doyle> GothAlice, in your mongo faq, you mention that the thread attempting the read gets suspended during page fault. At what point are faults excessive? 1000s per second?
[03:25:54] <GothAlice> Depends on the hardware; when you start getting backlog it's bad. I.e. time spent in io-wait.
[03:28:51] <Doyle> In cloud environments, what's your approach for getting good performance from your instances? striped LVM?
[03:28:57] <GothAlice> Quite the nightmare. I had to restore data at the time by reverse engineering the InnoDB on-disk format. ¬_¬ Took the 36 hours prior to wisdom tooth extraction. The reverse engineering without sleep was the more painful of the two. XP
[03:29:54] <nils_> GothAlice: Is that the thing where you have to build a struct from the frm file?
[03:30:14] <GothAlice> Yeah, except when you say directory-per-db with InnoDB, it lies.
[03:30:31] <GothAlice> It stores certain critical bits of information in a shared pool, a directory up.
[03:30:51] <GothAlice> Which were in a locked EBS volume (vs. the client home folders, which were accessible),
[03:31:14] <nils_> referring to the data dictionary I suppose
[03:31:54] <GothAlice> Much pattern hex editor goodness. :P
[03:34:45] <GothAlice> On the performance front, for MongoDB I also shard and replicate, with each node on distinct hardware (mostly… I have a complex VM setup with three physical servers… but each primary is on one box) with iSCSI RAID-ish. (Drobo 8-something-i.) One RAID per VM host. Durable to one server failure, or a lot of drives failing. XP
[03:35:46] <Doyle> Nice. What's your storage hardware? NetApp? Nimble?
[03:37:35] <GothAlice> nils_: Paid for itself in ~7 months at initial purchase vs. cloud hosting this stupid amount of data. Compose.io would cost me just over half a million dollars a month for this dataset. :/
[03:38:02] <nils_> yeah I'm very much a sceptic when it comes to cloud hosting
[03:38:33] <nils_> especially if you're already dealing with devs that want to throw hardware at any problem, if they can do it at the click of a button...
[03:38:42] <GothAlice> Doyle: I haven't actually measured. It's a very write-heavy load.
[03:39:47] <nils_> although I remember quite a few cases where I could have probably spent a fortune on a proprietary solution instead of haggling over components for something homegrown
[03:39:47] <Doyle> optimize everything!!! Start with the code before looking to scale hardware. See if your aggregates can do a larger reduction earlier in the pipeline...
[03:40:29] <Doyle> What I love is that a Macbook will always out-perform a cloud instance, and that everyone asks about it
[03:41:22] <nils_> directly attached storage and no durable I/O will do that ;)
[03:41:39] <GothAlice> Doyle: There's never been more than 15 second replication lag on that cluster, and that's only like, one operation lagging that far intermittently. (Never did track that down either…) It's almost purely inserts, with some chatter over a few capped collections (also only ever inserts). Very unusual load. ;P
[03:42:09] <Doyle> Cool. How many inserts/s does mongostat indicate?
[03:42:44] <GothAlice> I've only got one thing streaming on my network at the moment, and IRC chatter is light. :P
[03:43:42] <GothAlice> When I Duckduckgo things, though, there are larger bursts of traffic. Wikipedia browsing is deadly, sometimes rising to a few thousand/second.
[03:53:09] <Doyle> GridFS is sweet. I hadn't looked at it before
[03:54:21] <cheeser> i'm building a Java FileSystemProvider to expose that as a FileSystem in the JVM.
[03:54:34] <cheeser> it's almost certainly a terrible idea but it's been fun to play with
[03:54:58] <GothAlice> cheeser: ^_^ HTTP via Nginx and a custom Python FUSE thingy are what I use.
[03:55:04] <GothAlice> Mine is terrible as the FUSE filesystem is infinite.
[03:56:02] <cheeser> GothAlice: funny. my next step was to expose it all via REST for reading and writing :D
[03:56:13] <GothAlice> (All content is tagged, with each sub-directory containing all possible valid tags as sub-directories, "files" matching those tags also contained therein. Recursively forever.
[04:33:11] <cerjam> made up a new openvz container, using mongodb3.2 and it quietly crashed again
[04:49:39] <arussel> is there an easy query for: any of the fields are undefined ?
[05:18:09] <kalx> arussel: Not sure there's a convenient way to check that besides being a bit more explicit for each field, checkout $exists and $or.
[05:18:21] <kalx> (but I'm not claiming to be a mongo expert either)
[05:48:29] <arussel> I know how to check for a field, I was just hoping not to recursively have to test for each field
[05:49:09] <kalx> it wouldn't really be recursive, you just mean avoiding a long $exists:false OR $exists:false OR blabla chain I assume?
[05:50:45] <arussel> oops,forgot to say that I don't know the documents fiels, so I have to use the shell and iterate over every fields
[05:56:14] <arussel> {$type:6} seems to still work on my mongo version, what are you supposed to use instead considering it is deprecated ?
[08:13:35] <Sagar> hello i am trying to run this software https://github.com/mrvautin/adminMongo . as per config i have updated app.json and set the ip to my server's ip but still the adminMongo is accessible via localhost only. please help
[09:05:50] <kurushiyama> Well, was an advice for both of you – makes life easier.
[10:13:48] <dunk> Hi. I was wondering - I know that you can use a Postgres Foreign Data Wrapper to pull mongo data into postgres, but can you do the reverse? (without writing a bunch of custom application code - I mean like a true foreign data wrapper)
[11:20:59] <Sepho> I'm creating a Schema for events. Each event is only visible by user who has created the event. I'm thinking about to create something like "createdBy" field, which stores de current user _id... Is this the best approach? In that case, what value type should I use on "createdBy"? A Number? Thanks in advance!
[11:37:32] <kurushiyama> Sepho: Imho, your approach is kind of wrong.
[11:38:28] <kurushiyama> Sepho: You have your use cases. From these use cases questions are derived. Based on these questions, you should model your data.
[11:39:07] <kurushiyama> Sepho: And the aim is to get your questions answered in the most efficient way.
[14:30:22] <x4w3> Hello to everybody, please, could someone help me with this query? <db.devices.update({$set: {disableAccelsens:true}}, {multi:true})> it doenst change all collection, i want to add new column to devices collection...tk in adv
[14:31:07] <cheeser> first part of an update() is the query
[16:18:33] <JustMozzy> hello everyone. I have a script that aggregates documents with duplicate values. when I loop over the cursor, I do a remove but the documents are not deleted. no error messages either. anyone an idea?
[16:32:58] <JustMozzy> the print gives me the correct ID value. When I use the remove statement in the shell it actually removes the document
[16:33:29] <JustMozzy> oh sweet. I'll try your version
[16:34:28] <Ben_1> If I insert something to mongoDB with the async driver and maybe there is a duplicate _id, will it stop after the error or just skip that duplicated entry?
[16:38:29] <Ben_1> if it just skip that duplicate maybe there should be a list of throwables in the callback because there could be several keys duplicated
[16:38:33] <cheeser> write a quick test. insert the same document 10 times. see what happens.
[16:44:03] <JustMozzy> StephenLynx: Your version also left the documents untouched. Nothing was printed to the console though
[16:45:48] <Ben_1> cheeser: it seems that I got only one error and the whole operation is canceled
[16:48:53] <StephenLynx> usually callbacks come last
[16:49:47] <JustMozzy> hmm... I'll try using the cursor, just in case
[17:20:08] <Ben_1> cheeser: it seems that I got only one error and the whole operation is canceled, I am that right? would it not be better to insert all entries and track all errors
[17:32:30] <cheeser> Ben_1: are you doing an unordered bulk write?
[17:33:04] <Ben_1> not sure, but think so insertMany(listofDocs, response)
[17:34:58] <Ben_1> cheeser: unordered means it doesn't matter which entry is inserted after another, I am right?
[18:04:37] <JustMozzy> this script http://pastebin.ca/3380969 I am getting a lot of results, but the remove simply does not remove. WriteReulst has always nRemoved: 0, but when I type the remove manually in shell, it does remove
[18:07:44] <Ben_1> JustMozzy: seems the values of "todelete" does not exist in your db
[18:08:22] <JustMozzy> Ben_1: but they clearly do, when I type the db.contents.remove(Object("sameidasincursor"); it does remove the document
[18:08:34] <Ben_1> bzw. you are using print("Removing " + todelete.valueOf()); but in var res you don't use valueOf, why?
[18:49:37] <Doyle> Could a high number of faults cause cursor not found errors?
[20:07:07] <TechIsCool> Hey everyone got a question about latency in Mogodb. Inserts are taking between 10-30s each right now on one of my hosts. I have no clue how mongodb works other than a few basic commands. Is there a way to figure out what happening?
[20:07:57] <cheeser> do you have a bunch of indexes?
[20:09:22] <TechIsCool> I have 3 collections and one is system.index
[20:09:34] <TechIsCool> in the db I am having issues with
[20:10:08] <cheeser> do you have a bunch of indexes?
[20:16:34] <Doyle> If you dropped a collection that resided on the server being slow, it may be taking chunks, but that shouldn't impede operations
[20:17:01] <kurushiyama> Huh? In a replication, writes only go to the primary. So when writes are done when server A is primary, everything is fine, but when writes go to server B, they are slow?
[20:22:23] <TechIsCool> So it could be possible that the Primary failed over to another replica and then all servers are writing to the wrong host. Is that what you guys are saying?
[20:32:28] <dypsilon> Hie everyone, I'm thinking about using full text search capabilities of mongodb and have a question: how does it handle rich text content as in text with html tags?
[20:33:17] <kurushiyama> TechIsCool: just to make sure (I honestly dont know about the .Net driver), but the URI string should read "mongodb://db12,db13,db14/?replicaSet=set01"
[20:35:23] <kurushiyama> TechIsCool: Just a wild guess, but I think this is what happens: The URI string as provided by you does not inform the driver that he is talking to a replica set. So he tries to write to the hosts in order of their appearance. db12 refuses to take the write, as it is not primary. So the next one is tried.
[20:35:49] <TechIsCool> So I need to define the replica set?
[20:36:07] <kurushiyama> TechIsCool: Nah, just the URI string needs to be changed.
[20:36:16] <kurushiyama> TechIsCool: You already have a replset
[20:36:29] <kurushiyama> TechIsCool: But again, this is a wild guess.
[20:36:42] <TechIsCool> alright looking at the devs web config
[20:37:18] <Doyle> TechIsCool, your election Time is GMT: Wed, 24 Feb 2016 20:20:14 GMT
[20:37:39] <Doyle> so I'm guessing 12 was primary, then failed over
[20:37:52] <Doyle> Your uri starting with 12 might have worked normally until then
[20:40:08] <kurushiyama> Ok, just for testing, I'd have the primary step down to make sure it is not a hardware problem. Next, check your SCM. "We haven't changed anything!" is one of the most common lies in IT industry.
[20:41:11] <TechIsCool> App has not been touched in atleast a month
[20:41:57] <bros> GothAlice: I'm back with a day's worth of data.
[20:42:24] <TechIsCool> so I would run rs.stepDown(60)
[20:42:38] <TechIsCool> right on primary and one of the secondaries would take over
[20:42:44] <kurushiyama> TechIsCool: connect to the primary and issue rs.stepDown(). We just want to make sure the primary changes – we do not care too much to which of the other nodes for now.
[20:44:00] <kurushiyama> TechIsCool: And dont forget the SCM. I cant count the hours I ran after problems caused by "just a minor change in some unimportant part of the code".
[21:18:40] <kurushiyama> TechIsCool: Here is what I suggest
[21:19:10] <TechIsCool> So boss does not want to reboot due to indexes being rebuilt
[21:19:23] <TechIsCool> which I don't think happens
[21:19:46] <TechIsCool> but I am up for suggestions beofre he does things
[21:19:51] <Doyle> Indexes won't be rebuilt... Shared storage... Susceptible to noisy neighbour issues.
[21:20:17] <kurushiyama> TechIsCool: attach additional storage to one of the machines, use XFS, set noatime in fstab, remove the contents of dbpath, mount the new FS to dbpath, update mongod to 3.0.9 and start mongod
[21:21:21] <kurushiyama> TechIsCool: You make changes to the underlying FS
[21:21:47] <kurushiyama> TechIsCool: let the devs approve https://docs.mongodb.org/ecosystem/drivers/driver-compatibility-reference/#c-net-driver-compatibility first
[21:22:21] <kurushiyama> TechIsCool: when the instance you changed has synced, repeat the process with the next secondary.
[21:23:21] <kurushiyama> TechIsCool: Last but not least: have the primary step down, repeat
[21:23:39] <TechIsCool> So basically a rolling deploy
[21:23:52] <kurushiyama> TechIsCool: The changes in the FS are crucial.
[21:24:02] <TechIsCool> Right since its a much faster FS
[21:24:03] <kurushiyama> TechIsCool: But in general, yes
[21:24:30] <kurushiyama> TechIsCool: well, you need to change a few things anyway, so you can use the optimal FS, too
[21:24:52] <TechIsCool> So is him dropping the database going to cause harm today
[21:25:13] <kurushiyama> Except for loosing the data for no reason?
[21:25:26] <TechIsCool> meh we dropped the data once today already
[21:25:37] <TechIsCool> its flushed every day out of the db
[21:25:57] <TechIsCool> well its supposed to be anyways
[21:26:22] <TechIsCool> So basically every other time this happens we drop the collection and everything is resolved
[21:26:29] <kurushiyama> TechIsCool: basically, it doesn't hurt then. If the database is small, it wont hurt either to keep it.
[21:26:43] <TechIsCool> well that database is reported as 62gb
[21:29:31] <kurushiyama> TechIsCool: you might want to disable WT's compression on the data files and/or journal during the update.
[21:33:02] <kurushiyama> TechIsCool: Ah, you need to shutdown the instances manually before doing the yum update, since mongod is not shut down preupdate or preuninstall.
[21:35:32] <TechIsCool> I totally want to upgrade to the newer version of mongo it just works better
[21:38:03] <kurushiyama> Bosses. Sometimes I ask myself why they pay people good money for their technical expertise when in the end they seem to know most things, if not everything better.
[21:38:58] <dypsilon> Hi everyone, I'm thinking about using full text search capabilities of mongodb and have a question: how does it handle rich text content as in text with html tags?
[21:39:11] <kurushiyama> dypsilon: Have you tried it?
[21:46:50] <kurushiyama> Doyle: I am working on a PR for the RPMs to include THP, ulimits and zone reclaim
[21:55:13] <kurushiyama> TechIsCool: nvm, you tried. In Germany, there is the notion of "Beratungsresistenz" which roughly translates to resistance against advice
[22:17:41] <dypsilon> kurushiy_, thank you for this simple test, looks good, but I have all kinds of html in there with classnames and so on. I think, I just test how it looks tomorrow.
[22:21:34] <kurushiyama> dypsilon: What this test was supposed to show you is that html tags wont be ignored
[22:22:45] <dypsilon> kurushiyama, oh, just missed the last line, almost midnight here... well that's the answer I need. Seems like I'll have to trim them before saving to the database.
[22:23:08] <kurushiyama> dypsilon: Well, not necessarily.
[22:23:39] <dypsilon> kurushiyama, the problem is: I have all kinds of html with classnames and they will surely mess up the results.
[22:23:53] <kurushiyama> dypsilon: most dedicated search engines have the option to strip html before indexing
[22:24:20] <dypsilon> kurushiyama, yeah, I'm indexing the pages myself.
[22:24:50] <kurushiyama> dypsilon: Well, I'd use elaticsearch instead of reinventing the wheel
[22:25:29] <dypsilon> kurushiyama, oh, that's what you mean. Not sure I want to add all the admin overhead that comes with elasticsearch.
[22:26:14] <kurushiyama> dypsilon: Well, that is obviously your call. However, I do not find the overhead to be very small.
[22:27:09] <dypsilon> kurushiyama, you mean "elastic is faster" or "elastic is easy to maintain"?
[22:27:56] <dypsilon> well, either way I'll try and go with mongodb, if html stripping is my only problem, everything is fine
[22:29:25] <bros> Can you do indexes on subdocuments?
[22:33:45] <kurushiyama> bros: you sure can. As a whole or for single fields. http://pastebin.com/A4BiJtfC
[22:34:53] <bros> kurushiyama: If I create indexes for field "status" and field "account_id", should I also create one for { status: 1, account_id: 1 } (not subdocument related)?
[22:36:09] <kurushiyama> bros: That depends on your documents and you use cases.
[22:52:57] <bros> Is it bad to make indexes preemptively?
[22:54:56] <Derick> not particularly - as long as you know why and which queries you're running
[22:56:37] <StephenLynx> if you know you will need those indexes, no.
[23:26:53] <jiffe> if I flatten a maching with mongoc running (in a 3 mongoc node setup), it should resync from one of the other nodes right?
[23:26:58] <zeioth_> If someone could take a look I would be greateful