[00:09:03] <UberDuper> If you have slow queries that say IXSCAN, check the nscanned value. If it's a large number, your index (or query) is no good.
[00:09:57] <kurushiyama> Though the latter does not necessarily mean that the queried fields were not in an index. Order matters, for example. an index with {foo:1,bar:1} and a query of {bar:"a",foo:15} should result in a collscan.
[00:10:16] <UberDuper> I thought that didn't happen anymore.
[00:10:29] <kurushiyama> UberDuper I _think_ not in 3.x
[00:10:50] <kurushiyama> UberDuper But it was coverd in m102 for 2.6
[08:49:03] <chujozaur> I made two hosts join replicaSets, and load avg is ~300/400 there
[08:49:48] <chujozaur> I see many iowaits and many faults on newly spawned secondaries
[08:50:02] <chujozaur> but I have no idea how to mitigate them
[09:36:16] <MahaDev> Hi, please help me on this query - db.getCollection('users').find({"_id" : ObjectId("57502418391ebf041c94657b")}, {"timezone" : {$exists: true}}) What is wrong with this ? I am getting "Unsupported projection option: timezone: { $exists: true }" error
[10:41:14] <Mmike> Hi, lads. I have 3 node replicaset, and two nodes went down. The routers in the racks where those two nodes reside are being replaced
[10:41:22] <Mmike> now the remaining one is SECONDARY and I can't really use it
[10:41:36] <Mmike> so I am going to reconfigure replicaset
[10:41:54] <Mmike> when I do so the remaining node will become PRIMARY but with only one member
[10:42:33] <Mmike> when they fix routrers, what's going to happend with my replicaset? I need to manually kill them and re-add them, right?
[10:50:09] <kurushiyama> Mmike As of their POV, it could well be just a network partition that happened.
[10:50:44] <kurushiyama> Mmike You can not shut them down or sth?
[10:50:46] <Mmike> kurushiyama: ok, so I need to make sure that my apps do NOT try to connect to those two, but just to the one that's in fresh replicaset
[10:51:12] <kurushiyama> Mmike What "fresh" replica set? You are talking of the remaining node?
[10:51:26] <Mmike> yes - so, I have nodes: A, B and C
[10:58:03] <kurushiyama> Mmike Well, the remaining node still has the same IP, I guess ;)
[10:58:17] <kurushiyama> Mmike And/or same hostname
[10:58:41] <Mmike> so when networking goes back on, i'll connect to each of the ones that were unreachable, shut down mongo, delete datadir. Then I will fire up mongo, and do rs.add() on the always-working primary
[11:01:03] <kurushiyama> Mmike Ok. Side note: With that WC, you eliminate failover capabilities. Majority should do well. But in this case, no rollback
[11:01:30] <Mmike> I had serious issues with rollbacks not being notices properly
[11:01:43] <kurushiyama> Mmike Well, I'd make sure and firewall off the current primary as well.
[11:02:03] <kurushiyama> Mmike use readConcern, then.
[11:02:15] <Mmike> like, have PRIMARY disconnect before it replicated data - remaining ones voted new primary, and when old primary joined it was like 'ooo, we have primary, my oplog is no longer, let's take stuff from new primary'
[11:02:33] <kurushiyama> Mmike And if you need data on 3 nodes while retaining failover, 5 is more suitable to your needs.
[11:03:41] <kurushiyama> Mmike Assuming you have firewalled the 2 nodes basically off everything, so that there are isolated, then you should be good.
[11:04:04] <kurushiyama> Mmike With a write concern of majority, rollbacks should not happen anyway.
[11:06:03] <kurushiyama> Mmike As per consistency, you are aware of readConcern?
[11:06:45] <kurushiyama> Mmike And I was more referring to "as little of shared infrastructure as possible". One router killing two nodes is a bad sign.
[11:08:42] <Mmike> kurushiyama: no, no - two switches died :)
[11:18:09] <kurushiyama> Mmike But I take sweets ;)
[11:19:38] <enelar> Guys, i am getting oom while querying a lot of data through php->mongofill. As far as i understand, upon ->find command i should get nothing more than empty iterator (ie cursor). What memory allocating for?
[11:20:19] <enelar> Is it bug of mongofill (library name), or mongodb actually fetched data into client on ->find?
[11:21:26] <kurushiyama> enelar Sounds like a wrongly configured OS mongod is running on.
[11:21:55] <kurushiyama> enelar Wait, too early in my day: your query gets an OOM or the mongod?
[11:26:03] <enelar> kurushiyama, i am not quite sure whats happening. but i am getting OOM inside client. (which is PHP right now). This happens because database returning _BSON_ of data, which php trying to store into memory
[11:27:06] <enelar> And my question, is it expectable? Or it just bad coded library? (I am using https://github.com/mongofill/mongofill)
[11:28:20] <enelar> Let me come up again. Is collection->find() works similar to `sql select`
[11:29:42] <thapakazi> Hi there, does db.setProfilingLevel(2) requires restart, and any
[11:29:42] <thapakazi> better way to get insights on types of query mongo is
[11:29:42] <thapakazi> getting/processing for span of period like 6months or less ? THanks
[11:31:17] <kurushiyama> thapakazi What is you current profiling level?
[11:32:51] <thapakazi> "current profiling level ? " kurushiyama it was 0 and I am trying 1 and 2, but nothings happeing :(
[11:33:31] <kurushiyama> enelar Well, knowing poo about PHP and your data, I need a little more info. A find can potentially return huge datasets. Say you have a DB of 100k docs, and each is 10k in size – do the math. Now imagine you try to store that in an array.
[11:36:10] <kurushiyama> thapakazi I am not sure. I guess I restarted the mongod (since I usually change that stuff during maintenance). No problem if you have a replset: do it on the secondaries, restart them (one by one!), have the primary step down, set the level there, restart.
[11:37:53] <pamp> Hey its normal with WiredTiger have this high values of locks --> http://dpaste.com/08D97RV
[11:38:22] <kurushiyama> pamp That has little to do with WT.
[11:38:47] <kurushiyama> pamp Looks like a collscan
[11:39:55] <kurushiyama> pamp You made sure that the according query uses indices?
[11:42:51] <pamp> I thought that with WiredTiger We dont have global locks
[11:45:33] <kurushiyama> pamp Not global ones. But document level ones. And having all fields indexed does not say much. Say you have separate indices on `foo` and `bar`, but your query is {foo:"a",bar:"b"}, only the foo index will be used and then the bar condition needs to be searched i the docs found in the foo index.
[11:51:51] <kurushiyama> pamp You might want to think about having the indices hashed. Usually, this only makes sense if you want to use a monotonically increasing value as a shard key.
[11:52:30] <kurushiyama> Doing the math on your other question: that is 120µs/doc
[11:52:41] <enelar> I answered my question https://github.com/mongofill/mongofill/blob/c2a16b7cb6c8fa94aa4c286acd8b4a90c3a843f5/src/MongoCursor.php#L255
[12:08:36] <thapakazi> hey, are there no restart alternative, with restart I have to warmup huge dataset again :(
[12:09:20] <thapakazi> I wish sth exists like config reload in nginx
[12:20:10] <thapakazi> luckey me, my team have mms installed inplace :)
[12:28:36] <goku147> if i change the mongoose schema and i add a lot of new fields, how do i make the previously entered documents to also have the same frields?
[12:35:12] <kurushiyama> goku147 That process is called "data migration" and you have to do it.
[12:35:47] <kurushiyama> goku147 Neither mongoose nor MongoDB can guess the data for said fields, right?
[13:35:34] <goku147> it would be wonderfull if i can do it in mongoose schema itself, because i have loads of data!
[13:39:29] <scruz> what should {$eq: [field, ‘4’]} return if: (1) field is an empty string, (2) if field is null?
[13:41:05] <scruz> my pipeline seems borked and is returning results inconsistent with direct querying
[13:51:10] <scruz> here’s part of my $project stage: https://bpaste.net/show/a0b7f3973bd6
[13:54:13] <scruz> when i query directly on the database, there are no records with status set to 4, but i am getting results in the ‘flagged and verified’ bucket
[14:14:27] <scruz> i don’t have any records with status set to 4, but i keep finding records in the “flagged and verified” bucket whenever i run the aggregation containing the last paste
[14:14:54] <scruz> so i’m wondering what’s wrong with my projection
[14:19:56] <trudko> hi everyone, I am playing with mongo and I've found query like this http://pastie.org/10895500 which seems extremelly complicated syntax wise
[14:19:58] <scruz> no, just the part of the project stage. specifically, the part that assigns the value of bucket
[14:20:25] <saml> do you ahve the full? i just wanted to try it on my own
[14:20:26] <scruz> trudko: that’s the aggregation framework
[14:20:54] <tantamount> How can I recover from "Fatal assertion 28579 NoSuchKey Unable to find metadata for table" on start?
[14:21:58] <scruz> trudko: what do you want to do?
[14:22:06] <saml> it's json.. so you can generate it at runtime as well
[14:22:48] <trudko> scruz: nothing in particular I am just curious if queries likes this are exception , because my first (uniformed) feeling is that this looks like complicated syntax, at least what I am used to seeing in sql
[14:23:06] <scruz> trudko: again, that’s not a query
[14:23:45] <saml> queries are usually simpler since there's no join
[14:41:19] <saml> maybe there's a bug in nesting $cond like that
[14:52:11] <kurushiyama> tantamount A log excerpt and a bit more detailed explanation could help.
[14:59:30] <tantamount> I don't know what more I could tell you
[15:00:27] <tantamount> I've tried running --repair and it rebuilds the indexes but when I run without repair it drops them all with a series of messages like, "dropping unused ident: admin/index-3-5150148562311891939" and then proceeds to claim that it can't find that index with a message like, "Fatal assertion 28579 NoSuchKey Unable to find metadata for table:admin/index-3-8494706625464327628"
[15:34:47] <caliculk> Hey, just curious, for ubuntu, is the config file located at /etc/mongodb.conf is not in yaml, but the documentations on configuration options, say that you can use the previously mentioned file as a configuration file. So does that mean that YAML is optional or rqeuired?
[15:35:17] <caliculk> Or is the /etc/mongodb.conf only used for the daemon instaces?
[15:37:58] <saml> you can specify configuration file to be used when you start mongod
[15:48:51] <UberDuper> I gave up on znc forever ago. I just irssi in screen.
[15:49:28] <UberDuper> irssi has a pretty minimal znc like functionality too but it's not pleasant to configure iirc.
[15:50:26] <GothAlice> Excepting this channel just now, and the occasional net/split, I've used a bouncer for several years without issue. (I.e. it's more helpful than not.) Screen, on the other hand, would segregate my data too far away from where I consume it, in a protocol nothing speaks. (I've seen Minecraft hosts trying to automate Screen… it's silly.)
[15:51:12] <GothAlice> UberDuper: Notably, I have four devices, and two servers, and push all listening to the bouncer.
[15:52:41] <UberDuper> I dunno what you mean by automating screen. But it's always just an ssh away. /shrug
[15:53:57] <GothAlice> UberDuper: My systems speak IRC, not "SSH with control codes to control the screen wrapper, with manual parsing of some visual representation presented by a full client". ;^P
[15:54:45] <UberDuper> Now I understand even less what you mean. :)
[15:56:11] <GothAlice> UberDuper: Screen presents a terminal window of a certain size. It requires a terminal emulator (i.e. Terminal.app, Putty, etc.) to hold a representation of the 2D grid of characters that is the screen, which Screen issues commands to update. (I.e. "this part updated, here's the new text".) Irssi is an IRC client, with a visual representation of IRC meant for humans.
[15:57:28] <GothAlice> Software systems (such as my logger, with data going back to 2001) aren't terminal emulators; the complexity of "teaching" a software system to interpret a grid of characters as having structure and meaning is… silly. They speak the IRC communications protocol directly, instead. Infinitely simpler and easier.
[15:59:28] <GothAlice> However, on the MongoDB front, I've been making… extensive use of index prefixing. (I.e. indexing {foo, bar} allows use when just searching {foo}.) I keep meaning to investigate index use improvements such as index merging; how far can MongoDB go these days in making use of independent indexes together? My concrete example is: {company, reference} in one index, {published, retracted} for range querying dates in the second. Merge-able
[15:59:29] <GothAlice> when searching company=, published<, retracted>?
[16:26:36] <sSs> a developer asked me for a mongodb server... What should I be aware of?
[16:30:00] <UberDuper> It's pretty straight forward to get up and running for dev/testing. You'll want to understand replica sets when you go into production.
[16:30:31] <UberDuper> IMO enable authentication from the very beginning.
[16:32:29] <sSs> should I start with 3 servers (primary, secondary + referee)?
[16:33:10] <UberDuper> For a production deployment, 3 is best. I'd recommend primary, secondary, secondary.
[16:34:30] <UberDuper> In general, I wouldn't use an arbiter.
[16:35:18] <GothAlice> sSs: Arbiters are primarily useful for interesting edge cases, such as having geographically distributed secondaries, to help control the network partition effect if backbone connections go down.
[16:36:46] <GothAlice> But you need at least three nodes (primary, secondary, arbiter, or primary and two secondaries) in order to maintain "high availability", i.e. access to your data if one of the nodes goes away. In the arbiter case, your data is guaranteed to switch to read-only in the event of a single failure, in the two secondary case, a new primary will be elected and things can continue as-is. I.e. it preserves write capability in the event of a
[16:37:25] <GothAlice> Rather, in the even of the failure of a single data node. The arbiter going away wouldn't… really do much, of course.
[16:38:38] <GothAlice> sSs: Also, can't stress the "don't actually make it net accessible until you secure it" point that UberDuper made enough. The defaults assume a secure LAN or VLAN.
[16:40:30] <UberDuper> Just don't ever run mongo without auth.
[16:40:40] <GothAlice> You can, if you take other steps to secure it.
[16:41:11] <sSs> is there a guide to run mongo with auth?
[16:41:31] <UberDuper> Yes. The mongo docs site is quite good.
[16:41:34] <GothAlice> https://docs.mongodb.com/manual/core/authentication/ is the manual for it.
[16:42:48] <sSs> I will be using CentOS for the mongodb
[16:42:50] <GothAlice> MongoDB allocates files on disk using stepped power of two sizes, starting relatively small, eventually reaching a maximum size per on-disk slice.
[16:43:11] <GothAlice> If your client's database is small, you may wish to enable the smallFiles option to bump the starting allocation size down a bit.
[16:43:52] <sSs> the developer doesn't know the size...
[16:44:21] <GothAlice> Does he not have current or test data? Expectations around # records, and examples of the complexity of those records?
[16:44:25] <sSs> do you think a: 1core + 512Mb + 20Gb disk for each is a nice start?
[16:44:33] <UberDuper> It's not difficult to migrate to larger instances if you have to.
[16:44:49] <GothAlice> You can pretty easily do napkin calculations using http://bsonspec.org/spec.html to figure out the rough lower and upper bounds.
[16:45:17] <GothAlice> sSs: You may be interested in my FAQ: https://gist.github.com/amcgregor/4fb7052ce3166e2612ab
[16:45:27] <UberDuper> I'd recommend 2 cores and you're going to need more ram then that. If you're using WT, mongo wont even start with less then ~1GB of ram.
[16:45:30] <GothAlice> It goes into detail on memory allocation concerns.
[16:46:33] <GothAlice> Aye, that's a good starting ratio.
[16:47:20] <GothAlice> sSs: One last point, if you want to ease some of the maintenance of your MongoDB cluster, you may wish to investigate https://www.mongodb.com/cloud
[16:47:26] <GothAlice> (Formerly the MongoDB Management Service, MMS.)
[16:47:51] <GothAlice> It provides some great analytics, too, to help you estimate growth, watch load, etc.
[16:47:59] <GothAlice> (In addition to making updates a total breeze.)
[18:00:07] <caliculk> So a few questions then about YAML config for replication, where can I specify the replication set? I see how to create a replicationset name, but, I don't see where to specify the IP addresses of the mongo instances?
[18:02:14] <GothAlice> caliculk: Replication state is stored in the "local" database within the DB engine, not configured via YAML.
[18:02:45] <GothAlice> (Maintained using the rs.* REPL commands, such as rs.init() and rs.reconfig())
[18:13:35] <Doyle> Hey. Is there a way to prevent new DBs from being created on specif shards?
[18:13:57] <cheeser> Doyle: you can set permissions to just a set a users
[18:15:01] <caliculk> So GothAlice does this even need to be set then? https://docs.mongodb.com/manual/reference/configuration-options/#replication.replSetName
[18:15:53] <Doyle> hmm. Not sure if we're on the same page. Say you have a sharded cluster with one shard (rs) that shouldn't get any new db's landing on it. That's the scenario
[18:16:15] <GothAlice> caliculk: Yes; that informs the engine which replica set configuration to look for in the local DB. It must be set.
[18:17:33] <caliculk> Alright, and then, last question hopefully for a bit. Is it recommended (and if not recommended, what downsides is there) from feeding data into slaves, which then the salves replicate into a master? Say for instance, I have a bunch of individual databases that I want feeding all their information into one larger one. Would there be any performance impact?
[18:19:29] <cheeser> writes only go to a primary (mongodb doesn't have master/slave anymore)
[18:19:32] <GothAlice> caliculk: First up, there's a terminology issue. MongoDB does not use master/slave replication (hasn't since… 2.4? 2.2?) but instead uses primary / secondary replication. There are some subtle differences, mostly relating to high availability and (I think) how the data is pushed around. Are you actually referring to 2.2/2.4 master/slave, or modern primary/secondary?
[18:21:42] <GothAlice> https://gist.github.com/amcgregor/df42f41f752f581eb116307175f3b301?ts=4 < phew! Document is a MutableMapping suitable for passing straight through to PyMongo. Phase 1 of deprecating MongoEngine is pretty much done. This relates to my earlier unanswered question about index merging: can a query on company=, (published unset or <), (retracted unset or >), make use of both _availability and _reference (or _template) indexes?
[18:22:05] <GothAlice> (Since now is proving to be a good time to review index usage and such.)
[18:23:44] <caliculk> So my idea is that we will be having three mongodb instances for HA, which will be directly connected to a webapp. While those synchronize, I also want a fourth mongodb instance that will be seperate and won't havea web application, but will be more or less "backing up" all the data from the three to the fourth mongodb instance
[18:24:38] <GothAlice> caliculk: A reasonable arrangement, as long as the web apps are actually on nodes other than the mongod ones; DB services should always be run in isolation. At work we have two replicas just for backups, that will never become primary.
[18:25:22] <GothAlice> Using: https://docs.mongodb.com/manual/tutorial/configure-secondary-only-replica-set-member/ and one of them delayed by 24h using https://docs.mongodb.com/manual/core/replica-set-delayed-member/
[18:39:58] <UberDuper> Doyle: I'm not sure of a way to prevent mongo from picking a shard to set primary on.
[18:40:51] <UberDuper> You might be able to set maxSize for each shard you *don't* want the primary on to less then the current disk allocation.
[18:41:22] <UberDuper> But I don't know if that would have the desired effect. It would also prevent chunk moves to those shards until you revert maxSize.
[18:42:01] <GothAlice> … or might even cause chunk moves in order to get the dataset size below maxSize.
[18:51:07] <Doyle> Rather than movePrimary and reload all the mongos's
[18:51:56] <UberDuper> A flushRouterConfig() should work fine when doing a movePrimary on an unsharded collection.
[18:52:18] <UberDuper> Rather then restarting the mongos.
[18:52:47] <Doyle> Yea, that's what I meant to say, but couldn't think of it.
[18:54:08] <UberDuper> Nope. We just movePrimary here.
[18:56:14] <Doyle> Oh well. I was hoping to prod a fancy unknown command out of someone here. It's as I understood then. Tag awareness can assure chunks are migrated only to specifically tagged shards, but the DB has to be manually moved. Thanks UberDuper
[18:58:07] <UberDuper> I'm very surprised there isn't an option for it during the create.
[18:58:12] <Doyle> Tiny problem to deal with, but it would be nice to have a create db command that would allow the specification of a tag to adhere to when selecting a shard to create the db at
[18:59:36] <Doyle> I think that's the only item on my wishlist for Mongodb atm. 3.0 and 3.2 took care of the other, mostly locking related wishlist items.
[19:05:33] <UberDuper> Wonder if the mongos would be smart enough to recognize if you created the DB at the shard.
[19:05:55] <UberDuper> My faith in mongos leads me to believe it wouldn't.
[19:05:59] <Doyle> Na, the mongos won't see it until you flush
[22:13:05] <_ohm> is there a limit on how any connections and how many times/s a mongodb can be queried? When I have one VM flooding the DB everything is fine, when I have 2 I get NoneType errors and both crash