[10:58:22] <cxz> i can't get migrations to work: ming.schema.Invalid: version:Missing field
[12:05:01] <fleetfox> there is no way to update _id?
[12:07:19] <fleetfox> no type constraints and immutability is sucha a nice combination
[12:09:53] <fleetfox> and it does type coercion for indecies WTF
[12:13:40] <StephenLynx> for one, I don't touch the _id field for anything.
[12:13:58] <StephenLynx> I declare a different unique index and use it.
[12:14:39] <StephenLynx> not only because it has a bunch of special rules but is very counter intuitive the name _id if you are storing your own value instead of the generated one.
[12:16:55] <fleetfox> That's a stupid argument, what if i have an existing external sequence? Surrogate keys ar stupid
[12:21:01] <cheeser> but, no, you can change an _id field
[12:33:09] <StephenLynx> mongo does many things well, the _id field is not one of these, IMO.
[12:34:26] <StephenLynx> from the very moment I realized it had special rules for projection I just said "yeah, nah, I am not dealing with your bullshit"
[12:34:59] <fleetfox> yes, i'll just tell my 0.5mil loc codebase to ignore the id..
[12:42:21] <StephenLynx> oh, cmon, no one likes to adopt crappy code. and his case seems to be inexperience programmer adopting hyped tech and using it wrong.
[12:42:32] <StephenLynx> and he inherits the mess.
[12:43:13] <cheeser> sure. it sucks. he just lost me with his webscale comment.
[12:43:38] <StephenLynx> well, most people who bought the hype used that.
[12:43:57] <cheeser> "webscale" or not is irrelevant to his actual problem
[13:41:37] <benjick> Hello. I bring stupid friday questions. Just got started with Mongo MMS and I want to create a new database and user to use in my application. I have this now; http://i.imgur.com/M2L6e6m.png is this user isolated to only access the database "testdb"?
[13:42:39] <StephenLynx> yes. afaik in mongo you need to whitelist users to databases.
[13:57:47] <DragonPunch> does that limit soudn right?
[13:57:50] <DragonPunch> or is it supposed to be outside
[13:57:56] <derfdref2> I've tried updateUser but though it allows specifying a different mechanism and doesn't throw an error, the user object in the system users collection doesn't change
[13:58:15] <StephenLynx> what runtime environment are you using?
[13:58:19] <StephenLynx> with io.js that would have to be an array
[13:58:35] <StephenLynx> and you would have to put the operators inside their own objects
[14:01:28] <DragonPunch> StephenLynx: how would i use the limit in aggregation
[15:22:22] <paperziggurat> what does mongo return if a document is not found? null or undefined?
[15:22:51] <Derick> it returns an empty result set
[15:27:30] <paperziggurat> derick, how would i, in javascript, check to see if an empty set is returned? create an empty set json object and then compare the results to that?
[15:35:35] <Derick> how do you do the query? do you use an ODM?
[15:37:25] <paperziggurat> i realized i can use the count function and just check to see if 1 document exists
[15:37:45] <StephenLynx> or check if the array has a length.
[15:58:52] <jr3> is caching a moongoose model possible with something like redis
[15:59:25] <StephenLynx> I am not sure, but I am pretty sure is not a very practical idea.
[16:02:00] <GothAlice> jr3: I use MongoDB as a cache. If your queries are slow enough to warrant caching, they may also warrant refactoring (either the code, or denormalizing the data model) to better accommodate the queries you are performing.
[16:02:15] <GothAlice> To assist with those, though, we'd need to know more about your specific performance issue.
[17:08:33] <pamp> GothAlice, Yes, its fresh data, for now I only create the indexes and make some queries..
[17:11:48] <GothAlice> Yeah; for performance monitoring, page faults per second are what you should be looking at. Memory (properly configured without THP enabled) is divided up into 4KiB pages, and when data is first loaded from disk, every single page from the files on disk that are being read or written to will "page in" once, data being loaded from the disk as part of the page fault.
[17:12:11] <GothAlice> With 15GB of data, that's a lot of pages, eh?
[17:13:10] <boutell> Hi. I am scaling a mongo application in which I need “read after write” consistency (if I’ve written it, another request should immediately be able to read it). On a single node this is no problem, but in a cluster it’s a little mysterious how it’s supposed to work. I’ve heard it suggested that the “majority” write concern will achieve this, but I don’t see what good that does if not all of the nodes h
[17:13:11] <boutell> the information. But I’ve also read that by default, “read” requests go to the primary, which would seem to defeat the entire purpose of having a cluster. What’s the right away to achieve this? Thanks.
[17:15:22] <saml> let's say you have 1000 concurrent clients. and 1000 mongod. so basically each client gets own mongod to read from. and you expect a write to propagate at once?
[17:15:50] <saml> so if client1 read updated doc, you'd expect client100 also has same doc
[17:16:38] <GothAlice> saml: Why in the name of hojek would you have one mongod per client? There is no need for that.
[17:16:55] <GothAlice> saml: And even if you did have a replica set of that size, all clients would direct their writes at the single primary.
[17:17:20] <saml> and primary acknowledges after all slaves replicate the change?
[17:17:57] <GothAlice> saml: Thus, each client, when issuing a query that must be consistent, would direct that must-be-safe query at the primary, which is always consistent. (Atomic operations FTW.) This is done using "read preference".
[17:18:35] <saml> so that's what boutell is asking. what's purpose of replica in that case other than backup?
[17:18:59] <GothAlice> saml: As for writers, "write concern" gives the server the idea of what level of consistency you want in your cluster prior to the operation you are attempting (insert, update, etc.) returning. I.e. "wait until the majority of replicas acknowledge they have this update saved to disk".
[17:19:36] <GothAlice> You can direct queries you don't mind a minor historical view of at the secondaries.
[17:19:49] <boutell> OK. So “majority write concern” is intended as a certain level of guarantee that the data won’t die. It’s not intended to provide read-after-write consistency at all.
[17:19:58] <GothAlice> For example, someone types up a job description in our main application at work and the "career site" that only reads from secondaries eventually gets the new job details.
[17:20:28] <GothAlice> The secondary that "career site" is reading from is in the same datacenter as the application server for that site, but _not_ the same datacenter as the main application. Data locality.
[17:20:42] <saml> boutell, what kind of app are you writing that needs read-after-write consistency?
[17:21:14] <GothAlice> boutell: No, it's not, but if you read from the primary after writing (which must always go to the primary), you have read-after-write consistency.
[17:21:35] <GothAlice> You _also_ have read-after-write consistency if you mandate all replicas confirm they have the data before the insert/update returns.
[17:21:48] <GothAlice> (In the latter case, you can safely read from a secondary and know you will have the data just written.)
[17:21:55] <boutell> OK. So if I use the “majority” write concern, and the default read behavior, then what I’m getting is in practice not unlike an old school mysql failover setup. (Which is a useful case, to be sure.)
[17:23:54] <GothAlice> But with various combinations of read preference and write concern you can produce a very wide range of behaviours.
[17:24:17] <GothAlice> (Datacenter-awareness is an _awesome_ feature.)
[17:25:07] <boutell> OK. So that’s a useful case. But if my use case is going beyond what a single node can deliver in terms of activity level, and I want to pretend nothing has changed in an application that stores things like sessions in mongodb, that’s unrealistic. To get any benefit there, I will have to start distinguishing between queries that can safely be a little stale, and queries that can’t. For those that can’t, I have t
[17:25:08] <boutell> use the correct readPreference.
[17:25:23] <boutell> or I can set the default the other way round, so that I specifically opt out of reading from the primary when I know it’s reasonable to do so.
[17:25:42] <GothAlice> Yes. To scale in the way you describe, you want sharding.
[17:26:37] <GothAlice> When you have way too much data to fit safely in RAM on one host you can split the data amongst several hosts using sharding. Where replication is mirroring RAID, sharding is striped RAID. (Combine them, and you have RAID 10.)
[17:26:47] <GothAlice> (Redundancy, and improved performance.)
[17:27:08] <boutell> sharding makes sense when I can identify a “pivot” to break up the data, right? If I’m implementing an email service, then sharding is awesomely easy
[17:28:20] <GothAlice> Like with replication, there are a variety of techniques usable for sharding, based on how you formulate your "sharding index".
[17:28:52] <GothAlice> For example, you could group frequently-queried-together documents onto the same shard, eliminating the need to perform multi-shard merges for some queries.
[17:28:59] <GothAlice> (I.e. the user's session data with that user's account data.)
[17:30:07] <GothAlice> Like with my job site example, I could physically locate the data on servers closer to the app; you can use shards to segregate (rather than replicate) data geographically.
[17:37:23] <boutell> it’s interesting that there is no “maximum” write concern. It seems like that’s achievable. It would stink on ice if you had a large replica set, but with a relatively small one there’s a use case for “slow writes, fast reliable reads from any node”.
[17:40:30] <GothAlice> I assign all queries scores on a few things: what's the risk to the app/business if this command fails and can it silently fail? What's the risk if someone sees data that is old? The former gives me the write concern, the latter the read preference.
[17:43:22] <GothAlice> A user-facing career site can be safely a little bit delayed in noticing new or modified job data. User session data? Less so.
[17:47:23] <benjick> Hi I added a user in MMS but it doesn't allow me to choose an instance, is the user added to all instances?
[17:48:24] <GothAlice> benjick: All instances within the same group, I believe, yes.
[17:48:39] <benjick> Ah, I forgot about the group I added
[18:16:47] <jbea> is there a way to order the *keys* in query results?
[18:17:05] <jbea> not the documents themselves, the keys
[18:18:02] <jbea> { field1: 0, field2: 0, field3: 0 } is sometimes { field2: 0, field3: 0, field1: 0 }, i want it to be consistent.
[18:20:20] <deathanchor> jbea: in the shell or via a driver or script?
[18:20:21] <GothAlice> jbea: Not all languages provide ordered mappings. Python standard "dict" objects are hash-order, not "abstract key order" (since keys can also be basically any hashable type, including integers.)
[18:20:36] <GothAlice> jbea: BSON, however, does preserve order. What client driver are you using?
[20:49:25] <ChALkeR> Ah, i missed that part: > This pattern is applicable to all mongod instances running as standalone instances or as part of a replica set.
[21:01:43] <saml> how can i designate certain node as primary? if it goes down, something else can be master.. but then when it comes back up, i want it to be master
[21:01:53] <saml> cause.. legacy clients only trying to connect to certain node
[21:20:34] <shlant1> hi all. I am having the most frustrating time trying to figure out why my new replica set is connecting to my old repricla set…. I have NO idea how it is connecting to my old replica set master…. any suggestions?
[21:21:33] <shlant1> this is the start script: https://github.com/MrMMorris/dockers/blob/master/mongodb/start_instance.sh
[21:21:59] <shlant1> I can 100% confirm that MONGO_MASTER is pointing to my NEW instance
[23:50:22] <StephenLynx> GothAlice when using gridfs, would caching a file on RAM provide a considerable benefit? You mentioned you use mongo as a cache, so I believe that gridFS would already store files on RAM if they are being accessed regularly?
[23:51:35] <GothAlice> In my home array I have much more data than fits in RAM, in GridFS. Frequently accessed pages are prioritized for longer preservation than less frequently accessed pages, thus frequently accessed chunks + the indexes.