PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Friday the 8th of May, 2015

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:00:20] <Glitches> ah, make the _id null, got it
[10:58:10] <cxz> anyone use ming?
[10:58:22] <cxz> i can't get migrations to work: ming.schema.Invalid: version:Missing field
[12:05:01] <fleetfox> there is no way to update _id?
[12:07:19] <fleetfox> no type constraints and immutability is sucha a nice combination
[12:09:53] <fleetfox> and it does type coercion for indecies WTF
[12:13:40] <StephenLynx> for one, I don't touch the _id field for anything.
[12:13:58] <StephenLynx> I declare a different unique index and use it.
[12:14:39] <StephenLynx> not only because it has a bunch of special rules but is very counter intuitive the name _id if you are storing your own value instead of the generated one.
[12:16:55] <fleetfox> That's a stupid argument, what if i have an existing external sequence? Surrogate keys ar stupid
[12:21:01] <cheeser> but, no, you can change an _id field
[12:21:08] <fleetfox> how?
[12:21:33] <cheeser> er, *can't*
[12:21:38] <cheeser> sorry. just sat down. :)
[12:22:43] <fleetfox> does this thing have transactions now? Can i delete and reinsert in atomic way?
[12:23:11] <cheeser> no
[12:23:25] <cheeser> transactions aren't coming in the near future
[12:23:52] <StephenLynx> what do you mean by external sequence?
[12:27:13] <fleetfox> There are scenarions where external service generates identifiers.
[12:27:18] <StephenLynx> ok
[12:27:21] <StephenLynx> so
[12:27:34] <StephenLynx> whats wrong with storing them on a field that is not _id?
[12:27:50] <fleetfox> i could do that but then i have a surrogate id?
[12:27:58] <StephenLynx> that is what I do.
[12:28:23] <fleetfox> it's antipattern, but who cares eh
[12:28:50] <StephenLynx> afaik you could insert declaring the _id.
[12:29:11] <StephenLynx> 80% sure.
[12:30:39] <cheeser> you can put that sequence value in _id, sure.
[12:30:50] <cheeser> you just have to do it before you save your document
[12:31:04] <fleetfox> i am doing that the issue is i have historically broken data, and now i'm fucked
[12:31:58] <StephenLynx> hm
[12:32:29] <StephenLynx> or
[12:32:38] <StephenLynx> you could just ignore the _id field.
[12:32:43] <StephenLynx> just sayin'
[12:33:09] <StephenLynx> mongo does many things well, the _id field is not one of these, IMO.
[12:34:26] <StephenLynx> from the very moment I realized it had special rules for projection I just said "yeah, nah, I am not dealing with your bullshit"
[12:34:59] <fleetfox> yes, i'll just tell my 0.5mil loc codebase to ignore the id..
[12:35:31] <StephenLynx> :v
[12:36:03] <StephenLynx> this is what happens when you don't read the manual before coding.
[12:36:21] <fleetfox> this is when you inherit technology based on webscale
[12:36:33] <StephenLynx> muh webscale
[12:36:51] <StephenLynx> yeah, I can imagine mongo had very wrong adoptions based on hype.
[12:37:03] <StephenLynx> I feel for you;
[12:41:17] <cheeser> whining--
[12:41:17] <cheeser> :)
[12:42:21] <StephenLynx> oh, cmon, no one likes to adopt crappy code. and his case seems to be inexperience programmer adopting hyped tech and using it wrong.
[12:42:32] <StephenLynx> and he inherits the mess.
[12:43:13] <cheeser> sure. it sucks. he just lost me with his webscale comment.
[12:43:38] <StephenLynx> well, most people who bought the hype used that.
[12:43:57] <cheeser> "webscale" or not is irrelevant to his actual problem
[12:44:07] <StephenLynx> yeah, I know.
[13:41:37] <benjick> Hello. I bring stupid friday questions. Just got started with Mongo MMS and I want to create a new database and user to use in my application. I have this now; http://i.imgur.com/M2L6e6m.png is this user isolated to only access the database "testdb"?
[13:42:39] <StephenLynx> yes. afaik in mongo you need to whitelist users to databases.
[13:43:00] <benjick> Thank you. That's good
[13:43:08] <StephenLynx> so unless it implies it can work on any db or specifies the other dbs, it will not work on anything else.
[13:43:26] <benjick> I see
[13:43:41] <benjick> I just want to create a database and plop that info into my app and not really thing about it too much
[13:44:12] <StephenLynx> yeah, check the privileges to make sure, but I am pretty sure you don't have to worry with what you got now.
[13:49:12] <benjick> StephenLynx: Thank you
[13:52:36] <DragonPunch> how do you get by lAST date
[13:52:39] <DragonPunch> or most recent date
[13:52:58] <StephenLynx> sort?
[13:54:53] <DragonPunch> oh yeah i forgot
[13:54:59] <cheeser> magic! :D
[13:55:45] <StephenLynx> I also suggest using aggregate, unless you are dealing with too many results.
[13:55:53] <StephenLynx> otherwise you have to use separate operations for that.
[13:56:39] <seion> Suggestions for open source/free mongodb management tools?
[13:57:04] <derfdref2> Possible to create a user with a specific (older) auth mechanism on 3.0?
[13:57:10] <derfdref2> Or change an existing user?
[13:57:28] <DragonPunch> StephenLynx: i am using aggregate check it out
[13:57:43] <DragonPunch> {$match: {"People": uid}}, {$unwind: "$Chats"}, {$sort: {"Chats.Date": -1}, $limit: 1}
[13:57:47] <DragonPunch> does that limit soudn right?
[13:57:50] <DragonPunch> or is it supposed to be outside
[13:57:56] <derfdref2> I've tried updateUser but though it allows specifying a different mechanism and doesn't throw an error, the user object in the system users collection doesn't change
[13:58:15] <StephenLynx> what runtime environment are you using?
[13:58:19] <StephenLynx> with io.js that would have to be an array
[13:58:35] <StephenLynx> and you would have to put the operators inside their own objects
[14:01:28] <DragonPunch> StephenLynx: how would i use the limit in aggregation
[14:02:08] <StephenLynx> [stuff,{$limit:x}]
[14:02:11] <StephenLynx> in io.js
[14:02:46] <DragonPunch> StephenLynx: i am filtering a subdoc i want only 1 in return for each docs subdoc
[14:03:03] <StephenLynx> then you need to use a group
[14:03:24] <DragonPunch> yeah but if i $group and push
[14:03:28] <DragonPunch> it will push ALL of the subdocs
[14:03:30] <StephenLynx> no, you don't push
[14:03:36] <StephenLynx> IMO theres an alternative
[14:03:45] <StephenLynx> you could unwind and just set it.
[14:03:55] <StephenLynx> unwind, sort, group so you get only the last one
[14:04:04] <DragonPunch> what command is it?
[14:04:20] <StephenLynx> read the docs on aggregation operators. brb lunch
[14:04:25] <DragonPunch> ok
[14:04:27] <DragonPunch> um
[15:22:22] <paperziggurat> what does mongo return if a document is not found? null or undefined?
[15:22:51] <Derick> it returns an empty result set
[15:27:30] <paperziggurat> derick, how would i, in javascript, check to see if an empty set is returned? create an empty set json object and then compare the results to that?
[15:35:35] <Derick> how do you do the query? do you use an ODM?
[15:37:05] <paperziggurat> meteor, javascript
[15:37:25] <paperziggurat> i realized i can use the count function and just check to see if 1 document exists
[15:37:45] <StephenLynx> or check if the array has a length.
[15:58:52] <jr3> is caching a moongoose model possible with something like redis
[15:59:25] <StephenLynx> I am not sure, but I am pretty sure is not a very practical idea.
[16:02:00] <GothAlice> jr3: I use MongoDB as a cache. If your queries are slow enough to warrant caching, they may also warrant refactoring (either the code, or denormalizing the data model) to better accommodate the queries you are performing.
[16:02:15] <GothAlice> To assist with those, though, we'd need to know more about your specific performance issue.
[16:33:27] <cheeser> https://plus.google.com/u/0/+JustinLee1/posts/GYypwmzS75j
[16:33:36] <cheeser> morphia 1.0.0-rc0 for you java users
[16:55:32] <pamp> Hi, why I have 2803052 page faults, when I have 60Gb's ram and my entire working set is only about 15Gb's
[16:55:34] <pamp> ?
[16:56:27] <cheeser> that's a lot of faults
[16:59:28] <GothAlice> I'd expect 3.75 million 4KiB page faults to load a 15GB dataset on cold start, or when initially importing the dataset.
[16:59:38] <GothAlice> So you're actually a bit ahead of that curve for that dataset size. :)
[16:59:46] <GothAlice> The number shouldn't increase much (or at all) after that point, though.
[17:00:22] <GothAlice> Well, that or your full dataset isn't yet in RAM. So, start worrying if you exceed 4 million page faults or so.
[17:00:48] <GothAlice> Page faults / second are the metric.
[17:01:14] <GothAlice> pamp/cheeser: ^
[17:08:33] <pamp> GothAlice, Yes, its fresh data, for now I only create the indexes and make some queries..
[17:11:48] <GothAlice> Yeah; for performance monitoring, page faults per second are what you should be looking at. Memory (properly configured without THP enabled) is divided up into 4KiB pages, and when data is first loaded from disk, every single page from the files on disk that are being read or written to will "page in" once, data being loaded from the disk as part of the page fault.
[17:12:11] <GothAlice> With 15GB of data, that's a lot of pages, eh?
[17:13:10] <boutell> Hi. I am scaling a mongo application in which I need “read after write” consistency (if I’ve written it, another request should immediately be able to read it). On a single node this is no problem, but in a cluster it’s a little mysterious how it’s supposed to work. I’ve heard it suggested that the “majority” write concern will achieve this, but I don’t see what good that does if not all of the nodes h
[17:13:11] <boutell> the information. But I’ve also read that by default, “read” requests go to the primary, which would seem to defeat the entire purpose of having a cluster. What’s the right away to achieve this? Thanks.
[17:13:18] <saml> 15000000/4
[17:15:22] <saml> let's say you have 1000 concurrent clients. and 1000 mongod. so basically each client gets own mongod to read from. and you expect a write to propagate at once?
[17:15:50] <saml> so if client1 read updated doc, you'd expect client100 also has same doc
[17:16:33] <saml> i doubt mongodb ensures such. isn't mongodb eventual consistency
[17:16:38] <GothAlice> saml: Why in the name of hojek would you have one mongod per client? There is no need for that.
[17:16:55] <GothAlice> saml: And even if you did have a replica set of that size, all clients would direct their writes at the single primary.
[17:17:20] <saml> and primary acknowledges after all slaves replicate the change?
[17:17:57] <GothAlice> saml: Thus, each client, when issuing a query that must be consistent, would direct that must-be-safe query at the primary, which is always consistent. (Atomic operations FTW.) This is done using "read preference".
[17:18:14] <saml> oh i see
[17:18:35] <saml> so that's what boutell is asking. what's purpose of replica in that case other than backup?
[17:18:59] <GothAlice> saml: As for writers, "write concern" gives the server the idea of what level of consistency you want in your cluster prior to the operation you are attempting (insert, update, etc.) returning. I.e. "wait until the majority of replicas acknowledge they have this update saved to disk".
[17:19:36] <GothAlice> You can direct queries you don't mind a minor historical view of at the secondaries.
[17:19:49] <boutell> OK. So “majority write concern” is intended as a certain level of guarantee that the data won’t die. It’s not intended to provide read-after-write consistency at all.
[17:19:58] <GothAlice> For example, someone types up a job description in our main application at work and the "career site" that only reads from secondaries eventually gets the new job details.
[17:20:28] <GothAlice> The secondary that "career site" is reading from is in the same datacenter as the application server for that site, but _not_ the same datacenter as the main application. Data locality.
[17:20:36] <GothAlice> (One use of replicas.)
[17:20:39] <GothAlice> (That isn't backup.)
[17:20:42] <saml> boutell, what kind of app are you writing that needs read-after-write consistency?
[17:21:14] <GothAlice> boutell: No, it's not, but if you read from the primary after writing (which must always go to the primary), you have read-after-write consistency.
[17:21:35] <GothAlice> You _also_ have read-after-write consistency if you mandate all replicas confirm they have the data before the insert/update returns.
[17:21:48] <GothAlice> (In the latter case, you can safely read from a secondary and know you will have the data just written.)
[17:21:55] <boutell> OK. So if I use the “majority” write concern, and the default read behavior, then what I’m getting is in practice not unlike an old school mysql failover setup. (Which is a useful case, to be sure.)
[17:23:36] <GothAlice> Correct.
[17:23:54] <GothAlice> But with various combinations of read preference and write concern you can produce a very wide range of behaviours.
[17:24:17] <GothAlice> (Datacenter-awareness is an _awesome_ feature.)
[17:25:07] <boutell> OK. So that’s a useful case. But if my use case is going beyond what a single node can deliver in terms of activity level, and I want to pretend nothing has changed in an application that stores things like sessions in mongodb, that’s unrealistic. To get any benefit there, I will have to start distinguishing between queries that can safely be a little stale, and queries that can’t. For those that can’t, I have t
[17:25:08] <boutell> use the correct readPreference.
[17:25:23] <boutell> or I can set the default the other way round, so that I specifically opt out of reading from the primary when I know it’s reasonable to do so.
[17:25:42] <GothAlice> Yes. To scale in the way you describe, you want sharding.
[17:26:37] <GothAlice> When you have way too much data to fit safely in RAM on one host you can split the data amongst several hosts using sharding. Where replication is mirroring RAID, sharding is striped RAID. (Combine them, and you have RAID 10.)
[17:26:47] <GothAlice> (Redundancy, and improved performance.)
[17:27:08] <boutell> sharding makes sense when I can identify a “pivot” to break up the data, right? If I’m implementing an email service, then sharding is awesomely easy
[17:28:20] <GothAlice> Like with replication, there are a variety of techniques usable for sharding, based on how you formulate your "sharding index".
[17:28:52] <GothAlice> For example, you could group frequently-queried-together documents onto the same shard, eliminating the need to perform multi-shard merges for some queries.
[17:28:59] <GothAlice> (I.e. the user's session data with that user's account data.)
[17:30:07] <GothAlice> Like with my job site example, I could physically locate the data on servers closer to the app; you can use shards to segregate (rather than replicate) data geographically.
[17:32:24] <boutell> thanks GothAlice.
[17:32:31] <GothAlice> No worries.
[17:37:23] <boutell> it’s interesting that there is no “maximum” write concern. It seems like that’s achievable. It would stink on ice if you had a large replica set, but with a relatively small one there’s a use case for “slow writes, fast reliable reads from any node”.
[17:40:30] <GothAlice> I assign all queries scores on a few things: what's the risk to the app/business if this command fails and can it silently fail? What's the risk if someone sees data that is old? The former gives me the write concern, the latter the read preference.
[17:43:22] <GothAlice> A user-facing career site can be safely a little bit delayed in noticing new or modified job data. User session data? Less so.
[17:47:23] <benjick> Hi I added a user in MMS but it doesn't allow me to choose an instance, is the user added to all instances?
[17:48:24] <GothAlice> benjick: All instances within the same group, I believe, yes.
[17:48:39] <benjick> Ah, I forgot about the group I added
[17:48:42] <benjick> Makes sense, thanks
[17:48:43] <GothAlice> :)
[18:15:52] <jbea> hey
[18:16:47] <jbea> is there a way to order the *keys* in query results?
[18:17:05] <jbea> not the documents themselves, the keys
[18:18:02] <jbea> { field1: 0, field2: 0, field3: 0 } is sometimes { field2: 0, field3: 0, field1: 0 }, i want it to be consistent.
[18:20:20] <deathanchor> jbea: in the shell or via a driver or script?
[18:20:21] <GothAlice> jbea: Not all languages provide ordered mappings. Python standard "dict" objects are hash-order, not "abstract key order" (since keys can also be basically any hashable type, including integers.)
[18:20:36] <GothAlice> jbea: BSON, however, does preserve order. What client driver are you using?
[18:32:45] <jbea> deathanchor: shell
[18:32:56] <jbea> using Robomongo
[18:45:25] <deathanchor> yeah I don't think printjson has any options for sorting keys. python does though.
[18:46:36] <GothAlice> Python's JSON encoder can sort alphabetically; Python also offers an OrderedDict to preserve the original BSON order.
[18:48:10] <jbea> ok thanks
[19:34:49] <T-Sourcemaker> is it possible in mongodb to have multiple unique keys as in Oracle or MySQL?
[19:35:12] <StephenLynx> yes.
[19:35:17] <T-Sourcemaker> or should I create a checksum for them and store them in the id field
[19:35:42] <StephenLynx> you can have multiple unique indexes completely independent from each other.
[19:36:56] <T-Sourcemaker> but using a checksum is also possible or is this a bad idea?
[19:37:29] <StephenLynx> is just unnecessary
[19:37:40] <StephenLynx> you can also make compound unique indexes afaik
[19:38:02] <StephenLynx> so you can make only the combination unique.
[19:39:14] <T-Sourcemaker> StephenLynx: sounds great... is there a doc about this?
[19:39:45] <StephenLynx> http://docs.mongodb.org/manual/core/indexes/
[19:42:18] <T-Sourcemaker> thank you
[20:02:39] <Progz> Hello, I am looking for a good tutorial to install a sharded mongo cluster
[20:16:16] <autrilla> Where could I read an introduction to how noSQL schemaless databases work?
[20:17:17] <StephenLynx> is more about how it doesn't work.
[20:17:22] <StephenLynx> it doesn't have a schema.
[20:17:24] <StephenLynx> simple.
[20:17:35] <morenoh149> autrilla: youtube nosql
[20:17:40] <autrilla> It still has to store data so it's easily searchable.
[20:18:01] <StephenLynx> yeah, but each one do it differently.
[20:18:15] <autrilla> Maybe that is my question!
[20:18:16] <StephenLynx> a document database and a graph database are different, but both are no sql
[20:18:28] <StephenLynx> so you need to learn about the database at hand.
[20:18:31] <morenoh149> Progz: http://docs.mongodb.org/manual/core/sharding/
[20:18:45] <StephenLynx> nosql is meaningless.
[20:18:53] <Progz> morenoh149: thanks
[20:18:53] <autrilla> ok, mongodb
[20:19:11] <Illusioneer> Is there a way to find a key in a document if you don't know it's position?
[20:21:11] <ChALkeR> Is there still not ETA for debian stable packages?
[20:21:54] <ChALkeR> I installed oldstable packages, they seem fine. Are there any known problems?
[20:22:15] <ChALkeR> I mean the 3.0 mongodb-org packages.
[20:22:30] <StephenLynx> Illusioneer what do you mean by that?
[20:23:34] <Illusioneer> StephenLynx: If I wanted to find {"user":"Bob"} somewhere inside the document but I didn't know where the key "user" was
[20:24:35] <ChALkeR> Illusioneer: Are you using some sort of a gui and can't visually locate the property in a big object?
[20:24:54] <StephenLynx> you just ask for the value of "user"
[20:24:58] <StephenLynx> document.user
[20:25:29] <StephenLynx> unless you used an array outside of it, which is quite wrong.
[20:25:35] <Illusioneer> ChALkeR: that's one reason
[20:26:16] <deathanchor> wait... what's a gui?
[20:26:18] <ChALkeR> Btw, does replication use port 27017 (by default)?
[20:26:45] <deathanchor> ChALkeR: it uses whatever port you specify on the member host value
[20:26:57] <ChALkeR> deathanchor: > by default
[20:27:51] <ChALkeR> I mean does s2s communication go through the usual port?
[20:28:42] <ChALkeR> For example, shards use port 28018 and config servers use port 28019
[20:28:47] <deathanchor> ChALkeR: yes def 27017 http://docs.mongodb.org/manual/reference/method/rs.add/
[20:29:18] <ChALkeR> deathanchor: thanks
[20:29:36] <deathanchor> but if you are using rs.reconfig() you need to have the port defined.
[20:40:20] <ChALkeR> deathanchor: I was asking to know what port i should open on the firewall.
[20:40:34] <ChALkeR> I am not going to reconfigure default ports.
[20:40:44] <deathanchor> ah
[20:41:26] <ChALkeR> I asked in case if there is server-to-server communication between replicas on some other port than the default one.
[20:44:28] <deathanchor> http://docs.mongodb.org/manual/tutorial/configure-linux-iptables-firewall/
[20:45:01] <deathanchor> The internets!
[20:48:35] <ChALkeR> deathanchor: Saw that one.
[20:49:25] <ChALkeR> Ah, i missed that part: > This pattern is applicable to all mongod instances running as standalone instances or as part of a replica set.
[20:49:27] <ChALkeR> =)
[20:49:28] <ChALkeR> Sorry.
[20:51:42] <deathanchor> yeap and more rules to consider if you shard
[20:57:55] <ChALkeR> Not yet.
[21:01:43] <saml> how can i designate certain node as primary? if it goes down, something else can be master.. but then when it comes back up, i want it to be master
[21:01:53] <saml> cause.. legacy clients only trying to connect to certain node
[21:01:58] <saml> and sanity
[21:07:32] <saml> ideally, host names will be mongo-master.rs01.webscale.info mongo-slave01.rs01.webscale.info mongo-slave02.rs01.webscale.info ... etc
[21:07:47] <saml> so, by looking at hostname, i know what it is about
[21:09:18] <saml> and if mongo-master.rs01 goes down for power outtage or something... dns gets re configured
[21:09:21] <saml> whoa
[21:09:29] <saml> someone must have done something like this
[21:10:40] <saml> or i'm trying to solve non existing problem
[21:14:23] <redondos> saml: http://docs.mongodb.org/manual/tutorial/adjust-replica-set-member-priority/
[21:20:34] <shlant1> hi all. I am having the most frustrating time trying to figure out why my new replica set is connecting to my old repricla set…. I have NO idea how it is connecting to my old replica set master…. any suggestions?
[21:21:33] <shlant1> this is the start script: https://github.com/MrMMorris/dockers/blob/master/mongodb/start_instance.sh
[21:21:59] <shlant1> I can 100% confirm that MONGO_MASTER is pointing to my NEW instance
[21:27:02] <shlant1> logs: https://gist.github.com/MrMMorris/0ed201df5cf3407e08b8
[21:39:19] <saml> redondos, i want dynamic dns where mongo-master.rs01 is bound to master node of rs01 always
[21:39:32] <saml> maybe AWS provides dynamic dns
[23:50:22] <StephenLynx> GothAlice when using gridfs, would caching a file on RAM provide a considerable benefit? You mentioned you use mongo as a cache, so I believe that gridFS would already store files on RAM if they are being accessed regularly?
[23:50:42] <GothAlice> You are correct.
[23:50:46] <StephenLynx> got it.
[23:51:35] <GothAlice> In my home array I have much more data than fits in RAM, in GridFS. Frequently accessed pages are prioritized for longer preservation than less frequently accessed pages, thus frequently accessed chunks + the indexes.
[23:52:14] <StephenLynx> hm.
[23:52:22] <StephenLynx> it seems like will work wonders for this site.
[23:52:39] <StephenLynx> it is an imageboard that will have to support over 46 requests per second.
[23:53:02] <StephenLynx> it is currently made in PHP and build its pages once content is changed to decrease load.
[23:53:20] <StephenLynx> I am going to use io.js and mongo and will store these generated pages on gridFS.
[23:53:29] <StephenLynx> as long as the images.
[23:53:40] <StephenLynx> and will stream them instead of loading on RAM.
[23:53:48] <StephenLynx> before writing to the http response.
[23:54:09] <StephenLynx> from what I understood, the bottleneck is being disk usage.
[23:54:20] <GothAlice> Just be sure to monitor page faults per second.
[23:57:30] <StephenLynx> if I eliminate those warnings I should be alright?
[23:57:39] <StephenLynx> or they happen in overloads?