PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Friday the 20th of May, 2016

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:33:43] <GothAlice> "when a change is made, i rebuild the entire collection for that account" (which may be 10,000 to 40,000 records) — oh my gods, someone I had attempted to assist from elsewhere is unable to grasp the badness. :'(
[00:49:20] <zsoc> If I want update to update/upsert the entire object... do I just give the entire obj as the update parameter?
[00:53:13] <GothAlice> zsoc: If you want to replace an object with a different version, then yes in the update case. The upsert case is complicated by virtue of the fact that you provide it update operations ($set and friends) as well as a "default document" to use if none can be found to update.
[00:53:54] <GothAlice> And by object, I only assume we're referring to "document".
[00:54:08] <zsoc> A json compatible obj that will end up being a document on insertion, sure lol
[00:54:33] <GothAlice> Yeah, MongoDB isn't a JSON store. (Somewhat important to keep in mind. ;)
[00:54:47] <zsoc> It's a rather simplified use case, either the condition/query exists (just checking the _id) or it doesn't, if it exists, replace whatever is there with whatever I got, just keeping the _id the same.. if it doesn't, upsert the new one.
[00:54:56] <GothAlice> To be JSON-compatible requires encoding. See: https://docs.mongodb.com/manual/reference/mongodb-extended-json/
[00:55:11] <zsoc> wait, so to use .update() I need to be giving it an actual document?
[00:55:17] <GothAlice> No, but you can.
[00:55:23] <zsoc> that... no that doesn't seem right. Otherwise why would upsert exist lol.
[00:55:34] <GothAlice> Ref: https://docs.mongodb.com/manual/tutorial/modify-documents/#replace-the-document
[00:55:58] <GothAlice> Upsert = update if exists, otherwise create. Update = modify existing document in-place, or replace the entire document.
[00:56:02] <zsoc> Got it
[00:56:08] <zsoc> thank you
[00:56:11] <GothAlice> :)
[00:57:11] <zsoc> How does the document._doc magic work? Like _doc is a magic getter somehow for document fields?
[00:57:19] <GothAlice> What's _doc?
[00:57:33] <GothAlice> Additionally, what language and database access layer are you using?
[00:57:44] <zsoc> When I have a document, I can document._id, but that isn't where it actually is. It's in document._doc._id
[00:57:45] <GothAlice> The only field MognoDB adds automatically to documents is _id.
[00:57:47] <zsoc> Node
[00:57:51] <GothAlice> Mongoose?
[00:57:54] <zsoc> uh
[00:58:03] <zsoc> Well that depends, is _doc a mongoose thing then? Lol
[00:58:12] <zsoc> Otherwise no I won't admit to it xD
[00:58:14] <GothAlice> Yeah, you're using Mongoose. It's terrible. ;)
[00:58:33] <GothAlice> Also entirely unnecessary, and cause of the majority of third-party issues people come here with. ;P
[00:58:45] <zsoc> eh, i mean it hasn't caused too much of a headache
[00:59:05] <GothAlice> For "schemas" try out https://docs.mongodb.com/manual/core/document-validation/ < this will enforce them regardless of what is connecting and how, for example, from the mongo shell.
[00:59:23] <GothAlice> Document validation is also infinitely more powerful than simple type schemas; you can do anything you can do with a .find() in one.
[00:59:38] <GothAlice> (For example, conditional field validation, bounds checking, etc., etc.)
[00:59:42] <zsoc> absolutely the only reason i use it is because i'd like to have models exporting modules in /models and then require them at the top of my routes and just have access with Model.<whatever>. I'd be completely fine without mongoose and just do that natively if there's a reasonable way.
[01:00:24] <zsoc> I mean.. can I create like a Validation object? Or it doesn't work that way?
[01:00:30] <zsoc> I'll read the doc
[01:00:38] <GothAlice> In my own web framework the "db" object is exposed through a context object passed to each controller end-point. (Typically as part of the class constructor to avoid littering references absolutely _everywhere_.) This makes access uniform, and not require "import dependencies" (include/require/import/whatever ;) littered throughout the code, or use of auto-loaders.
[01:01:21] <zsoc> I feel like only importing the 'models' i require makes each route a little cleaner, but realistically it's only for readability, not optimization.
[01:01:23] <GothAlice> Thus in an endpoint I can: self._ctx.db.foo.find(…)
[01:01:50] <GothAlice> (I work in Python, you might be able to tell. ;)
[01:01:53] <zsoc> That feels very globally
[01:01:56] <zsoc> yes i can tell hehe
[01:02:00] <GothAlice> Ah, but it's not "global".
[01:02:30] <GothAlice> The context is initially created at the "application" scope (and you can have multiple "applications" per process), then a subclass is instantiated for each request. These are request-local, not global.
[01:02:34] <GothAlice> Multi-tenancy FTW.
[01:02:39] <zsoc> ah
[01:02:52] <GothAlice> :)
[01:03:49] <zsoc> I mean I can just use middleware to stuff the db object into the routes... that feels a little forced tho if the route doesn't require it. I guess I can do it on a per-route basis, just a fetchDB(db) sort of thing
[01:04:00] <zsoc> it would get rid of all that nasty importing
[01:06:16] <GothAlice> Ah, yeah, the context attributes are generally lazily evaluated, but the DB is set up once on startup, so it can't really be lazily handled. (Unlike, say, user session data in the context, which is entirely lazily loaded on-demand.) However, it's a singular context object being passed around. Touch less dirty than trying to inject potentially multiple things for the application overall.
[01:06:54] <zsoc> that seems sane
[01:26:26] <GothAlice> zsoc: https://github.com/marrow/WebCore/blob/develop/example/hello.py ;) (Though the simplest hello world doesn't really need a class at all. https://github.com/marrow/WebCore/blob/develop/example/basic.py ;^)
[01:26:53] <zsoc> That is the most python block of code i'd ever laid eyes on
[01:27:05] <GothAlice> XD
[01:27:43] <GothAlice> from web.core import Application; Application("Hi.").serve('wsgiref') ← the simplest possible example.
[01:27:44] <zsoc> but fascinating, thanks
[01:27:47] <GothAlice> ;^P
[01:28:01] <GothAlice> Yeah, no worries; just wanted to show how the context is passed.
[01:28:39] <GothAlice> (As an argument to bare functions, or as part of class construction for classes.)
[01:30:39] <GothAlice> I'm struggling with an internal debate over handling of multiple database connections, though. :/ My initial idea was to have an attribute proxy be context.db, with an attribute per DB connection and one marked as the "default" that other unknown attributes get routed to. Thus context.db.foo is the "foo" collection on the default, a la context.db.default.foo, but you can still context.db.other.foo when needed.
[01:30:45] <GothAlice> The debate is about the potential for confusion, there.
[01:31:27] <GothAlice> (But various extensions pretty much require the concept of a default connection.)
[07:20:59] <nofxx> Nobody on #mongoid, so let me ask: By default, find(foo) will raise NotFound if foo isn't on DB. To change that, there's 'raise_not_found = false'. So, there should be a find!() , as per ruby practices to force errors back where I want. Am I missing something?
[07:21:25] <nofxx> There's #find but there's no #find!
[07:51:39] <kurushiyama> GothAlice: Regarding Mongoose have a look at SO. I have the feeling that the MongoDB tag and the Mongoose tag could be merged for all practical purposes.
[10:25:29] <Industrial> Hi.
[10:25:54] <Industrial> I have set up a GeoJSON database with lat/lon and dutch postal codes with 7.5mil docs
[10:26:29] <Industrial> I'm using the query `.findOne({ loc: { $near: { $geometry: { type: 'Point', coordinates: [ +longitude, +latitude ] }, $maxDistance: 40, $minDistance: 0, } } })`
[10:26:54] <Industrial> the 40 here is arbitrary, and my current problem is that it doesn't return results for a lot of the lat/lon pairs that I am feeding it
[10:27:07] <Industrial> So my question is:
[10:27:30] <Industrial> Can I up the maxDistance, so that it returns more results, but have the actual distance of each found result in the output documents
[10:27:44] <Industrial> so that I can sort by distance, and only take the closest one
[10:38:22] <kurushiyama> Industrial: I think so. Lemme check
[10:40:11] <kurushiyama> Industrial: As far as I can see, it is sorted by default.
[10:40:50] <kurushiyama> Industrial: I'd guess that the order of $maxDistance and $minDistance defines the sort order.
[10:43:04] <kurushiyama> Industrial: So I'd try `.find({ loc: { $near: { $geometry: { type: 'Point', coordinates: [ +longitude, +latitude ] }, $minDistance:0, $maxDistance:40}}}).limit(1)`
[10:43:27] <Industrial> right
[10:43:36] <Industrial> Then it's the quality of my database that is the issue :-)
[10:43:54] <kurushiyama> Industrial: btw, you are aware of the fact that distances are measured in _meters_?
[10:44:01] <Industrial> I've checked the google API and its EXPENSIVE. Thay are talking about 50 cents per 1000 requests, but I have 2 million on a sunday :P
[10:44:52] <Industrial> I'm doing this data enrichment to get stuff synced to an eventual ElasticSearch so I can display it in Kibana so hopefully we can pull KPI's from it :-)
[10:45:25] <kurushiyama> Industrial: If your ROI for 1000 requests is not even 50 cents, I would be more concerned of my business model.
[10:46:37] <Industrial> kurushiyama: I have no idea about that :-) I'm only an external engineer/consultant that got pulled into the team
[10:46:40] <quellhorst> kurushiyama: not really a helpful response
[10:46:46] <Industrial> thats ok:D
[10:47:59] <kurushiyama> quellhorst: Really? We started talking about money. Personally, I'd say that 50 cents per 1k requests is extremely cheap, since running an own server/cluster + dev costs would allow me to run millions of requests and I'd still be off cheaper.
[10:49:03] <kurushiyama> Industrial: I got it right that you use MongoDB as a "cache" for results you get from Google, then?
[10:50:12] <quellhorst> kurushiyama: maybe i just joined mid stream
[10:50:54] <kurushiyama> quellhorst: I could argue that it was not a useful comment, then ;P
[10:51:01] <quellhorst> lol
[10:51:18] <quellhorst> but depending on the request $.50 could be very high
[10:55:43] <Industrial> kurushiyama: nah I'm not using google right now, I'm using my own postal code implementation with mongodb :-)
[10:56:13] <Industrial> So it just costs one docker container :D
[10:56:39] <Industrial> We''ve decided to go for a more detailed data set instead, and use the .find().limit(1) query you suggested
[10:56:39] <kurushiyama> quellhorst: Hence my concern. I agree that this can be very high. But so is the implementation of an own solution. And with Google API, you do not need to worry about data storage and such, let alone that you do not have development costs. Say you only use a medium sized single server. Let's say $100/month? That alone would be worth 200k requests, and you still had implementation and possibly even administration costs... ;)
[10:57:10] <kurushiyama> Industrial: Well, if you have the data you need, it should be fine.
[10:57:14] <Industrial> yeah
[10:58:09] <kurushiyama> Industrial: If you could give some sample data, maybe we can optimize your use cases.
[10:58:29] <kurushiyama> s/your use/for your use/
[11:03:40] <quellhorst> kurushiyama: have you seen compose.io before?
[11:04:39] <kurushiyama> quellhorst: I am a DBA. Go figure ;)
[11:04:48] <quellhorst> they are around $31/month :)
[11:04:54] <kurushiyama> For 1GB
[11:05:01] <quellhorst> right
[11:05:10] <kurushiyama> Scale yourself to death.
[11:06:06] <kurushiyama> At 40GB, it is cheaper to run a replset with 800GB SSDs at Softlayer – who arent exactly known to be cheap.
[11:06:54] <quellhorst> i found softlayer to be better than owning your own dedicated hardware in a datacenter
[11:07:08] <quellhorst> and their prices can be negotiated
[11:07:25] <quellhorst> at least they were before ibm purchased them
[11:08:37] <kurushiyama> I love them. They aren't exactly cheap, but more than worth their money. Actually, the SSDs they built into their servers are the best I have _ever_ seen. And they give themselves a hard time to find the best of breed for about everything.
[11:09:14] <quellhorst> yeah, my only complaint with them is bandwidth pricing
[11:09:25] <quellhorst> but i can handle that elsewhere
[11:10:04] <kurushiyama> Well, that does not apply to my customers, really. A lot of aggregations are done, and the result sets tend to be rather small.
[11:11:53] <kurushiyama> Does compose.io offer MMS/CM access? Without metrics, it is not even worth considering for the most fanciful use cases.
[11:14:19] <quellhorst> they have some web gui to show you metrics. I have only used them for dev / staging so far.
[11:15:20] <quellhorst> larger stuff i have setup with ansible
[11:15:48] <kurushiyama> Well, I either use docker and /or puppet, but it is about the same approach.
[11:19:17] <kurushiyama> quellhorst: compose.io's rabbitMQ prices are quite good, on the other hand.
[11:24:00] <quellhorst> kurushiyama: haven't used em for that yet
[13:37:03] <m0rpho> hi guys, can I modify an ISODate() and only change the hour while leaving everything as it is?
[13:37:20] <m0rpho> and all that withing an aggregation pipeline
[13:37:29] <mick27> folks, running mongodb on ec2, how much are you guys provisioning the iops of the volumes ? I have a lot of write, but I don't know what to give for the data and the journal volumes.
[13:43:44] <kurushiyama> mick27: Without MMS/CM stats on IOPS, that is hard to say.
[14:03:28] <mick27> kurushiyama: can I get some stats without mms ? can't pay for that right now
[14:04:23] <cheeser> mongostat
[14:04:36] <kurushiyama> mick27: It is free for 30 days.
[14:05:52] <mick27> oh great, I missed that
[14:07:34] <kurushiyama> mick27: From my point of view, unless you have very dynamic scaling needs, running MongoDB on AWS is a quite tricky and in comparison costly thing.
[14:12:12] <mick27> kurushiyama: not my choise, the whole infra is on aws
[14:44:02] <kurushiyama> mick27: Ouch.
[17:36:46] <devloko> silviolucenajuni: oia aí mah!
[18:56:42] <ThisIsDog> I know mongo currently has cursor timeouts as a use the default or nothing setting: http://api.mongodb.com/python/current/faq.html#how-do-i-change-the-timeout-value-for-cursors
[18:56:47] <ThisIsDog> Are there any plans to change this?
[19:05:58] <tsturzl> Can a 2dsphere index hold a poly?
[19:08:10] <ThisIsDog> Or is the documentation just wrong? https://jira.mongodb.org/browse/SERVER-8188
[19:29:57] <kurushiyama> ThisIsDog: Good question, actually
[19:37:30] <kurushiyama> ThisIsDog: Hm, I'd guess it is a documentation thingy. In mgo (which is pretty up to date with this kind of stuff), the cursor timeout is configurable: https://godoc.org/labix.org/v2/mgo#Session.SetCursorTimeout So it seems to boil down to the driver maintainer. GothAlice could know more, iirc.
[19:40:18] <ThisIsDog> kurushiyama: Cool, that could make things a lot easier for my use case. We were about to submit a support request to try to get more information on it.
[19:42:23] <kurushiyama> ThisIsDog: Well, I do not get the application for a configurable timeout (you could always have an app side timeout closing the cursor). Turning it off for very long operations was always sufficient for me.
[19:44:03] <kurushiyama> ThisIsDog: Nevertheless, requesting clarification in the docs surely makes sense.
[19:44:05] <ThisIsDog> I'm iterating at least 150,000 records at once, and I don't want to assume I won't get an exception and leave the cursor hanging.
[19:45:07] <kurushiyama> ThisIsDog: If you could get an exception while iterating, you should either close the cursor or skip to the next item in the result set.
[19:45:23] <kurushiyama> ThisIsDog: Which language do you use?
[19:45:27] <ThisIsDog> Python
[19:45:34] <kurushiyama> ThisIsDog: Right.
[19:46:52] <kurushiyama> ThisIsDog: So you could wrap your stuff in an try/except block.
[19:47:32] <kurushiyama> ThisIsDog: Albeit I have to admit that my Python is a bit... well, "rusty" would be quite an understatement.
[19:49:04] <ThisIsDog> I can put in a try catch, but that doesn't help if any one of those elements takes longer than 10 minutes to process.
[19:50:02] <ThisIsDog> Well if I handle it manually it would work
[19:50:38] <ThisIsDog> I really don't like handling it manually when just changing the value would mean I don't have to worry about absolutely anything going wrong with my loop
[19:51:59] <ThisIsDog> I also then have to worry about future developers understanding this particular loop is a special case and how to handle it
[19:52:34] <kurushiyama> ThisIsDog: The wonder of comments... ;P
[19:53:28] <kurushiyama> Basically, you'd turn off the timeout, and catch other exceptions, closing the cursor before exiting. That was the idea...
[19:53:34] <kurushiyama> ThisIsDog: ^
[19:55:10] <ThisIsDog> I get it, but wouldn't it be better if I could just set the value as an argument?
[19:55:43] <ThisIsDog> instead of having to leave those comments, wrap everything in another block, etc
[19:55:53] <ThisIsDog> it just isn't ideal
[19:56:46] <kurushiyama> ThisIsDog: Well, the problem here is that you'd have the same problem again: You might run into your timeout before processing is finished. So I'd rather turn of the timeout altogether and just make sure the cursor is closed when the scope is left.
[19:58:32] <kurushiyama> Which, admittedly, is easy to say from a Go perspective: "defer cursor.Close()" => AutoClose on scope exit, no matter what (well, except SIGKILL, iirc)
[20:03:08] <ThisIsDog> kurushiyama: That is fair. I'm trying to think through the possible edge cases. I don't really like wrapping large portions of code in a blanket try catch, but I think as long as I log the exceptions properly so I can follow up on them it would be fine.
[20:05:32] <kurushiyama> ThisIsDog: Well, t/c's should always be as narrow as possible. And I'd be as narrow as possible especially in this case. There might be exceptions which call for a simple skip to the next element of the result set, where others call for closing the cursor and returning unsuccessful.
[20:07:52] <ThisIsDog> kurushiyama: Is there a good way to kill a hanging cursor? I think I would feel more comfortable working through it and minimizing the t/c's if I have that.
[20:08:22] <kurushiyama> ThisIsDog: Define "hanging".
[20:08:58] <ThisIsDog> I accidentally didn't put a line in the try catch, got an exception before going through all the elements and was unable to call close
[20:09:08] <ThisIsDog> kurushiyama:
[20:10:28] <kurushiyama> Well, you should test your code against a test env – restart the server if you have to ;)
[20:11:30] <kurushiyama> ThisIsDog: Other than that: https://jira.mongodb.org/browse/SERVER-3090
[20:14:15] <kurushiyama> ThisIsDog: For the "except" block: simply call the cursors close method.
[20:16:17] <ThisIsDog> kurushiyama: Right, but sometimes real data finds edge cases you didn't discover on the test enviornment. I think I'm comfortable enough now to just not use timeout like you suggested.
[20:17:30] <kurushiyama> But by any means file that request – inconsistent docs are horrible!
[20:22:14] <ThisIsDog> kurushiyama: I forwarded it to supervisor to follow up on. Thanks for your help =)
[20:22:32] <kurushiyama> ThisIsDog: You are welcome!
[20:45:39] <the_f0ster> does adding a replica set to another host, use the default port of 27017 on that host for access ? I have 27017 open on my secondary nodes.. but when I add them to my replica set on the primary, the config isn't being recieved
[20:47:32] <kurushiyama> the_f0ster: can the hostnames be resolved?
[20:49:07] <the_f0ster> kurushiyama: yeah.. but I only have ssh and mongo ports open on the secondary hosts.. for instance I can ssh to the secondary hosts from primary, but "ping" would not work
[20:49:12] <the_f0ster> so not sure what mongo is doing internally
[20:50:02] <kurushiyama> but you can ssh to, lets say "ssh secondariesHostNameYouUsedInRsAdd" from the primary and vice versa?
[20:50:39] <kurushiyama> the_f0ster: ^
[20:52:34] <the_f0ster> kurushiyama: yeah
[20:53:24] <kurushiyama> the_f0ster: Well, you'd need to check that from and to every node. That should be well enough.
[20:54:02] <the_f0ster> i only have 3 rs members.. and they can all talk to each other over 22 and 27017
[20:54:34] <the_f0ster> but when I add a secondary from primary by setting config.. i get "stateStr" : "STARTUP", "configVersion" : -2
[20:56:26] <kurushiyama> the_f0ster: so you did an rs.initiate, and then you added the members?
[21:24:13] <uuanton> new mms monitor update changed things
[21:24:36] <uuanton> i use to add existing mongodb primary and add whole replica
[21:24:53] <uuanton> now it only see it as only host without secondaries
[21:27:00] <the_f0ster> kurushiyama: yes
[21:27:27] <the_f0ster> i only have tcp open on 22, 28017, and 27017-27030 on all of the replica set nodes