PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Thursday the 23rd of June, 2016

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[06:50:20] <sumi> hello
[08:40:00] <kurushiyama> Hello sumi !
[08:40:26] <sumi> hi krushiyama
[08:40:46] <sumi> nande kurushi?
[08:41:24] <kurushiyama> My Japanese is virtually non-existent ;)
[08:42:09] <kurushiyama> sumi Yours is obviously better than mine, as you are the first person who abbreviated the name correctly ;)
[08:42:23] <sumi> hahaha
[08:42:30] <sumi> my wife is japanese
[08:42:57] <kurushiyama> Ah. That's what I call an edge!
[08:43:07] <sumi> :D you're right
[08:44:34] <kurushiyama> sumi Can I be of assistance?
[08:44:50] <sumi> everything is allright :D
[08:57:32] <n3ssi3> Hey there :) Can someone help me import a large mongodump?
[09:29:47] <kurushiyama> n3ssi3 It would be helpful to describe the problem a bit.
[09:41:39] <n3ssi3> kurushiyama: I try to import a dump about 17gb big... It always stops with error "insertion error: EOF" so I google and found this: https://jira.mongodb.org/browse/TOOLS-939
[09:42:29] <kurushiyama> n3ssi3 Have you tried to reduce the batch size, then?
[09:43:23] <n3ssi3> I tried with many different high of batchSizes but it doesn't work
[09:44:24] <n3ssi3> lowest being 10
[09:45:14] <n3ssi3> I also tried downgrading mongodb since the ticket says it was introduced in the latest version :S
[09:53:19] <kurushiyama> n3ssi3 Are your documents close to 16MB?
[09:53:49] <kurushiyama> n3ssi3 If yes, reduce further. to 6, for example.
[09:53:57] <n3ssi3> kurushiyama: I now even tried batchSize 1, and it dies :(
[09:54:07] <n3ssi3> -rw-r--r--@ 1 nessie staff 631M Jun 21 18:29 Dropbox/BlackriverData/mongodump/amzontoolkitdump/amazontoolkit/sales_metaid_merchant_fba_BU2.bson
[09:54:19] <n3ssi3> the file that dies: 631M
[09:55:55] <kurushiyama> n3ssi3 That is... ...strange.
[09:57:08] <kurushiyama> n3ssi3 Is this the restore of a backup or a data migration?
[09:57:48] <n3ssi3> restore of a backup... I am a frontend dev, and Backend just sent me their dump...
[09:58:07] <n3ssi3> the problem they are in the US and asleep right now :p
[09:58:50] <kurushiyama> n3ssi3 I guess they did not verify their dump.
[09:59:13] <kurushiyama> n3ssi3 Actually, I assume a corrupted dump file.
[09:59:49] <kurushiyama> n3ssi3 which would be... unfortunate, to say the least.
[10:00:00] <kurushiyama> n3ssi3 Let me quickly check sth.
[10:00:31] <kurushiyama> n3ssi3 with which version was the dump _made_?
[10:00:39] <n3ssi3> I have time ;) Since I need to do performance tests , so without the data I am going to sit here and do nothing
[10:01:27] <n3ssi3> hmm I wouldn't know... or can I check that in the dump
[10:04:18] <kurushiyama> n3ssi3 Not sure about that. But let's say you try to restore with 2.6.X and the dump was made with 3.0.X (mongorestore and mongodump version), there might be problems.
[10:05:25] <n3ssi3> If anything they use the older version....
[10:05:42] <n3ssi3> kurushiyama: would it help if I downgrade to a 2.x version?
[10:06:06] <n3ssi3> I only use mongo for this one project, so I can uninstall and install/ delete everything
[10:06:06] <kurushiyama> n3ssi3 Well, you have to find out with which version the dump was made.
[10:06:36] <n3ssi3> kurushiyama: thanks :) I will check with US once they wake up
[10:06:37] <kurushiyama> n3ssi3 Then, you should install the same major.minor.X
[10:06:54] <kurushiyama> n3ssi3 Both server and tools.
[10:07:04] <n3ssi3> will do ;)
[10:07:18] <n3ssi3> thank you very much for the help
[10:07:34] <kurushiyama> n3ssi3 On the other hand: If you just need testdata: https://github.com/rueckstiess/mtools/wiki/mgenerate
[10:09:22] <n3ssi3> I try to use the same version as the US guys, if that doesn't work, I will try this ;)
[14:14:35] <OSInet> jmikola: ping
[14:16:50] <jmikola> pong
[14:17:17] <jmikola> OSInet: ^
[14:18:14] <OSInet> jmikola: hello. I'm working on MongoDb for Drupal 8. IIRC when we met at the Paris MUG you told me the next ODM would be using the low-level driver and advised me to probably do the same for Drupal, but I can't see a version using it on the ODM repo. Am I missing something ?
[14:20:20] <jmikola> howdy! development hasn't started on ODM 2.0. speaking with the other devs since, they will likely use https://github.com/mongodb/mongo-php-library (or a similar library) to abstract some basic commands, like index creation
[14:21:08] <jmikola> you may want to do the same. while the library does contain Database and Collection classes, many of the command wrappers are stand-alone classes that you can use directly if you so choose (e.g. https://github.com/mongodb/mongo-php-library/blob/master/src/Operation/CreateIndexes.php)
[14:21:30] <jmikola> that would at least save you the trouble of handling differ MongoDB server apis for the same operations
[14:21:45] <OSInet> thanks for the suggestion
[14:22:36] <jmikola> however, if you were writing some MongoDB glue to simply to writes and read operations, then it'd be easy to stick to the extension. the nuances with commands and GridFS are really the things you'd want the library for
[15:39:26] <direwolf> can i use $slice along with $addToSet
[15:57:18] <adrian_lc> hi, I'm launching a new replication slave and dunno why seem REPL [ReplicationExecutor] Error in heartbeat request to core:27017; HostUnreachable: HostUnreachable spamed in the log
[15:57:32] <adrian_lc> that node "core" is not added to the replica set conf or status
[15:57:45] <adrian_lc> but it's still trying to heartbeat
[15:58:12] <adrian_lc> how can I reset what seems to be a residual connection or something
[16:01:42] <adrian_lc> db.runCommand({connPoolStats: 1 }) does show a core:27017 under hosts
[16:18:35] <jayjo> I've asked a similar question before. If I have a document with many json separated by newline, and one of the lines is somehow causing problems with mongoimport, what's the best way to catch the line but continue with the import?
[16:21:01] <jayjo> Is mongoimport not really intended for this? When I tried to do it with pymongo, it was so painfully slow to write the documents one at a time. And even writing the documents in batches was not working quickly either
[16:21:23] <cheeser> i don't think mongoimport supports that.
[16:22:26] <jayjo> If you had a similar task, again about 15GB, would you write a script to do this data upload? At the rate I had it last time with pymongo the 15GB upload would've taken over 24 hours... not exactly sure but it was less than 1 GB an hour
[16:22:49] <jayjo> Which seems not totally reasonable
[16:59:47] <sinkmanu> hi
[17:29:03] <dino82> Uhh how many times does the BTree Bottom Up index need to run? This is the third time it's been [rsSync] Index: (2/3) BTree Bottom Up
[18:00:27] <orev> hi, I've been following this guide to install an app (which requires mongodb): https://help.ubnt.com/hc/en-us/articles/205146080-UniFi-Install-controller-software-to-CentOS One thing it says is that mongo requires 35GB of disk space, but I can't find any other mentions of those kind of requirements. Is that requirement correct? maybe changed with newer mongo releases?
[18:07:49] <StephenLynx> it doesn't require 35gb of disk, though.
[18:08:01] <StephenLynx> I got less than 25gb on my server and way less on my vms.
[18:09:27] <StephenLynx> https://docs.mongodb.com/manual/administration/production-notes/#hardware-considerations orev
[19:09:18] <orev> ok, thanks. I'm not sure what that web page it talking about then
[19:36:35] <kurushiyama> orev Get used to that. A lot of "specialists" with "in-depth knowledge" write a lot of bullpoo about MongoDB.
[19:46:31] <orev> yeah, that's everywhere on IT blogs. they are just guidelines which is why I was trying to verify.
[19:46:47] <orev> I got it to work on a small vm, so I also didn't want to just assume it was ok
[19:56:01] <jayjo> I wrote a script that is super slow with pymongo to upload documents. Anyone have an idea why it's so slow? https://bpaste.net/show/cf39a7515682
[20:04:05] <GothAlice> Hmm, is there an adulterated mime type for MongoDB-flavour JSON? application/json+mongo?
[20:07:24] <GothAlice> jayjo: Wrap that whole thing in a function, slap https://github.com/rkern/line_profiler somewhere (pip install line_profiler), and add the @profile decorator to your function. Save the whole thing somewhere (example.py) and run: kernprof -l example.py, then to look at the results run python -m line_profiler example.py.lprof
[20:07:46] <GothAlice> jayjo: There's "all sorts of concerning" in that code of yours, but that'll be the fastest way to identify if it's JSON loading, or actually saving that's taking so long.
[20:10:25] <StephenLynx> GothAlice, wouldn't that be bson?
[20:10:39] <StephenLynx> or mongo does something BEYOND bson?
[20:10:51] <GothAlice> StephenLynx: No, I'm speaking of https://docs.mongodb.com/manual/reference/mongodb-extended-json/
[20:11:03] <StephenLynx> ah
[20:11:43] <StephenLynx> that looks like valid json, though.
[20:11:54] <StephenLynx> or am I missing something?
[20:12:43] <GothAlice> It's JSON with implied meaning through use of $-prefixed keys. That's a "format" way beyond the general one JSON itself specifies. Typically one would use + and a custom format designator to mention the format in the mimetype you are delivering to clients. This preserves "yeah, this is JSON" matching, while also informing as to the structure layered on top.
[20:13:12] <StephenLynx> hm
[20:13:28] <GothAlice> (Or if there's an official one.)
[20:13:43] <StephenLynx> I really don't think there would be a standard for this.
[20:16:18] <GothAlice> StephenLynx: Consider, a client requests a resource with Accept: application/json+mongo, gets back {"_id": {"$oid": "…"}}. To JSON, the whole {"$oid": "…"} bit is meaningless, but given the +mongo context, it's an ObjectId instance. The same resource is requested with Accept: application/json, gets back {"_id": "…"} where the hex-encoded ID is returned as a string.
[20:16:59] <GothAlice> (Since the requesting client can't handle fancy MongoDB encoding, we "do the right thing" and return a value suitable for use as an _id in those cases.)
[20:17:09] <StephenLynx> well, a client can go fuck himself.
[20:17:12] <GothAlice> :P
[20:17:51] <StephenLynx> he will have to do a manual check on application code to see if the content is valid for his purpose.
[20:18:06] <StephenLynx> since there isn't a standard for mongo json.
[20:18:07] <GothAlice> Well, actually, this particular example is an "opaque value" to clients.
[20:18:29] <GothAlice> Uhh, MongoDB Extended JSON is pretty documented…
[20:18:40] <StephenLynx> not a standard.
[20:18:41] <GothAlice> I.e. how to interpret {"$oid": "…"} is unambiguous.
[20:18:46] <StephenLynx> now,
[20:18:54] <StephenLynx> there is a bigger issue here, though.
[20:18:59] <GothAlice> I'd be curious as to your definition of "a standard".
[20:19:03] <StephenLynx> why would one expose data directly like that?
[20:19:20] <GothAlice> You… seem to have read more into things than there actually is.
[20:19:29] <StephenLynx> ISO, RFC (or w/e its called)
[20:19:32] <StephenLynx> that's a standard.
[20:19:40] <GothAlice> No, those are standards bodies.
[20:19:42] <GothAlice> ¬_¬
[20:19:47] <StephenLynx> that define standard.
[20:19:56] <StephenLynx> if they don't define it as a standard, is not a standard.
[20:20:08] <GothAlice> (And to be specific, RFC is the _process_ used by the IETF.)
[20:20:54] <GothAlice> So, the document structuring standards I use at work aren't "standards"? Your definition is unduly specific to public, open, broadly accepted standards, but does nothing to invalidate use of the word for smaller systems.
[20:21:07] <GothAlice> How curious.
[20:21:07] <StephenLynx> no, they are not standards.
[20:21:11] <StephenLynx> they are conventions.
[20:22:20] <StephenLynx> not matter how well-documented and consistent it is, unless it is declared as a standard by a body, it isn't a standard.
[20:22:52] <GothAlice> Standard: a level of quality or attainment; quality, guideline, principle, or, oddly, flag or banner. Sorry, the three dictionaries I have on hand disagree with you. ;P
[20:23:10] <StephenLynx> kek
[20:23:36] <StephenLynx> you are arguing over semantics here.
[20:24:20] <GothAlice> (That's the noun form. The adjective form is still suitably broad: 1. the standard way of doing, normal, usual, typical, customary, conventional, or established. 2. the standard work on the subject: definitive, established, recognized, accepted, authoritative. Doesn't say it has to be a worldwide standard.) Regardless, it's not like I'm exposing MongoDB's built-in REST service, here.
[20:25:11] <GothAlice> I've just been discussing announcement of serialization format / encoding.
[20:25:30] <GothAlice> Announcement / negotiation.
[20:26:50] <StephenLynx> then it would be a standard.
[20:27:07] <StephenLynx> an unnecessary one, though.
[20:27:22] <StephenLynx> since you could just add a field and let whoever reading what it is.
[20:27:30] <GothAlice> Yeah, I have no idea what you're talking about at this point.
[20:27:35] <GothAlice> Sorry.
[20:27:48] <StephenLynx> {"mongojson":true,all other stuff here}
[20:28:05] <GothAlice> … so, why exactly would I replicate the functionality already provided by HTTP content type negotiation?
[20:28:06] <StephenLynx> or {"mongojson":true,data:stuff goes here}
[20:28:13] <GothAlice> Really; that's not a wheel that needs to be reinvented.
[20:28:24] <StephenLynx> eh? what negotiation?
[20:28:32] <StephenLynx> its just a header.
[20:28:37] <GothAlice> You may have missed the "client requests via Accept header" bit.
[20:29:06] <GothAlice> And given the response has an associated Content-Type header…
[20:29:21] <StephenLynx> you are talking about either sending extended json and regular json based on that header?
[20:31:15] <GothAlice> https://jira.mongodb.org/browse/DOCS-8162
[20:31:24] <GothAlice> StephenLynx: That's exactly what that header is for.
[20:31:30] <GothAlice> Content type negotiation.
[20:32:11] <GothAlice> (Well, Accept is. Content-Type in the return being a "from what you requested, this is what we gave you" response.)
[20:33:09] <GothAlice> StephenLynx: http://pretty-rfc.herokuapp.com/RFC2616#header.accept
[20:33:18] <StephenLynx> why can't you expect the client to always understand extended json?
[20:34:02] <StephenLynx> is not like the difference between compressed and uncompressed text.
[20:34:06] <GothAlice> Because I have no control over the client, there are many, and each has different capabilities. The same reason I can't expect all browsers to have localstorage enabled: clients differ.
[20:34:24] <StephenLynx> its just json.
[20:34:33] <StephenLynx> if it understands vanilla json, it understands extended json.
[20:34:53] <StephenLynx> it won't magically implement types that work in the same way mongo have just because its extended.
[20:35:14] <StephenLynx> it will always have to, one way or the other, work around its own types.
[20:37:43] <StephenLynx> what can happen and what is the alternative?
[20:38:15] <StephenLynx> assuming the client can clearly differentiate between vanilla and extended.
[20:38:30] <GothAlice> The alternative is to, as I've been describing, hand back content to clients in a format they explicitly request. Which they can. Via Accept.
[20:39:47] <GothAlice> My only question has revolved around the "correct" label to use to mark MongoDB extended formatting, but I'll use "+mongo" for now pending the result of that ticket.
[20:45:58] <StephenLynx> and what other format can you send than a string?
[20:46:20] <StephenLynx> what im trying to tell you is that its just an implementation detail.
[20:46:28] <StephenLynx> clients can understand both.
[20:46:33] <StephenLynx> because its just json
[20:46:34] <GothAlice> YAML, Bencode, BSON itself, Msgpack, …
[20:46:48] <GothAlice> I… have a lot I support, so that list can go on for a while.
[20:47:28] <StephenLynx> I am talking about picking between json and extended json.
[20:47:38] <StephenLynx> not between json and something completely unrelated to json.
[20:47:46] <GothAlice> YAML having native support for extended types, there's no particular need to inform the mimetype that an ObjectId is an ObjectId by virtue of convention, since it's explicit in the data. "This is an ObjectId, use class bson.ObjectId to instantiate this."
[20:47:58] <GothAlice> JSON, not so much.
[20:48:02] <StephenLynx> wait, wait
[20:48:14] <StephenLynx> you are expecting to work with undefined models?
[20:48:28] <GothAlice> Please define "undefined model".
[20:48:35] <StephenLynx> the client should know beforehand what exactly each field is.
[20:48:42] <StephenLynx> what type to use, what is his purpose.
[20:48:54] <StephenLynx> you don't have to tell its an ObjectId because it should know beforehand.
[20:49:03] <GothAlice> Should it?
[20:49:50] <StephenLynx> unless you are working on some kind of db thing or middleware.
[20:49:55] <StephenLynx> not exactly a client.
[20:50:50] <GothAlice> I'm writing a universal REST framework on top of my web framework. My Marrow Mongo package contains the adapter glue needed to REST-ify collections and documents, generically. It has no clue what the final use for these things will be.
[20:51:20] <GothAlice> Sure, there are "schemas" in a loose sense. It doesn't care, it _can't_ care. It must be generic.
[20:51:21] <StephenLynx> holy jesus
[20:51:26] <StephenLynx> nevermind it.
[20:51:29] <StephenLynx> I don't want to know.
[20:52:02] <StephenLynx> I clenched my butthole so hard it achieved critical mass on "REST framework on top of my web framework".
[20:52:13] <GothAlice> https://github.com/marrow/mongo/blob/develop/web/db/mongo/collection.py#L30 < just getting started extracting bits from work and cleaning them up, of course. There be dragons here.
[20:52:28] <StephenLynx> no, it IS a dragon on itself.
[20:52:48] <GothAlice> My web framework is a few hundred lines of code. The REST framework on top of it? https://github.com/marrow/web.dispatch.resource/blob/develop/web/dispatch/resource/helper.py#L11-L24 < that's the developer side of it.
[20:53:10] <GothAlice> The actual REST dispatcher being less than 100 lines of code: https://github.com/marrow/web.dispatch.resource/blob/develop/web/dispatch/resource/dispatch.py#L18-L108
[20:53:28] <GothAlice> I'm not working on Django or anything.
[20:53:36] <StephenLynx> kek
[20:53:55] <StephenLynx> I don't see a difference, though.
[20:54:07] <StephenLynx> it is a web framework abstracting a high-level surface.
[20:54:28] <GothAlice> Have a benchmark just on template generation performance: https://github.com/marrow/cinje/wiki/Benchmarks#python-34 < hint: I'm 17,000x more performant.
[20:54:41] <GothAlice> And… that's just templates.
[20:55:04] <StephenLynx> you are talking about python. no one uses python for performance and certainly the maintainers don't give a hoot about it.
[20:55:09] <GothAlice> Hahahahahahaha.
[20:55:11] <StephenLynx> more performant than what?
[20:55:27] <GothAlice> Okay, never mind then. This has been unproductive long enough. ;P
[20:55:30] <StephenLynx> than engines?
[20:55:41] <StephenLynx> another high-level abstractions of high-level surfaces?