PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Tuesday the 16th of February, 2016

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:57:48] <Freman> I don't believe that you really have to worry about "sql injection" with mongo. as long as you're not evaling things...
[01:04:47] <cheeser> Freman: pretty much
[01:05:23] <cheeser> but still, never pass unsanitized user inputs to the database
[01:05:44] <Freman> thought that was rule #1
[01:06:00] <cheeser> well, apparently some things still need to be said out loud ;)
[02:06:32] <Freman> yeh like "prelive is live - not staging not testing - it's an evironment to test the docker containers to make sure that they don't all throw 500's and break the live website when we switch the version" - doesn't stop them from testing in prelive
[06:44:36] <kryo_> hi, is there an atomic operation to pull an element from one array and push it to another array? (within the same document)
[09:15:53] <mylord> how does insert know if there’s already a matching record? does it check every single property for equivalence?
[09:33:13] <ravenx> can you have a sharded replica set
[09:33:34] <ravenx> or do i set up my mongodb cluster either replica-set OR sharded?
[09:57:18] <Derick> ravenx: each shard *should* be a replicasetet
[10:05:34] <ravenx> Derick: thanks
[10:08:31] <ravenx> I'm trying to optimize an insert i'm doing righ tnow
[10:08:47] <ravenx> it is from HDFS on another server, to my mongodb replicatset. At the moment I am doing it with this command:
[10:08:56] <ravenx> hadoop fs -cat /38/GB/Worth/of/Data | mongoimport --host rs2_production/server.server.com,server2.server.com --db cluster_category —collection all --upsert --stopOnError --numInsertionWorkers 6 --username {} --password {} --authenticationDatabase itsame
[10:09:05] <ravenx> and this takes anywhere from 8 hours to 18 hours.
[10:09:30] <ravenx> I tried turning off journalling and indexing as well, but it didnt' seem to make matters any better. does anyone have any ideas how i can improve the import speed?
[10:20:17] <Ben_1> good morning
[10:21:27] <ravenx> good morning
[10:21:34] <ravenx> do you want to help me with an import problem
[10:23:15] <Ben_1> cheeser: sorry I have left yesterday but it was hometime :P the point was, I'm using the async driver, robomongo GUI don't show any collection but the mongo shell does. Anyway, by using insertMany or insertOne the async callback is not called, so I think it's not a robomongo problem
[10:23:21] <Ben_1> ravenx: sure if I can
[10:23:26] <Ben_1> what's the problem?
[10:23:32] <ravenx> I'm trying to optimize an insert i'm doing righ tnow
[10:23:41] <ravenx> it is from HDFS on another server, to my mongodb replicatset. At the moment I am doing it with this command:
[10:23:45] <ravenx> hadoop fs -cat /38/GB/Worth/of/Data | mongoimport --host rs2_production/server.server.com,server2.server.com --db cluster_category —collection all --upsert --stopOnError --numInsertionWorkers 6 --username {} --password {} --authenticationDatabase itsame
[10:23:50] <ravenx> and it takes anywhere from 8 to 18 hours :/
[10:25:28] <Ben_1> the question is, is this a hadoop mongodb problem or is the network not fast enough. What is the time you want?
[10:25:58] <kurushiyama> ravenx: Ben_1 is right. We first need to identify what's going on.
[10:26:41] <ravenx> Ben_1, kurushiyama so i came prepared for that question
[10:26:58] <kurushiyama> ravenx: Version, storage engine?
[10:27:18] <ravenx> and i isolated it out as not a network, or a disk I/O problem. as i have copied the dataset from one server to another in 45 minutes.
[10:27:39] <ravenx> kurushiyama: Version is 3.2 and i'm storing it as a replicaset
[10:27:57] <kurushiyama> ravenx: ravenx so wiredTiger.
[10:28:01] <kurushiyama> ravenx: ;)
[10:28:05] <ravenx> yep :)
[10:29:35] <kurushiyama> ravenx: So copying the dataset leaves out a few things, not the last being that you copy bulk data instead of processing small parts ;) It could well still be the number of operations which are killing you.
[10:30:28] <ravenx> true, there is no processing of small parts in that equation, so that is not very telling. i just wanted to see if it was my network that was unbearably slow though.
[10:30:51] <ravenx> when you say the number of operations, you mean that the time it goes from HDFS into the mongodb, and mongo working on each document?
[10:31:07] <kurushiyama> ravenx: That's how import kind of works
[10:32:02] <kurushiyama> ravenx: "it takes anywhere from 8 to 18 hours" you do this regularly?
[10:32:15] <ravenx> hahaha, no, i have just been letting it run these past few days.
[10:32:23] <ravenx> just run it once befor ei leave work then look at it
[10:33:30] <ravenx> kurushiyama: would getting the HDFS data, splitting them up and then inserting them with 3-5 different instances of mongoimport help matters? or is that too naive a solution?
[10:34:05] <kurushiyama> ravenx: Yes, since we still would be talking of a single replica set.
[10:34:33] <Ben_1> ravenx: I think this would help
[10:34:51] <Ben_1> kurushiyama: do you have experience with mongodb async driver?
[10:35:05] <kurushiyama> Ben_1: As written yesterday: No.
[10:35:27] <Ben_1> oh sorry, can't remember xD
[10:35:54] <ravenx> Ben_1: wait you're saying that splitting up the insert between 3-5 different instances would help? o_o
[10:36:09] <Ben_1> I'm wondering why nobody is using this driver because I think it could push mongodbs performance a lot
[10:36:16] <ravenx> but i suppose i introduce an intermediate step which is to copy the hdfs data, splice it then run mongo import ;/
[10:37:43] <Ben_1> ravenx: maybe I'm naive too but yes I think that, but it's just a gut feeling
[10:37:48] <kurushiyama> ravenx: You might want to change the writeConcern for the operations. Add --write-concern "{w:0,j:true}" to the import.
[10:38:12] <ravenx> kurushiyama: let me read up on that
[10:38:14] <kurushiyama> It would help if the bottlenec was the JSON parsing.
[10:38:18] <ravenx> in that example, are you setting it as 0?
[10:39:43] <kurushiyama> ravenx: What you basically do is say: "Acknowledge the receipt of the document, parse it for it's validity, write the according insert operation to the journal and return". The default would be to do this on the majority of replica set members.
[10:41:03] <kurushiyama> Actually, I suspect the bottleneck to be the JSON serialization.
[10:41:19] <kurushiyama> And the pipe.
[10:41:26] <ravenx> kurushiyama: json serialization is when it turns JSON to BSON?
[10:41:53] <kurushiyama> Nope, when hadoop cat fs does it's job.
[10:42:01] <Ben_1> thought JSON to BSOn is just String.getBytes()
[10:42:12] <kurushiyama> Ben_1: nope
[10:42:18] <ravenx> ah, so json serialiation is when hadoop cat fs does its job
[10:42:20] <ravenx> gotcha
[10:42:49] <kurushiyama> Ben_1: have a look into the driver. BSON has some intricacies.
[10:43:14] <Ben_1> yah I'm reading bsonspec.org right now
[10:44:35] <ravenx> i am still not too sure what --jsonArray does
[10:44:37] <ravenx> o_O
[10:45:18] <kurushiyama> ravenx: and pipes have limited throughput, too, though this should be negligible. --jsonArray Does not apply for you.
[10:45:34] <kurushiyama> ravenx: It is s shortcut for very small imports.
[10:47:04] <ravenx> kurushiyama: thanksl
[10:47:17] <ravenx> kurushiyama: when you talk about pipes, you m ean that '|' thing in between my hadoop fs and mongoimport right?
[10:47:25] <kurushiyama> ravenx: Yes.
[10:48:55] <ravenx> hm...so could it be that the UNIX pipe is the thing that's making this go down south?
[10:50:02] <kurushiyama> ravenx: Have you already created indices on said collcetion? No, the pipe is good for more throughput that the network... Dont worry, was just my method of ruling out possibilities. Thinking with my fingers
[10:50:47] <ravenx> kurushiyama: no worries!
[10:50:54] <ravenx> kurushiyama: i have not created an indeciecs on the said collection,
[10:53:56] <kurushiyama> ravenx: I dont know poo about hadoop (need to change that). But we can rule out it is a MongoDB problem. Have you had any startup warnings from MongoDB? Open file limits, etc...
[10:54:54] <ravenx> nope
[10:54:56] <kurushiyama> ravenx: connect to the primary and have a look into local.startup_log
[10:55:05] <ravenx> wait, so you're kinda sure that this isn't a mongodb problem?
[10:55:22] <ravenx> kurushiyama: i don't think i've seen any as of recent.
[10:55:25] <kurushiyama> ravenx: No. I want to make sure.
[10:55:48] <kurushiyama> ravenx: Just for your and my peace of mind ;)
[10:55:51] <ravenx> :)
[10:56:15] <ravenx> im' kinda new to mongo...i just started up mongo and...how do i look into local.startup_log?
[10:56:31] <kurushiyama> connect to the primary
[10:57:43] <kurushiyama> then "use local; db.startup_log.find().sort({startTime:-1}).limit(1)"
[10:57:57] <kurushiyama> Paste it somewhere like pastebin.com
[10:59:26] <ravenx> Error: error: { "$err" : "not authorized for query on local.startup_log", "code" : 13 }
[10:59:30] <ravenx> hm apparently i'm not authorized.
[10:59:34] <ravenx> weird, the sys admin ddint' give me acces
[10:59:49] <ravenx> gah -_-
[11:03:50] <kurushiyama> ravenx: Get that data, and an access to MMS would not hurt, either.
[11:07:13] <ravenx> what is MMS?
[11:07:24] <ravenx> kurushiyama: yeah i will need to talk to our sysadmin, but he is away on vacation -_-
[11:07:27] <ravenx> that is how i got assigned this proglme.
[11:07:30] <ravenx> problem*
[11:09:33] <kurushiyama> <consultant mode="smartypants">There should be a documentation about the access credentials</consultant>
[11:11:07] <ravenx> i wonder if this'll help: https://docs.mongodb.org/ecosystem/tools/hadoop/
[11:13:36] <kurushiyama> ravenx: Giving the fact that I do not know poo about hadoop, I'd say yes if you can somehow manage to instruct hadoop to copy the database. In case creating a specialized tool is ok for you, you'd just need to iterate over the hadoop data points and use a bulk insert for MongoDB...
[11:13:45] <kurushiyama> ravenx: Should not bee to much work.
[11:15:02] <ravenx> kurushiyama: sweet
[11:15:05] <ravenx> thanks for your input kurushiyama
[11:16:57] <kurushiyama> ravenx: If you come back with some more hard data, we can check wether it is MongoDB. mongoimport is not a very fast tool, but 38GB/18h or 2Gb/h let me assume that the problem is somehwere else.
[11:17:24] <ravenx> kurushiyama: gotcha. once i get access i will try and work this out.
[11:17:35] <ravenx> kurushiyama: agreed, i've read that mongoimport is not the fastest tool around.
[11:19:22] <kurushiyama> ravenx: Actually, I have written spezialized import tools quite a bit. Should not be too complicated to adapt them if you talk Go.
[11:19:33] <ravenx> only python
[11:19:41] <ravenx> those tools you are talking about are simply just scripts?
[11:19:58] <kurushiyama> ravenx: No. Go code compiles to native binaries.
[11:21:58] <ravenx> another question i have is that if i am connecting to a replica set is it enough to just connect to the primary
[11:22:02] <ravenx> or do i have to list my entire cluster in there?
[11:22:06] <ravenx> there as in, my command
[11:22:41] <Derick> ravenx: you should have a majority of the hosts in your connection string
[11:22:50] <Derick> the primary doesn't always stay primary, and can go down too...
[11:22:54] <kurushiyama> ravenx: For the shell, you always connect to a single instance. For the driver, the replica set is in your connection string
[11:22:58] <ravenx> hm, but i thought that the secondaries just read from the primary's oplog
[11:23:22] <kurushiyama> ravenx: Right, but there is a read preference of "secondaryPreferred", for example.
[11:23:23] <ravenx> so if i use `mongoimport` then i use just one host? (primary?)
[11:23:28] <Derick> they do, but that's not really relevant ravenx
[11:23:32] <kurushiyama> ravenx: Thats correct
[11:23:54] <ravenx> though that shouldn't impact write performance
[11:24:04] <ravenx> unless i find out that it's writing to both primary and secondary
[11:24:14] <Derick> it can't write to secondaries
[11:24:17] <ravenx> phew
[11:24:18] <Derick> only primaries are writeable
[11:24:26] <ravenx> so then the command passing isnt' so much a problem too, then.
[11:24:52] <kurushiyama> Derick: shouldnt the driver autodiscover the remaining members when connecting to a replset? So basically the only reason to have multiple hosts in the connection string is there for the initial connection?
[11:25:02] <ravenx> okay at least that is ruled out. too many snake oil remedies on stackover flow sometimes.
[11:25:11] <Derick> kurushiyama: yes, it should - and yes, only for initial connection
[11:25:13] <ravenx> strnagest things i see at times, i swear.
[11:25:39] <ravenx> i would be quite busy if i tried everything they threw atm me
[11:26:19] <kurushiyama> Derick: Thanks, was getting cognitive dissonances ;)
[11:39:50] <dddh> there should be some way to shorten ObjectIds and obfuscate them
[12:11:15] <m1dnight_> Am I using mongo to store IRC logs. Is it non-idiomatic to use a collection per channel?
[12:12:34] <Ben_1> m1dnight_: I think it's ok because you can find logs really fast by get collections by channelname
[12:13:00] <m1dnight_> Ah. Okay. I was just wondering because my node.js app uses listCollections() to make a list of logged channels and system.indexes is in there.
[12:13:29] <m1dnight_> db.listCollections({name: {$ne: 'system.indexes'}}) works on my develop machine, but not on my server :o Ill have to investigate.
[12:14:55] <Derick> m1dnight_: sounds like a server version mismatch
[12:48:56] <ravenx> kurushiy_: im' able to get access to the db now :)
[13:14:28] <Ben_1> does someone know how to change the writeConcern in mongoDB async driver?
[13:15:08] <Ben_1> tried it with MongoClientSettings in combination with SocketSettings but I got an Exception
[13:15:08] <Ben_1> Exception in thread "main" java.lang.NullPointerException
[13:15:08] <Ben_1> at com.mongodb.connection.DefaultClusterFactory.create(DefaultClusterFactory.java:69)
[13:54:58] <Ben_1> cheeser: could this be the reason why my callback is not called? WriteConcern{w=null, wTimeout=null ms, fsync=null, journal=null
[15:26:20] <Ben_1> someone an idea why mongoDB have WriteConcern{w=null, wTimeout=null ms, fsync=null, journal=null}?
[15:27:19] <Ben_1> this is the default setting, I changed nothing what have to do with write concern
[15:30:05] <StephenLynx> driver defaults?
[15:30:17] <StephenLynx> the driver default is not equal to the database default, afaik.
[15:30:34] <StephenLynx> even if your driver defaults are w=null, the db will use 1
[15:33:07] <Ben_1> StephenLynx: the problem is that I use the official async driver and my callback is not called
[15:33:20] <StephenLynx> dunno
[15:33:28] <StephenLynx> i have no experience with the java driver.
[15:33:32] <StephenLynx> thats what you are using, right?
[15:33:35] <Ben_1> StephenLynx: so if the driver default is null, it's a really weird behavior for an async driver
[15:33:37] <StephenLynx> cheeser might know something.
[15:33:47] <Ben_1> yes I'm using Java
[15:33:50] <StephenLynx> he develops it, i guess
[15:34:02] <Ben_1> StephenLynx: no he does not use the async driver, that's what he said yesterday
[15:34:12] <StephenLynx> welp
[15:34:24] <StephenLynx> gg no re
[15:34:38] <Ben_1> but maybe I can mail jeff yemin
[15:39:50] <cheeser> Ben_1: I haven't worked with the async driver. i'm not sure why you'd be getting that write concern. asking the others now.
[15:40:10] <Ben_1> cheeser: I know that's why I did not ask you
[15:40:21] <cheeser> :)
[15:40:31] <cheeser> i'll do my best to track something down for you though
[15:43:16] <Ben_1> thx :P the problem occurs just in the asynchronous driver, the synchronous is returning WriteConcern{w=1, wTimeout=null ms, fsync=null, journal=null} so as StephenLynx said it's a driver default. The problem is, I can not change that default so my callback will be never called.
[15:44:08] <Ben_1> But I think it's a wrong configuration or something because if it's an bug inside the driver, someone else should have noticed this problem
[15:52:11] <cheeser> Ben_1: sync and async have the same default WriteConcern, fwiw.
[15:52:47] <Ben_1> I don't know but my output says async have null and sync have 1
[15:54:13] <Ben_1> System.out.println("DEBUG MESSAGE (writeConcern mongoClient): " + mongoClient.getSettings().getWriteConcern()); <<< async and returns WriteConcern{w=null, wTimeout=null ms, fsync=null, journal=null
[15:54:34] <Ben_1> System.out.println("DEBUG MESSAGE (writeConcern mongoClient): " + mongoClient.getWriteConcern()); <<< sync and returns WriteConcern{w=1, wTimeout=null ms, fsync=null, journal=null
[15:55:12] <Ben_1> using the mongodb-driver-async-3.2.2 from the maven repository but also tried 3.2.1 and it does not work either
[15:59:46] <cheeser> Ben_1: that `w=null` tells the server to use the default write concern as configured on the server. unless you've changed that config, the default is acknowledged writes.
[16:00:18] <cheeser> so that default on the driver should be fine. that null means that the driver shouldn't send any write concern and let the server use its default.
[16:01:43] <Ben_1> cheeser: ok thanks, I will check the servers default even if I didn't changed something
[16:02:52] <cheeser> try this test: https://gist.github.com/evanchooly/424b242d291925ef7baa
[16:04:58] <cheeser> even on an unacknowledge write the callback is still called: https://gist.github.com/evanchooly/9d3ab75d5269e994c6d3
[16:05:40] <Ben_1> mh then I have a nother problem sigh
[16:06:36] <Ben_1> the simple test returns: WriteConcern{w=null, wTimeout=null ms, fsync=null, journal=null
[16:06:54] <Ben_1> now I try the other one
[16:09:07] <Ben_1> cheeser: you are right, your example works! BUT a private variable with anonymous innertype won't work
[16:09:28] <cheeser> come again?
[16:09:51] <Ben_1> wait I will show you an example
[16:12:42] <cheeser> that'd be helpful
[16:15:36] <Ben_1> wait I think I have the problem, if the throwable is empty all system.out.printlns after the printStacktrace won't generate output.
[16:17:55] <DrPeeper> good morning!
[16:18:32] <Ben_1> mh never thought that I won't see an exception in that callback. thx for your help cheeser so it's no mongodb problem but my stupidity :P
[16:18:39] <kurushiyama> DrPeeper: Well, its early evening here, but I take the word for the deed. ;)
[16:18:45] <Ben_1> good morning DrPeeper
[16:18:57] <cheeser> Ben_1: i prefer the term "inexperience." :D
[16:19:08] <DrPeeper> I'm going to be the 'noob of the day' :)
[16:19:18] <kurushiyama> DrPeeper: Let's see ;)
[16:19:43] <Ben_1> hehe yes that is the nicer articulation :P
[16:19:52] <DrPeeper> kurushiyama: I just decided to give mongo a shot after a long while using sql :/ So far it makes a lot more sense to me
[16:20:18] <DrPeeper> a bit of a learning curve with designing efficient schemas though
[16:20:46] <kurushiyama> DrPeeper: Both technologies have their use cases. But you are right with the modelling – that's the hardest part, probably.
[16:20:47] <Derick> less of a curve than learning proper normalisation
[16:23:09] <kurushiyama> DrPeeper: So, define your problem ;)
[16:23:34] <DrPeeper> it isn't so much of a problem right now as it is unlearning the table/column approach
[16:24:54] <DrPeeper> for example, I know what types of data will be residing inside the collection; however, I still expect the db to scream at me about 'that field doesn't exist' or 'you are trying to put use a string where we expect an int'
[16:25:54] <cheeser> mongo won't care, typically
[16:28:46] <DrPeeper> cheeser: yeah.. but I want it to care :(
[16:28:47] <DrPeeper> haha
[16:29:11] <cheeser> you can do some basic schema validation in 3.2. not sure if it's enough for what you want.
[16:29:21] <cheeser> typically that's an application concern on top of mongodb
[16:31:59] <kurushiyama> DrPeeper: I couldn't agree more with cheeser: Input validation belongs to your application, for a ton of reasons.
[16:56:22] <DrPeeper> I think I get it
[16:58:14] <DrPeeper> I believe that embedding will be ideal
[16:58:33] <DrPeeper> now to build some kind of magic xml->mongo translator! :D
[17:13:20] <DrPeeper> StephenLynx: why
[17:14:17] <StephenLynx> because XML is cancer.
[17:15:29] <DrPeeper> i've heard a lot of people don't like it
[17:38:11] <magicantler> is there a limit to # of indices that should be used?
[17:46:14] <vane-> Hi all, I apologize if this is a dumb question, if I perform map-reduce within MongoDB, does it delete the data it aggregates? or does it just aggregate?
[18:39:57] <deathanchor> just aggregated as long as you don't try to write the output to the same docs :D
[18:41:41] <vane-> deathanchor, I thought so, thanks a bunch!
[18:42:51] <cheeser> or the same collection
[21:07:53] <noahfx> hello o/
[21:08:51] <noahfx> did you ever have problems with a looped mongorestore ?
[21:12:01] <joannac> what do you mean by "looped"?
[21:13:21] <noahfx> joannac: I am restoring a collection, it starts, gets to 100% restored and then start all over again
[21:13:34] <noahfx> with the same collection, same bson
[21:14:43] <noahfx> joannac: https://paste.debian.net/392419/
[21:17:40] <joannac> what do the server logs say if you set logLevel 1?
[21:38:49] <noahfx> joannac: https://paste.debian.net/392445/
[21:38:57] <noahfx> the collection name this time is kpis
[21:57:10] <noahfx> joannac: I disabled the replicaset and started to work as expected