[01:06:00] <cheeser> well, apparently some things still need to be said out loud ;)
[02:06:32] <Freman> yeh like "prelive is live - not staging not testing - it's an evironment to test the docker containers to make sure that they don't all throw 500's and break the live website when we switch the version" - doesn't stop them from testing in prelive
[06:44:36] <kryo_> hi, is there an atomic operation to pull an element from one array and push it to another array? (within the same document)
[09:15:53] <mylord> how does insert know if there’s already a matching record? does it check every single property for equivalence?
[09:33:13] <ravenx> can you have a sharded replica set
[09:33:34] <ravenx> or do i set up my mongodb cluster either replica-set OR sharded?
[09:57:18] <Derick> ravenx: each shard *should* be a replicasetet
[10:09:05] <ravenx> and this takes anywhere from 8 hours to 18 hours.
[10:09:30] <ravenx> I tried turning off journalling and indexing as well, but it didnt' seem to make matters any better. does anyone have any ideas how i can improve the import speed?
[10:21:34] <ravenx> do you want to help me with an import problem
[10:23:15] <Ben_1> cheeser: sorry I have left yesterday but it was hometime :P the point was, I'm using the async driver, robomongo GUI don't show any collection but the mongo shell does. Anyway, by using insertMany or insertOne the async callback is not called, so I think it's not a robomongo problem
[10:27:18] <ravenx> and i isolated it out as not a network, or a disk I/O problem. as i have copied the dataset from one server to another in 45 minutes.
[10:27:39] <ravenx> kurushiyama: Version is 3.2 and i'm storing it as a replicaset
[10:27:57] <kurushiyama> ravenx: ravenx so wiredTiger.
[10:29:35] <kurushiyama> ravenx: So copying the dataset leaves out a few things, not the last being that you copy bulk data instead of processing small parts ;) It could well still be the number of operations which are killing you.
[10:30:28] <ravenx> true, there is no processing of small parts in that equation, so that is not very telling. i just wanted to see if it was my network that was unbearably slow though.
[10:30:51] <ravenx> when you say the number of operations, you mean that the time it goes from HDFS into the mongodb, and mongo working on each document?
[10:31:07] <kurushiyama> ravenx: That's how import kind of works
[10:32:02] <kurushiyama> ravenx: "it takes anywhere from 8 to 18 hours" you do this regularly?
[10:32:15] <ravenx> hahaha, no, i have just been letting it run these past few days.
[10:32:23] <ravenx> just run it once befor ei leave work then look at it
[10:33:30] <ravenx> kurushiyama: would getting the HDFS data, splitting them up and then inserting them with 3-5 different instances of mongoimport help matters? or is that too naive a solution?
[10:34:05] <kurushiyama> ravenx: Yes, since we still would be talking of a single replica set.
[10:34:33] <Ben_1> ravenx: I think this would help
[10:34:51] <Ben_1> kurushiyama: do you have experience with mongodb async driver?
[10:35:05] <kurushiyama> Ben_1: As written yesterday: No.
[10:35:54] <ravenx> Ben_1: wait you're saying that splitting up the insert between 3-5 different instances would help? o_o
[10:36:09] <Ben_1> I'm wondering why nobody is using this driver because I think it could push mongodbs performance a lot
[10:36:16] <ravenx> but i suppose i introduce an intermediate step which is to copy the hdfs data, splice it then run mongo import ;/
[10:37:43] <Ben_1> ravenx: maybe I'm naive too but yes I think that, but it's just a gut feeling
[10:37:48] <kurushiyama> ravenx: You might want to change the writeConcern for the operations. Add --write-concern "{w:0,j:true}" to the import.
[10:38:12] <ravenx> kurushiyama: let me read up on that
[10:38:14] <kurushiyama> It would help if the bottlenec was the JSON parsing.
[10:38:18] <ravenx> in that example, are you setting it as 0?
[10:39:43] <kurushiyama> ravenx: What you basically do is say: "Acknowledge the receipt of the document, parse it for it's validity, write the according insert operation to the journal and return". The default would be to do this on the majority of replica set members.
[10:41:03] <kurushiyama> Actually, I suspect the bottleneck to be the JSON serialization.
[10:48:55] <ravenx> hm...so could it be that the UNIX pipe is the thing that's making this go down south?
[10:50:02] <kurushiyama> ravenx: Have you already created indices on said collcetion? No, the pipe is good for more throughput that the network... Dont worry, was just my method of ruling out possibilities. Thinking with my fingers
[10:50:54] <ravenx> kurushiyama: i have not created an indeciecs on the said collection,
[10:53:56] <kurushiyama> ravenx: I dont know poo about hadoop (need to change that). But we can rule out it is a MongoDB problem. Have you had any startup warnings from MongoDB? Open file limits, etc...
[11:09:33] <kurushiyama> <consultant mode="smartypants">There should be a documentation about the access credentials</consultant>
[11:11:07] <ravenx> i wonder if this'll help: https://docs.mongodb.org/ecosystem/tools/hadoop/
[11:13:36] <kurushiyama> ravenx: Giving the fact that I do not know poo about hadoop, I'd say yes if you can somehow manage to instruct hadoop to copy the database. In case creating a specialized tool is ok for you, you'd just need to iterate over the hadoop data points and use a bulk insert for MongoDB...
[11:13:45] <kurushiyama> ravenx: Should not bee to much work.
[11:15:05] <ravenx> thanks for your input kurushiyama
[11:16:57] <kurushiyama> ravenx: If you come back with some more hard data, we can check wether it is MongoDB. mongoimport is not a very fast tool, but 38GB/18h or 2Gb/h let me assume that the problem is somehwere else.
[11:17:24] <ravenx> kurushiyama: gotcha. once i get access i will try and work this out.
[11:17:35] <ravenx> kurushiyama: agreed, i've read that mongoimport is not the fastest tool around.
[11:19:22] <kurushiyama> ravenx: Actually, I have written spezialized import tools quite a bit. Should not be too complicated to adapt them if you talk Go.
[11:24:26] <ravenx> so then the command passing isnt' so much a problem too, then.
[11:24:52] <kurushiyama> Derick: shouldnt the driver autodiscover the remaining members when connecting to a replset? So basically the only reason to have multiple hosts in the connection string is there for the initial connection?
[11:25:02] <ravenx> okay at least that is ruled out. too many snake oil remedies on stackover flow sometimes.
[11:25:11] <Derick> kurushiyama: yes, it should - and yes, only for initial connection
[11:25:13] <ravenx> strnagest things i see at times, i swear.
[11:25:39] <ravenx> i would be quite busy if i tried everything they threw atm me
[11:26:19] <kurushiyama> Derick: Thanks, was getting cognitive dissonances ;)
[11:39:50] <dddh> there should be some way to shorten ObjectIds and obfuscate them
[12:11:15] <m1dnight_> Am I using mongo to store IRC logs. Is it non-idiomatic to use a collection per channel?
[12:12:34] <Ben_1> m1dnight_: I think it's ok because you can find logs really fast by get collections by channelname
[12:13:00] <m1dnight_> Ah. Okay. I was just wondering because my node.js app uses listCollections() to make a list of logged channels and system.indexes is in there.
[12:13:29] <m1dnight_> db.listCollections({name: {$ne: 'system.indexes'}}) works on my develop machine, but not on my server :o Ill have to investigate.
[12:14:55] <Derick> m1dnight_: sounds like a server version mismatch
[12:48:56] <ravenx> kurushiy_: im' able to get access to the db now :)
[13:14:28] <Ben_1> does someone know how to change the writeConcern in mongoDB async driver?
[13:15:08] <Ben_1> tried it with MongoClientSettings in combination with SocketSettings but I got an Exception
[13:15:08] <Ben_1> Exception in thread "main" java.lang.NullPointerException
[13:15:08] <Ben_1> at com.mongodb.connection.DefaultClusterFactory.create(DefaultClusterFactory.java:69)
[13:54:58] <Ben_1> cheeser: could this be the reason why my callback is not called? WriteConcern{w=null, wTimeout=null ms, fsync=null, journal=null
[15:26:20] <Ben_1> someone an idea why mongoDB have WriteConcern{w=null, wTimeout=null ms, fsync=null, journal=null}?
[15:27:19] <Ben_1> this is the default setting, I changed nothing what have to do with write concern
[15:40:31] <cheeser> i'll do my best to track something down for you though
[15:43:16] <Ben_1> thx :P the problem occurs just in the asynchronous driver, the synchronous is returning WriteConcern{w=1, wTimeout=null ms, fsync=null, journal=null} so as StephenLynx said it's a driver default. The problem is, I can not change that default so my callback will be never called.
[15:44:08] <Ben_1> But I think it's a wrong configuration or something because if it's an bug inside the driver, someone else should have noticed this problem
[15:52:11] <cheeser> Ben_1: sync and async have the same default WriteConcern, fwiw.
[15:52:47] <Ben_1> I don't know but my output says async have null and sync have 1
[15:55:12] <Ben_1> using the mongodb-driver-async-3.2.2 from the maven repository but also tried 3.2.1 and it does not work either
[15:59:46] <cheeser> Ben_1: that `w=null` tells the server to use the default write concern as configured on the server. unless you've changed that config, the default is acknowledged writes.
[16:00:18] <cheeser> so that default on the driver should be fine. that null means that the driver shouldn't send any write concern and let the server use its default.
[16:01:43] <Ben_1> cheeser: ok thanks, I will check the servers default even if I didn't changed something
[16:02:52] <cheeser> try this test: https://gist.github.com/evanchooly/424b242d291925ef7baa
[16:04:58] <cheeser> even on an unacknowledge write the callback is still called: https://gist.github.com/evanchooly/9d3ab75d5269e994c6d3
[16:05:40] <Ben_1> mh then I have a nother problem sigh
[16:06:36] <Ben_1> the simple test returns: WriteConcern{w=null, wTimeout=null ms, fsync=null, journal=null
[16:18:32] <Ben_1> mh never thought that I won't see an exception in that callback. thx for your help cheeser so it's no mongodb problem but my stupidity :P
[16:18:39] <kurushiyama> DrPeeper: Well, its early evening here, but I take the word for the deed. ;)
[16:19:43] <Ben_1> hehe yes that is the nicer articulation :P
[16:19:52] <DrPeeper> kurushiyama: I just decided to give mongo a shot after a long while using sql :/ So far it makes a lot more sense to me
[16:20:18] <DrPeeper> a bit of a learning curve with designing efficient schemas though
[16:20:46] <kurushiyama> DrPeeper: Both technologies have their use cases. But you are right with the modelling – that's the hardest part, probably.
[16:20:47] <Derick> less of a curve than learning proper normalisation
[16:23:09] <kurushiyama> DrPeeper: So, define your problem ;)
[16:23:34] <DrPeeper> it isn't so much of a problem right now as it is unlearning the table/column approach
[16:24:54] <DrPeeper> for example, I know what types of data will be residing inside the collection; however, I still expect the db to scream at me about 'that field doesn't exist' or 'you are trying to put use a string where we expect an int'
[17:15:29] <DrPeeper> i've heard a lot of people don't like it
[17:38:11] <magicantler> is there a limit to # of indices that should be used?
[17:46:14] <vane-> Hi all, I apologize if this is a dumb question, if I perform map-reduce within MongoDB, does it delete the data it aggregates? or does it just aggregate?
[18:39:57] <deathanchor> just aggregated as long as you don't try to write the output to the same docs :D
[18:41:41] <vane-> deathanchor, I thought so, thanks a bunch!