pmxbot IRC Log Viewer

[00:47:35] <Forbidd3n> Hey everyone. Quick question. I am loading a csv file in into multiple collections. If the first few rows of the csv file are duplicate, would it be best to create an array of all the data and then loop the array to insert data and only do one check if line exists or would it be alright to do the calls to the collection to see if the value exists, if so then grab id for next collection insert?

[00:48:49] <Forbidd3n> School,Subject,Grade,Teacher - the school, subject and grade are repeated for each different teacher name

[00:48:54] <Forbidd3n> for example

[01:05:00] <sumobob> quick question, I have a model with an array of objects with start: Date, and end: Date, given another start and end date how can i find all the models where the query start and end are not in range

[01:09:49] <Forbidd3n> Hey everyone. Quick question. I am loading a csv file in into multiple collections. If the first few rows of the csv file are duplicate, would it be best to create an array of all the data and then loop the array to insert data and only do one check if line exists or would it be alright to do the calls to the collection to see if the value exists, if so then grab id for next collection insert?

[01:09:51] <Forbidd3n> School,Subject,Grade,Teacher - the school, subject and grade are repeated for each different teacher name

[07:22:40] <chris|> when I stop the balancer, will that interrupt a currently active balancer run or will the balancer complete the current run and then not get scheduled again?

[07:25:27] <kurushiyama> chris| The other way around. If you try to stop the balancer, this operation will block until the current balancer run is done. https://docs.mongodb.com/manual/tutorial/manage-sharded-cluster-balancer/#disable-the-balancer

[07:25:40] <kurushiyama> chris| However, _why_ do you want to disable the balancer?

[07:27:12] <chris|> kurushiyama: for backups

[07:27:38] <kurushiyama> chris| Please clarify. Your backups should be unaffected by the balancer.

[07:28:31] <chris|> kurushiyama: stopping the balancer is documented for sharded clusters: https://docs.mongodb.com/manual/tutorial/backup-sharded-cluster-with-filesystem-snapshots/#disable-the-balancer

[07:29:49] <kurushiyama> chris| Ah, FS snapshots. Well, in this case it makes sense. However, in the long run, I rather suggest to set a balancer window and take the snapshots automatically _outside_ said windows.

[07:31:29] <chris|> yes, I am thinking about doing that

[07:46:53] <cppking> hello, my mongodb cluster balancer is not working , https://bpaste.net/show/258289e040b4 anybody help me ?

[07:47:52] <kurushiyama> cppking It is working. Otherwise there would be no errors at all.

[07:54:12] <chris|> the docs say "operations have propagated to the majority of voting nodes" for majority write concern. That seems strange to me, should that not read "majority of replica set members" ?

[07:57:59] <cppking> kurushiyama: you mean balancer in normal state? but current time it is not working ?

[07:58:37] <kurushiyama> cppking It has errors. While this surely is a thing to monitor and debug, it does run and already has done some balancing.

[07:58:43] <hentis> Morning

[07:59:17] <cppking> kurushiyama: Can I just ignore that errors ?

[08:00:45] <cppking> kurushiyama: At least I can think balancer is in normal state

[08:00:55] <kurushiyama> cppking " this surely is a thing to monitor and debug" Is there something unclear in this sentence? ;P

[08:01:08] <kurushiyama> Good morning, hentis !

[08:01:23] <hentis> We're using TTL expiry on some collections, but I need to be able to restore old backups, without loosing that data. As part of our DR policy we use mongodb cloud bacup services, which restores data to /data. Is there a way to run collmod or disable the TTL expiry so we can run collmod to remove the expiry from the index until such time as we want to expire the data manually ?

[08:01:32] <cppking> kurushiyama: where error log placed ?

[08:01:57] <hentis> konnichiwa kurushiyama san :)

[08:02:52] <kurushiyama> cppking That depends on your configuration and OS. Please see the docs for that.

[08:04:03] <hentis> Or can the background process that expires the data be disabled without having to collmod the index ?

[08:04:16] <kurushiyama> hentis Uhm... not sure, but shouldn't it be "konnichi wa, Kurushiyama-san". My Japanese is worse than my French, and that means something. ;)

[08:04:25] <cppking> kurushiyama: ok , thx, bye

[08:04:40] <kurushiyama> hentis Uhmmmm

[08:05:10] <kurushiyama> hentis Digging in my brain, I know there is a TTL background process.

[08:06:30] <kurushiyama> hentis If it is a temporary thingy, I would probably disable the TLL monitor altogether: https://docs.mongodb.com/manual/reference/parameters/#param.ttlMonitorEnabled

[08:09:46] <hentis> kurushiyama: haha my japanese is very crappy .. been way to long since I've active used any of it ... nevermind typed it. thanks for info :)

[08:10:17] <kurushiyama> hentis You are welcom, hentis-sama ;)

[08:42:13] <Ange7> Hey all

[08:43:29] <Ange7> is it possible to group by regex with mongo ? i have documents with id field which contains for example : AA12345, AA54321, BB12345, BB54321... i want count number id started by « AA » and count number id started by « BB »

[09:02:23] <eugenmayer> hello. Suddenlty, installing mongdob 3.2.7 under wheezy does not work, since it depends on libc >= 2.14 and wheezy does only have 2.13 - how can this happen since this packages are wheezy specific ?

[09:02:46] <eugenmayer> i am using deb http://repo.mongodb.org/apt/debian wheezy/mongodb-org/stable main as the source

[09:03:26] <eugenmayer> http://pastebin.com/FzbyxrfM

[09:03:45] <eugenmayer> following this guide : https://docs.mongodb.com/manual/tutorial/install-mongodb-on-debian/

[09:06:20] <eugenmayer> seems like 3.2.6 is going well with wheezy, while 3.2.7 does not. Is 3.2.7 jessie only - eventhough there is no support for jessie yet ( according to the docs? )

[09:26:43] <ramnes> so, what about that article posted everywhere

[09:27:01] <ramnes> is MongoDB going to react in a way or another or just take bashing?

[09:28:31] <eugenmayer> ramnes: what are you talking about?

[09:29:01] <ramnes> eugenmayer: https://news.ycombinator.com/item?id=11857674

[09:49:15] <Derick> ramnes: I've flagged it internally

[09:50:08] <r01010010> hello folks

[09:50:16] <Zelest> hi

[09:52:14] <ramnes> okay Derick

[09:52:19] <ramnes> keep me up to date

[09:53:55] <Derick> ramnes: Sorry, I can't.

[09:58:22] <ramnes> ohnoes

[09:59:09] <ramnes> just interested if you guys are going to consider this as an issue or not, and if not, how you'll justify

[10:09:12] <kurushiyama> Funny post: https://engineering.meteor.com/mongodb-queries-dont-always-return-all-matching-documents-654b6594a827#.uhdnfhnzd

[10:10:44] <kurushiyama> I immediately tried to beat the shit out of my test cluster, with not a single operation failing.

[10:17:56] <ramnes> kurushiyama: this is exactly what I was talking about just before :p

[10:20:28] <kurushiyama> ramnes Just came in. What was it? I am still giving myself a hard time to reproduce the problem, with 10M docs and 10 concurrent goroutines modifying the data, and an equal amount of goroutines reading the data...

[10:21:37] <ramnes> kurushiyama: not much, I just asked if MongoDB was going to do anything and Derick said that he flagged it internally

[10:22:27] <kurushiyama> Well, I'll publish the test program I wrote in the next few hours.

[10:22:56] <ramnes> so many MongoDB haters tho

[10:22:59] <ramnes> pretty impressive

[10:48:34] <kurushiyama> ramnes Well, I do not care too much. Often enough it is just clickbait.

[10:52:07] <Derick> kurushiyama: the post highlights some concerns - hacker news is something else

[10:52:40] <ramnes> this

[10:53:22] <kurushiyama> @Derick Well, I'll let you know when I got my little test project finished. It generates noise and reordering on purpose, and very much so.

[10:54:13] <Derick> cool

[10:57:41] <Derick> https://en.wikipedia.org/wiki/Isolation_%28database_systems%29#Phantom_reads is interesting too

[11:02:12] <kurushiyama> Derick Another thing I have thought of (because the technical details were not disclosed) is that he may have run into SERVER-3645 in some way. But admittedly, I haven't had the time to think it through, yet.

[11:04:12] <Derick> kurushiyama: it'd be great if you can publically post your musings at some point :)

[11:04:42] <kurushiyama> Derick Sorry, not a native speaker: "musings"?

[11:04:50] <Derick> deliberations

[11:05:13] <Derick> http://www.dictionary.com/browse/musings :-)

[11:07:45] <kurushiyama> Derick Ah! You bet. That is the reason why I started the project. Actually, it is part of a bigger project, called mongodb-bc. Go figure ;P

[11:08:10] <Derick> -bc ?

[11:08:19] <kurushiyama> bull-c...

[11:08:38] <Zelest> bull-chat! chatting about the stock market!

[11:09:04] <kurushiyama> I decided to make a hobby of dissecting hater posts. With counter-proofs.

[11:09:12] <Derick> oh, I like that

[11:10:18] <Zelest> kurushiyama, mongodb stores lots of stuff in RAM and doesn't fsync! so if the power cuts, I lose my data!

[11:10:25] <Zelest> kurushiyama, please dissect :)

[11:10:44] <kurushiyama> Darn, I created so much noise that my mongos was OOMed

[11:11:25] <kurushiyama> Zelest Will do. Please link an appropriate post and it gets on my list.

[11:11:45] <Zelest> hehe, only forwarding comments my "friends" use on IRC to hate on mongo :(

[11:14:31] <kurushiyama> tl;dr No, you do not, since if you need to make sure that data was persisted, you should have a proper write concern, only after which is satisfied the operation returns.

[11:14:51] <Zelest> :)

[11:26:40] <eugenmayer> Derick: is it known that the wheezy 3.2.7 builds are broken?

[11:27:41] <Derick> broken how? they worked here

[11:27:45] <Derick> (for me, that is)

[11:29:47] <Derick> got to go now, back in ~2 hours.

[11:30:35] <Derick> eugenmayer: there is: https://jira.mongodb.org/browse/SERVER-24459

[11:34:19] <eugenmayer> Derick: not that one. 3.2.7 depends on libc >= 2.14 .. but wheezy has 2.13

[11:50:52] <eugenmayer> Derick: maybe there is jessie planned and someone created wrong packages?>

[11:51:37] <eugenmayer> https://packages.debian.org/de/wheezy/libc6 .. max 2.13 and no backports. Upgrading libc6 without backports is madness, so i do not think this is suggested

[11:51:52] <eugenmayer> jessie though has 2.19 https://packages.debian.org/de/jessie/libc6

[11:57:32] <manishr> Hello Guys

[11:57:56] <manishr> I am not able to create a superuser with full access in Mongo 3.2

[11:58:03] <manishr> Can someone plz help?

[12:07:09] <urbanizator> Is there a way to lock a document and if there is, how?

[12:07:18] <kurushiyama> Derick Hm, I was able to reproduce the problem when doing a count on the cursor.

[12:07:44] <kurushiyama> urbanizator findAndModify, with a field called lock.

[12:09:02] <urbanizator> Do you have a documantation for that or an example?

[12:09:08] <urbanizator> https://docs.mongodb.com/manual/reference/command/findAndModify/

[12:09:11] <urbanizator> this says nothing about a lock

[12:11:08] <kurushiyama> urbanizator Give me a sec

[12:12:16] <urbanizator> Okay :)

[12:17:52] <sector_0> what's the most mature, feature rich GUI interface for MongoDB?

[12:18:15] <kurushiyama> urbanizator http://hastebin.com/ciwequyaho.coffee

[12:18:52] <kurushiyama> sector_0 For what prupose?

[12:19:18] <sector_0> kurushiyama, for administration

[12:19:29] <kurushiyama> sector_0 Shell.

[12:19:30] <urbanizator> mongochef?

[12:19:37] <kurushiyama> sector_0 No gui. but most feature rich.

[12:20:30] <kurushiyama> urbanizator Any questions about the locking?

[12:21:30] <sector_0> kurushiyama, so the shell is the accepted way to administrate a MongoDB server?

[12:21:48] <urbanizator> kurushiyama nope it's all clear now

[12:21:55] <kurushiyama> sector_0 Accepted? Well, I'd say it is the preferred way.

[12:22:26] <kurushiyama> sector_0 For high level stuff, there is CloudManager.

[12:23:56] <kurushiyama> Derick But as far as I can see, count actually utilizes the index. So, imho, it is not the document which is in undefined state, but the index.

[12:25:56] <kurushiyama> Derick Which should be provable by doing an iteration over the query result, no?

[12:26:58] <urbanizator> one more question

[12:27:03] <urbanizator> the locked collumn

[12:27:05] <urbanizator> is that your own thing?

[12:30:26] <kurushiyama> urbanizator The locked _field_ (terminology is important, there is no such thing as a column in MongoDB) is one you define, yes.

[12:31:01] <urbanizator> K

[13:10:04] <kurushiyama> ramnes Derick This is the first version. Still has some problems, but it proves that it is a problem of .count() https://github.com/mwmahlberg/incomplete-returns/blob/master/main.go

[13:10:40] <sector_0> on a scale of 1 to 10 how easy is it to go from noob to pro in MongoDb

[13:10:45] <ramnes> kurushiyama: what's your conclusion then?

[13:10:59] <ramnes> kurushiyama: is it really an issue?

[13:12:45] <Derick> kurushiyama: afaik count is a special index scan or something

[13:13:09] <ramnes> yeah

[13:13:34] <kurushiyama> I do not trust count() too much, anyway, since as of now it is rather useless in sharded environments (see SERVER-3645) for details. As far as I can see, it is only the index which might be in an undefined state. The program freezes all modifying operations as soon as a mismatch of expected documents and the number returned by .count() is found. As of now, iterating through the result set always gives the correct number of documents.

[13:14:09] <Derick> kurushiyama: for that, you need to talk to one of the server guys really - I don't know

[13:14:33] <ramnes> kurushiyama: so basically, you can't reproduce, right?

[13:14:56] <kurushiyama> The programs has some locking problems, which I need to debug. Once a mismatch is found, the locking causes basically every count to fail in a way. But that may be worth a look in itself.

[13:15:21] <kurushiyama> ramnes I can reproduce that .count() might return wrong information in write-heavy environments.

[13:15:39] <kurushiyama> ramnes I can not reproduce that the actual result set is wrong.

[13:15:48] <kurushiyama> ramnes Quite the contrary.

[13:16:16] <ramnes> yes

[13:16:34] <ramnes> but count always was unreliable

[13:16:38] <ramnes> that's not new

[13:16:52] <ramnes> what I'm interested in is reproducing that article bug

[13:18:12] <kurushiyama> ramnes Well, the author did not disclose how he measured the problem. My guess is that he used .count()

[13:19:57] <ramnes> he didn't talk about his topology neither

[13:20:09] <ramnes> could be a read on a secondary while writes are on the primary

[13:20:13] <ramnes> anyway

[13:20:14] <ramnes> clickbait

[13:22:02] <kurushiyama> I don't know. He might simply be mislead in a way that he assumes .count to be reliable.

[13:22:13] <kurushiyama> ramnes ^

[13:23:18] <kurushiyama> Derick I will spice up and document the code a bit. Is there a way to reach out for your "sever guys"?

[13:29:49] <jayjo_> I have figured out that aws cli or s3cmd cannot stream recursively (at least effectively). What would be the workaround for a problem like this? Do I need to write a script that recursively finds files in the bucket and streams to mongoimport?

[13:31:04] <saml> you want to insert json files from http server to mongodb?

[13:31:13] <saml> i'd just write a script

[13:31:39] <saml> for js in all_json(): mongo.yourdb.yourcollection.insert(js)

[13:34:17] <kali> jayjo_: command line interface tools for s3 are often not that helpful for scripting. many scripting languages have better interface (python, node, ...). whatever you're confortable with

[13:40:48] <jayjo_> so I should just roll my own in python and use boto

[13:41:20] <jayjo_> OK - that's not a problem

[13:44:04] <kali> jayjo_: yeah, python will do the job. as usual

[13:44:20] <kali> boring as hell, but it delivers

[13:55:36] <kurushiyama> As for the verification of MongoDB having an undefined state under certain conditions. Well, I actually found out that I did not have a locking problem at all. What happens after the first count() returns a wrong value, all concurrent writers are paused. As soon as the result set is verified, those concurrent writers are unpaused _simultaneously_ (almost), resulting in a lot of concurrent writes again, causing .count() to return a wrong

[13:55:36] <kurushiyama> value and it restarts from there. So, my conclusion holds and is even enforced: It is a problem of the way .count() behaves, not in the result sets.

[13:58:49] <kurushiyama> ramnes ^

[13:59:47] <ramnes> thanks kurushiyama

[14:00:04] <ramnes> which version/storage are you using?

[14:00:58] <kurushiyama> ramnes As of now, I used 3.2.3 with WT.

[14:01:30] <kurushiyama> ramnes As soon as I have updated the code, you can download and run it against your env, if you want to.

[14:01:33] <ramnes> does MMAPv1 give the same results?

[14:01:52] <ramnes> just curious

[14:01:55] <kurushiyama> ramnes Havent had the time to verify that, as of now.

[14:01:59] <ramnes> okay

[14:02:06] <ramnes> that'd be interesting

[14:02:22] <kurushiyama> ramnes What OS would you need? Or do you do Go?

[14:05:35] <ramnes> I'm on Gentoo and I can run Go

[14:05:59] <kurushiyama> Gimme a sec, then

[14:10:56] <kurushiyama> ramnes Assuming you have installed Go and a proper GOPATH set and all, it should work with "go get https://github.com/mwmahlberg/incomplete-returns", after which $GOPATH/bin/incomplete-returns would be the executable. Append "-h" for the options.

[14:11:27] <kurushiyama> if you wish, I can provide a linux executable, which might be easier.

[14:11:42] <ramnes> great

[14:12:01] <ramnes> I'll give it a shoot this evening

[14:12:45] <kurushiyama> Well, I might update it, so you better run "go get -u ..." once in a while.

[15:07:49] <dino82> It appears mms support is down? I can't send in a ticket.

[15:13:37] <dino82> Apparently you can't change the user the mms-agent run as? mongod vs mongodb

[17:01:30] <bratner> Hi people! Where does one go to ask C++ driver questions?

[17:04:51] <cheeser> here or the mongodb-users list

[17:05:10] <cheeser> probably the list would be better. i don't think any of the devs are here and most of the users here use other languages

[17:05:22] <StephenLynx> yeah, few people use the C or C++ drivers

[17:05:43] <StephenLynx> and I don't think their devs hang around here

[17:33:40] <_1009> Hey all - just wanted to check: for shards, is it recommended to put config nodes on a dedicated node? I noticed that deployments in MongoDB Cloud do so by default. If that is the case, why?

[17:34:51] <cheeser> you don't want those nodes bogged down with external loads.

[17:37:55] <GothAlice> _1009: Configuration servers are queried by mongos query routers to determine which actual data nodes contain the data being queried. It's important that these answers are fast.

[17:51:06] <_1009> OK, thanks - in my case, all data is sharded by id, but I'm always searching by geolocation - does it still hit the config router then, as it will query all nodes for (their part) of the geolocation?

[17:59:29] <kurushiyama> _1009 Shard key not in query => monogos has no clue about the key ranges of the queried field => Query is sent to all shards.

[18:14:50] <_1009> Awesome, that's what I thought - thanks so much!

[19:21:37] <alexi5> wazzup

[19:23:33] <speedio> i have the following mongoose schema for my mongo db http://dpaste.com/0FEPM52 , and im trying to add data to it with the following command http://dpaste.com/2DWQVA6 , but for some reason the nested objects wont get added.. what am i doing wrong?

[19:24:22] <StephenLynx> i recommend you don't use mongoose.

[19:24:27] <StephenLynx> its plain broken.

[19:28:56] <speedio> StephenLynx: ok what should i use instead?

[19:29:16] <StephenLynx> mongodb

[19:30:17] <speedio> ok, ive setup my api to use mongoose now so that would mean alot of work todo.. will keep that in mind for my next project..

[19:30:27] <speedio> so mongoose doesnt support nested objects?

[19:32:41] <StephenLynx> don't know.

[19:32:46] <StephenLynx> I know its performance is piss poor

[19:32:56] <StephenLynx> less than 20% of the native driver

[19:33:04] <StephenLynx> it doesn't handle dates properly

[19:33:17] <StephenLynx> and uses a crappy cast on ObjectIds

[19:33:34] <StephenLynx> not to mention that it forces you to use mongo wrong.

[19:33:41] <StephenLynx> in a way it wasn't designed to be used

[19:34:26] <alexi5> whats up with the nodejs community been hear some bad new s about a library consisting of only a string padding function that broke alot of apps

[19:34:35] <StephenLynx> hipsters

[19:34:43] <StephenLynx> that community is rotten to the core

[19:34:58] <StephenLynx> and woudn't be able to code their way out of a torn paper bag

[19:35:13] <StephenLynx> libs like that are not that rare

[19:35:55] <StephenLynx> they have a SERIOUS case of IH syndrome

[19:36:16] <alexi5> what is the IH syndrome ?

[19:36:31] <StephenLynx> the opposite of NIH.

[19:36:47] <StephenLynx> "I'll be damned if I write any code, dependencies should do EVERYTHING"

[19:36:47] <alexi5> ok

[19:37:40] <StephenLynx> and that lib was leftpad

[19:37:45] <StephenLynx> hilarious stuff

[19:38:01] <StephenLynx> even more considering the circumstances that triggered it

[19:38:20] <StephenLynx> that was npm being huge assholes sucking corporate dick.

[19:39:02] <StephenLynx> and used some "no feefees hurt" rule to forcibly transfer a module to a company

[19:39:06] <cheeser> they were abiding by the law

[19:39:20] <StephenLynx> there was no law for that.

[19:39:30] <cheeser> trademark law

[19:39:36] <StephenLynx> it doesn't apply to that.

[19:39:46] <cheeser> the lawyers apparently thought otherwise.

[19:39:51] <StephenLynx> let me look for the thing

[19:39:59] <StephenLynx> you happen to have a link for it?

[19:43:19] <cheeser> i don't. i tried to stop thinking about it weeks ago.

[19:44:01] <StephenLynx> https://medium.com/@mproberts/a-discussion-about-the-breaking-of-the-internet-3d4d2a83aa4d#.qm1qa0jf1

[19:44:23] <StephenLynx> >Our goal is to make publishing and installing packages as frictionless as possible. In this case, we believe that most users who would come across a kik package, would reasonably expect it to be related to kik.com. In this context, transferring ownership of these two package names achieves that goal.

[19:44:48] <StephenLynx> npm never mention anything about "hm, yeah, that could give us legal trouble"

[19:45:54] <StephenLynx> and fuck what lawyers think, they are just greedy rats.

[19:46:14] <StephenLynx> they would sue you for looking at them the wrong way

[20:28:27] <Forbidd3n> How do I store _id in another collection->document in mongo? refId = new MongoId('the id of the reference I want to store') or do I just store this - refId = 'the id of the reference I want to store'

[20:28:39] <Forbidd3n> so the mongoid object or the mongoid string

[20:31:51] <alexi5> have you guys heard of developers making statements to use postgresql to store document data given that it has json and jsonb field types that can be queried ?

[20:32:55] <StephenLynx> yeah

[20:33:12] <StephenLynx> it takes a lot of ignorance to think mongo just stores json.

[20:33:41] <StephenLynx> and that its features are limited by what and how it stores data.

[20:35:19] <alexi5> what I like about mongodb is the ability to map application objects to documents easily

[20:36:20] <alexi5> what I relized is that systems can use both a document database and relational based on their requirements instead of staying with just one

[20:55:55] <Forbidd3n> is it possible to set unique on nested items?

[20:56:01] <StephenLynx> yes

[20:56:56] <Forbidd3n> StephenLynx: sorry for my ignorance, but can't seem to find this in the manual - can you show a reference to it, please?

[20:57:05] <StephenLynx> use dot notation

[20:57:12] <StephenLynx> 'field.subfield'

[20:57:17] <StephenLynx> when creating the index.

[20:57:55] <Forbidd3n> ok so when I create the index for the collection use dot notation and that will make the sub nested item unique?

[20:58:17] <StephenLynx> yes

[20:58:19] <cheeser> yep

[20:58:36] <Forbidd3n> thanks

[21:00:23] <Forbidd3n> If I have a list of ports for example and many ships go to the same ports, would it be best to put the ports in its own collection and reference it by id versus duplicating the port names in each document

[21:00:50] <Forbidd3n> I am guessing that is best to have separate port collection so if the port name changes I only have to update it in one place versus all documents that have that port name

[21:12:30] <tuvo> sounds bad: https://engineering.meteor.com/mongodb-queries-dont-always-return-all-matching-documents-654b6594a827#.lmjvuetgb

[21:12:35] <tuvo> should i be worried?

[21:36:32] <kurushiyama> tuvo No

[21:36:49] <kurushiyama> tuvo I have run extensive tests.

[21:38:10] <kurushiyama> tuvo https://github.com/mwmahlberg/incomplete-returns actually proves that this is only a misbehaviour of the count method, which was presumably used to count the number of results.

[21:38:49] <kurushiyama> tuvo when you look at the channel logs, I have made some statements regarding that, too

[21:39:02] <tuvo> kurushiyama: "I have run extensive tests" doesn't mean much. He had huge input, lots of traffic, happened rarely.

[21:39:14] <tuvo> I believe him, after looking at the details

[21:39:25] <kurushiyama> tuvo Have a look at the program

[21:39:29] <tuvo> no

[21:39:34] <kurushiyama> tuvo What?

[21:39:39] <kurushiyama> tuvo You asked.

[21:40:04] <kurushiyama> Actually, this programs punches the crap out of any installation you can find.

[21:40:29] <kurushiyama> I can easily reproduce a count mismatch, but the cursors are always in the correct size.

[21:41:26] <kurushiyama> What the heck was that?

[22:06:49] <Doyle> Hey. is the db.metadata.json needed to restore? It's not included in stdout of a dump. Thanks

[22:13:22] <Boomtime> @Doyle: https://docs.mongodb.com/manual/reference/program/mongodump/#cmdoption--out

[22:13:33] <Boomtime> "When writing standard output, mongodump does not write the metadata that writes in a <dbname>.metadata.json file when writing to files directly."

[22:14:00] <Boomtime> metadata contains some details about the collection, mainly index definitions

[22:14:40] <Doyle> Seems like a better way to archive would be to dump, then zip the folder after

[22:45:06] <Forbidd3n> How would I set a reference to a nested object from another collection?

[22:45:31] <Forbidd3n> Do I create an ObjectID on the nested item and store it on the referencing collection document(s)?

[22:56:52] <Forbidd3n> If I have to have a nested items reference documents in another collection would it be best to have the nested items in its own collection or add id to the nested items and add id to other collection documents

[22:57:56] <Boomtime> @Forbidd3n: what are the nested documents? can they just be embedded?

[22:59:53] <Forbidd3n> Boomtime: for example. I have ships, lines, ports, itineraries. I was going to embed ships into lines and link each line to port and then link port and link ship and port to itinerary

[23:00:31] <Forbidd3n> The reason I have ports as another collection is because I didn't want to duplicate the port name in each line item embedded

[23:00:58] <Forbidd3n> I can embed itineraries, but need to link the embedded itinerary to the port by id

[23:01:06] <Forbidd3n> Boomtime: does this kinda make sense?

[23:01:17] <Boomtime> how often do ports change?

[23:01:36] <Boomtime> like, how often do the names of ports, and their locations change?

[23:02:10] <Forbidd3n> not often, but they could change

[23:02:23] <Forbidd3n> the itineraries change and are different every year

[23:02:37] <Boomtime> and is that frequency of change worth the extra effort of having to look up port names on every query?

[23:02:43] <Forbidd3n> so when a new file is imported I would append the new itineraries

[23:03:23] <Forbidd3n> probably not, but is it good practice to repeat data in each document embedded port object?

[23:04:01] <Boomtime> 'good practice' is another way of saying 'people always did it this way and i'm not sure why'

[23:04:06] <Boomtime> know why

[23:04:42] <Boomtime> what is the advantage of avoiding putting the port details directly in the lines?

[23:04:44] <Forbidd3n> well if I put the same name on all cruise line object items and if the name changes I would have to filter through and update them all

[23:04:57] <Boomtime> you can still have the port collection if you like as 'master' reference or something

[23:05:20] <Boomtime> how foten do the names change?

[23:05:23] <Boomtime> often*

[23:05:26] <Forbidd3n> correct, I was thinking of having a collection of ports with _id and link to it

[23:05:39] <Boomtime> how often do port names change? daily?

[23:05:54] <Forbidd3n> the way it is laid out is a cruise line has many ships, ships go to many different ports, itinerary is the date at a specific port

[23:06:03] <Boomtime> how often do port names change? daily?

[23:06:23] <Forbidd3n> port name changes don't change, but where they go in the itinerary, the port, may change so the id of the port in the itinerary will have to be updated

[23:06:50] <Forbidd3n> are at least not often, rarely do they change, but of course anything can change

[23:07:00] <Boomtime> so it doesn't matter if the itinerary contains port names and other details, or _ids are well because those need updating anyway?

[23:07:31] <Forbidd3n> not sure I follow that last comment

[23:07:34] <Boomtime> meanwhile, embedding only the _id of the port means you now have to go look up the details in another collection, every query

[23:08:01] <Boomtime> why not design your schema to carry the information you need for queries?

[23:08:34] <Forbidd3n> so it comes down to either update all port names in the itinerary if it changes or do the extra query to get the port based on the id

[23:08:37] <Boomtime> rather than trying to avoid having the same name more than once, despite it rarely ever changing anyway

[23:09:09] <Boomtime> "so it comes down to either update all port names in the itinerary if it changes" <- if what changes?

[23:09:18] <Forbidd3n> the port name

[23:09:37] <Boomtime> how often does a port name change? daily?

[23:09:44] <Boomtime> is it daily?

[23:09:46] <Forbidd3n> then I would have to do a find with the old port name and update each item that contains it

[23:09:51] <Boomtime> is it daily?

[23:09:58] <Forbidd3n> Boomtime: it doesn't change much, but if it did

[23:10:03] <Boomtime> is it daily?

[23:10:07] <Forbidd3n> no

[23:10:30] <Boomtime> so, worse case condition is you have to "filter through and change the port names" once per day - oh noes!

[23:10:31] <Forbidd3n> the port name itself doesn't, but the day/itinerary may change the port it will be at

[23:10:50] <Boomtime> how often do you query an itinerary?

[23:10:58] <Forbidd3n> so you are advocating embedding all the items even if it is duplicates

[23:11:09] <Forbidd3n> the query will be 100s a day

[23:11:10] <Boomtime> 9 times out of 10

[23:11:39] <Boomtime> if the information is too big to fit, or produces an array that might grow unbounded, then embedding is bad

[23:11:46] <Forbidd3n> it is an slim api where the user can get ships based on line, itineraries based on port or line

[23:11:48] <Forbidd3n> and so on

[23:12:03] <Boomtime> optimize for queries

[23:13:00] <Forbidd3n> Boomtime: so the schema should be designed based on queries versus memory used to update

[23:14:12] <Boomtime> is a port name even as long as the _id you propose to use instead?

[23:14:18] <Forbidd3n> Boomtime: based on what I have would you throw it all into one collection?

[23:14:44] <Boomtime> i don't know, i'm not going to go through your whole schema - i called out one detail to hopefully get you to start thinking differently

[23:14:49] <Forbidd3n> character count, it may be the same size or a few chars longer

[23:15:39] <Forbidd3n> Boomtime: just trying to get an better understand the line between when to use additional collections versus embedding

[23:15:40] <leptone> how can i .find() items in my database by a subproperty of a property of the document instance?

[23:15:50] <Boomtime> queries are almost always the vast majority of the work that a database does - it's non-sensical to optimize writes by avoiding duplication and using references etc, at the cost of query complexity

[23:15:53] <leptone> id like to do something like this: https://gist.github.com/daaceab73f70c9c70500bec80a254fc5

[23:16:44] <Forbidd3n> Boomtime: I am coming from a sql background where you don't duplicate data and use foreign keys to normalize data

[23:17:04] <Boomtime> MongoDB is not SQL

[23:17:07] <leptone> but im gettin this error https://gist.github.com/leptone/39c3ecddf2e0eca3a9557684a0cb87d1

[23:17:35] <Forbidd3n> it appears, based on what you are saying, is it is ok to duplicate data in collections and documents

[23:17:44] <Boomtime> @leptone: your query isn't valid JSON

[23:17:48] <Boomtime> -> customer_attributes.affiliate_code

[23:17:57] <Boomtime> that will need to be quoted if you want it to be a JSON key

[23:18:18] <leptone> Boomtime: so what should it look like?

[23:18:42] <leptone> it appears to work even if its not vaild JSON as long as it a direct property im querying for

[23:19:14] <Boomtime> yep, because the interpreter is graciously fixing it for you

[23:19:51] <Boomtime> -> http://json.org/

[23:20:00] <Forbidd3n> Boomtime: also is it efficient when the user queries the api for cruise lines, it pulls the entire data even if all tehy want is the line name itself

[23:20:47] <Boomtime> probably not, you can use a filter to restrict to only the fields you need

[23:21:20] <Forbidd3n> so it doesn't return the entire object and embedded objects?

[23:22:17] <Forbidd3n> so in your opinion based on my conversation would you say I would create a new document for each line, embed ships and in each ship embed itineraries

[23:23:00] <Forbidd3n> line meaning cruise line

[23:27:15] <leptone> Boomtime: so i tried querying for JSON.stringify(queryObj) but it doesnt seem to fix the issue

Log file Viewer

Help | Karma | Search:

#mongodb logs for Wednesday the 8th of June, 2016