[00:47:35] <Forbidd3n> Hey everyone. Quick question. I am loading a csv file in into multiple collections. If the first few rows of the csv file are duplicate, would it be best to create an array of all the data and then loop the array to insert data and only do one check if line exists or would it be alright to do the calls to the collection to see if the value exists, if so then grab id for next collection insert?
[00:48:49] <Forbidd3n> School,Subject,Grade,Teacher - the school, subject and grade are repeated for each different teacher name
[01:05:00] <sumobob> quick question, I have a model with an array of objects with start: Date, and end: Date, given another start and end date how can i find all the models where the query start and end are not in range
[01:09:49] <Forbidd3n> Hey everyone. Quick question. I am loading a csv file in into multiple collections. If the first few rows of the csv file are duplicate, would it be best to create an array of all the data and then loop the array to insert data and only do one check if line exists or would it be alright to do the calls to the collection to see if the value exists, if so then grab id for next collection insert?
[01:09:51] <Forbidd3n> School,Subject,Grade,Teacher - the school, subject and grade are repeated for each different teacher name
[07:22:40] <chris|> when I stop the balancer, will that interrupt a currently active balancer run or will the balancer complete the current run and then not get scheduled again?
[07:25:27] <kurushiyama> chris| The other way around. If you try to stop the balancer, this operation will block until the current balancer run is done. https://docs.mongodb.com/manual/tutorial/manage-sharded-cluster-balancer/#disable-the-balancer
[07:25:40] <kurushiyama> chris| However, _why_ do you want to disable the balancer?
[07:27:38] <kurushiyama> chris| Please clarify. Your backups should be unaffected by the balancer.
[07:28:31] <chris|> kurushiyama: stopping the balancer is documented for sharded clusters: https://docs.mongodb.com/manual/tutorial/backup-sharded-cluster-with-filesystem-snapshots/#disable-the-balancer
[07:29:49] <kurushiyama> chris| Ah, FS snapshots. Well, in this case it makes sense. However, in the long run, I rather suggest to set a balancer window and take the snapshots automatically _outside_ said windows.
[07:31:29] <chris|> yes, I am thinking about doing that
[07:46:53] <cppking> hello, my mongodb cluster balancer is not working , https://bpaste.net/show/258289e040b4 anybody help me ?
[07:47:52] <kurushiyama> cppking It is working. Otherwise there would be no errors at all.
[07:54:12] <chris|> the docs say "operations have propagated to the majority of voting nodes" for majority write concern. That seems strange to me, should that not read "majority of replica set members" ?
[07:57:59] <cppking> kurushiyama: you mean balancer in normal state? but current time it is not working ?
[07:58:37] <kurushiyama> cppking It has errors. While this surely is a thing to monitor and debug, it does run and already has done some balancing.
[08:01:23] <hentis> We're using TTL expiry on some collections, but I need to be able to restore old backups, without loosing that data. As part of our DR policy we use mongodb cloud bacup services, which restores data to /data. Is there a way to run collmod or disable the TTL expiry so we can run collmod to remove the expiry from the index until such time as we want to expire the data manually ?
[08:01:32] <cppking> kurushiyama: where error log placed ?
[08:02:52] <kurushiyama> cppking That depends on your configuration and OS. Please see the docs for that.
[08:04:03] <hentis> Or can the background process that expires the data be disabled without having to collmod the index ?
[08:04:16] <kurushiyama> hentis Uhm... not sure, but shouldn't it be "konnichi wa, Kurushiyama-san". My Japanese is worse than my French, and that means something. ;)
[08:05:10] <kurushiyama> hentis Digging in my brain, I know there is a TTL background process.
[08:06:30] <kurushiyama> hentis If it is a temporary thingy, I would probably disable the TLL monitor altogether: https://docs.mongodb.com/manual/reference/parameters/#param.ttlMonitorEnabled
[08:09:46] <hentis> kurushiyama: haha my japanese is very crappy .. been way to long since I've active used any of it ... nevermind typed it. thanks for info :)
[08:10:17] <kurushiyama> hentis You are welcom, hentis-sama ;)
[08:43:29] <Ange7> is it possible to group by regex with mongo ? i have documents with id field which contains for example : AA12345, AA54321, BB12345, BB54321... i want count number id started by « AA » and count number id started by « BB »
[09:02:23] <eugenmayer> hello. Suddenlty, installing mongdob 3.2.7 under wheezy does not work, since it depends on libc >= 2.14 and wheezy does only have 2.13 - how can this happen since this packages are wheezy specific ?
[09:02:46] <eugenmayer> i am using deb http://repo.mongodb.org/apt/debian wheezy/mongodb-org/stable main as the source
[09:03:45] <eugenmayer> following this guide : https://docs.mongodb.com/manual/tutorial/install-mongodb-on-debian/
[09:06:20] <eugenmayer> seems like 3.2.6 is going well with wheezy, while 3.2.7 does not. Is 3.2.7 jessie only - eventhough there is no support for jessie yet ( according to the docs? )
[09:26:43] <ramnes> so, what about that article posted everywhere
[09:27:01] <ramnes> is MongoDB going to react in a way or another or just take bashing?
[09:28:31] <eugenmayer> ramnes: what are you talking about?
[10:10:44] <kurushiyama> I immediately tried to beat the shit out of my test cluster, with not a single operation failing.
[10:17:56] <ramnes> kurushiyama: this is exactly what I was talking about just before :p
[10:20:28] <kurushiyama> ramnes Just came in. What was it? I am still giving myself a hard time to reproduce the problem, with 10M docs and 10 concurrent goroutines modifying the data, and an equal amount of goroutines reading the data...
[10:21:37] <ramnes> kurushiyama: not much, I just asked if MongoDB was going to do anything and Derick said that he flagged it internally
[10:22:27] <kurushiyama> Well, I'll publish the test program I wrote in the next few hours.
[10:53:22] <kurushiyama> @Derick Well, I'll let you know when I got my little test project finished. It generates noise and reordering on purpose, and very much so.
[10:57:41] <Derick> https://en.wikipedia.org/wiki/Isolation_%28database_systems%29#Phantom_reads is interesting too
[11:02:12] <kurushiyama> Derick Another thing I have thought of (because the technical details were not disclosed) is that he may have run into SERVER-3645 in some way. But admittedly, I haven't had the time to think it through, yet.
[11:04:12] <Derick> kurushiyama: it'd be great if you can publically post your musings at some point :)
[11:04:42] <kurushiyama> Derick Sorry, not a native speaker: "musings"?
[11:07:45] <kurushiyama> Derick Ah! You bet. That is the reason why I started the project. Actually, it is part of a bigger project, called mongodb-bc. Go figure ;P
[11:10:44] <kurushiyama> Darn, I created so much noise that my mongos was OOMed
[11:11:25] <kurushiyama> Zelest Will do. Please link an appropriate post and it gets on my list.
[11:11:45] <Zelest> hehe, only forwarding comments my "friends" use on IRC to hate on mongo :(
[11:14:31] <kurushiyama> tl;dr No, you do not, since if you need to make sure that data was persisted, you should have a proper write concern, only after which is satisfied the operation returns.
[11:29:47] <Derick> got to go now, back in ~2 hours.
[11:30:35] <Derick> eugenmayer: there is: https://jira.mongodb.org/browse/SERVER-24459
[11:34:19] <eugenmayer> Derick: not that one. 3.2.7 depends on libc >= 2.14 .. but wheezy has 2.13
[11:50:52] <eugenmayer> Derick: maybe there is jessie planned and someone created wrong packages?>
[11:51:37] <eugenmayer> https://packages.debian.org/de/wheezy/libc6 .. max 2.13 and no backports. Upgrading libc6 without backports is madness, so i do not think this is suggested
[11:51:52] <eugenmayer> jessie though has 2.19 https://packages.debian.org/de/jessie/libc6
[12:19:37] <kurushiyama> sector_0 No gui. but most feature rich.
[12:20:30] <kurushiyama> urbanizator Any questions about the locking?
[12:21:30] <sector_0> kurushiyama, so the shell is the accepted way to administrate a MongoDB server?
[12:21:48] <urbanizator> kurushiyama nope it's all clear now
[12:21:55] <kurushiyama> sector_0 Accepted? Well, I'd say it is the preferred way.
[12:22:26] <kurushiyama> sector_0 For high level stuff, there is CloudManager.
[12:23:56] <kurushiyama> Derick But as far as I can see, count actually utilizes the index. So, imho, it is not the document which is in undefined state, but the index.
[12:25:56] <kurushiyama> Derick Which should be provable by doing an iteration over the query result, no?
[12:30:26] <kurushiyama> urbanizator The locked _field_ (terminology is important, there is no such thing as a column in MongoDB) is one you define, yes.
[13:10:04] <kurushiyama> ramnes Derick This is the first version. Still has some problems, but it proves that it is a problem of .count() https://github.com/mwmahlberg/incomplete-returns/blob/master/main.go
[13:10:40] <sector_0> on a scale of 1 to 10 how easy is it to go from noob to pro in MongoDb
[13:10:45] <ramnes> kurushiyama: what's your conclusion then?
[13:10:59] <ramnes> kurushiyama: is it really an issue?
[13:12:45] <Derick> kurushiyama: afaik count is a special index scan or something
[13:13:34] <kurushiyama> I do not trust count() too much, anyway, since as of now it is rather useless in sharded environments (see SERVER-3645) for details. As far as I can see, it is only the index which might be in an undefined state. The program freezes all modifying operations as soon as a mismatch of expected documents and the number returned by .count() is found. As of now, iterating through the result set always gives the correct number of documents.
[13:14:09] <Derick> kurushiyama: for that, you need to talk to one of the server guys really - I don't know
[13:14:33] <ramnes> kurushiyama: so basically, you can't reproduce, right?
[13:14:56] <kurushiyama> The programs has some locking problems, which I need to debug. Once a mismatch is found, the locking causes basically every count to fail in a way. But that may be worth a look in itself.
[13:15:21] <kurushiyama> ramnes I can reproduce that .count() might return wrong information in write-heavy environments.
[13:15:39] <kurushiyama> ramnes I can not reproduce that the actual result set is wrong.
[13:15:48] <kurushiyama> ramnes Quite the contrary.
[13:23:18] <kurushiyama> Derick I will spice up and document the code a bit. Is there a way to reach out for your "sever guys"?
[13:29:49] <jayjo_> I have figured out that aws cli or s3cmd cannot stream recursively (at least effectively). What would be the workaround for a problem like this? Do I need to write a script that recursively finds files in the bucket and streams to mongoimport?
[13:31:04] <saml> you want to insert json files from http server to mongodb?
[13:31:39] <saml> for js in all_json(): mongo.yourdb.yourcollection.insert(js)
[13:34:17] <kali> jayjo_: command line interface tools for s3 are often not that helpful for scripting. many scripting languages have better interface (python, node, ...). whatever you're confortable with
[13:40:48] <jayjo_> so I should just roll my own in python and use boto
[13:55:36] <kurushiyama> As for the verification of MongoDB having an undefined state under certain conditions. Well, I actually found out that I did not have a locking problem at all. What happens after the first count() returns a wrong value, all concurrent writers are paused. As soon as the result set is verified, those concurrent writers are unpaused _simultaneously_ (almost), resulting in a lot of concurrent writes again, causing .count() to return a wrong
[13:55:36] <kurushiyama> value and it restarts from there. So, my conclusion holds and is even enforced: It is a problem of the way .count() behaves, not in the result sets.
[14:10:56] <kurushiyama> ramnes Assuming you have installed Go and a proper GOPATH set and all, it should work with "go get https://github.com/mwmahlberg/incomplete-returns", after which $GOPATH/bin/incomplete-returns would be the executable. Append "-h" for the options.
[14:11:27] <kurushiyama> if you wish, I can provide a linux executable, which might be easier.
[14:12:01] <ramnes> I'll give it a shoot this evening
[14:12:45] <kurushiyama> Well, I might update it, so you better run "go get -u ..." once in a while.
[15:07:49] <dino82> It appears mms support is down? I can't send in a ticket.
[15:13:37] <dino82> Apparently you can't change the user the mms-agent run as? mongod vs mongodb
[17:01:30] <bratner> Hi people! Where does one go to ask C++ driver questions?
[17:04:51] <cheeser> here or the mongodb-users list
[17:05:10] <cheeser> probably the list would be better. i don't think any of the devs are here and most of the users here use other languages
[17:05:22] <StephenLynx> yeah, few people use the C or C++ drivers
[17:05:43] <StephenLynx> and I don't think their devs hang around here
[17:33:40] <_1009> Hey all - just wanted to check: for shards, is it recommended to put config nodes on a dedicated node? I noticed that deployments in MongoDB Cloud do so by default. If that is the case, why?
[17:34:51] <cheeser> you don't want those nodes bogged down with external loads.
[17:37:55] <GothAlice> _1009: Configuration servers are queried by mongos query routers to determine which actual data nodes contain the data being queried. It's important that these answers are fast.
[17:51:06] <_1009> OK, thanks - in my case, all data is sharded by id, but I'm always searching by geolocation - does it still hit the config router then, as it will query all nodes for (their part) of the geolocation?
[17:59:29] <kurushiyama> _1009 Shard key not in query => monogos has no clue about the key ranges of the queried field => Query is sent to all shards.
[18:14:50] <_1009> Awesome, that's what I thought - thanks so much!
[19:23:33] <speedio> i have the following mongoose schema for my mongo db http://dpaste.com/0FEPM52 , and im trying to add data to it with the following command http://dpaste.com/2DWQVA6 , but for some reason the nested objects wont get added.. what am i doing wrong?
[19:24:22] <StephenLynx> i recommend you don't use mongoose.
[19:32:46] <StephenLynx> I know its performance is piss poor
[19:32:56] <StephenLynx> less than 20% of the native driver
[19:33:04] <StephenLynx> it doesn't handle dates properly
[19:33:17] <StephenLynx> and uses a crappy cast on ObjectIds
[19:33:34] <StephenLynx> not to mention that it forces you to use mongo wrong.
[19:33:41] <StephenLynx> in a way it wasn't designed to be used
[19:34:26] <alexi5> whats up with the nodejs community been hear some bad new s about a library consisting of only a string padding function that broke alot of apps
[19:44:23] <StephenLynx> >Our goal is to make publishing and installing packages as frictionless as possible. In this case, we believe that most users who would come across a kik package, would reasonably expect it to be related to kik.com. In this context, transferring ownership of these two package names achieves that goal.
[19:44:48] <StephenLynx> npm never mention anything about "hm, yeah, that could give us legal trouble"
[19:45:54] <StephenLynx> and fuck what lawyers think, they are just greedy rats.
[19:46:14] <StephenLynx> they would sue you for looking at them the wrong way
[20:28:27] <Forbidd3n> How do I store _id in another collection->document in mongo? refId = new MongoId('the id of the reference I want to store') or do I just store this - refId = 'the id of the reference I want to store'
[20:28:39] <Forbidd3n> so the mongoid object or the mongoid string
[20:31:51] <alexi5> have you guys heard of developers making statements to use postgresql to store document data given that it has json and jsonb field types that can be queried ?
[20:33:12] <StephenLynx> it takes a lot of ignorance to think mongo just stores json.
[20:33:41] <StephenLynx> and that its features are limited by what and how it stores data.
[20:35:19] <alexi5> what I like about mongodb is the ability to map application objects to documents easily
[20:36:20] <alexi5> what I relized is that systems can use both a document database and relational based on their requirements instead of staying with just one
[20:55:55] <Forbidd3n> is it possible to set unique on nested items?
[21:00:23] <Forbidd3n> If I have a list of ports for example and many ships go to the same ports, would it be best to put the ports in its own collection and reference it by id versus duplicating the port names in each document
[21:00:50] <Forbidd3n> I am guessing that is best to have separate port collection so if the port name changes I only have to update it in one place versus all documents that have that port name
[21:36:49] <kurushiyama> tuvo I have run extensive tests.
[21:38:10] <kurushiyama> tuvo https://github.com/mwmahlberg/incomplete-returns actually proves that this is only a misbehaviour of the count method, which was presumably used to count the number of results.
[21:38:49] <kurushiyama> tuvo when you look at the channel logs, I have made some statements regarding that, too
[21:39:02] <tuvo> kurushiyama: "I have run extensive tests" doesn't mean much. He had huge input, lots of traffic, happened rarely.
[21:39:14] <tuvo> I believe him, after looking at the details
[21:39:25] <kurushiyama> tuvo Have a look at the program
[22:13:33] <Boomtime> "When writing standard output, mongodump does not write the metadata that writes in a <dbname>.metadata.json file when writing to files directly."
[22:14:00] <Boomtime> metadata contains some details about the collection, mainly index definitions
[22:14:40] <Doyle> Seems like a better way to archive would be to dump, then zip the folder after
[22:45:06] <Forbidd3n> How would I set a reference to a nested object from another collection?
[22:45:31] <Forbidd3n> Do I create an ObjectID on the nested item and store it on the referencing collection document(s)?
[22:56:52] <Forbidd3n> If I have to have a nested items reference documents in another collection would it be best to have the nested items in its own collection or add id to the nested items and add id to other collection documents
[22:57:56] <Boomtime> @Forbidd3n: what are the nested documents? can they just be embedded?
[22:59:53] <Forbidd3n> Boomtime: for example. I have ships, lines, ports, itineraries. I was going to embed ships into lines and link each line to port and then link port and link ship and port to itinerary
[23:00:31] <Forbidd3n> The reason I have ports as another collection is because I didn't want to duplicate the port name in each line item embedded
[23:00:58] <Forbidd3n> I can embed itineraries, but need to link the embedded itinerary to the port by id
[23:01:06] <Forbidd3n> Boomtime: does this kinda make sense?
[23:04:42] <Boomtime> what is the advantage of avoiding putting the port details directly in the lines?
[23:04:44] <Forbidd3n> well if I put the same name on all cruise line object items and if the name changes I would have to filter through and update them all
[23:04:57] <Boomtime> you can still have the port collection if you like as 'master' reference or something
[23:05:20] <Boomtime> how foten do the names change?
[23:05:26] <Forbidd3n> correct, I was thinking of having a collection of ports with _id and link to it
[23:05:39] <Boomtime> how often do port names change? daily?
[23:05:54] <Forbidd3n> the way it is laid out is a cruise line has many ships, ships go to many different ports, itinerary is the date at a specific port
[23:06:03] <Boomtime> how often do port names change? daily?
[23:06:23] <Forbidd3n> port name changes don't change, but where they go in the itinerary, the port, may change so the id of the port in the itinerary will have to be updated
[23:06:50] <Forbidd3n> are at least not often, rarely do they change, but of course anything can change
[23:07:00] <Boomtime> so it doesn't matter if the itinerary contains port names and other details, or _ids are well because those need updating anyway?
[23:07:31] <Forbidd3n> not sure I follow that last comment
[23:07:34] <Boomtime> meanwhile, embedding only the _id of the port means you now have to go look up the details in another collection, every query
[23:08:01] <Boomtime> why not design your schema to carry the information you need for queries?
[23:08:34] <Forbidd3n> so it comes down to either update all port names in the itinerary if it changes or do the extra query to get the port based on the id
[23:08:37] <Boomtime> rather than trying to avoid having the same name more than once, despite it rarely ever changing anyway
[23:09:09] <Boomtime> "so it comes down to either update all port names in the itinerary if it changes" <- if what changes?
[23:13:00] <Forbidd3n> Boomtime: so the schema should be designed based on queries versus memory used to update
[23:14:12] <Boomtime> is a port name even as long as the _id you propose to use instead?
[23:14:18] <Forbidd3n> Boomtime: based on what I have would you throw it all into one collection?
[23:14:44] <Boomtime> i don't know, i'm not going to go through your whole schema - i called out one detail to hopefully get you to start thinking differently
[23:14:49] <Forbidd3n> character count, it may be the same size or a few chars longer
[23:15:39] <Forbidd3n> Boomtime: just trying to get an better understand the line between when to use additional collections versus embedding
[23:15:40] <leptone> how can i .find() items in my database by a subproperty of a property of the document instance?
[23:15:50] <Boomtime> queries are almost always the vast majority of the work that a database does - it's non-sensical to optimize writes by avoiding duplication and using references etc, at the cost of query complexity
[23:15:53] <leptone> id like to do something like this: https://gist.github.com/daaceab73f70c9c70500bec80a254fc5
[23:16:44] <Forbidd3n> Boomtime: I am coming from a sql background where you don't duplicate data and use foreign keys to normalize data
[23:20:00] <Forbidd3n> Boomtime: also is it efficient when the user queries the api for cruise lines, it pulls the entire data even if all tehy want is the line name itself
[23:20:47] <Boomtime> probably not, you can use a filter to restrict to only the fields you need
[23:21:20] <Forbidd3n> so it doesn't return the entire object and embedded objects?
[23:22:17] <Forbidd3n> so in your opinion based on my conversation would you say I would create a new document for each line, embed ships and in each ship embed itineraries