[06:44:39] <nodejunkie> Hi, if I have a backup of my /data/db directory (but do *not* have a mongodump or mongoexport), what is the easiest way to restore my database
[10:11:47] <Ange7> If i have most documents like this : http://pastebin.com/FQ3tb46G is it possible to do « .find({}) » to get all document which contains : cat = foo
[10:26:13] <kurushiyama> Ange7 why not db.yourcoll.find({"cat.foo":{$exists:true}})? http://pastebin.com/EzTk7t54
[10:35:35] <kurushiyama> Ange7 Nah. Not search on demand. Read it. at least the operator list.
[10:35:40] <kurushiyama> Ange7 It is not that long.
[12:06:52] <Ben_1> Detected unclean shutdown - /var/lib/mongo/mongod.lock is not empty.
[12:07:08] <Ben_1> can I remove that file or should I do something else to start my mongod?
[12:08:55] <Ben_1> cheeser: do you have an idea? :3
[12:25:00] <Ben_1> ok had to remove all lock files and rechown several tiger files and use repait :3
[12:48:17] <Ben_1> does "setOnInsert" overwrite values that already exist?
[12:51:56] <vista> Hi, can someone tell me more about the "silent data loss" scenario that might occur when using MongoDB? What are the prerequisites for such scenario?
[12:56:05] <kurushiyama> vista Huh? Where did you hear of that?
[12:56:36] <kurushiyama> vista Are you referring to the blog post that MongoDB does not return al queried data?
[12:58:16] <vista> I heard it on various sites on the Internet
[12:58:29] <kurushiyama> vista Well. Ok, lets start.
[12:58:31] <vista> I have no idea if the claims are true or not
[12:58:38] <kurushiyama> vista Have you read in the docs a bit?
[12:58:58] <vista> we are currently considering our options regarding the choice of database
[12:59:37] <kurushiyama> Do you know the notion of "writeConcern"?
[13:01:02] <vista> I have not delved into the specifics, but I have heard about it
[13:01:43] <kurushiyama> Ok, here is how it works: Be it a standalone or replica set you are connecting to, you can specifiy a writeConcern (by default and per request).
[13:02:05] <kurushiyama> Let us take a 3 member replica set.
[13:02:37] <kurushiyama> If you set the write concern to "majority", a write operation would only return successfully if the data was written to at least 2 nodes.
[13:03:12] <kurushiyama> Now, here is the problem. until 2.6, the default write concern was 0
[13:03:23] <kurushiyama> Basically, a fire and forget write.
[13:03:48] <kurushiyama> Which _was_ documented, btw – but a lot of people did not bother to read the docs.
[13:05:13] <kurushiyama> As of 2.6, the default write concern is 1
[13:05:58] <kurushiyama> Now, within our replica set, there are still some edge cases in which writes may not be directly acessible (though recoverable).
[13:06:42] <kurushiyama> Say we write with a writeConcern of 1, our primary accepted and acknowledged the write but before the secondaries got the new data, primary goes down.
[13:07:12] <kurushiyama> One of the secondaries gets elected new primary, your application can write.
[13:07:36] <kurushiyama> now, the old primary comes up again and has data the others do not know about.
[13:08:42] <kurushiyama> Since those write operations might already be obsolete, or documents even be deleted, it wont simply apply those write operations to the data set. It saves them as so called replays.
[13:09:08] <kurushiyama> Those replays have to be manually reviewed and applied.
[13:10:29] <kurushiyama> Mind you, this is only the case with a writeConcern of 1 and only when the primary goes down before writes made it to a secondary. With a higher writeConcern or when the writes made it to at least one secondary before the primary goes down, everything is fine.
[13:12:07] <kurushiyama> vista In case you have any questions or do not understand a term I use, please ask.#
[13:21:50] <jayjo> I am trying to upload data into mongodb, and I believe I have some illegal characters (?). Is it the case that I can only not have the $ or . in the keys for the document? Can I have '\' in values?
[13:22:19] <jayjo> Or do I need to set the encoding before the upload?
[13:25:42] <jayjo> I ran a python script to upload all of the data and catch the lines that caused problems. I wrote them to a log, in order to attempt to upload them afterwards. Here is a snippet of the failed json: https://bpaste.net/show/41764f214b90
[13:26:53] <jayjo> The first few lines have a '.' in the key value. So, in my upload script I need to modifiy the json, is that right? I think the last rows failed as well because the \ is not escaped? Is that the case? I'm not getting meaninful errors
[13:31:09] <kurushiyama> jayjo In general, you should encode. I tend to use base64 in this case.
[13:31:35] <kurushiyama> vista May I ask what you want to use MongoDB for?
[13:32:31] <jayjo> kurushiyama: shoot... so I need to convert to UTF-8 on the python side, and re-upload the data I have already uploaded without problem? Or just encode the problem data and begin doing it for all future data?
[13:42:14] <kurushiyama> vista Say you want to count the instances of a value occuring. Or the average of a field over all documents. Something like this.
[13:42:35] <kurushiyama> vista Stuff you would typically do with map/reduce.
[13:45:29] <kurushiyama> vista Well, it is not mutually exclusive. How is your demand on uptime. Well, I know, always up, but realistically?
[13:48:19] <mroman> I have documents like {"objects":{ "k1" : { ... }, "k2" : { ... }, "q0" : { ... }}} and I'd like a query that returns me all the "names" of the things beneath "objects"
[13:48:28] <mroman> (i.e. it should return "k1,k2,q0")
[13:48:46] <Derick> mroman: you need to change your schema so that you turn these keys into values
[13:51:56] <jayjo> kurushiyama: sorry - why would I choose base64 over utf-8 if I'm generally storing strings?
[13:52:30] <mroman> {"instruments" : { "k1" : { "density" : 0.9, "temp" : 30 }, "k2" : { "pressure" : 1109 }}, "location" : "VVDB", timestamp : ISODate(...)} <- I have multiple such entries and I somehow want to query all instruments I have ever gotten any data from at location "VVDB" for example
[13:52:59] <mroman> (I can obviously just query for everything where "location == VVDB" and then use some application code to filter out the instruments)
[13:55:03] <jayjo> ok - but isn't utf-8 the default? If the data uploaded without a problem, is that data already in utf-8? and I only need to specify the encoding for all future data?
[13:55:16] <Derick> jayjo: MongoDB's strings should always be UTF-8
[13:55:23] <mroman> Derick: so you get one row per instrument?
[13:55:28] <Derick> it can't store any others (in the normal string type)
[13:55:37] <Derick> mroman: yes. We call rows documents though
[13:55:42] <Derick> how many documents do you have?
[13:56:13] <mroman> then I could just do a distinct query for instrument where location == VVDB
[13:57:47] <jayjo> just to verify then, I wrote an ETL script to upload this data. I didn't encode it. If it worked, it encoded into utf-8 by default(?) Now I can take the lines that I have errors for, explicitly encode them in utf-8, and I should be good?
[13:59:12] <mroman> It could probably be done in a mapreduce function, if I had access to create one server side.
[13:59:32] <mroman> (have to check the docs if I can submit my own mapreduce functions with just read privileges)
[13:59:33] <Derick> jayjo: depends on what your driver does...
[13:59:40] <Derick> mroman: m/r is a killer for performance
[14:00:46] <Derick> mroman: Map/Reduce is not very performant. Much better to use the aggregation framework, but you really need to consider fixing your schema. Apparently it's not designed with data access patterns in mind (or perhaps not for all of them in mind)
[14:02:19] <kurushiyama> jayjo As far as I can see, we are talking of binary data you intend to store as a string, right?
[14:06:42] <mroman> It was designed to have as few rows as possible :)
[14:07:30] <mroman> if you have one row/document per instrument data you have location and timestamp duplicated n-times
[14:26:27] <kurushiyama> mroman Redundancy is your friend, actually.
[14:31:26] <gzoo> How would I add a document with an NUUID instead of ObjectId as the key?
[14:32:03] <gzoo> I have a collection with several documents keyed with an NUUID (seeing through Robomongo) and I need to add another. Want a NUUID for consistency
[14:35:52] <gzoo> kurushiyama, I'm really new to this so I'd need some help. Do I do something like NUUID(ObjectId())? It didn't work
[14:37:15] <kurushiyama> gzoo I am not aware that mongodb can generate NUUIDs. An _id field can have _any_ value (as long as it is unique within a collection), so I guess the NUUIDs are generated client side.
[14:41:03] <gzoo> kurushiyama, hmm. ok. How does mongodb know to differentiate between say, NUUID, ObjectId, CSUUID, so it shows the type on mongodb?
[14:41:15] <gzoo> kurushiyama, I'm also checking your link
[14:42:05] <kurushiyama> gzoo MongoDB does not. it is just a field like any other. So it can be either of type objectId, string, even bool (which would make a _very_ small collection).
[14:45:38] <gzoo> kurushiyama, so the NUUID I see in my Robomongo are just a representation built into Robomongo?
[14:46:48] <Lope> When will the 3.3.9 fixes be made available for the Ubuntu apt repo? https://github.com/mongodb/mongo/commit/710159c9602a6738e6455cfb26bc2d70a0454ae2
[14:47:43] <kurushiyama> gzoo Most likely. GUI tools have one major disadvantage – they decouple you from what is really going on. _Especially_ when you start with MongoDB, my suggestion is to use the shell.
[14:48:55] <gzoo> kurushiyama, well thank you for now. You did clarify some things.
[15:04:55] <Lope> kurushiyama: some people make the argument that CLI tools do the same.
[15:05:32] <Lope> kurushiyama: but generally I agree. GUI tools tend to be overly simplistic and leave out a lot of details.
[15:06:07] <kurushiyama> Lope Well, it does not get any closer to the data except for manipulating the datafiles directly, which _maybe_ is not the best idea... ;)
[15:06:37] <Lope> kurushiyama: every GUI/CLI tool pair is unique.
[15:20:56] <spleen> Hello All, i am new in mongodb and I have some questions
[15:21:47] <spleen> I have a shared cluster with some sharded collection
[15:22:22] <spleen> i splited the collection-A and moved it to a new shard
[15:32:58] <kurushiyama> so each and every new document gets written to shard2
[15:33:24] <kurushiyama> spleen since it is assigned the key range from 20160626 to +infinity.
[15:34:04] <kurushiyama> spleen Now, after a while the chunk gets too big and is split.
[15:34:49] <kurushiyama> spleen And again, and again. Now, when there is a certain mismatch between the number of chunks on shard1 and shard2, the balancer kicks in and moves chunks from shard2 to shard1
[15:35:22] <kurushiyama> spleen The metadata is updated accordingly, and all live happily ever after... ...except, you dont.
[15:35:58] <kurushiyama> spleen Say now shard1 is assigned the range from -infinity to 20160627
[15:36:12] <kurushiyama> spleen And again, all new documents are only written to shard2
[15:36:21] <kurushiyama> spleen repeating the process.
[15:37:04] <kurushiyama> spleen In a worst case scenario (and I have seen those), data comes in faster than the balancer can ... ...well, balance out the mess.
[15:37:28] <kurushiyama> spleen Which is very likely what you are experiencing now.
[15:38:18] <kurushiyama> spleen And here come the really, really bad news: You can not change the shard key.
[15:39:40] <kurushiyama> spleen You basically have to set up a new sharded collection, with a better shard key selected, and migrate the data.
[15:42:43] <kurushiyama> spleen You ok or were you hit by a stroke?
[21:55:57] <TommyTheKid> I have inherited a mongodb cluster, that is in docker containers... Is there an advantage of having mongo server running in a docker container? The "host" is an EC2 node. The nodes were built over 10 months ago, and are still working, but they haven't been touched for updates, and that scares me.
[21:59:07] <bjpenn> i have a lot of open connections on mongo thats killing the db load, anyone know how i can kill those connections?