pmxbot IRC Log Viewer

[00:44:32] <django_> hey all

[00:44:52] <django_> running mondodb for this tutoral:

[00:44:53] <django_> http://code.tutsplus.com/tutorials/building-rest-apis-using-eve--cms-22961

[00:45:07] <django_> how should i install mongodb?

[01:09:30] <cheeser> django_: what os?

[01:09:44] <django_> cheeser, ubuntu but i found a guide ty

[01:09:49] <cheeser> ok

[01:10:44] <django_> cheeser, is this correct: mongod --dbpath= /home/projects/Bucket-List-app/data/db

[01:11:06] <cheeser> no =

[06:44:39] <nodejunkie> Hi, if I have a backup of my /data/db directory (but do *not* have a mongodump or mongoexport), what is the easiest way to restore my database

[10:10:45] <Ange7> Hello all

[10:11:47] <Ange7> If i have most documents like this : http://pastebin.com/FQ3tb46G is it possible to do « .find({}) » to get all document which contains : cat = foo

[10:26:13] <kurushiyama> Ange7 why not db.yourcoll.find({"cat.foo":{$exists:true}})? http://pastebin.com/EzTk7t54

[10:30:37] <Ange7> cause i didn't know that :D

[10:31:43] <kurushiyama> Ange7 The documentation is a rich source of information ;)

[10:32:04] <Ange7> thank you kurushiyama

[10:32:12] <Ange7> i searched but didn't find

[10:35:35] <kurushiyama> Ange7 Nah. Not search on demand. Read it. at least the operator list.

[10:35:40] <kurushiyama> Ange7 It is not that long.

[12:06:52] <Ben_1> Detected unclean shutdown - /var/lib/mongo/mongod.lock is not empty.

[12:07:08] <Ben_1> can I remove that file or should I do something else to start my mongod?

[12:08:55] <Ben_1> cheeser: do you have an idea? :3

[12:25:00] <Ben_1> ok had to remove all lock files and rechown several tiger files and use repait :3

[12:48:17] <Ben_1> does "setOnInsert" overwrite values that already exist?

[12:51:56] <vista> Hi, can someone tell me more about the "silent data loss" scenario that might occur when using MongoDB? What are the prerequisites for such scenario?

[12:56:05] <kurushiyama> vista Huh? Where did you hear of that?

[12:56:36] <kurushiyama> vista Are you referring to the blog post that MongoDB does not return al queried data?

[12:58:16] <vista> I heard it on various sites on the Internet

[12:58:29] <kurushiyama> vista Well. Ok, lets start.

[12:58:31] <vista> I have no idea if the claims are true or not

[12:58:38] <kurushiyama> vista Have you read in the docs a bit?

[12:58:58] <vista> we are currently considering our options regarding the choice of database

[12:59:02] <vista> A *bit*

[12:59:23] <kurushiyama> Ok.

[12:59:37] <kurushiyama> Do you know the notion of "writeConcern"?

[13:01:02] <vista> I have not delved into the specifics, but I have heard about it

[13:01:43] <kurushiyama> Ok, here is how it works: Be it a standalone or replica set you are connecting to, you can specifiy a writeConcern (by default and per request).

[13:02:05] <kurushiyama> Let us take a 3 member replica set.

[13:02:37] <kurushiyama> If you set the write concern to "majority", a write operation would only return successfully if the data was written to at least 2 nodes.

[13:03:12] <kurushiyama> Now, here is the problem. until 2.6, the default write concern was 0

[13:03:23] <kurushiyama> Basically, a fire and forget write.

[13:03:48] <kurushiyama> Which _was_ documented, btw – but a lot of people did not bother to read the docs.

[13:04:47] <vista> I see

[13:05:13] <kurushiyama> As of 2.6, the default write concern is 1

[13:05:58] <kurushiyama> Now, within our replica set, there are still some edge cases in which writes may not be directly acessible (though recoverable).

[13:06:42] <kurushiyama> Say we write with a writeConcern of 1, our primary accepted and acknowledged the write but before the secondaries got the new data, primary goes down.

[13:07:12] <kurushiyama> One of the secondaries gets elected new primary, your application can write.

[13:07:36] <kurushiyama> now, the old primary comes up again and has data the others do not know about.

[13:08:42] <kurushiyama> Since those write operations might already be obsolete, or documents even be deleted, it wont simply apply those write operations to the data set. It saves them as so called replays.

[13:09:08] <kurushiyama> Those replays have to be manually reviewed and applied.

[13:10:29] <kurushiyama> Mind you, this is only the case with a writeConcern of 1 and only when the primary goes down before writes made it to a secondary. With a higher writeConcern or when the writes made it to at least one secondary before the primary goes down, everything is fine.

[13:12:07] <kurushiyama> vista In case you have any questions or do not understand a term I use, please ask.#

[13:21:50] <jayjo> I am trying to upload data into mongodb, and I believe I have some illegal characters (?). Is it the case that I can only not have the $ or . in the keys for the document? Can I have '\' in values?

[13:22:19] <jayjo> Or do I need to set the encoding before the upload?

[13:25:42] <jayjo> I ran a python script to upload all of the data and catch the lines that caused problems. I wrote them to a log, in order to attempt to upload them afterwards. Here is a snippet of the failed json: https://bpaste.net/show/41764f214b90

[13:26:53] <jayjo> The first few lines have a '.' in the key value. So, in my upload script I need to modifiy the json, is that right? I think the last rows failed as well because the \ is not escaped? Is that the case? I'm not getting meaninful errors

[13:30:24] <vista> thank you, kurushiyama

[13:31:09] <kurushiyama> jayjo In general, you should encode. I tend to use base64 in this case.

[13:31:35] <kurushiyama> vista May I ask what you want to use MongoDB for?

[13:32:31] <jayjo> kurushiyama: shoot... so I need to convert to UTF-8 on the python side, and re-upload the data I have already uploaded without problem? Or just encode the problem data and begin doing it for all future data?

[13:32:35] <vista> long-term data storage

[13:32:41] <vista> data integrity is important for us

[13:32:50] <vista> but so is performance on json-like structures

[13:33:15] <kurushiyama> jayjo Since you actually want to access them all the same way, I'd guess the former.

[13:33:49] <kurushiyama> vista Well, do you want to store them as documents or files?

[13:34:13] <vista> documents

[13:35:22] <jayjo> OK - if I'm going to redo this... is there a concern with base64 and BSON? Why do you prefer base64?

[13:35:25] <kurushiyama> vista Are you planning to have your own cluster? What data sizes are we talking about? Will the data get outdated?

[13:36:13] <kurushiyama> jayjo I use base64 when I store binary data in fields...

[13:36:28] <vista> per entry, not that big, about ~2000B at max

[13:36:38] <vista> and no, the data will never get outdated

[13:36:54] <vista> long-term data storage => long-term data archival

[13:37:05] <kurushiyama> How many entries/day? Are we talking of time series data?

[13:37:41] <vista> time series data, yes, and around 5-6 entries per second

[13:38:20] <vista> or perhaps about ~500k per day

[13:39:05] <kurushiyama> vista 1GB/day, basically.

[13:39:12] <kurushiyama> No.

[13:39:14] <kurushiyama> more

[13:40:15] <kurushiyama> vista But we are talking of a more write heavy application, and most likely some aggregations?

[13:41:13] <vista> aggregations, as in?

[13:42:14] <kurushiyama> vista Say you want to count the instances of a value occuring. Or the average of a field over all documents. Something like this.

[13:42:35] <kurushiyama> vista Stuff you would typically do with map/reduce.

[13:43:51] <vista> no, write-heavy

[13:45:29] <kurushiyama> vista Well, it is not mutually exclusive. How is your demand on uptime. Well, I know, always up, but realistically?

[13:48:19] <mroman> I have documents like {"objects":{ "k1" : { ... }, "k2" : { ... }, "q0" : { ... }}} and I'd like a query that returns me all the "names" of the things beneath "objects"

[13:48:28] <mroman> (i.e. it should return "k1,k2,q0")

[13:48:34] <Derick> mroman: you can't do that

[13:48:46] <Derick> mroman: you need to change your schema so that you turn these keys into values

[13:51:56] <jayjo> kurushiyama: sorry - why would I choose base64 over utf-8 if I'm generally storing strings?

[13:52:30] <mroman> {"instruments" : { "k1" : { "density" : 0.9, "temp" : 30 }, "k2" : { "pressure" : 1109 }}, "location" : "VVDB", timestamp : ISODate(...)} <- I have multiple such entries and I somehow want to query all instruments I have ever gotten any data from at location "VVDB" for example

[13:52:59] <mroman> (I can obviously just query for everything where "location == VVDB" and then use some application code to filter out the instruments)

[13:53:21] <Derick> jayjo: you won't...

[13:53:39] <jayjo> Derick: so I should just use utf-8

[13:53:42] <Derick> sure

[13:53:57] <mroman> basically for instrument in row['instruments']: all_instruments.add(instrument) and I'm done

[13:54:18] <mroman> (and do that for row in rows: )

[13:54:25] <Derick> mroman: instruments, shouldn't that be a collection then?

[13:54:45] <mroman> what do you mean by that?

[13:54:56] <Derick> { "k1" : { "density" : 0.9, "temp" : 30 } should really be: { instrument: "k1", "density" : 0.9, "temp" : 30 }

[13:55:03] <jayjo> ok - but isn't utf-8 the default? If the data uploaded without a problem, is that data already in utf-8? and I only need to specify the encoding for all future data?

[13:55:16] <Derick> jayjo: MongoDB's strings should always be UTF-8

[13:55:23] <mroman> Derick: so you get one row per instrument?

[13:55:28] <Derick> it can't store any others (in the normal string type)

[13:55:37] <Derick> mroman: yes. We call rows documents though

[13:55:42] <Derick> how many documents do you have?

[13:56:13] <mroman> then I could just do a distinct query for instrument where location == VVDB

[13:56:19] <mroman> I'm aware of that.

[13:56:31] <mroman> but it's not my database :D

[13:57:47] <jayjo> just to verify then, I wrote an ETL script to upload this data. I didn't encode it. If it worked, it encoded into utf-8 by default(?) Now I can take the lines that I have errors for, explicitly encode them in utf-8, and I should be good?

[13:59:12] <mroman> It could probably be done in a mapreduce function, if I had access to create one server side.

[13:59:32] <mroman> (have to check the docs if I can submit my own mapreduce functions with just read privileges)

[13:59:33] <Derick> jayjo: depends on what your driver does...

[13:59:40] <Derick> mroman: m/r is a killer for performance

[13:59:42] <jayjo> I'm using pymongo

[13:59:51] <Derick> jayjo: sorry, I don't know that one

[13:59:59] <mroman> Derick: hu?

[14:00:46] <Derick> mroman: Map/Reduce is not very performant. Much better to use the aggregation framework, but you really need to consider fixing your schema. Apparently it's not designed with data access patterns in mind (or perhaps not for all of them in mind)

[14:02:19] <kurushiyama> jayjo As far as I can see, we are talking of binary data you intend to store as a string, right?

[14:06:42] <mroman> It was designed to have as few rows as possible :)

[14:06:55] <mroman> to eliminate redundancy

[14:07:30] <mroman> if you have one row/document per instrument data you have location and timestamp duplicated n-times

[14:26:27] <kurushiyama> mroman Redundancy is your friend, actually.

[14:31:26] <gzoo> How would I add a document with an NUUID instead of ObjectId as the key?

[14:32:03] <gzoo> I have a collection with several documents keyed with an NUUID (seeing through Robomongo) and I need to add another. Want a NUUID for consistency

[14:33:47] <kurushiyama> gzoo Well, generate one.

[14:35:52] <gzoo> kurushiyama, I'm really new to this so I'd need some help. Do I do something like NUUID(ObjectId())? It didn't work

[14:37:15] <kurushiyama> gzoo I am not aware that mongodb can generate NUUIDs. An _id field can have _any_ value (as long as it is unique within a collection), so I guess the NUUIDs are generated client side.

[14:38:28] <kurushiyama> gzoo http://stackoverflow.com/a/31836351/1296707

[14:41:03] <gzoo> kurushiyama, hmm. ok. How does mongodb know to differentiate between say, NUUID, ObjectId, CSUUID, so it shows the type on mongodb?

[14:41:15] <gzoo> kurushiyama, I'm also checking your link

[14:42:05] <kurushiyama> gzoo MongoDB does not. it is just a field like any other. So it can be either of type objectId, string, even bool (which would make a _very_ small collection).

[14:45:38] <gzoo> kurushiyama, so the NUUID I see in my Robomongo are just a representation built into Robomongo?

[14:46:48] <Lope> When will the 3.3.9 fixes be made available for the Ubuntu apt repo? https://github.com/mongodb/mongo/commit/710159c9602a6738e6455cfb26bc2d70a0454ae2

[14:47:43] <kurushiyama> gzoo Most likely. GUI tools have one major disadvantage – they decouple you from what is really going on. _Especially_ when you start with MongoDB, my suggestion is to use the shell.

[14:48:55] <gzoo> kurushiyama, well thank you for now. You did clarify some things.

[15:04:55] <Lope> kurushiyama: some people make the argument that CLI tools do the same.

[15:05:32] <Lope> kurushiyama: but generally I agree. GUI tools tend to be overly simplistic and leave out a lot of details.

[15:06:07] <kurushiyama> Lope Well, it does not get any closer to the data except for manipulating the datafiles directly, which _maybe_ is not the best idea... ;)

[15:06:37] <Lope> kurushiyama: every GUI/CLI tool pair is unique.

[15:20:56] <spleen> Hello All, i am new in mongodb and I have some questions

[15:21:47] <spleen> I have a shared cluster with some sharded collection

[15:22:22] <spleen> i splited the collection-A and moved it to a new shard

[15:22:53] <spleen> sh.splitAt('collection-A',{ "_id" : { "d" : NumberLong(20160626)}})

[15:23:43] <spleen> sh.moveChunk('collection-A',{ "_id" : { "d" : NumberLong(20160626)}},'MyNewShard')

[15:24:27] <spleen> Today, i have a doupt the collection-A is writting on MyNewShard...

[15:25:08] <spleen> it is critical cause my OldShard is near full on disk space...

[15:25:25] <spleen> How could i ensure that my operations are OK.. ?

[15:25:55] <spleen> Thanks for your help

[15:26:04] <kurushiyama> spleen Is this a monotonically increasing NumberLong?

[15:27:04] <kurushiyama> spleen Like an AUTOINCREMENT from SQL world?

[15:28:02] <spleen> kurushiyama, yep

[15:28:12] <kurushiyama> spleen Bad news.

[15:28:27] <kurushiyama> spleen You could not have made a worse mistake,

[15:28:53] <spleen> kurushiyama, NumberLong is a date of the day i made the operation

[15:29:35] <spleen> kurushiyama, what you mean

[15:30:04] <kurushiyama> spleen Here is the problem. When you shard you collection, it gets split in logical units called chunks.

[15:30:29] <kurushiyama> spleen Those chunks hold a range of documents.

[15:30:53] <spleen> yes

[15:31:11] <kurushiyama> spleen Each shard gets assigned a range of shard keys. Lets keep it simple and say you have two shards.

[15:31:45] <kurushiyama> spleen one is assigned the range from -infinity to X, the other from X+1 to + infinity.

[15:32:07] <spleen> ok

[15:32:15] <kurushiyama> Let us say X is 20160626

[15:32:38] <kurushiyama> Or better 20160625

[15:32:58] <kurushiyama> so each and every new document gets written to shard2

[15:33:24] <kurushiyama> spleen since it is assigned the key range from 20160626 to +infinity.

[15:34:04] <kurushiyama> spleen Now, after a while the chunk gets too big and is split.

[15:34:49] <kurushiyama> spleen And again, and again. Now, when there is a certain mismatch between the number of chunks on shard1 and shard2, the balancer kicks in and moves chunks from shard2 to shard1

[15:35:22] <kurushiyama> spleen The metadata is updated accordingly, and all live happily ever after... ...except, you dont.

[15:35:58] <kurushiyama> spleen Say now shard1 is assigned the range from -infinity to 20160627

[15:36:12] <kurushiyama> spleen And again, all new documents are only written to shard2

[15:36:21] <kurushiyama> spleen repeating the process.

[15:37:04] <kurushiyama> spleen In a worst case scenario (and I have seen those), data comes in faster than the balancer can ... ...well, balance out the mess.

[15:37:28] <kurushiyama> spleen Which is very likely what you are experiencing now.

[15:38:18] <kurushiyama> spleen And here come the really, really bad news: You can not change the shard key.

[15:39:40] <kurushiyama> spleen You basically have to set up a new sharded collection, with a better shard key selected, and migrate the data.

[15:42:43] <kurushiyama> spleen You ok or were you hit by a stroke?

[15:43:20] <kurushiyama> Ouch...

[18:03:39] <Industrial> Hi!

[18:04:04] <Industrial> Say I have a { ..., pageViews: [ { ..., time: 0 } ] }

[18:04:14] <Industrial> How do i query for all time that are greater then 0?

[18:04:23] <Industrial> `db.getCollection('processed').find({ pageViews: [{ time: { $eq: 0 } }] })` returns 0

[18:04:40] <Industrial> so I reckon a $gt will not work either

[18:15:00] <teprrr> .find({ pageviews: { time: { $gt: 0 } } }) ?

[18:15:08] <teprrr> that is, without this []

[18:15:37] <teprrr> I'm not sure, but that'd be my guess. I've been told here before, that it's not very clever to do depe nesting though :)

[18:16:56] <cheeser> Industrial: https://docs.mongodb.com/manual/tutorial/query-documents/#query-on-arrays

[20:26:25] <Industrial> join #loopback

[20:26:26] <Industrial> ups

[21:55:57] <TommyTheKid> I have inherited a mongodb cluster, that is in docker containers... Is there an advantage of having mongo server running in a docker container? The "host" is an EC2 node. The nodes were built over 10 months ago, and are still working, but they haven't been touched for updates, and that scares me.

[21:59:07] <bjpenn> i have a lot of open connections on mongo thats killing the db load, anyone know how i can kill those connections?

[21:59:11] <bjpenn> the least impactful way?

[22:21:23] <oky> TommyTheKid: dockerdockerdockerdockerdockerdocker

[22:21:37] <TommyTheKid> oky: :D

Log file Viewer

Help | Karma | Search:

#mongodb logs for Tuesday the 28th of June, 2016