[02:26:46] <sx> Hi, I did not change any settings, but somehow my collection is allowing for duplicate "_id" values... not sure how to fix.
[02:57:32] <sx> I create a new collection, "test", and then add a blank document. An "_id" is generated as expected. Then I insert another document with the same "_id" and it works fine. Now I have a collision on what I thought was a uniquely constrained index. Is this expected behavior?
[02:58:47] <Boomtime> @sx: are you doing this in the shell?
[02:59:12] <Boomtime> can you provide all the outputs, in gist/pastebin, including a find showing the duplicates?
[02:59:31] <Boomtime> (repeat for the room bounce)
[02:59:33] <Boomtime> @sx: are you doing this in the shell?
[02:59:38] <Boomtime> can you provide all the outputs, in gist/pastebin, including a find showing the duplicates?
[03:00:11] <sx> did I miss more than one message? It is not expected behavior right?
[03:01:07] <sx> Boomtime: I was working both in shell and in python shell and in adminMongo web gui, so I will try to reproduce in a single env
[03:52:56] <sx> Boomtime: I figured it out, the admin interface I was using was not differentiating between "ObjectId(xx)" and "xx"
[03:53:29] <sx> but in reality the objectIds were unique but I was able to override them with strings (which I think is normal)
[04:05:31] <Boomtime> @sx: yep, that sort of thing is somewhat common - always use the mongo shell to verify
[04:07:06] <Boomtime> and yes, _id is mostly a field like any other, it has the unique property of being immutable, and it may not be an array, but otherwise it behaves like a regular field
[04:07:38] <Boomtime> (actually, immutable is not strictly unique to _id, a shard key index gains this super-power automatically as well)
[06:14:37] <Spritzgebaeck> hello, it's possible to store the mongo db files in multiple files? our backup guys asks
[06:17:29] <Boomtime> MongoDB uses a path, specified by thew dbpath option, it expects to have complete ownership of the files in that path - there are multiple files for both WT and MMAP storage engines
[06:18:52] <Boomtime> the simplest method of backing up a mongodb instance is to shut it down and copy the entire dbpath content - if you have a replica-set then you can shut down a secondary which should have low/none operational impact
[06:21:21] <kurushiyama> Spritzgebaeck: I assume he wants to have 2Gb files?
[06:22:52] <kurushiyama> Boomtime: btw: I am not too sure wether you can not change the value of a shard key in a doc. I do not think it is immutable. Though I haven't tested, but it should be possible. However, you can not change the designated shard key.
[06:23:18] <kurushiyama> Boomtime: for a sharded collection, that is.
[06:23:24] <Spritzgebaeck> kraft: yeah, something like this
[06:31:20] <Spritzgebaeck> also thank you Boomtime
[06:31:47] <kurushiyama> Spritzgebaeck: I am not aware of a halfway modern filesystem, though, which can not deal with file sizes by far exceeding most storage capacities of today.
[06:35:04] <Spritzgebaeck> i think it's more to backup only one changed file and not the hole collection file, but this is not my problem :D so the backup guys have to save all files
[06:36:21] <kurushiyama> Spritzgebaeck: Uhm. Max file size for NTFS is 17TB... And no, they can not simply copy the files over.
[08:20:55] <mroman> Anybody tried versioning documents in mongodb?
[08:21:54] <mroman> for example you have data sets, you analyze them, produce results. but some data sets might turn out to be wrong later.
[08:23:27] <kurushiyama> mroman: So, remove them and rerun analyze? Fail to see why versioning is needed.
[08:24:11] <kurushiyama> mroman: Good morning, bt.
[08:57:51] <master_op> but i have some questions start using this tool
[08:58:34] <master_op> is the data inserted in mongodb will be copied into elasticsearch or just indexed
[08:59:20] <kurushiyama> master_op: Better to ask there, then. Personally, I suggest https://github.com/richardwilly98/elasticsearch-river-mongodb
[08:59:58] <master_op> i'm not talking about river plugin
[09:00:40] <master_op> i'm talking about mongo-connector which is under mongo-labs in github
[09:03:43] <kurushiyama> master_op: Believe it or not, I _can_ read. It is just that I'd rather suggest using the river if you want to connect to elasticsearch. Mainly since I know it works.
[09:04:20] <kurushiyama> master_op: With an emphasis on "know", which I can not say for the connector ;)
[09:07:23] <master_op> thank you kurushiyama, but river can't support ES 2.x, anyway, thank you
[09:08:33] <kurushiyama> master_op: https://github.com/mongodb-labs/mongo-connector/wiki/Usage-with-ElasticSearch#elasticsearch-indexes-mappings-and-types As I read it, it is only an index ;)
[09:09:14] <Derick> didn't ES 2.x do away with rivers?
[09:09:54] <kurushiyama> Derick: May well be, never used 2.x
[09:10:04] <master_op> thank you, i will test it now
[09:10:24] <Derick> I heard some rumours - I don't recall exactly
[09:10:31] <kurushiyama> Derick: Rivers were deprecated in Elasticsearch 1.5 and removed in Elasticsearch 2.0.
[09:11:35] <kurushiyama> A shame, if you ask me. But that's the way it is.
[12:08:02] <Keksike> I'm updating our servers mongodb from 2.6 to 3.2. How can I make sure that the mongodrivers (our software is in Clojure) are up to date?
[12:30:51] <kurushiyama> Keksike: Uhm... You use Monger?
[12:32:28] <kurushiyama> Keksike: Maybe you should ask the to maintain a compat matrix. Unless you have verified, I would not continue with the update. Wire proto has changed, iirc.
[12:37:35] <kurushiyama> Keksike: Or maybe you can get a hold of Michael somewhere and ask him directly.
[12:53:15] <aguilbau> Hi, I want to update 2 fields in a document, either one of the other but not both. They can be recognized by their value. Is it possible in a single query ?
[13:03:55] <kurushiyama> markizano: cheeser > a node can only be a member of one replica set
[13:04:12] <kurushiyama> markizano: A mongodb node, that is.
[13:04:45] <markizano> right - we were debugging a thing last night, and this was unrelated to what we were discussing, but it was a question I was curious about... so my idea that you'd need 1 arbiter per replSet was correct.
[13:05:00] <kurushiyama> markizano: And saving on an arbiter is really, really the wrong thing to do. What are even Softlayers prices for a cheapo VPS nowadays? Like 10 bucks?
[13:05:28] <markizano> kurushiyama: nah, price for a vps wasn't a concern - we've got that covered.
[13:05:36] <markizano> it was more along the lines of the mechanics of - does the software allow this.
[13:06:45] <kurushiyama> markizano: The reason why I would not do it is because when the dual use arbiter machine goes down, the chances for both of your replsets to become unavailable increase.
[13:06:59] <markizano> oh - good point, kurushiyama
[13:08:11] <kurushiyama> markizano: But one question crosses my mind: Why do you have two replsets for the same DB, even when they don't overlap? I'd rather shard, then.
[13:10:03] <kurushiyama> markizano: Last but not least I am not too much of a fan of 2+1 replsets, anyway. You loose redundancy during maintenance.
[13:44:29] <Ange7> i'm sorry but i already had this error, but i don't understand : i try to update one row in my mongocollection : collection.update({_id: A}, {field: foo}) with option upsert=true but i have this error : duplicate key error collection: db.collectionName index: _id_ dup key : { A }
[13:44:41] <Ange7> Ok, i want update this key, so i don't understand why duplicate key ?
[15:05:50] <mroman> hm newer mongoimport versions don't show documents/second
[15:05:59] <mroman> but just list how much MB they have imported so far
[15:52:38] <hiro`> Hey all. Having trouble getting mongo to start up. I can run it okay as root, but otherwise, `mongod` will fail to properly startup, giving me errors about "wiredtiger_open: /data/db/WiredTiger.lock: Permission denied"
[15:53:09] <hiro`> Although WiredTiger.lock seems to have all the proper permissions.
[15:54:47] <StephenLynx> start it using a service instead of booting it manually.
[15:54:55] <StephenLynx> what is your init system?
[17:54:05] <Gloomy> I'm trying to manipulate a mongodb object client-side, any idea why it looks like this? https://www.dropbox.com/s/qtganali7kgcbwy/Screenshot%202016-04-13%2019.45.34.png?dl=0
[17:54:26] <Gloomy> is it normal? why doesn't it have a normal JSON object structure?
[17:54:42] <Gloomy> (Sorry if it's a dumb question, I'm really new to all this)
[17:55:41] <kurushiyama> Gloomy: Ok, my first advice then: Ditch all graphical tools. All and any. Use the mongo shell. Nothing else.
[17:56:14] <StephenLynx> why is it an image to begin with?
[17:56:44] <kurushiyama> StephenLynx: Hard to make a text of a graphical tool, I assume ;)
[17:56:54] <Gloomy> Hmm, I'm building a website, kind of hard without a browser :-) that's a snapshot from firebug, maybe I should have begun by stating that
[17:57:50] <StephenLynx> how is any of that related to the database?
[17:57:57] <kurushiyama> Gloomy: Well, firebug doesn't help us here.
[17:58:09] <StephenLynx> yeah, it already went through your back-end
[17:58:14] <StephenLynx> lord knows what happened there.
[17:59:15] <StephenLynx> in the future, use the terminal client with either findOne() or .pretty() to format the output of the data you want to show and paste the text somewhere
[17:59:23] <StephenLynx> I saw it was an image linked and didn't even bother
[18:00:28] <kurushiyama> Gloomy: Do not take it wrong, but not all of use a MEAN(-ish) stack. Personally, I avoid EAN like the devil holy water.
[18:02:14] <Gloomy> Well that's the thing, I don't know where exactly things get messed up. The data seems to be stored fine in the database, but the object structure is a bit weird once it gets out from ... mongo? monk? express?
[18:02:47] <StephenLynx> youre using linux on the machine that is holding the db, right?
[18:02:52] <Gloomy> Why are angular and express so bad?
[18:03:00] <kurushiyama> Gloomy: Ok, have you _every_ connected to your MongoDB instance using the shell?
[18:04:18] <Gloomy> kurushiyama Yes, didn't dive in very deep though
[18:04:27] <StephenLynx> node's api is already a high level tool for you to work, so as the browser JS api.
[18:04:46] <StephenLynx> that makes both of them redundant, but now you are adding a whole new layer of stuff to break
[18:05:09] <StephenLynx> not to mention the performance loss.
[18:05:20] <cheeser> that assumes node is your back end, for one.
[18:05:30] <StephenLynx> yes, because I was talking about express.
[18:05:52] <StephenLynx> but you can say the same for any other web framework for scripted runtime environments aimed at web.
[18:06:08] <StephenLynx> I wouldn't say the same for a framework for C, for example.
[18:06:21] <kurushiyama> Gloomy: Ok, to join the choir. I am kind of with StephenLynx here. People take MEAN and think it will be easy to implement something and basically it will do itself. Nothing could be farther from the truth, imho, since it decouples you from what it is going on in the background. Is it possible to achieve awesome results with MEAN? It sure is. But only if you know your tools.
[18:06:56] <kurushiyama> Down to the bottom, that is.
[18:07:04] <StephenLynx> that too, people skip the fundamentals of the tools they are using and their software end up as black boxes to themselves.
[18:07:14] <StephenLynx> something goes wrong and they have no idea how to fix.
[18:08:11] <StephenLynx> the whole idea of full stacks is retarded, too. it makes no sense to talk about the front-end and the database at the same time.
[18:08:12] <kurushiyama> Gloomy: Ok, enough bashing. First, we need to see your structure. Please pastebin a sample document. You can get one with db.yourcoll.findOne().pretty()
[18:08:17] <Gloomy> Here is the text version directly from the mongo shell. looks similar to what I had in Firebug, it's just not the Json object structure I'm used to?
[18:12:21] <Gloomy> And yes, I see what you mean (no pun ;-) ). I find it much easier to learn by breaking things though than going through the whole theory first)
[18:12:30] <StephenLynx> is not through the theory
[18:12:35] <StephenLynx> is through the actual software
[18:27:57] <kurushiyama> cheeser: Rather -- an Ubuntu. With GUI. Guess for mlogvis
[18:27:57] <Gloomy> Ok, perfect. One last question, I have no knowledge whatsoever on databases. Is learning a noSQL DB as first approach detrimental in any way?
[18:28:52] <kurushiyama> Gloomy: Imho, it is better to learn a few NoSQL databases first, and than SQL, which imho is devils work, anyway (and always was).
[18:29:35] <Gloomy> The devil sure seems to be lurking in a lot of places.
[18:30:26] <kurushiyama> Gloomy: I tend to suggest MongoDB for doc databases (unsurprisingly), Cayley as a graph database (since it build up on your MongoDB knowledge), Redis as KV and Cassandra, which is sort of a type on its own.
[18:30:45] <StephenLynx> no. databases are just a single software holding lots of data.
[18:31:04] <StephenLynx> just don't end up thinking there is the One True Way for databases.
[18:31:20] <StephenLynx> other kind of databases also have their use.
[18:35:41] <StephenLynx> a no-sql relational db would be the tits
[18:36:14] <StephenLynx> while I don't dislike relational dbs, I think SQL had it's time.
[18:36:26] <StephenLynx> and using two languages on the same code is unpleasant to say the least
[18:39:27] <uuanton> anyone knows how to reload: Job for mongod.service failed. See 'systemctl status mongod.service' and 'journalctl -xn' for details ?
[18:39:40] <uuanton> after machine reboot it works
[18:50:55] <kurushiyama> StephenLynx: Well, not in the RDBMS sense, but you can relate data appside, right? And within limited capabilities, $lookup does help.
[18:51:13] <StephenLynx> the database is not relational.
[18:51:14] <cheeser> the database supports references. it just doesn't enforce referential integrity.
[18:51:25] <StephenLynx> it doesn't support if the reference is not solved at the database.
[18:51:38] <StephenLynx> dbrefs are just a syntactic sugar.
[18:52:09] <kurushiyama> Anyway, you can have similar approaches on cassandra. And the query language is SQLish
[18:52:34] <StephenLynx> of course you can have similar approaches on cassandra.
[18:52:42] <StephenLynx> any db would have to be broken if you wouldn't be able to do that.
[18:53:01] <StephenLynx> "you can read a value and query based on this value"
[18:56:20] <kurushiyama> StephenLynx: See ;) But does SELECT * FROM numberOfRequests WHERE cluster IN ('cluster1', 'cluster2') read familiar?
[18:56:41] <StephenLynx> that is exactly what I want to avoid.
[18:57:02] <StephenLynx> and again, I am not talking about syntactic sugar.
[18:57:19] <StephenLynx> I am not interest in how I interface with the db, but on how the db operates internally.
[18:59:02] <kurushiyama> StephenLynx: That was not really clear to me. There was a good screencast on why SQL clustering is pretty hard to do... Can't remember really where I saw it. Darn
[19:03:38] <kurushiyama> cheeser: I'd guess that $lookup is best to be used when you already have narrowed down to (very) few docs and can save a $in query by using $lookup.
[19:05:28] <kurushiyama> StephenLynx: I do some SO for fun... ...and yeah, I am a bit morbid... You would not believe what people are complaining about...
[19:07:48] <kurushiyama> My current fav: "I do an aggregation on 7M docs, match 45k docs, generate a new field, sort by that field and all that takes 8 seconds!"
[19:09:23] <kurushiyama> kexmex: Time it takes to get this or that lock or ticket, iirc.
[19:09:38] <kexmex> kurushiyama : why would that be high spontaneously
[19:10:06] <kurushiyama> kexmex: It is _blazing_ fast, with less than 0.2 _milli_seconds per doc...
[19:11:03] <kurushiyama> kexmex: And this is just counting the matched ones...
[19:11:32] <kurushiyama> kexmex: A guy on SO complained, not me.
[19:11:33] <kexmex> match should be really really fast
[19:12:05] <kexmex> if indexes are good and are in memory i guess
[19:12:19] <kurushiyama> kexmex: Still, with 7M docs it is not negligible.
[19:13:20] <kurushiyama> kexmex: And then, the 45k docs werent projected...
[19:14:21] <kurushiyama> kexmex: So unless those 45k docs happen to be in the working set, there are quite some read ops involved.
[19:18:21] <kexmex> kurushiyama : that part yea..IO
[19:21:18] <kurushiyama> Or to put it different. Selecting 45k docs out of a set of 7M, missing docs read from disk, then sorted by an unindexed field which had to be created by an operation before the sort could even start. ;)