[01:04:08] <Kota> hey i was in kind of late this morning and no one was around, but i'm having an issue with a duplicate index on a collection. http://puu.sh/gN3tr/7ff013951f.png i've got two _id_ indexes ascending on _id and I can't get rid of either of them
[01:10:42] <Boomtime> Kota: can you check that via the mongo shell please?
[01:33:11] <Kota> Boomtime: confirmed via shell http://puu.sh/gN5xP/3a971f544c.png
[01:33:30] <Kota> field order variance, but otherwise identical
[01:43:15] <Kota> okay that's just messed up... i attempted to make another index, with a different name and different field and it duplicated my _id_ index yet again
[01:43:43] <Kota> i give up. indexes just hate me today http://puu.sh/gN6ik/b828469d0b.png
[02:29:30] <Kota> Mon Mar 23 19:21:42.132 DATABASE: yhr to dump/yhr Mon Mar 23 19:21:42.133 yhr.chat_pms to dump/yhr/chat_pms.bson Mon Mar 23 19:21:44.302 2064704 objects Mon Mar 23 19:21:44.302 Metadata for yhr.chat_pms to dump/yhr/chat_pms.metadata.json
[02:29:57] <joannac> you don't have a system.indexes.bson file?
[02:30:16] <Kota> oh i only exported that collection. one moment
[02:35:10] <Kota> joannac: i do not have an upload site handy, however I can send via email
[05:44:59] <Kota> GothAlice: https://gist.github.com/dabaer/e21f44ea7a5152525fd9 was what i used
[05:45:20] <GothAlice> Using futures, have some easy multiprocess parallelization. What isn't covered by the timings at the top is the resource utilization. Single-process was using ~33% of one core, ~11% on the mongod core. Four-process was using ~80% of each of four cores, 60% of one mongod core. (Much better utilization.)
[05:45:44] <GothAlice> Hmm, that'd be why you're not blocking at all. XD Each operation is quick, and you're not doing any batching at all.
[05:46:23] <GothAlice> (I really ought to be doing proper batching, but, eh… I'm lazy, and this is a rarely-run benchmark I use to test WiredTiger.)
[05:48:34] <Kota> i copied the data to the new collection, renamed the broken index collection, renamed the new collection to what the broken was before, and now the renamed broken collection has two duplicate indexes for _id and the new collection has all of the broken collections old indexes...
[05:48:34] <GothAlice> If the process is aborted part-way through for any reason, though, you'll understandably need to repair the indexes.
[05:49:02] <Kota> no i deleted the collection that resulted from copyTo()
[05:49:05] <GothAlice> Sounds like something somewhere is adding indexes automatically. Any ensure_index calls as import-time side effects?
[05:53:16] <GothAlice> (Heh, checking some of mine to make sure duplicate _id_ isn't actually normal, I notice that none of mine are named anything useful: t_p_1_t_o_1__id_1. wut.)
[05:53:44] <Kota> i'm going to close mongo and just use shell for the entire process
[06:02:47] <Kota> http://hastebin.com/atihuyexad.json - post rename snapshot...
[06:03:09] <GothAlice> You lost one just renaming the collection?
[06:03:26] <winem_> hi together. I guess some of you will already have passed the m202 course (advanced deployments and operations). can you tell me if the MMS is only part of the stuff for the first week or in all weeks?
[06:04:14] <Kota> i thought this was just a normal okay message, but renaming the new collection did not produce it, but that copy on the broken collection returned this error: http://hastebin.com/ucamevudef
[06:05:25] <GothAlice> Number of records alone isn't sufficient to say, this data is the same as this other data. ;)
[06:06:05] <Kota> i haven't verified every record obviously but at first glance the data is what should be there
[06:06:08] <GothAlice> That type of namespace error is a "bad sign" and warrants more scrutiny, and/or a --repair.
[06:07:06] <Kota> what should my next course of action be?
[06:07:47] <winem_> GothAlice, please tell me more about hashing the BSON documents. I would expect issues with timestamps or die id objects. I will wait until Kota is happy and knows his issues fixed :)
[06:08:12] <Boomtime> Kota: what version did you say this was?
[06:08:21] <GothAlice> MongoDB out of the box doesn't validate the wholesale integrity of every BSON value it's handed. (Some things, such as the well-fromedness of records to insert it takes on faith unless an option is given to explicitly check.) This means it's fully possible to do some _really_ crazy things with your BSON documents, either on purpose, or accidentally.
[06:08:32] <Boomtime> yep, thanks, i thought 2.4.6 for some reason
[06:09:13] <GothAlice> For example, one could construct a single document with multiple _id values. (Due to the way the document is stored, repeated keys are technically allowable.)
[06:09:13] <Kota> is there a validator i can run it through?
[06:10:21] <GothAlice> Ah! Never mind, it _was_ 2.4 where they fixed that.
[06:10:30] <GothAlice> Thought it was 2.6. (They changed the default, apparently in 2.4, to check-by-default.)
[06:10:41] <Kota> well for now i'm going to update the application component that uses that table, since the new collection is fine, only when i rename it to that old name does it screw up
[06:11:02] <Kota> somethings sticky with the indexes
[06:12:43] <Boomtime> indeed, that is what that assertion is checking
[06:13:08] <GothAlice> Huzzah. I assume the failing assertion means a record wasn't returned?
[06:13:21] <GothAlice> Oh, never mind, that's just on rename.
[06:13:30] <Boomtime> somehow, despite the collection not being there, the index assertion of 'not present in destination' is failing.. meaning it trips over something that shouldn't exist
[06:13:58] <Boomtime> when you delete a collection, apparently the _id index is not being deleted with it
[06:14:51] <GothAlice> If there's multiple and an unhanded "multiple documents returned" (or equivalent) exception in the loop (by name?) to clear the indexes…? Fair number of assumption in that hypothesis.
[08:02:57] <ZorgHCS> girb1: This is exactly why iptables exist, I don't understand why that's not an option?
[08:06:13] <girb1> ZorgHCS: I need t give only read access for a mongo router , all writes should be forbidden even though I have implemented user auth
[08:06:28] <girb1> same in what we do with postgres
[11:52:27] <lxsameer> Is there any work around to store version numbers as keys in the mongodb Hash ?
[13:59:58] <StephenLynx> adding pretty() at the end of the query should provide enough visual aid.
[14:12:08] <ggoodman> Just had my production 3.0.1 primary crash with an OOM error. Anyone around that can help me diagnose the root cause and help me prevent this from happening again?
[14:35:02] <latestbot> I have referenced an id in an array in another document, how can I remove it in mongodb?
[14:41:24] <latestbot> Unset seems to remove the field completely
[14:42:37] <latestbot> Suppose I have a field like this in a document, “tags” : [ "550afb5971634aa331d010f3", "550afb6071634aa331d010f4", "550afb6a71634aa331d010f5" ], I just want to remove the first item say. What would I use?
[14:43:33] <Derick> '$unset' : { 'tags.0' : true } — ought to work
[14:52:20] <Alittlemurkling> Hello #mongodb, I'm attempting to construct a hierarchy of distinct medical license specialties. Doctors can have multiple licenses, so I store them in an array of subdocuments. It basically looks like this: {"_id": "1170", "licenses": [ { "specialtyCategory": "Internal Medicine", "specialtyName": "Sleep Medicine"}, {"specialtyCategory": "Pediatrics", "specialtyName": "Sleep Medicine"}] }
[14:53:40] <Alittlemurkling> I would like something that gives the same result as the distinct command, but have been unable to reproduce it with $aggregate.
[14:55:45] <Alittlemurkling> So far, I have come up with this: db.provider.aggregate({$match: {"licenses.specialtyCategory": "A"}}, {$unwind: "$licenses"}, {$match: {"licenses.specialtyCategory": "A"}}, {$project: {"licenses.specialtyName": 1}})
[14:57:11] <d0x> Any Idea why I get a java.io.IOException: cannot find class com.mongodb.hadoop.mapred.BSONFileInputFormat with Hive on EMR using the Mongo-Hadoop Connector? This Bootstrap script i used: https://s3.eu-central-1.amazonaws.com/christian.emr/bootstrap.sh
[14:58:17] <Alittlemurkling> This is how I tried to use distinct: db.provider.distinct("licenses.specialtyName", {"licenses.specialtyCategory": "Pediatrics"})
[14:59:03] <Alittlemurkling> But that would return both specialtyNames (ah, if they were different. Pardon the simplcity of my example).
[15:08:59] <latestbot> Looks like in my case $pull is much better
[15:09:14] <latestbot> Since I am removing the items by their value
[15:11:09] <Derick> latestbot: yes, but that's not what you asked ;-)
[15:11:22] <latestbot> hehehe, sorry for the confusion
[15:14:15] <StephenLynx> Alittlemurkling I think that mongo might not be the tool you need.
[15:15:36] <StephenLynx> on the first example, what collection is that?
[15:15:39] <Alittlemurkling> StephenLynx. Okay. I will just construct the relationship when I import the data.
[15:26:42] <StephenLynx> so its two different objects, category and specialty?
[15:28:57] <StephenLynx> then the levels ARE defined.
[15:29:18] <StephenLynx> you told me they weren't :v
[15:29:43] <Alittlemurkling> There is a license document. It looks like this: { "specialtyCategory": "A", "specialtyName": "B", "subSpecialty": "C", "licenseNumber": "1", "licenseState": "CO" }
[15:34:55] <Alittlemurkling> Okay. I guess this just means I will construct the relation at import.
[15:35:27] <StephenLynx> on a second though, any relations can be bad depending on the query.
[15:35:45] <Alittlemurkling> Yeah. Which is why I have been embedding everything.
[15:35:56] <pamp> Hi, I have a compound index {MoName: 1, "P.k": 1}, I can use the index prefix {MoName:1} in a query like db.test.find({MoName:"UtranCell"}).hint({MoName:1})
[15:43:35] <pamp> I cant use only the prefix of a compound index?
[16:39:33] <imachuchu> I have an older mongodb instance (~2.0) and am wondering if there is a way to backup/restore it to a more modern version keeping objectid relationships valid?
[16:41:10] <cheeser> there's no real relationship as such. just common values that your app correlates.
[16:43:11] <imachuchu> cheeser: the dump format changed in 2.2 so importing from an older dump kindof works, but not really. I'm more wondering if there is a conversion script somewhere I'm missing
[16:43:40] <cheeser> you could just walk your way up through the upgrades...
[16:43:51] <cheeser> you don't need to dump/restore each time.
[16:44:45] <imachuchu> cheeser: so like install 2.1, then 2.2, then 2.3, etc?
[18:39:07] <d4rklit3> I have a mongodatabase on compose.io and another one ij ust made on a dedicated server somewhere. Is it possible to clone the one from compose.io to my new one via the mongoshell on my dedicated host?
[20:15:51] <afroradiohead> I guess I was wondering if I can do an upsert request like "oh these indexed values are duplicates, let me just update this document"
[20:28:29] <afroradiohead> working very well. But yeah that rest of the info was icing on the cake
[20:29:35] <btwotch1> Hi, I am trying to retrieve an object from the mongodb in go via a nested property: http://pastebin.com/tcn4tvtG - does anybody have some helpful comments? thx in advance
[20:48:13] <btwotch1> found it: http://pastebin.com/7FwFyMUC
[20:49:26] <StephenLynx> can't you just read your driver documentation?
[20:51:33] <btwotch1> it has no documentation about nested queries: https://www.google.de/?gfe_rd=cr&ei=7c0RVdOGA6mF8QfbkIDICA&gws_rd=ssl#q=site:labix.org%2Fmgo+nest
[21:29:10] <Progz> I have no insert in this DB for the moment.
[21:34:32] <imachuchu> Progz: no, I think it's only best if the indexes fit into ram. If the db hasn't been queried for a while/at all since starting the server it's completely normal to have them not loaded yet
[21:35:38] <Progz> I have more than 1000 websites online. and I have a lot of request in my db. but nothing in ram
[21:38:31] <GothAlice> Progz: You might want to look at your "page fault" counts. If the number is high (which it very likely will be) then MongoDB can't efficiently utilize RAM because the kernel is constantly swapping out chunks behind-the-scenes. (This makes things very, very slow.)
[21:39:43] <GothAlice> Note that the majority of the memory MongoDB uses isn't allocated, it's memory-mapped on-disk files. These will typically show up not as "real" usage, but as "caches", because the kernel can freely move things around to free up the RAM if something else needs it.
[21:45:43] <Progz> ok GothAlice so when I see with mongostat in the column faults number around 132. is it a lot ?
[21:53:43] <GothAlice> Progz: What's the vsize being reported for you?
[21:54:02] <GothAlice> (res size would be the "working set" memory used to process queries, map/reduce, and manage connections, etc.)
[22:00:36] <GothAlice> 30GB of RAM, assuming 10% overhead in general, means you have 14% as much RAM as you need for that dataset to fit entirely. What's the aggregate index size? (The size of all indexes across all collections?) Not sure the easiest way to get this information other than by getting db.stats() on each DB.
[22:01:06] <Progz> I only have 1 db (not conting db admin)
[22:01:36] <d4rklit3> on ubuntu how do i start the mongo service with --auth
[22:02:22] <GothAlice> Progz: Well then. The page faults are the kernel going out to disk to answer a query by MongoDB. This means it'll suffer amazingly bad performance, since you're going back to disk for potentially large chunks of data, in some cases, more than 200 times per second. You really, really, really want to invest in sharding at this point.
[22:03:19] <GothAlice> Notably also http://docs.mongodb.org/manual/tutorial/deploy-shard-cluster/ and http://docs.mongodb.org/manual/tutorial/choose-a-shard-key/ to ensure your data gets "evenly" spread.
[22:04:11] <GothAlice> This is MongoDB's official way to split data between multiple hosts in order to bring the per-host data set size below the RAM size of the individual hosts and regain optimal performance.
[22:04:32] <d4rklit3> nvm i got it was in the conf file, however I can't seem to be able to connect to the server with auth.
[22:04:41] <d4rklit3> i set up a user on the db in question and those credentials don't work
[22:04:45] <Progz> ok GothAlice I am following a mooc in the mongodb university. I was thinking about sharding but never try do configure one ^^
[22:05:07] <Progz> Thanks GothAlice ! I have some reading to do
[22:05:43] <Progz> GothAlice: result of db.stats => http://pastebin.com/LaGrdyZS
[22:06:00] <GothAlice> d4rklit3: Several potential issues: did you upgrade to 3.0 from any prior version? If your DB host is 3.0, are your client drivers up-to-date and compatible? (Both ref: http://docs.mongodb.org/manual/release-notes/3.0-compatibility/) Have you already added users? If not, you'll need to use the localhost exception to add the first user, which should be an admin. For this, ref: http://docs.mongodb.org/manual/tutorial/enable-authentication/
[22:06:16] <d4rklit3> i can connect fine if auth = false
[22:07:29] <GothAlice> Progz: Alas, that doesn't quite work. Think of it like a "most recently used" cache of file pages in RAM. Some of those pages will be indexes, sure, but to the kernel, it has no way other than frequency of access to determine if a page is "hot or not", so in some cases your indexes might get paged out to make room for other data being paged in.
[22:09:07] <GothAlice> d4rklit3: The other hint of mine was going to be "are you authenticating against the right database?" Since MongoDB users are local to a database, such as the "admin" DB, you may have to specify if you created them in the wrong one.
[22:09:29] <GothAlice> s/specify/specify when authenticating/
[22:10:06] <d4rklit3> i used robomongo with an ssh tunnel to create a user on my db
[22:10:19] <GothAlice> Please, for now, don't use RoboMongo with 3.0.
[22:10:47] <Progz> GothAlice: Arf... do you know another GUI client ?
[22:10:51] <GothAlice> In the last few days I've seen it doing terrible, terrible things, like creating duplicate indexes on _id, failing 3.0-style authentication, etc.
[22:11:36] <Progz> GothAlice: since today robomongo doesn't displayanymore my indexe xD
[22:12:19] <GothAlice> Seriously, what does everyone find so attractive about robomongo over a syntax highlighting shell? I find the "(0) {…}" list items to be purposefully obstructing, not helpful. :/ It hides information from you, and doesn't empower you in terms of querying any more than a bare shell. :/
[22:12:37] <d4rklit3> getting auth failed through terminal
[22:15:32] <d4rklit3> so robomongo would be able to connect authless fine but not with auth?
[22:15:41] <GothAlice> Progz: In terms of finding commands, nothing beats the documentation, which also explains the output of those commands. ;) Even I refer back to the documentation a lot. (There's a reason I'm so quick on the docs.mongodb.org links…)
[22:17:05] <GothAlice> However, I'll repeat my warning from earlier: I've seen robomongo do *destructively* terrible things to people's data in the last week.
[22:17:19] <d4rklit3> i usually just use it to visualize stuff
[22:18:31] <GothAlice> That's one of the issues encountered: just connecting and fetching a collection created a duplicate _id index for one user. (There may, or may not have been other things going on, but _not_ connecting robomongo after cloning the collection from one to another in Python didn't exhibit the duplicate index, firing robomongo up, it spontaneously appeared.)
[22:19:09] <d4rklit3> i want to see if the damn framework even supports 3.0
[22:19:24] <d4rklit3> the IT guys instaleld 3.0 on this server and left me with the config!! im just a programmer!
[22:19:28] <d4rklit3> i never said I do sysadmin shit
[22:19:29] <GothAlice> At least mongoose issues have historically been "it just does things weirdly and isn't particularly well documented when they choose to be wacky".
[22:21:19] <d4rklit3> at least thats how i understand it
[22:21:29] <d4rklit3> but im nowhere close to being a DBA
[22:21:38] <GothAlice> Indeed. Efficient, meaningful 12-byte storage of creation timestamp, application server ID, process ID, and auto-increment (per server process) counter.
[22:22:34] <GothAlice> Turn it into a string (as many mongoose users do), or worse, mix real ObjectIds and strings, and the support incidents begin. ;)
[22:30:18] <joannac> what doesn't work in the app?
[22:32:12] <GothAlice> d4rklit3: Yeah, pardon my Python-esque call syntax in the term. Pretend those are "{_id: " instead of "_id=". ;)
[22:58:04] <ThisIsDog> I'm using PyMongo 2.8 in my application. When I add a document to the database and it already exists, a DuplicateKeyError exception is thrown. When I do e.details on the exception, I'm always getting None. Under what circumstances is details not None?
[22:58:58] <ThisIsDog> I'm trying to get the document existing in the database that caused the DuplicateKeyError
[23:13:03] <afroradiohead> _id is supposed to be unique right?
[23:13:20] <afroradiohead> i mean, does _id in collections come default with a unique index?