[01:56:55] <GothAlice> joannac: The reason I ask is that my old Okapi implementation stored the weighted, normalized keywords in a separate collection from the records themselves. Ranking happens on that separate collection, and moving the data in is disadvantageous.
[02:28:49] <GothAlice> My kingdom for an ORDER BY `id` IN […]. This Python generator is getting pretty funky.
[02:31:40] <speaker1234> is it safe to use _id as a universe wide record identifier
[02:32:45] <GothAlice> speaker1234: Yes. In fact, MongoDB assumes that it is, automatically providing a unique index across it.
[02:33:30] <GothAlice> speaker1234: ObjectIDs are very carefully designed to avoid the problem Twitter encountered when they became popular: how do you have multiple machines generate unique IDs without collisions? Twitter solved this by creating an entire service platform to do nothing but generate new auto-increment IDs.
[02:33:31] <speaker1234> so if a worker be was to change state, all it needs is the _id field
[02:34:30] <GothAlice> MongoDB solved this by providing IDs that combine several different values that are gaurenteed to be unique together. ObjectID combines a UNIX timestamp, a per-process auto-increment ID, and some form of host/process identifier.
[02:34:43] <GothAlice> speaker1234: So yes, all you need is the _id. ;)
[02:35:03] <speaker1234> it's late.I am being really focused and ignoring the names the dishes are calling me
[02:35:34] <GothAlice> If we cared what dishes thought of us, we probably wouldn't have invented them. ;^P
[02:36:52] <speaker1234> in looking at the dump, I see the _id is a dictionary of 1 entry. Do I pass that in as an atomic object?
[02:37:23] <GothAlice> speaker1234: At what point are you seeing that result? After the aggregation?
[02:37:40] <GothAlice> (And what are you $group'ing on?)
[02:39:02] <GothAlice> 5446e725124c580d8ee05d39 is the hex representation of the ObjectID.
[02:39:16] <speaker1234> k I got to get some sleep. chant later
[02:40:18] <speaker1234> I have spent the next two days justifying to one customer why ordinary IT best practices are not up for discussion and why all the efforts I've been doing for the past six months have been headed in that direction
[02:40:35] <GothAlice> speaker1234: You generated that ObjectID on 2014-10-21 at 23:07:17 :)
[02:40:55] <GothAlice> speaker1234: Good luck with that.
[02:41:03] <GothAlice> It's an argument I have with my managers on a regular basis. ;)
[02:41:22] <speaker1234> it's crazy. I've been struggling for six months trying to get the server room switched into hot/cold aisles to gain a little bit more efficiency because they won't install any additional air-conditioning capacity
[02:41:56] <GothAlice> That's right. An ellipsis of ellipsese.
[02:42:18] <GothAlice> It's going to require effort at *some point* to correct that…
[02:42:21] <speaker1234> there 10 Gb network is flaky because it was installed wrong. I get a vendor in, it cost $10,000 to do it right so no, instead we are getting Chinese made cat 6A cables and stringing them over the ceiling
[02:42:46] <GothAlice> Okay. I'm about to start phoning up Domokun for a hit job on some kittens.
[02:42:52] <speaker1234> if I didn't need the money I would've left a long time ago. I really need a new client
[02:43:09] <GothAlice> Being able to fire a client is a luxury most of us cannot afford, sadly.
[02:43:30] <speaker1234> I have a second client, the one I'm doing this work for which may be better but I really don't know. They're probably just crazy in a different way
[02:43:45] <speaker1234> on the plus side, the CTO is very understanding of what I'm fighting for, he has the same problems and is making the same amount of headway
[02:43:50] <GothAlice> All clients are crazy. Period. ;^)
[02:44:11] <speaker1234> that's what consultants charge so much. Is to put up with their insanity
[02:44:37] <GothAlice> 99% of my consulting work is convincing the client of the Right Solution™. There's usually an insane dichotomy between what they *want* and what they actually *need*—it's always an uphill battle.
[02:45:34] <GothAlice> (I.e. at work it's taken nearly a year to convince them that the scope of the current project was too small. At least now my managers are super-excited for what we will be able to offer the clients in the near future in accordance with my original vision, not theirs. ;)
[02:45:35] <speaker1234> I have been accused of trying to build a Michelin one star restaurant of an IT shop. In reality, I'm just trying to keep people from getting food poisoning
[02:50:23] <GothAlice> This is what #mongodb (IRC channel) has become. Discussing social media postings about IT/consulting irks. ;)
[02:51:24] <speaker1234> it's late. I'm sharing divorce recovery stories with a friend by texting and the dishes are still calling me nasty names
[02:52:28] <speaker1234> then I will let you plug away. I want to get up early so I can get a head start on traffic and the excrement storm for over the next three days (two days documenting, Friday morning final execution)
[02:52:43] <GothAlice> For some reason this week has gotten off to an insanely productive start with the release of the updated (and jumping from <100 to >600 tests) marrow.schema. Of course I've gotten to bed at 5am the last three days. T_T
[02:53:06] <speaker1234> if I can get this second customer up and running, then I'll have enough money coming in to tell them to go find a trained monkey
[03:30:29] <darkblue_b> I see this https://www.digitalocean.com/community/tutorials/how-to-set-up-a-scalable-mongodb-database
[03:31:37] <shoerain> GothAlice: late reply, I guess, I just wouldn't mind modelling my migration script after someone else's well designed one
[03:33:20] <GothAlice> shoerain: All migration scripts boil down to "execute this series of linear steps to upgrade" and "execute these other series of linear steps to downgrade". At work I don't even use a migration framework, they're just bare functions at the module level in Python. (Though I do have a package of nothing but migration modules with incrementing integer prefixes to track them.)
[03:34:06] <GothAlice> from m001_bootstrap import upgrade; upgrade() # in a production shell
[03:34:20] <joannac> darkblue_b: I don't know what that means. you either want a replica set, or a sharded cluster
[03:34:55] <GothAlice> darkblue_b: sharded cluster, for sure. Sharding allows for easier parallelization of your queries.
[03:35:17] <darkblue_b> a client wants to throw genomic data and do cluster analysis.. I assume they have thought about it.. I am new to this but can admin
[03:36:49] <GothAlice> shoerain: And for the *vast majority of changes* MongoDB reuqires no migrations whatsoever. Obsoleted attributes eventually decay (if deletes are regular), new attributes automatically get assigned on next update, etc. (This just *forces* you to handle both foo: {exists: false} and foo: '' cases on a regular basis, for example.)
[03:37:35] <joannac> darkblue_b: "I assume they have thought about it" is a bad assumption to make
[03:45:42] <darkblue_b> .. and this https://bugsnag.com/blog/mongo-sharding
[04:01:51] <darkblue_b> if I make a master mongodb node using say Debian/Ubuntu.. then put names like sh1.myweb.com sh2.myweb.com sh3.myweb.com in the /etc/hosts file.. is that enough ?
[04:02:06] <darkblue_b> or do I have to have the shard nodes resolve from the outside
[05:11:42] <darkblue_b> any thoughts on this ? http://rockmongo.com/
[05:12:05] <darkblue_b> dns server config for a collection of VMs ?
[05:12:43] <darkblue_b> NFS to mount disks to distribute data ?
[05:27:37] <Boomtime> darkblue_b: what do you mean by "NFS to mount disks to distribute data"
[05:28:11] <Boomtime> you can use replica-sets to distribute complete copies of data and achieve high-availability
[07:28:11] <josias> Hi i have a problem with the mongodb-basic-php driver (http://docs.mongodb.org/ecosystem/drivers/php/)(System: windows7, WAMP with php 5.5.15). I did everything as written, but there occures an error at apache startup: PHP Warning: PHP Startup: in Unknown on line 0
[07:28:44] <josias> and: PHP Fatal error: Class 'MongoClient' not found in [...]
[10:43:26] <ut2k3> hi guys our mongodb brings this error: Assertion: 10334:BSONObj size: -286331154 (0xEEEEEEEE) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: ObjectId('5439ac05a0e4b6b32bb923c9')
[10:45:19] <ut2k3> should db.repairDatabase() solve the thing?
[12:07:15] <johey_> Is the following possible? I have documents containing a 'nodes' key with a list of values, for instance {nodes: {'a', 'b', 'c', 'd'}}, {nodes: {'c', 'a'}} and {nodes: {'b', 'd'}}. Now I want to find all documents containing two given nodes, but only those that comes in the same order as in the query. For instance ('a','c') would match the first document but not the second, as the second is in the wrong order, and ('b','d') would find the first and last document
[12:37:39] <ut2k3> db.repairDatabase() fails with Assertion: 10334:BSONObj size: -286331154 (0xEEEEEEEE) is invalid. Size must be between 0 and 16793600(16MB) First element: _id:. Is there a way to fix it?
[13:49:07] <edrocks> is there any way to get a field from the item your remove with pull?
[14:41:23] <izolate> how do you restart the mongod process?
[14:45:13] <jmfurlott> I am working with another developer, about to build a site using Mongo for data, and we were wondering the best way to share it with each other whenever we pull/push code. Does anyone have any suggestions?
[14:45:34] <Forest> Hello. Can anyone tell me how do i calculate the size of documents i want to insert in batch into mongodb? I am using Node js.
[14:46:45] <Forest> My problem is that i create either too small arrays so the import process takes a lot of time or i create too large batches so only some of the documents get actually inserted.
[14:47:54] <izolate> everybody's too busy to help here it seems
[14:48:36] <Forest> izolate: do you know the answer?
[14:50:21] <izolate> Sorry, no. I have very little experience with the tool
[14:51:10] <izolate> you may be better off asking stackoverflow
[14:57:38] <GothAlice> izolate: Your question will depend on the platform and distribution. On most Linuxes, sudo /etc/init.d/mongodb restart
[15:00:50] <Forest> so no one actually knows how to calculate the size of that BSON?
[15:01:12] <GothAlice> Forest: http://bsonspec.org/implementations.html — grab a library, figure it out. ;)
[15:01:22] <Forest> this issue drives me crazy,MongoDB has 16 MB limit and i just cant send array that long cause i dont know its size,its ridiculous
[15:01:58] <Forest> GothAlice: trzing that for an hour,can you help me please?
[15:02:07] <GothAlice> Forest: Search on this page for "Array": http://bsonspec.org/spec.html
[15:02:37] <GothAlice> Simple arrays are sent as mapings of atomically incrementing integers to the respective array element.
[15:03:55] <Forest> GothAlice: jesus i dont understand that,i have element like {"id":item.id,"loc" : [item.lon,item.lat],tags:item.tags} where tags is another dictionary of key value pairs
[15:06:49] <GothAlice> Thus the storage required for an array of two integer elements representing lat/long is: "\x04" + cstring() + int32(2) + "\x100\x00" + int32(lat) + "\x101\x00" + int32(long) + "\x00"
[15:07:11] <GothAlice> The cstring() there would be "loc" in your case.
[15:07:40] <GothAlice> That makes the grand total: 1 + 4 + 4 + 3 + 4 + 3 + 4 + 1 = 24 bytes for the lat/long by itself.
[15:08:01] <GothAlice> http://bsonspec.org/spec.html < you can do these calculations yourself.
[15:10:07] <GothAlice> Basically [foo, bar] turns into {0: foo, 1: bar} when stored. BSON cheats. ;)
[15:11:30] <GothAlice> pithagora: That may be a question better asked in the debian chat; seems like confusion around package management.
[15:13:24] <Forest> GothAlice: i dont understand how are you counting it,can zou explain again please? I still dont understand how would i calculate for tags,because it can be empty or contain variable number of string:string pairs
[15:14:11] <GothAlice> Forest: Are you familiar with C structures at all? That'll determine the approach I take in breaking down what I previously wrote.
[15:16:17] <Forest> GothAlice: no,unfortunatelz i am not :(
[15:16:35] <GothAlice> Forest: Cool. Time to get the learning hats on. :) http://bsonspec.org/spec.html is what is referred to as BNF (Backus–Naur Form) notation. It describes the valid ways low-level chunks of data can be combined.
[15:17:17] <GothAlice> At the top it describes how large certain named types are: byte is one byte (obviously), int32 is 32 bits or 4 bytes, etc., etc.
[15:18:50] <GothAlice> The first thing in the BNF (document ::=) is the top level of any BSON "document". The first four bytes are a number (int32) telling you how long the whole BSON document is. (This is put up front so that networks can read four bytes quickly to figure out how much more data to expect.) Then there's an "e_list" of aadditional "element"s and a terminating null byte (\x00).
[15:19:18] <GothAlice> The smallest possible BSON document (an empty one) is then: int32(1) + "\x00" or five bytes.
[15:20:46] <Forest> GothAlice: can you help me get just the first four bytes? i am desperate
[15:25:24] <stefandxm> i am sorry to say that if you cant figure that out you wont get far ;)
[15:26:35] <Forest> stefandxm: jesus,i need it for mz bachelor thesis where solve different kind of things and i just cant proceed if zou dont help me. Even if i could calculate it manually the number of key:tags in tags can vary,so it wouldnt help me much
[15:26:37] <stefandxm> but, maybe there is a js driver that can do this for you
[15:27:11] <stefandxm> i mean like really. how can you go for a bachelor and dont know how to parse a file format / basic programming
[15:27:23] <stefandxm> sorry for being arrogant. but you really need to sit down and think this through ;)
[15:28:06] <Forest> stefandxm: why are you such an asshole and dont want to help me? i just dont get this,sorry :(
[15:28:17] <GothAlice> Forest: Because it varies, there is no way I can give you anything approaching a "correct" answer, or any kind of answer to the question "how big is this data".
[15:28:31] <GothAlice> Forest: Your question is fundamentally flawed.
[15:29:17] <GothAlice> However, I am in the process of figuring out how to *get* the answer in Python, which can be a start for using other BSON libraries to work out your answer.
[15:29:38] <stefandxm> the bson library in c++ is trivial
[15:31:33] <GothAlice> Understanding BNF is critical when trying to examine low-level data structures. Almost everything uses this notation, even whole programming languages. (Python's syntax is defined in BNF, which it uses when compiling the interpreter.)
[15:55:13] <darkblue_b> I have managed to get by without my own DNS.. I suspect those days are numbered
[15:55:49] <GothAlice> I have to call one company to update our external DNS, and another company to update the internal DNS. My first deployment at work went great, except that nobody in-house could connect. XD
[15:56:16] <docdoak> How can i do self reference to a field? I need to a increase a specific value by 10%
[15:56:19] <darkblue_b> "networking is easy - it works every time.. except the first time"
[15:57:11] <docdoak> in other languages it would be salary *= 1.1
[16:22:26] <doug1> Or, do I have to build the entire cluster from scratch for the Nth time?
[16:23:45] <GothAlice> AFIK you demote the shard, let it rebalance, then remove it. http://docs.mongodb.org/manual/tutorial/remove-shards-from-cluster/
[16:25:00] <doug1> so i have to blow the whole cluster away and start from scratch?
[16:25:22] <GothAlice> Why are you even trying to do that? If there are no other shards, you're effectively running non-sharded, and can build up from there.
[16:25:47] <doug1> Because I need to get back to a clean state, so I can test the automation of adding shards
[16:26:44] <GothAlice> Wanting "clean slate" and not wanting to "build the entire cluster from scratch" are mutually exclusive.
[16:26:59] <doug1> If there's a shard there, it's not clean
[16:27:28] <GothAlice> No. No shard = dead, waiting for the first shard.
[16:27:42] <doug1> Ok, that's fine. How do I get back to dead state then?
[16:27:53] <GothAlice> Trying to preserve the mongos setup while removing all mongod instnaces backing it is not how you produce a clean slate. You nuke the cluster and start again.
[16:28:14] <GothAlice> doug1: Nuking it from orbit is the only way to be sure.
[16:28:55] <doug1> No wonder I've been at this for 4 months. I should email my boss and tell him the only way to reset my test is to rebuild all 10 instances from scratch in an attempt to explain why this is taking so long
[16:29:05] <GothAlice> And with proper automation, it's not insane at all. Spinning up a new cluster for me takes about 30 seconds to reach an operational state. (Admittedly my OS boot times to all-services-running is about two seconds…)
[16:29:44] <doug1> if your using gold images, it may take 30s, but you've just moved the effort into the image maintenance
[16:29:54] <GothAlice> doug1: You don't have to nuke the VMs, just the data stores for the mongo[s/d] processes. You can fire them right back up afterwords.
[16:30:16] <doug1> GothAlice: You mean blow away /var/lib/mongodb/* ?
[16:30:51] <doug1> Where tho? My data store on the config server is empty
[16:31:04] <Dioxy> So DBs in Mongo are basically flat JSON files, with MONGO handling functions to parse the JSON?
[16:31:29] <Dioxy> I'm coming from an SQL background
[16:31:42] <doug1> Or, do I have to blow away /var/lib/mongodb/* on all 6 data nodes ?
[16:32:05] <GothAlice> Dioxy: It's a bit more complicated than that. Flat is not a word I'd use to describe Mongo—it's far more appropriate to the "spreadsheet" style of SQL. Mongo also uses BSON, a binary form of JSON that avoids much of the JavaScript legacy (like 48-bit integer accuracy).
[16:32:48] <GothAlice> doug1: The error you gave says all of those other nodes (except one) have already been removed. Nuking those data directories resets MongoDB to a "clean slate".
[16:33:07] <GothAlice> doug1: Just make sure you have backups of your data. ;)
[16:33:46] <GothAlice> doug1: Also, the config server must be storing its data somewhere. Check the mongos config file and command-line arguments to determine the data directory.
[16:35:17] <GothAlice> Dioxy: MongoDB also provides extremely rich querying of nested structures, and the ability to manipulate nested data, too. (I.e. you can atomically append to a list, retrieve only a range of elements or elements that match a query from a list, etc.)
[16:39:20] <Dioxy> GothAlice - I assume you build query strings like any other DB, fire the command and Mongo and iterate the results, then write back to the DB?
[16:40:13] <GothAlice> Dioxy: You build query mappings / dictionaries. I.e. {age: {$gt: 18, $lt: 40}, occupation: {$in: ['IT', 'Accounting']}}
[16:40:57] <GothAlice> Dioxy: And like SQL databases, your MongoDB client driver will expose a cursor to you, which you can limit/skip/iterate/etc. on.
[16:42:47] <GothAlice> Dioxy: MongoDB also includes map/reduce support and something called "aggregation pipelines". These let you build some pretty wild queries that can be easily parallelized across a cluster.
[16:43:56] <GothAlice> Dioxy: https://gist.github.com/amcgregor/7bb6f20d2b454753f4f7 is a comparison between two approaches to generate the same results. (One aggregate, one map/reduce.)
[16:44:45] <GothAlice> (ignore the "ohgods.py" file on that; we abstracted aggregate queries for storage within MongoDB here at work. ;)
[16:46:30] <docdoak> I want to copy one "row" from a database into another database (removing the initial)
[16:47:04] <GothAlice> You would find() the original record, insert() it into the other database, check for success (people often forget to do this ;), then remove the original record.
[16:47:26] <docdoak> but cant i do that in one step? db.pastemployees.insert( {db.employees.find( {name: "Raoul Dewan"} ) } ) isn't working for me
[16:47:51] <docdoak> brand new to mongodb obviously
[16:48:02] <GothAlice> db.employees.find() will return a cursor, not the first object. You want findOne in that instance, I believe. (And you don't need to wrap it in {}… the result will already be a dictionary.)
[16:48:46] <GothAlice> You'll still want to check for success, there. And using a temporary vairable will allow you to catch other errors more easily (and understandably), such as the original record not existing.
[16:53:21] <doug1> What does this mean? Could not connect to database: 'localhost:27017', reason Failed to connect to a master node at localhost:27017
[16:54:12] <ejb> A $geoNear aggregate returns a doc with 'result' (array of docs) and 'ok' attributes. I want to $project 'result' and do further operations but the projection only includes _ids. What gives?
[16:54:42] <GothAlice> doug1: Sounds like you're connecting to a disconnected slave. (I.e. the slave can't find its master and is unwilling or unable to promote itself to a new master.)
[16:55:10] <GothAlice> ejb: You'll need to include the fields you wish to preserve in each $project operation. {fieldname: 1} is enough to say "hey, keep that!"
[16:55:40] <ejb> GothAlice: yeah, so shouldn't {$project: {result: 1}} include everything in the array?
[16:55:44] <GothAlice> ejb: (excluding _id, which is kept by default) The result of an aggregate (when not $out'ed to a collection) is to return a single record with a nested "results" list of the actual output of the aggregate.
[16:56:07] <GothAlice> (So basically you can ignore the top level of the returned document; MongoDB creates that automatically and it's not accessible from within the pipeline.)
[16:56:08] <tehpwnz> if i do a project that uses mongodb as the db, do i have to draw things such as DFD's and ERD's?
[16:56:18] <ejb> GothAlice: oh, ok. So I shouldn't try to unwind result
[16:56:31] <GothAlice> ejb: Nope; "results" is a lie. Much like the cake.
[16:57:06] <GothAlice> tehpwnz: I'm not sure what you mean; I haven't *had* to "draw" anything in six years. ;)
[16:57:33] <ejb> GothAlice: can I pass the result of $map to $sum or $add?
[16:57:57] <tehpwnz> GothAlice: i mean, i have to write documentation and the like. i'm a coll student. Arent DFD and ERD for schema based dbs?
[16:59:14] <GothAlice> ejb: $map returns the results. I.e. {$project: {foo: {$map: {...}}}} will result in a new field named "foo" being added to each record.
[16:59:54] <ejb> GothAlice: Yeah, I want to project something like { $sub: [ { $add: tagScoresFromMapOp }, distanceScoreFromGeoNear ] }
[16:59:59] <GothAlice> tehpwnz: Ah, I haven't been in uni even longer than that. They may be, but I'm not familiar with those acronyms. For the most part I use schema abstractions on top of MongoDB that can produce diagrams for me automatically. ;)
[17:01:38] <GothAlice> ejb: You need to $unwind on "tagScoresFromMapOp" (whatever you assign that field to) then $group and $sum them, not try to abuse $add (which won't work the way you've got it).
[17:02:00] <GothAlice> $group on _id, $sum as part of the aggregate.
[17:02:40] <GothAlice> tehpwnz: like http://f.cl.ly/items/0Y422V0v281b0w0v0I2q/model.pdf (cower in fear at that model)
[17:04:17] <GothAlice> (You almost never actually want to do that for performance reasons.)
[17:05:02] <GothAlice> tehpwnz: It gets worse. http://f.cl.ly/items/003g0F212R2t3N1p3D1x/match-new.pdf is the call graph processing data on that model.
[17:05:20] <GothAlice> tehpwnz: "it's wider than my mom"
[17:06:17] <docdoak> 2014-10-22T13:04:23.116-0400 findAndModifyFailed failed: { "ok" : 0, "errmsg" : "remove and returnNew can't co-exist" } at src/mongo/shell/collection.js:614
[17:06:21] <docdoak> anything I can do about that?
[17:06:29] <docdoak> I wanted to update it before i returned it
[17:07:16] <GothAlice> I'm not actually quite sure how you're getting that error message. What's the line you were trying to execute?
[17:08:48] <ejb> GothAlice: I have a dict of tag -> weight. How do I assign a variable inside of a $let vars block to that weight? vars: { tagWeight: '$$tagWeights[$$tag]' } ?
[17:09:07] <ut2k3_> Hi guys. How is it possible to select and insert documents from a collection of DB1 to another Collection Named DB2?
[17:09:49] <GothAlice> ejb: You don't need anything fancy to pass in a dictionary mapping strings to integers. ;)
[17:10:00] <docdoak> GothAlice: <ut2k3_> Hi guys. How is it possible to select and insert documents from a collection of DB1 to another Collection Named DB
[17:29:49] <doug1> So... where does one add the admin user? config server? data node? router? where?
[17:32:06] <ut2k3_> thank you kali it works! i appreciate your help
[17:33:33] <GothAlice> doug1: Config server and the primary, AFIK. Primary will replicate it to the secondaries but the config server doesn't get synchronized updates of credentials.
[17:42:27] <doug1> GothAlice: Oh. Someone... Joanna last night said mongos and mongod.
[17:44:36] <GothAlice> (I assume replication, thus the 'primary' bit.)
[17:45:04] <doug1> oh for the love of god. trying to add admin user "not master"
[17:45:35] <GothAlice> doug1: I'm suggesting adding the admin user to the "master" (primary) of the replica set. God doesn't need to enter into it.
[17:45:54] <doug1> GothAlice: well that doc says "On each mongos and mongod " ... but that failed with 'not master' when I tried to add the admin user on an arbitrary mongod
[17:46:24] <GothAlice> doug1: Yes, because you *also* have replication. You only need to add the user to the *primary* mongod instance, not the replication secondaries/slaves.
[17:46:28] <doug1> There is no master yet. I need to add the admin user before configuring the replicaset, so that automation, which authenticates as the admin user CAN set up the replicaset
[17:46:42] <doug1> GothAlice: there are no secondaries yet. There isn't even a master
[17:47:17] <mike_edmr> GothAlice: do you think there's any credibility to a higher default write concern or changes in write concern behavior from 2.4->2.6 resulting in not just slower query performance, but much lower concurrency / higher write lock percentage ?
[17:47:30] <GothAlice> doug1: The error message you are recieving indicates that the mongod process you were trying to use *thinks* it's configured in a replica set, but hasn't successfully found its master yet.
[17:48:00] <mike_edmr> it doesnt seem to follow that changing the write concern would make locks be held for longer.. but maybe i am misunderstanding their relationship
[17:48:28] <GothAlice> mike_edmr: It'd certainly result in an overall decrease in performance and throughput, and depending on the speed of your disks could effect write lock percentage.
[17:49:00] <mike_edmr> is the write lock held until the write concern is met?
[17:49:18] <mike_edmr> as opposed to being held for the same time, regardless of write concern?
[17:49:53] <GothAlice> I.e. instead of inserting a record in memory and marking the page as dirty (for eventual reclamation to disk, with a write lock surrounding that periodic task) the higher default write concern may be forcing an immediate sync() to disk of dirty pages, which can be slow.
[17:50:01] <doug1> GothAlice: Logged into all 3 data nodes. None have a replicaset configured.
[17:50:27] <doug1> So it appears that you can't add the admin user until you have master. This isn't documented as far as I can see
[17:50:47] <GothAlice> doug1: Are you able to 'mongo test' and 'db.foo.insert({})' successfully on those servers?
[17:51:01] <GothAlice> (When there's a problem, it's a good idea to simplify down to the bare essentials to help identify what's really going on.)
[17:51:13] <doug1> GothAliceL: I'd rather not.... because then I'll have to blow away and start from scratch again won't I?
[17:51:29] <GothAlice> doug1: No, not really. The 'test' database is meant for things like this.
[17:51:33] <mike_edmr> GothAlice: would that be true going from {w: 1} to {w: majority}, or does it strictly pertain to the fsync/journal options
[17:51:44] <GothAlice> doug1: If you care, you can db.dropDatabase() when you're done. ;)
[17:52:23] <doug1> The insert gets me "WriteResult({ "writeError" : { "code" : undefined, "errmsg" : "not master" } })"
[17:52:51] <GothAlice> mike_edmr: I believe write lock is about fsync/journalling. Upgrading to replication concerns (i.e. majority) would have a substantial impact on performance (network roundtrip latency and waiting on remote fsync()), but not locking.
[17:53:23] <GothAlice> doug1: I now suspect you have supplied command-line or configuration file options to your mongod processes that is convincing them they are in a replica set. --keyFile?
[17:53:28] <mike_edmr> GothAlice: thanks, that's helpful. I need to read more about the behavior around writing to disk.
[17:53:52] <doug1> GothAlice: if I connect to the console of them and do "rs.status()" I get nothing
[17:54:18] <GothAlice> doug1: The set might not be configured, but mongod knows you want it. Could you pastebin your mongod.conf and mongod command-line?
[17:56:07] <doug1> but an rs.status() returns nothing
[17:56:17] <doug1> bottom line is... how do I add the &^%%^*()) admin user?
[17:58:24] <GothAlice> Either add the user to whichever server will become the primary before adding any keyFile/replSet options (i.e. in standalone mode) or enable replication *first*, then add the admin user to the primary.
[17:59:13] <GothAlice> You can automate this by spinning up mongod manually (i.e. with a bootstrap config file that omits the bad rules), issuing the appropriate commands to populate /var/lib/mongodb, then starting up the real system-wide daemon after shutting your bootstrap one down.
[18:02:28] <doug1> I can't fathom how people got the chef cookbook to work. Even the folks from mongo who are around the corner came out and said 'oh yeah we know people that are using it'. FFS How?
[18:02:44] <Thomas__> Hi! Is it possible to rename a database by just renaming all files and the folder (folderperdb is on)
[18:02:44] <GothAlice> I can't fathom people using chef. ;)
[18:02:59] <doug1> I can't fathom deploying config by hand
[18:03:52] <GothAlice> Thomas__: No. Don't do that. Use db.copyDatabase('old', 'new') followed by use old_database; db.dropDatabase()
[18:04:48] <GothAlice> doug1: I use templates and my automation is predominantly BASH scripts executed as Git hooks or RPC.
[18:05:12] <doug1> GothAlice: I have dozens of categories of servers to look after, not just mongo
[18:05:38] <Thomas__> GothAlice: so its not possible? Because the database size is about 500gb?
[18:06:08] <Thomas__> It also contains some corruption that cannot fixed by db.repairDatabase()
[18:06:09] <GothAlice> Thomas__: There would very likely be hanging internal references to the old name contained within the files.
[18:06:09] <doug1> GothAlice: As far as your automation suggestion, I'll defer because I'm bound to hit some other corner case
[18:07:40] <Thomas__> hm damn, the problem is db.repairDabase is not working and mongdump dies after a certain amount of records
[18:07:57] <GothAlice> Thomas__: Have you tried spinning up mongod with --repair set?
[18:08:23] <GothAlice> I.e. "offline repair" mode?
[18:08:50] <Thomas__> GothAlice: --repair means that mongodb tries to repair _all_ db right? The Problem is that there is another (alright) Database containing about 2TB of data
[18:08:52] <doug1> hang on hang on.... why can't I add the admin user to mongos? isn't that the point?
[18:09:54] <doug1> I would tend to think that not being able to add users via the router would be a serious architectural flaw
[18:10:08] <GothAlice> Thomas__: In situations like that I rsync the data to a staging machine, shut down, re-rsync (to catch up minor differences more quickly), start it back up, then work in staging to not disrupt other things. Then you'd be able to delete the other databases and --repair just the one you want.
[18:10:20] <doug1> even the docs say "While connected to a mongos, add t"
[18:10:32] <doug1> so... I will try and add to the mongos...
[18:10:34] <GothAlice> doug1: You misunderstand how the router routes things.
[18:17:58] <GothAlice> doug1: http://www.irclogger.com/.mongodb/2014-10-21#1413934651 — joannac was helpful yesterday, and the answer hasn't changed since then.
[18:18:08] <GothAlice> (Yay for IRC loggers reducing duplication of effort. ;)
[18:33:40] <doug1> ok, how do I make a given node in a replicaset primary?
[18:52:54] <doug1> Isn't that just used for internal com between nodes?
[18:53:03] <doug1> How's that relate to the admin user?
[18:53:12] <doug1> I thought the localhost exception had nothing to do with that?
[18:53:29] <GothAlice> doug1: It relates to authentication. It's how authenticated mongos/mongod servers securely communicate, and enabling it enables auth. (That's what it's for.)
[18:53:45] <doug1> so it's not related at all to the admin user?
[18:53:45] <GothAlice> The localhost exception is to allow you to connect w/o authentication if you are a server-local user. (I.e. have SSH access.)
[19:01:03] <GothAlice> http://uwsgi-docs.readthedocs.org/en/latest/Zerg.html < for multiple years they never documented what --zerg did. --help would list it as a valid option, but it would omit a description for it.
[19:18:07] <GothAlice> That file should not contain nulls. It might potentially contain nulls if you have slow query logging enabled and your query contains nulls.
[19:20:25] <GothAlice> axitkhurana: My production MongoDB logs average about 50MB per day.
[19:20:42] <GothAlice> (With peaks of maybe 200-300MB.)
[19:20:47] <axitkhurana> GothAlice: The first line has a lot of nulls (that increases the size of the file) and the rest of the file has the actual logs.
[19:21:13] <GothAlice> Could you pastebin your mongod.conf and mongod command line?
[19:22:25] <riso> I want to make an update to add a new string variable to each document in mongo, how can I do this?
[19:26:51] <GothAlice> axitkhurana: Hmm. No logappend, so mongod is actually emitting those nulls on each startup? That's weeeeird, Jerry.
[19:27:03] <axitkhurana> GothAlice: Appreciate your help and patience here.
[19:27:26] <GothAlice> axitkhurana: Apologies, but at this point I'm stumped. I have no idea what might be padding that file to such extremes.
[19:27:40] <axitkhurana> GothAlice: no problem, thanks for looking into it.
[19:31:19] <GothAlice> Failing to solve a problem makes me sad. :( You might try enabling logappend, manually clearing the old logfile, and restarting the service at some opportune time that'll impact fewer users. You *really* should set up logrotate on that file to keep it to sane sizes, with managable history.
[19:31:54] <GothAlice> (logrotate being a third-party utility on Linux, not a configuration option to mongod)
[19:34:31] <Streemo> I plan on using mongoDB and nodejs. Does anyone know any good *Book* or in depth guide that goes over using mongodb and how it works
[19:35:05] <axitkhurana> GothAlice: we're using logrotate to compress daily log files, that's why we hadn't noticed the huge log files till now, the compressed ones were very small.
[19:35:14] <daidoji> Streemo: the docs are pretty good for Mongo. A document store is a pretty simple device too
[19:35:33] <daidoji> Streemo: just think of it as a giant key-value store with BSON limitations for the keys and values
[19:35:45] <axitkhurana> GothAlice: I should mention we use replica sets (master slave like config), if it can affect the log files in some way
[19:36:15] <GothAlice> riso: Oops, your question got buried a bit there. db.collection.update({}, {$set: {newvar: somevalue}})
[19:37:47] <Streemo> daidoji the docs do look pretty good, im liking the visual aids. do you recommend reading the entire thing?
[19:37:53] <GothAlice> Streemo: The documentation available on mongodb.org and mongodb.com include both high-level overviews in addition to the technical details, tutorials, whitepapers, links to blog posts, etc.
[19:37:59] <riso> GothAlice: what is the false and true here? db.Collection.update({}, { $set : { "myfield" : "x" } }, false, true)
[19:38:48] <GothAlice> riso: db.somecollection.help() in the interactive mongo shell. (somecollection should exist)
[19:39:03] <daidoji> Streemo: I read the entire thing cause thats kinda what I do, but you can skip a lot of the admin and sharding parts probably until you need fancy things like that
[19:41:16] <GothAlice> Streemo: I appreciate the "consume all available material" approach to learning. I once read (and memorized) the 600 page "HTTP: The Definitive Guide" in one night and wrote a HTTP/1.1 server (in 171 Python opcodes) the next day. :D
[19:41:36] <GothAlice> (Had to do it while the info was still fresh…)
[19:41:58] <Streemo> heh, im at about a fifth of the pace
[19:42:20] <Streemo> 100 pages in a night is plenty enough for me to digest X_X
[19:42:37] <daidoji> GothAlice: do you have a photographic memory?
[19:42:48] <GothAlice> daidoji: When I choose to apply it, yes.
[19:43:42] <Streemo> i try to get the big picture, cause usually i can just find the details when i need them.
[19:44:22] <GothAlice> Streemo: These days it's really about how quickly you can find information, not how much information you can retain. (The internet has had a measurable impact on the structure and function of our memories.)
[19:51:47] <Streemo> i think the first four chapters of the docs should be enough for me
[19:53:14] <Streemo> and yeah i dont thik speed reading is for me
[19:53:47] <Streemo> i like to read logical chunks of info and then integrate them into my current understadning chunk by chunk. this is slower, but helps me in the end xD
[20:06:25] <annakarin> hi, I´m a beginner trying to learn. is it possible to pass a "if true, else" argument on .find() ? example: .find( { a: "4", b: "3} ), if true function(), else .find( {a:"3"} )
[20:15:12] <GothAlice> annakarin: Certainly: .find(condition ? {…true criteria…} : {… false criteria …}) — that'll only really work in the interactive shell, though. Each language has its own way of doing ternary statements (the name for those).
[20:15:45] <GothAlice> annakarin: However, the condidion (say "function()") will only be evaluated once, though, for the entire query.
[21:50:15] <Streemo> would mongodb be effective for sites in which users can store pictures, or perhaps via a chat client send pictures/video/media files?
[21:50:31] <daidoji> Streemo: depends on how you use it
[21:52:32] <Streemo> what would be an efficient way to do that? storing links or paths to files?
[21:53:40] <ejb> How can I pass values through $group?
[22:02:23] <daidoji> Streemo: I would store paths to files personally. However, I think GridFS or whatever its called allow one to store binary data in Mongo documents
[22:14:50] <darkblue_b> what is what.. this isnt clear...
[22:15:11] <doug1> joannac: Sigh. So... I'm automating installation of a sharded replica set. The config has replset = foo, but since it has that option, I can't add the admin user until after there's a rs with a master...
[22:15:45] <doug1> if I create the rs first, then I have to have a way to find the master so THEN i can add the admin user...
[22:16:03] <joannac> I don't understand what you are doing
[22:19:48] <doug1> the MMS... don't suppose you know if I can set the identify of the agent via a config file?
[22:20:07] <joannac> doug1: if all your hosts have the same username/pass, you can do it in the agent config
[22:20:58] <doug1> joannac: you mean I can bring up an instance, use <insert-config-management-tool-of-choice> to actually set the identity/role (ie config server or data node) via a config file in the agent?
[22:49:15] <GothAlice> When I return, I'll see if I can dig up that automation script of mine (finally) and sanatize it to Gist for y'all.
[22:49:23] <GothAlice> Seems to be a frequent issue over the last few days I've hung out here.
[22:55:41] <darkblue_b> doug1: this is what I have sitting in /usr/local/bin/prettyJSON.py http://paste.debian.net/128223/
[22:57:03] <doug1> darkblue_b: Thanks. I was trying to write some js to work out if I'm on the master and if I am, then check if teh admin user exists, and if not, try and create it. Not simple
[22:57:25] <doug1> THis works manually... if ( db.system.users.find({user:'admin'}).count() < 1 ) { but not from a script...
[23:02:55] <GothAlice> You're trying to execute that remotely, aren't you?
[23:03:10] <GothAlice> (I.e. you aren't benefitting from the localhost exception to authentication.)
[23:03:29] <doug1> GothAlice: Locally. Works when I do it manually. Fails when I do it from a js file (which has conn = new Mongo(); db = conn.getDB("admin"); in it
[23:03:52] <GothAlice> doug1: How are you running that JS file?
[23:03:59] <doug1> i thought the localhost exception to auth went away after I added the admin user
[23:04:27] <GothAlice> … if you've already added the user, then you'll need to actually authenticate using it to perform that query.
[23:04:32] <doug1> Not sure... the cli is a bit confusing... one variation is mongo -u admin -p changeme test.js
[23:05:16] <doug1> ah wait this works ... mongo -u admin -p changeme admin
[23:05:41] <GothAlice> the conn = new Mongo() at the top...
[23:05:53] <doug1> well it does... until the count request...
[23:07:20] <doug1> if I keep typing, like monkeys on a typewriter, I may get it eventually
[23:07:28] <GothAlice> Did you give your admin user the correct permissions? (My db-local admin users can't, for example, query the system collections in that database even though they are admins, which is intentional.)
[23:14:40] <Boomtime> mongo cli lets you use the uri format or options to specify the database to authenticate against
[23:14:42] <doug1> although strangely it does print two lines
[23:14:50] <GothAlice> doug1: Localhost. Without it the default is likely actually '127.0.0.1', forcing a TCP connection instead of using a local on-disk socket. (Which is how most DBs behave.)
[23:14:50] <doug1> connecting to: localhost/admin and connecting to: localhost:27017/admin
[23:15:03] <Boomtime> the second line is your script
[23:15:07] <GothAlice> doug1: *Because you re-connect!* new Mongo()!
[23:15:11] <Boomtime> remove the script and see what it says
[23:15:43] <Boomtime> you can just use the globals
[23:15:46] <doug1> back to this again now... ""errmsg" : "not authorized on admin to execute command { count: \"system.users\", query: { user: \"admin\" }, fields: {} }","
[23:15:50] <GothAlice> That's why I've been pointing at that line and screaming for half an hour. ;)
[23:56:25] <joannac> morenoh149: your app code is not relevant. you have a unique index, and your inserts are causing duplicate key exceptions
[23:56:59] <joannac> figure out where the index came from, and remove it if you think it's not useful
[23:57:25] <joannac> otherwise, you could modify your app code to catching the duplicate key exception and figure out what you want to do with it
[23:58:28] <morenoh149> I think I'll just add a name field. Doesn't make sense for this model but that's what keystone is assuming in it's code. Find a name field use it as the unique id and build and index with that.