[00:00:00] <Mmike> when I do rs.initiate() on secondary and then do rs.remove('second') and rs.add('second'), on primary I see secondary (but added as primary)
[00:01:26] <Mmike> GothAlice, it gets added properly
[00:02:00] <GothAlice> Mmike: Why are you running rs.initiate() on the seondary? Are you passing in the correct command-line args to tell it it'll be a member?
[00:02:13] <GothAlice> (Command-line or configfile.)
[00:02:44] <GothAlice> rs.initiate() AFIK means "you're now primary of a new replicaset, congratulations!"
[00:03:09] <Mmike> GothAlice, out of despair :) I wanted to see what would happen. Figured if there was connecting issue that would fail.
[00:03:09] <GothAlice> Thus you end up with two primaries that hate each-other if you run rs.initiate() on *both* prior to rs.add()'ing the supposed secondary.
[00:03:29] <Mmike> yes, but why wasn't I able to add the box before doing rs.initate() on secondary?
[00:03:41] <GothAlice> Because you forgot to tell it it was going to be a member of a replica set.
[00:03:42] <Mmike> I know I created flawed config.
[00:04:42] <joannac> shut down all 3 mongod processes
[00:04:48] <joannac> clear out the dbpath for all 3
[00:05:02] <GothAlice> Second: https://gist.github.com/amcgregor/c33da0d76350f7018875 — demonstrates the literal minimum number of commands needed to set up a 3x3 sharded replica set with authentication enabled.
[00:05:05] <joannac> start them all with `--replSet marioset`
[00:05:53] <GothAlice> --replSet on the command-line or http://docs.mongodb.org/manual/reference/configuration-options/#replication these options if using a configfile.
[00:06:22] <Mmike> joannac, that will work if I don't run rs.status() on secondaries before they become part of replicaset
[00:06:31] <joannac> Mmike: it will work full stop
[00:06:48] <joannac> rs.status() does not affect your being able to add a member to a replica set
[00:06:56] <GothAlice> Mmike: It's the extra rs.initiate() that's messing up the process, from what I can tell.
[00:06:58] <Mmike> yes, but if I run rs.status(), something happens, and I can't add the box to the replset
[00:07:14] <joannac> nothing happens when you run rs.status(). it's a read operation
[00:07:32] <Mmike> hm, ok, let me try again, without rs.status()
[00:07:35] <joannac> running rs.initiate() however, will prevent you from adding a node, like GothAlice said
[00:12:40] <Mmike> hm, seems something else is wrong here, did rs.add() and my 'added' node is still showing: "lastHeartbeatMessage" : "still initializing"
[00:16:28] <joannac> can you connect from secondary to primary?
[00:16:34] <Mmike> that log (primary) is the same as before, when I was adding a secondary. Just when I did rs.status() on secondary and then added a node, secondary's logs were empty
[00:18:07] <joannac> nothing new in the secondary's logs?
[00:18:52] <GothAlice> Being able to connect is… confusing. I see no reason why the secondary wouldn't be able to grab its config. joannac: Hmm, indeed. (These are short logs!) The warning is prominent in the logs and worthy of correction, though. My sleep depravation is showing. ¬_¬
[00:22:04] <joannac> Mmike: something is definitely weird. Secondary can connect to primary via the shell, but can't for the purposes of getting a config
[00:23:15] <Mmike> joannac, so, when I do rs.add() on pri, pri just connect to sec and tels is, "hey-ho, come join me", and then secondary connects and asks for config?
[00:24:56] <GothAlice> SELinux with a differing outgoing connection policy between mongo and mongod would exhibit the "shell can connect but daemon can't" issue. I'd expect connection failures to show up in the log, though.
[00:26:33] <GothAlice> (A naive "we're a server daemon!" policy could potentially deny outgoing connections by default.)
[00:29:15] <Mmike> had really weird issues with lxc
[00:30:16] <GothAlice> Container systems always seemed excessively "hacky" to me, even if they're theoretically far more efficient than running full VMs.
[00:40:13] <Mmike> and even with full vms I have issues now :)
[00:41:06] <GothAlice> Heh; OFC I spent most of my effort automating VM build processes, deployment, and scaling, so I'm somewhat tied to that approach. ;^D
[00:41:59] <GothAlice> Mmike: Out of curiosity, which filesystem are you running /var/lib/mongodb from?
[00:43:10] <Mmike> but only because lxc can do snapshotting and cloning pretty fast
[00:44:40] <GothAlice> Wau; my search for "mongodb" and "btrfs" with "-grub" to remove a bunch of news release noise only returns 10K results. Doesn't appear to be a common arrangement. :/
[00:46:15] <Mmike> I wouldn't use btrfs in production
[00:46:31] <GothAlice> http://qnalist.com/questions/5248296/mongodb-on-btrfs-with-compression — not excessively positive results, either, yeah
[00:46:39] <Mmike> if you check btrfs'sses faq, it says, for instance: 'Free space is a tricky concept in btrfs' :)
[00:55:37] <Mmike> so, i re-did this few times now, and it works ok. I shut down mongodb on secondary, clear /var/lib/mongodb/* start mongodb on sec, do rs.add('sec') on pri, wait a bit, do rs.status() on seconday, and all is golden
[00:55:38] <GothAlice> Mmike: "Suddenly working" is scarier to me than "we know what went wrong, but do you have a backup? We can't fix this.".
[00:56:00] <GothAlice> Okay, that sounds like the normal procedure working normally. Where does it go wrong?
[00:58:17] <GothAlice> (Copy-on-Write will *insanely* fragment your MongoDB data files on btrfs, and I have no idea how a light-weight sparse clone of an existing filesystem will interact…)
[00:58:42] <Mmike> tried doing rs.status before adding it to the primary and it didn't matter, it all works
[00:58:51] <Mmike> thnx, lads and gals for your patience :)
[00:59:33] <Mmike> i realized something was wrong when I was tailing mongodb logs on secondary
[00:59:46] <Mmike> only when I restarted `tail -f` command I could see that the file changed
[01:00:02] <GothAlice> Heh; this is the week for filesystem oddities. (The other one I resolved related to a filesystem not handing file handles down to child processes correctly, preventing BASH from piping data around because tempfiles couldn't be handed off.)
[01:01:14] <GothAlice> "another one which", I mean. *checks clock* Still too early to pass out. T_T That other issue gave me the niggling idea to ask what filesystem you were on, originally. ;)
[01:02:18] <Mmike> that pushed me to do 'clean' container, no cloning at all
[01:02:40] <Mmike> i just id with cloning, works ok. Fails when I do lxc-clone --snapshot
[01:03:05] <GothAlice> If it's still btrfs, make sure to chattr +C /var/lib/mongodb to make sure local COW is disabled, too.
[01:03:16] <GothAlice> (Prior to starting mongod.)
[01:03:52] <GothAlice> It wouldn't do to thrash an SSD, if you're on one, or otherwise cripple performance (on spindle drives) by fragmenting those files during normal operation.
[01:04:58] <Mmike> hm, i have mounted my /srv/ssdtestcrap with 'ssd' option
[01:05:05] <Mmike> that should tell btrfs to 'behave on ssd'
[01:05:20] <Mmike> but it's a scratch drive, an old one, se...
[01:05:49] <GothAlice> Mmike: COW will still thrash the SSD due to the way MongoDB uses memory mapped files. BTRFS will clone and move blocks on each journalled operation…
[01:06:08] <GothAlice> (Either chattr +C or use a subvolume with COW explicitly disabled…)
[09:26:39] <bin> i have a question guys .. i have a field which has an index on it and it's constantly update ( which makes the collection reindex over and over again). This provides huge blocking. Is there a way to avoid that .. ? I'm talking about a date field ..
[10:34:09] <fatih> Is there a way to set a field according to pre condition? Say I want to set the field foo to "test1 if it has the value "a" but want to set it to "test2" if it has the value "b" and so on
[10:34:38] <fatih> basically I have a field that and many conditions, so I can find the field via "$in" and giving the conditions with an array
[10:35:03] <fatih> but I couldn't find how to set it once I've found it , because I'm using the update command
[10:51:43] <simonerich> Hi guys, I have question. I read in an old stackoverflow thread that it is not very save or good to run mongodb just on one server
[10:52:11] <simonerich> Is that still the case or do you have any other tips?
[12:28:26] <NoReflex> hello! I have setup a sharded cluster with 3 replica sets (each replica set has 3 members - primary, secondary, arbiter). I have loaded random data in the cluster but the shards are unbalanced (79.3% data, 20.69% data and0% data, 0%)
[12:32:14] <NoReflex> the sharding is done on a field called ti (which in my test is filled with data like: Math.floor((Math.random() * 1000000)) + 1; I already have a compound index on ti:1, mo:1 where mo is a datetime object and I noticed mongo uses this index for the sharding process (it did not create an additional single index on ti)
[12:33:32] <NoReflex> the documents are very similar; apart from the two values specified above (ti: integer, mo: datetime) there are another 5 float values and a boolean value
[12:34:49] <NoReflex> why would so much data be stored on one shard only?
[12:38:48] <NoReflex> for example the last shard (the almost empty one) has: { "ti" : 999983 } -->> { "ti" : { "$maxKey" : 1 } } on : rs3 Timestamp(3, 0)
[12:39:16] <NoReflex> so only the values from 999983 to 1000000 which is a very small range
[13:47:03] <ernetas> What's the difference between sharding.clusterRole: shard and just a mongod on a different port?
[13:47:32] <ernetas> I'm upgrading MongoDB 2.4 to 2.6 and to the new YAML format, due to which I'm not sure if I should set the option or not.
[13:51:17] <davonrails> Hi there, does any one know how I can sort mongodb results based on the given match of a regexp ? Is it possible to say lines having 2 matches goes first … ?
[14:34:09] <ssarah> hei guys what is the rs.status() function equivalent for a single mongod instance? i mean, how do i know info about the mongod im connected to?
[14:36:56] <NoReflex> ssarah, what information are you looking for? rs.status() gives you information that pertains to the replication process so that information would not have any meaning for a single instance
[14:37:35] <NoReflex> if you want db or collection information use db.stats or db.collection.stats()
[15:44:19] <GothAlice> Or it might be nuking the capped collection prior to rebuilding it. (Can't tell from here, but capped collections usually store ephemeral data.)
[15:44:53] <GothAlice> That's a lot of data for a capped collection, though. Hmm.
[15:45:51] <kas84> it’s ephemeral data but huge volume
[15:46:01] <Derick> ernetas: I've never seen that?
[15:47:04] <GothAlice> Derick, ernetas: That's the equivalent of the --shardsvr and --configsvr command-line arguments, AFIK.
[15:47:46] <Derick> GothAlice: no context was supplied so difficult to make that leap :-)
[15:49:16] <kas84> there’s no sharding or replica sets configured
[15:58:47] <Mmike> Hola, lads and ladies! When I do rs.add(), is that atom operation? Looking by the javascript behind rs.add it seems that there is a race condition possible if two clients try to add nodes at the same time.
[16:00:16] <GothAlice> Mmike: That's not actually something I've ever considered trying to tell my clusters to do… rs.add() will cause an election, which means all clients will be disconnected, so I'm suspecting it'll actually be quite difficult to get two of those operations in "at the same time".
[16:02:35] <Mmike> GothAlice, orchestration stuff I'm using would do that... I wanted to code it all in python, but then a race is obvious... so I was hoping that maaaaaaybe somehow rs.add() would block other rs.adds() :)
[16:02:44] <GothAlice> However because of the way the metadata about the cluster is tracked (in documents inside MongoDB) changes are atomic. Two update operations to set/update elements of the config will be run one after the other, never "at the same time"; if it's a whole-document update, the last one will win.
[16:03:41] <GothAlice> If they're proper incremental updates (I.e. $push, $set only on one of the sub-documents, etc.) the changes (if non-overlapping) will be merged.
[16:04:15] <GothAlice> {$add: {version: 1}} and {$add: {version: 1}} will result in a version of 2 (starting at zero) no matter what order you execute them in…
[16:04:28] <Mmike> I'm reading what rs.add() helper does.
[16:05:09] <Mmike> it checks existing replicaset config, increments the _id field by one, creates new replicaset document, and calls replSetReconfig
[16:05:26] <Mmike> but there is no 'WAIT! I AM DOING STUFF NOW!'-like code :)
[16:05:53] <GothAlice> If you're building tools to automate this, won't you be executing replSetReconfig in an automated way instead of via the shell? You could implement two-phase commits or atomic locking yourself without any difficulty…
[16:08:20] <ernetas> Derick: http://docs.mongodb.org/manual/reference/configuration-options/#sharding.clusterRole - that's all the context I have. :) I'm migrating from MongoDB 2.4 to 2.6 and also want to switch to YAML configuration, but didn't know if I should set it. Anyways, previously I wasn't setting --shardsvr, too, so I'm completely unfamiliar with it. Does it do anything other than set port to 27018?
[16:08:25] <GothAlice> … or just rate limit your replication automation so that all changes are linearly applied.
[16:08:53] <GothAlice> ernetas: If you are operating single-server, you don't need or want that option.
[16:09:25] <GothAlice> A shard server stores a subset of your data and wants to operate alongside config server(s) (to track where records are stored) and other shards.
[16:10:41] <Mmike> GothAlice, that's the issue, I can't control the 'order' the units are going to be added, from within the orchestration wee
[16:11:24] <Mmike> why would I need two phase commit? I could connect to mongodb, try to grab a lock, when the lock is there, do rs.add() equivalent, release the lock
[16:11:32] <Mmike> and hope noone will mingle with my replicaset 'by hand' :)
[16:11:57] <GothAlice> Mmike: Or have a single "gatekeeper" process which performs the tasks asked of it and can thus *ensure* that tasks like this run linearly.
[16:12:34] <GothAlice> It sounds like you're opening up your cluster's configuration to any process who asks for it, which certainly would open you up to edge cases.
[16:13:27] <jumsom> is there some kind of index that I can create to assert an embedded reference exists before inserting the parent document? or must I wrap the insert in a find for the referenced document?
[16:13:42] <Mmike> actually, the orchestration framework is already using mongodb so I guess I could use that, somehow...
[16:14:04] <kali> use mongodb to setup zookeeper, use zookeeper to setup mongodb
[16:14:16] <GothAlice> jumsom: You'd have to wrap that in a find().count() to check to see if it exists. MongoDB has no concept of referential integrity. (I go the opposite approach; when loading the record out and attempting to access that field I perform the find() and gracefully handle failure to lookup.)
[16:17:10] <jumsom> GotheAlice: unfortunately the inverse is not possible since the referenced doc is used to retrieve the inserted doc after insertion
[16:18:08] <GothAlice> … if you're inserting a document, don't you already have the document? Why retrieve it after?
[16:20:18] <GothAlice> (Are you relying on server-side _id generation? You can create the _id yourself client-side prior to inserting… then you have a complete record, including _id.)
[16:20:57] <jumsom> the document is part of a category.. ie it has a reference to the category of which it is a part.. then from the UI a user selects a category kind of thing... im noticing a case when a user has deleted a category on one instance/browser tab but has an older session open on another page and inserts into the old category
[16:21:40] <GothAlice> So… you're trusting that user-supplied data is gospel. :|
[16:22:15] <GothAlice> (Indeed, a find() would be good in that scenario.)
[16:24:00] <jumsom> to some extent the request is attached to a session and the item has an owner reference so they could only ever insert into their own scope of dead/random categories
[16:28:03] <ernetas> GothAlice: no, that's a sharded replica cluster
[16:37:47] <jumsom> Im assuming this http://api.mongodb.org/python/current/api/bson/errors.html means that BSON.ObjectID is guaranteed not to throw if you ensure a 24 char hex string is passed into it or is a try catch still needed?
[16:40:22] <GothAlice> jumsom: Only construct valid ObjectIds, please, doing otherwise can lead to madness and obtuse, difficult to identify problems. They have a structure that should be followed. (A certain JS driver abstraction builds completely random ObjectIds, and it's caused support incidents recently.)
[16:40:29] <jumsom> woops py sure is same deal for most clients
[16:41:07] <kali> GothAlice: you're not fun, you know :)
[16:44:19] <jumsom> im using mongodb native JS driver
[16:45:44] <jumsom> am currently just validating the strings before the BSON.ObjectID invocation ie 24 char hex
[16:55:57] <GothAlice> http://docs.mongodb.org/manual/reference/object-id/ — I validate that the date range makes sense (i.e. no future dates), actually use a per-process incremental counter, and use real machine and process identifiers. All of these things are useful. (For example, having real UNIX timestamps in the ObjectId saves me ever needing a "created" date/time field.)
[16:57:02] <GothAlice> Real machine/process IDs mean I can check the audit log (which'll include _id ObjectIds) for weird queries, then know exactly which webapp server to check the logs of, and what time range to examine.
[17:13:18] <GothAlice> chetandhembre: How many application processes (clients) are connected? If each has a pool of connections, they can add up quickly. (Pool size * number of clients + 2 * replicas…)
[17:13:58] <chetandhembre> number of connection per host is 100
[17:14:10] <chetandhembre> clients are about 10 instance
[17:17:19] <GothAlice> That helps to prevent certain network communication issues at the expense of a tiny bit of bandwidth overhead.
[17:17:46] <chetandhembre> things is my mongodb query is taking so much time
[17:17:50] <GothAlice> (I.e. it sends a "hey, I'm alive!" notice periodically to ensure the connection stays alive even when idle.)
[17:18:50] <chetandhembre> but 28000 connection open is too much
[17:18:59] <GothAlice> chetandhembre: Well, query performance is strongly effected by the structure of your data and your use of indexes; could you pastebin the query and .explain() of the query you are having difficulty with?
[17:21:39] <GothAlice> chetandhembre: (As a side note on the connections thing, one of my production clusters does _not_ have keepalives enabled and reports an average of 10 connections per app server host right now.)
[17:38:31] <chetandhembre> my actual java client code is this http://pastebin.com/Nq6hqxJP
[17:40:26] <GothAlice> That query is slow? In the shell, the explain reveals that the query should be instantaneous. From your Java code I'd insert "printf school of debugging" output (of the current UNIX timestamp) before line 9 and after line 9, then after the while loop. If your documents are excessively large, streaming them over the wire may be your bottleneck.
[17:41:12] <GothAlice> Also, there's no need to specify a natural sort… it uses natural sort by default. (And lately there have been several people having issues with sort causing the query planner to reject useful indexes, slowing things down.)
[17:41:58] <chetandhembre> yeah my document is are quite large
[17:42:24] <GothAlice> Generally you can specify a second argument to find() to retrieve only the fields you actually care about at the time, omitting bulk data.
[17:43:18] <GothAlice> For example, {_id: 1} as the second argument will only fetch the IDs of the matching records. (Less useful here, but you get the idea.) If you specify other fields, _id will be included by default (use {_id: 0} to exclude it).
[17:44:19] <chetandhembre> actually i am storing key value stuff
[17:50:04] <GothAlice> Use of JSON BLOBs in a document storage system makes baby pandas cry tears of blood… it really sounds like you're trying to use MongoDB in the least MongoDB-ish (and most expensive to use) way possible, here. :/
[17:51:06] <chetandhembre> DO you have any suggestion for me ?
[17:51:10] <GothAlice> I.e. if you transform that JSON into BSON (BasicDBObject hierarchy in your case) you'll be able to get MongoDB to give you specific, deeply nested fields back rather than needing to load _all_ of it to pull out certain fields, with the extra deserialization step.
[17:51:18] <GothAlice> chetandhembre: My typing is slow today. :)
[17:52:05] <chetandhembre> i want full profile every time
[17:52:31] <GothAlice> I.e. instead of {_id: "GothAlice", value: '{"foo": 27, …}'} with extra deserialization to turn value into a usable object, store: {_id: "GothAlice", foo: 27, …}
[17:53:34] <chetandhembre> but desrialization happen in my code right ? not on mongodb server
[17:53:38] <GothAlice> Either way, using real BSON objects and not storing JSON serialized strings will make your data actually queryable (instead of completely opaque), allow changes to be issued incrementally, allow for intelligent filtering of returned fields to only those needed to process a request, etc., etc.
[17:55:00] <GothAlice> What's happening right now is that you construct the BSON query data, fire that off to MongoDB, which then returns with the BSON document which is deserialized into a BasicDBObject tree by your MongoDB client driver. Once that's done, your own code takes over and _also_ has to deserialize the JSON data—a second deserialization that is completely unnessicary, and hides the data from MongoDB.
[17:55:51] <chetandhembre> actually my current usecase is only to get all profile back .. converting json to bson will make query fast ?
[17:55:53] <GothAlice> I.e. with your data you can't ask the database: give me all usernames for all users who registered before the beginning of the year.
[17:56:18] <GothAlice> chetandhembre: Storing the data in the correct, native BSON format means you won't have to completely reprocess the data twice on each access.
[17:56:30] <GothAlice> *And* you'll be able to ask it questions.
[17:59:45] <chetandhembre> things is i already have started filling db
[18:00:28] <GothAlice> And this is why it's critically important to consider the repercussions of data architecture before beginning to use it.
[18:02:31] <GothAlice> (Storing JSON strings in MongoDB is like storing CSV TEXT in MySQL… neither are things that will work out well in the end, and both ignore the features, and _purpose_, of the respective database architecture.)
[18:33:21] <chetandhembre> is mongodb is better solution for storing key value data ?
[18:38:11] <GothAlice> chetandhembre: There are two situations where I store arbitrary key-value data. The first is a metadata filesystem built on MongoDB, which uses documents with arbitrary fields. I.e. {kind: 'music', artist: 'Alice Bevan-McGregor', …} — not every document in the collection will have an "artist" field; in fact, only "music" and "picture" document types do.
[18:59:45] <jmacdonald> i am trying to connect with python, i have this 3 liner https://gist.github.com/jeffmacdonald/9c720ce7062ac1e0acd1
[19:00:12] <jmacdonald> i created "test" before authentication was enabled. then i enabled authentication and somewhat expecte this to work. i'm not sure why it isn't.
[19:02:03] <GothAlice> jmacdonald: The database you are using matters greatly for authentication; if you created the admin user in the admin database, you have to tell your client to authenticate against *that* database before switching.
[19:02:09] <GothAlice> jmacdonald: (Authentication is per-database.)
[19:19:17] <jmacdonald> okay, i got everything working now.had to include one line in chef recipe. all good.
[19:19:19] <GothAlice> jmacdonald: Pragmatic automation. I push, Github notifies one of the controllers, the controller notifies all affected servers via MongoDB capped collection, they pull, a post-merge hook runs which identifies files changed in the commit, which packages own those files, which of those have init.d scripts, and automatically reloads/restarts as appropriate.
[19:20:04] <jmacdonald> neat to learn about roles now, but i'm on my way.
[19:20:17] <GothAlice> (Going from a "generic server" to any of the roles, such as "application server" or "database secondary" is a git checkout to another branch; even init.d service associations are handled in Git via symlinks.)
[19:20:43] <jmacdonald> its neat. I'm not going to counter argue :)
[21:55:31] <BadHorsie> Is there something like Kibana for MongoDB?
[23:30:39] <tskaggs> Hi! I'm trying to bulk update a mongodb collection with a new format for pricing and not sure how to do that efficiently. I'm using sails.js as my MVC and just running my functions through there. The items get update as I need them but timeout when I have to hit the update() function. http://hastebin.com/bikituqogu.coffee