PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Friday the 28th of November, 2014

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:00:00] <Mmike> when I do rs.initiate() on secondary and then do rs.remove('second') and rs.add('second'), on primary I see secondary (but added as primary)
[00:00:02] <Mmike> sec
[00:00:15] <joannac> sigh
[00:01:17] <Mmike> http://pastebin.com/f9U3dfVD
[00:01:26] <Mmike> GothAlice, it gets added properly
[00:02:00] <GothAlice> Mmike: Why are you running rs.initiate() on the seondary? Are you passing in the correct command-line args to tell it it'll be a member?
[00:02:13] <GothAlice> (Command-line or configfile.)
[00:02:44] <GothAlice> rs.initiate() AFIK means "you're now primary of a new replicaset, congratulations!"
[00:03:09] <Mmike> GothAlice, out of despair :) I wanted to see what would happen. Figured if there was connecting issue that would fail.
[00:03:09] <GothAlice> Thus you end up with two primaries that hate each-other if you run rs.initiate() on *both* prior to rs.add()'ing the supposed secondary.
[00:03:29] <Mmike> yes, but why wasn't I able to add the box before doing rs.initate() on secondary?
[00:03:41] <GothAlice> Because you forgot to tell it it was going to be a member of a replica set.
[00:03:42] <Mmike> I know I created flawed config.
[00:04:03] <GothAlice> First: http://docs.mongodb.org/manual/tutorial/deploy-replica-set/
[00:04:04] <Mmike> well, the only thing that needs to be done is to start mongod with --replset
[00:04:07] <joannac> Mmike: is there data on any of these members?
[00:04:13] <Mmike> or add that to .conf file
[00:04:16] <joannac> I vote you start again if possible
[00:04:18] <Mmike> joannac, nope, fresh start
[00:04:27] <joannac> yeah, start again
[00:04:42] <joannac> shut down all 3 mongod processes
[00:04:48] <joannac> clear out the dbpath for all 3
[00:05:02] <GothAlice> Second: https://gist.github.com/amcgregor/c33da0d76350f7018875 — demonstrates the literal minimum number of commands needed to set up a 3x3 sharded replica set with authentication enabled.
[00:05:05] <joannac> start them all with `--replSet marioset`
[00:05:53] <GothAlice> --replSet on the command-line or http://docs.mongodb.org/manual/reference/configuration-options/#replication these options if using a configfile.
[00:06:22] <Mmike> joannac, that will work if I don't run rs.status() on secondaries before they become part of replicaset
[00:06:31] <joannac> Mmike: it will work full stop
[00:06:48] <joannac> rs.status() does not affect your being able to add a member to a replica set
[00:06:56] <GothAlice> Mmike: It's the extra rs.initiate() that's messing up the process, from what I can tell.
[00:06:58] <Mmike> yes, but if I run rs.status(), something happens, and I can't add the box to the replset
[00:07:14] <joannac> nothing happens when you run rs.status(). it's a read operation
[00:07:32] <Mmike> hm, ok, let me try again, without rs.status()
[00:07:35] <joannac> running rs.initiate() however, will prevent you from adding a node, like GothAlice said
[00:12:40] <Mmike> hm, seems something else is wrong here, did rs.add() and my 'added' node is still showing: "lastHeartbeatMessage" : "still initializing"
[00:12:52] <joannac> check the logs
[00:12:57] <joannac> primary and secondary
[00:14:15] <Mmike> in secondary's log: http://pastebin.com/Ci0q44Hm
[00:14:42] <joannac> and primary?
[00:15:24] <Mmike> primary: http://pastebin.com/WPRsy2ix
[00:15:46] <GothAlice> Mmike: Stuck voting!
[00:16:00] <GothAlice> Mmike: With only two set members, you need an arbiter to make things work reliably.
[00:16:14] <GothAlice> Mmike: See: http://docs.mongodb.org/manual/tutorial/add-replica-set-arbiter/
[00:16:17] <joannac> I don't think it's stuck
[00:16:28] <joannac> can you connect from secondary to primary?
[00:16:34] <Mmike> that log (primary) is the same as before, when I was adding a secondary. Just when I did rs.status() on secondary and then added a node, secondary's logs were empty
[00:16:39] <Mmike> joannac, never tried :) sec
[00:17:02] <Mmike> GothAlice, but I have to add a node and then add a node to have three members :) it should show as 'secondary'
[00:17:40] <Mmike> Fri Nov 28 00:16:29.167 [initandlisten] connection accepted from 10.0.3.254:36936 #3 (1 connection now open)
[00:17:47] <Mmike> connects ok from sec to pri
[00:18:07] <joannac> nothing new in the secondary's logs?
[00:18:52] <GothAlice> Being able to connect is… confusing. I see no reason why the secondary wouldn't be able to grab its config. joannac: Hmm, indeed. (These are short logs!) The warning is prominent in the logs and worthy of correction, though. My sleep depravation is showing. ¬_¬
[00:19:31] <Mmike> joannac, nope, the same error
[00:19:32] <Mmike> but
[00:19:45] <Mmike> something else is wrong on my box (i'm testing this in lxc)
[00:21:00] <Mmike> I did this so many times, it should work (never did rs.status() as described above, though), so it just might be a coincidence
[00:21:05] <Mmike> i'm firing up real vms now
[00:22:04] <joannac> Mmike: something is definitely weird. Secondary can connect to primary via the shell, but can't for the purposes of getting a config
[00:23:15] <Mmike> joannac, so, when I do rs.add() on pri, pri just connect to sec and tels is, "hey-ho, come join me", and then secondary connects and asks for config?
[00:24:56] <GothAlice> SELinux with a differing outgoing connection policy between mongo and mongod would exhibit the "shell can connect but daemon can't" issue. I'd expect connection failures to show up in the log, though.
[00:26:33] <GothAlice> (A naive "we're a server daemon!" policy could potentially deny outgoing connections by default.)
[00:28:58] <Mmike> we'll see now
[00:29:05] <joannac> Mmike: very loosely, yes
[00:29:15] <Mmike> had really weird issues with lxc
[00:30:16] <GothAlice> Container systems always seemed excessively "hacky" to me, even if they're theoretically far more efficient than running full VMs.
[00:39:48] <Mmike> GothAlice, oh, they're excellent
[00:39:57] <Mmike> I can run half of openstack in it for testing
[00:40:01] <Mmike> and debugging
[00:40:13] <Mmike> and even with full vms I have issues now :)
[00:41:06] <GothAlice> Heh; OFC I spent most of my effort automating VM build processes, deployment, and scaling, so I'm somewhat tied to that approach. ;^D
[00:41:59] <GothAlice> Mmike: Out of curiosity, which filesystem are you running /var/lib/mongodb from?
[00:42:12] <GothAlice> (ext4, reiser, xfs, zfs, …)
[00:42:44] <Mmike> btrfs
[00:43:10] <Mmike> but only because lxc can do snapshotting and cloning pretty fast
[00:44:40] <GothAlice> Wau; my search for "mongodb" and "btrfs" with "-grub" to remove a bunch of news release noise only returns 10K results. Doesn't appear to be a common arrangement. :/
[00:46:15] <Mmike> I wouldn't use btrfs in production
[00:46:31] <GothAlice> http://qnalist.com/questions/5248296/mongodb-on-btrfs-with-compression — not excessively positive results, either, yeah
[00:46:39] <Mmike> if you check btrfs'sses faq, it says, for instance: 'Free space is a tricky concept in btrfs' :)
[00:46:49] <GothAlice> lol
[00:47:07] <GothAlice> At least they admit it. HFS+ still uses a GIANT bitmask file to mark block allocation.
[00:47:16] <Mmike> but when I fire up 15-20 containers to do stuff, on ext4 it takes 5-10 minutes to destroy them all, and thtat's on ssd
[00:47:52] <joannac> oh
[00:47:54] <Mmike> so btrfs saves me time, greatly (as it takes less than a minute to clean up)
[00:47:56] <joannac> i have a thoery
[00:48:07] <joannac> Mmike: pastebin your current rs.status() please?
[00:49:13] <Mmike> joannac, don't have current one
[00:49:31] <Mmike> joannac, say it! :)
[00:49:54] <GothAlice> T_T
[00:50:02] <joannac> Mmike: okay, so we've given up diagnosing this?
[00:50:39] <Mmike> joannac, nope, I'm redeploying inside real vms, but having dns resolution issues there so I'm fixin that, so it's a bit slow :)
[00:50:46] <joannac> ah okay
[00:50:56] <Mmike> in lxc it won't work, and wanted to make sure I can get it up properly
[00:51:03] <Mmike> will see what I broke with containers later
[00:51:12] <Mmike> but what's your theory?
[00:51:29] <Mmike> ha!
[00:51:34] <Mmike> it worked inside lxc too now!
[00:52:24] <GothAlice> O_o
[00:53:11] <joannac> my theory was you weren;'t testing correctly
[00:53:26] <joannac> you addded replset members with internal IPs but were testing external Ips
[00:53:39] <joannac> or hostnames
[00:55:37] <Mmike> so, i re-did this few times now, and it works ok. I shut down mongodb on secondary, clear /var/lib/mongodb/* start mongodb on sec, do rs.add('sec') on pri, wait a bit, do rs.status() on seconday, and all is golden
[00:55:38] <GothAlice> Mmike: "Suddenly working" is scarier to me than "we know what went wrong, but do you have a backup? We can't fix this.".
[00:56:00] <GothAlice> Okay, that sounds like the normal procedure working normally. Where does it go wrong?
[00:56:12] <Mmike> Not sure, yet.
[00:56:21] <Mmike> it's btrfs, I think, to be honest :)
[00:56:40] <Mmike> I just craeted three containers that are not cloned from the first one
[00:56:42] <GothAlice> This wouldn't be the first bug directly attributable to a filesystem issue I've encountered in the last week.
[00:57:11] <GothAlice> … you need to disable COW on /var/lib/mongodb and all child files and folders.
[00:58:10] <Mmike> aaaaaand, it was btrfs
[00:58:17] <GothAlice> (Copy-on-Write will *insanely* fragment your MongoDB data files on btrfs, and I have no idea how a light-weight sparse clone of an existing filesystem will interact…)
[00:58:42] <Mmike> tried doing rs.status before adding it to the primary and it didn't matter, it all works
[00:58:44] <Mmike> as it should
[00:58:51] <Mmike> thnx, lads and gals for your patience :)
[00:59:33] <Mmike> i realized something was wrong when I was tailing mongodb logs on secondary
[00:59:46] <Mmike> only when I restarted `tail -f` command I could see that the file changed
[01:00:02] <GothAlice> Heh; this is the week for filesystem oddities. (The other one I resolved related to a filesystem not handing file handles down to child processes correctly, preventing BASH from piping data around because tempfiles couldn't be handed off.)
[01:01:14] <GothAlice> "another one which", I mean. *checks clock* Still too early to pass out. T_T That other issue gave me the niggling idea to ask what filesystem you were on, originally. ;)
[01:01:46] <Mmike> neat one :)
[01:02:18] <Mmike> that pushed me to do 'clean' container, no cloning at all
[01:02:40] <Mmike> i just id with cloning, works ok. Fails when I do lxc-clone --snapshot
[01:03:05] <GothAlice> If it's still btrfs, make sure to chattr +C /var/lib/mongodb to make sure local COW is disabled, too.
[01:03:16] <GothAlice> (Prior to starting mongod.)
[01:03:52] <GothAlice> It wouldn't do to thrash an SSD, if you're on one, or otherwise cripple performance (on spindle drives) by fragmenting those files during normal operation.
[01:04:58] <Mmike> hm, i have mounted my /srv/ssdtestcrap with 'ssd' option
[01:05:05] <Mmike> that should tell btrfs to 'behave on ssd'
[01:05:20] <Mmike> but it's a scratch drive, an old one, se...
[01:05:23] <Mmike> s/se/so
[01:05:49] <GothAlice> Mmike: COW will still thrash the SSD due to the way MongoDB uses memory mapped files. BTRFS will clone and move blocks on each journalled operation…
[01:06:08] <GothAlice> (Either chattr +C or use a subvolume with COW explicitly disabled…)
[01:07:23] <GothAlice> (nodatacow option)
[01:10:02] <Mmike> ack
[01:10:16] <Mmike> need to script that out somehow in container, but good pointer
[01:10:19] <Mmike> thnx
[01:10:31] <GothAlice> My Google-fu is strong. :) Search this log for mongodb: http://www.mattzone.com/btrfs/btrfs_20140209.html
[01:17:28] <Mmike> actually, snapshotting works fine
[01:17:43] <GothAlice> You've further isolated the cause?
[01:18:50] <Mmike> I just had to use lxc-create -B btrfs, and lxc-clone -B btrfs
[01:19:12] <GothAlice> O_o That's… interesting.
[01:19:16] <Mmike> from what I understood, without it, it will use overlayfs on top of btrfs
[01:19:32] <GothAlice> lol — a many layered cake, this filesystem weirdness.
[01:19:43] <Mmike> I know it does on ext4. There it's just slow (although it saves space), but here its... somehow broken :) (the overlayfs)
[01:19:48] <Mmike> if it even is overlayfs
[01:19:49] <Mmike> however
[01:19:52] <Mmike> it's not mongodbs issue
[01:20:10] <Mmike> thnx again
[01:20:17] <GothAlice> np
[01:21:03] <GothAlice> Not being able to tail -f reliably makes diagnostics somewhat hard! XD
[01:21:27] <Mmike> yes, but it was also smelling baaaaad
[01:21:55] <Mmike> ok, this part of globe deserves me going to bed! :)
[01:22:05] <GothAlice> Mmike: G'night!
[07:40:00] <nothingPT> Hello
[09:26:39] <bin> i have a question guys .. i have a field which has an index on it and it's constantly update ( which makes the collection reindex over and over again). This provides huge blocking. Is there a way to avoid that .. ? I'm talking about a date field ..
[10:33:16] <fatih> Hi everyone
[10:34:09] <fatih> Is there a way to set a field according to pre condition? Say I want to set the field foo to "test1 if it has the value "a" but want to set it to "test2" if it has the value "b" and so on
[10:34:38] <fatih> basically I have a field that and many conditions, so I can find the field via "$in" and giving the conditions with an array
[10:35:03] <fatih> but I couldn't find how to set it once I've found it , because I'm using the update command
[10:50:02] <RobConan> Good morning
[10:51:43] <simonerich> Hi guys, I have question. I read in an old stackoverflow thread that it is not very save or good to run mongodb just on one server
[10:52:11] <simonerich> Is that still the case or do you have any other tips?
[12:28:26] <NoReflex> hello! I have setup a sharded cluster with 3 replica sets (each replica set has 3 members - primary, secondary, arbiter). I have loaded random data in the cluster but the shards are unbalanced (79.3% data, 20.69% data and0% data, 0%)
[12:32:14] <NoReflex> the sharding is done on a field called ti (which in my test is filled with data like: Math.floor((Math.random() * 1000000)) + 1; I already have a compound index on ti:1, mo:1 where mo is a datetime object and I noticed mongo uses this index for the sharding process (it did not create an additional single index on ti)
[12:33:32] <NoReflex> the documents are very similar; apart from the two values specified above (ti: integer, mo: datetime) there are another 5 float values and a boolean value
[12:34:49] <NoReflex> why would so much data be stored on one shard only?
[12:38:48] <NoReflex> for example the last shard (the almost empty one) has: { "ti" : 999983 } -->> { "ti" : { "$maxKey" : 1 } } on : rs3 Timestamp(3, 0)
[12:39:16] <NoReflex> so only the values from 999983 to 1000000 which is a very small range
[12:40:37] <NoReflex> any ideas?
[13:46:37] <ernetas> Hey guys.
[13:47:03] <ernetas> What's the difference between sharding.clusterRole: shard and just a mongod on a different port?
[13:47:32] <ernetas> I'm upgrading MongoDB 2.4 to 2.6 and to the new YAML format, due to which I'm not sure if I should set the option or not.
[13:51:17] <davonrails> Hi there, does any one know how I can sort mongodb results based on the given match of a regexp ? Is it possible to say lines having 2 matches goes first … ?
[13:57:50] <davonrails> Anyone ?
[14:34:09] <ssarah> hei guys what is the rs.status() function equivalent for a single mongod instance? i mean, how do i know info about the mongod im connected to?
[14:36:56] <NoReflex> ssarah, what information are you looking for? rs.status() gives you information that pertains to the replication process so that information would not have any meaning for a single instance
[14:37:35] <NoReflex> if you want db or collection information use db.stats or db.collection.stats()
[14:45:40] <ssarah> nope
[14:45:44] <ssarah> im using puppet to start mongo
[14:46:06] <ssarah> i just want info on what config it used to start a mongod
[15:07:19] <NoReflex> ssarah, you can use something like:
[15:07:21] <NoReflex> use admin
[15:07:28] <NoReflex> db.runCommand("getCmdLineOpts")
[15:21:11] <kas84> hi
[15:21:39] <kas84> is it possible that mongo does any kind of mainteinance operation that rebuilds indexes by itself?
[15:24:20] <Derick> kas84: it doesn't rebuild indexes, but modifies this as you update/insert/delete data from indexed fields
[15:25:24] <kas84> aha, the thing is I have some capped collections running on my mongo and seemed to have done a createIndex operation by itself
[15:26:08] <kas84> I have this on my mongod.log: 2014-11-28T14:45:08.484+0000 [conn20082] command capped_rt.$cmd command: createIndexes { createIndexes: "channel", indexes: [ { name: "t_1", key: { t: 1 } } ] } keyUpdates:0 numYields:0 locks(micros) r:137754 w:123786889 reslen:113 247323ms
[15:30:33] <kas84> Derick: does that make any sense? or could it be a ensureIndex doing something wrong and rebuilding all the indexes?
[15:30:58] <cheeser> ensureIndex doesn't rebuild indexes
[15:33:39] <Derick> cheeser: it does in some drivers, as it's equivalent to createIndex
[15:33:46] <Derick> (for those drivers)
[15:34:00] <cheeser> oh, in the driver. i'm thinking the actual command :)
[15:34:11] <Derick> right, in that case you're right
[15:34:20] <cheeser> but even calling createIndex is not rebuilding an index.
[15:35:31] <Derick> that's true
[15:37:25] <kas84> aha
[15:37:42] <kas84> so why would mongodb rebuild an index?
[15:39:07] <kas84> the index build in my logs is like this…. http://pastebin.com/2wjRKLk1
[15:39:35] <kas84> the index for that collection was already created
[15:40:57] <ernetas> Derick: maybe you could tell what's the purpose of sharding.clusterRole == "shardsvr"?
[15:42:09] <GothAlice> kas84: Your application may be dropping the index first. And waitaminute; that's an index on a capped collection, eh?
[15:42:17] <kas84> yep
[15:42:21] <kas84> it’s on a capped collection
[15:42:57] <kas84> what do you mean by dropping the index? an actual command to drop the index?
[15:43:52] <GothAlice> Aye.
[15:44:19] <GothAlice> Or it might be nuking the capped collection prior to rebuilding it. (Can't tell from here, but capped collections usually store ephemeral data.)
[15:44:53] <GothAlice> That's a lot of data for a capped collection, though. Hmm.
[15:45:51] <kas84> it’s ephemeral data but huge volume
[15:46:01] <Derick> ernetas: I've never seen that?
[15:47:04] <GothAlice> Derick, ernetas: That's the equivalent of the --shardsvr and --configsvr command-line arguments, AFIK.
[15:47:46] <Derick> GothAlice: no context was supplied so difficult to make that leap :-)
[15:49:16] <kas84> there’s no sharding or replica sets configured
[15:49:23] <kas84> it’s a single mongod instance
[15:53:42] <GothAlice> Derick: Parse error. Insufficient context to contextualize the described lack of context.
[15:54:03] <Derick> exactly :)
[15:58:47] <Mmike> Hola, lads and ladies! When I do rs.add(), is that atom operation? Looking by the javascript behind rs.add it seems that there is a race condition possible if two clients try to add nodes at the same time.
[16:00:16] <GothAlice> Mmike: That's not actually something I've ever considered trying to tell my clusters to do… rs.add() will cause an election, which means all clients will be disconnected, so I'm suspecting it'll actually be quite difficult to get two of those operations in "at the same time".
[16:02:35] <Mmike> GothAlice, orchestration stuff I'm using would do that... I wanted to code it all in python, but then a race is obvious... so I was hoping that maaaaaaybe somehow rs.add() would block other rs.adds() :)
[16:02:44] <GothAlice> However because of the way the metadata about the cluster is tracked (in documents inside MongoDB) changes are atomic. Two update operations to set/update elements of the config will be run one after the other, never "at the same time"; if it's a whole-document update, the last one will win.
[16:02:55] <Mmike> yes, but
[16:03:40] <Mmike> c.version++;
[16:03:41] <GothAlice> If they're proper incremental updates (I.e. $push, $set only on one of the sub-documents, etc.) the changes (if non-overlapping) will be merged.
[16:03:41] <Mmike> and then
[16:03:56] <Mmike> cfg = { _id: max + 1, host: hostport };
[16:04:09] <Mmike> so there is nothing to stop two 'connections' doing that same thing at the same time
[16:04:15] <Mmike> so one could, in theory, fail
[16:04:15] <GothAlice> {$add: {version: 1}} and {$add: {version: 1}} will result in a version of 2 (starting at zero) no matter what order you execute them in…
[16:04:28] <Mmike> I'm reading what rs.add() helper does.
[16:04:35] <GothAlice> … uhm.
[16:05:09] <Mmike> it checks existing replicaset config, increments the _id field by one, creates new replicaset document, and calls replSetReconfig
[16:05:26] <Mmike> but there is no 'WAIT! I AM DOING STUFF NOW!'-like code :)
[16:05:53] <GothAlice> If you're building tools to automate this, won't you be executing replSetReconfig in an automated way instead of via the shell? You could implement two-phase commits or atomic locking yourself without any difficulty…
[16:08:20] <ernetas> Derick: http://docs.mongodb.org/manual/reference/configuration-options/#sharding.clusterRole - that's all the context I have. :) I'm migrating from MongoDB 2.4 to 2.6 and also want to switch to YAML configuration, but didn't know if I should set it. Anyways, previously I wasn't setting --shardsvr, too, so I'm completely unfamiliar with it. Does it do anything other than set port to 27018?
[16:08:25] <GothAlice> … or just rate limit your replication automation so that all changes are linearly applied.
[16:08:53] <GothAlice> ernetas: If you are operating single-server, you don't need or want that option.
[16:09:25] <GothAlice> A shard server stores a subset of your data and wants to operate alongside config server(s) (to track where records are stored) and other shards.
[16:10:41] <Mmike> GothAlice, that's the issue, I can't control the 'order' the units are going to be added, from within the orchestration wee
[16:11:24] <Mmike> why would I need two phase commit? I could connect to mongodb, try to grab a lock, when the lock is there, do rs.add() equivalent, release the lock
[16:11:32] <Mmike> and hope noone will mingle with my replicaset 'by hand' :)
[16:11:57] <GothAlice> Mmike: Or have a single "gatekeeper" process which performs the tasks asked of it and can thus *ensure* that tasks like this run linearly.
[16:12:34] <GothAlice> It sounds like you're opening up your cluster's configuration to any process who asks for it, which certainly would open you up to edge cases.
[16:12:38] <kali> you need zookeeper :)
[16:13:03] <cheeser> zk++
[16:13:06] <Mmike> I need alcohol! A lot of it :)
[16:13:27] <jumsom> is there some kind of index that I can create to assert an embedded reference exists before inserting the parent document? or must I wrap the insert in a find for the referenced document?
[16:13:42] <Mmike> actually, the orchestration framework is already using mongodb so I guess I could use that, somehow...
[16:14:04] <kali> use mongodb to setup zookeeper, use zookeeper to setup mongodb
[16:14:07] <kali> wait... no.
[16:14:15] <cheeser> INCEPTION!
[16:14:16] <GothAlice> jumsom: You'd have to wrap that in a find().count() to check to see if it exists. MongoDB has no concept of referential integrity. (I go the opposite approach; when loading the record out and attempting to access that field I perform the find() and gracefully handle failure to lookup.)
[16:17:10] <jumsom> GotheAlice: unfortunately the inverse is not possible since the referenced doc is used to retrieve the inserted doc after insertion
[16:18:08] <GothAlice> … if you're inserting a document, don't you already have the document? Why retrieve it after?
[16:20:18] <GothAlice> (Are you relying on server-side _id generation? You can create the _id yourself client-side prior to inserting… then you have a complete record, including _id.)
[16:20:57] <jumsom> the document is part of a category.. ie it has a reference to the category of which it is a part.. then from the UI a user selects a category kind of thing... im noticing a case when a user has deleted a category on one instance/browser tab but has an older session open on another page and inserts into the old category
[16:21:06] <jumsom> refreshes and has dead items
[16:21:40] <GothAlice> So… you're trusting that user-supplied data is gospel. :|
[16:22:15] <GothAlice> (Indeed, a find() would be good in that scenario.)
[16:24:00] <jumsom> to some extent the request is attached to a session and the item has an owner reference so they could only ever insert into their own scope of dead/random categories
[16:28:03] <ernetas> GothAlice: no, that's a sharded replica cluster
[16:37:47] <jumsom> Im assuming this http://api.mongodb.org/python/current/api/bson/errors.html means that BSON.ObjectID is guaranteed not to throw if you ensure a 24 char hex string is passed into it or is a try catch still needed?
[16:40:22] <GothAlice> jumsom: Only construct valid ObjectIds, please, doing otherwise can lead to madness and obtuse, difficult to identify problems. They have a structure that should be followed. (A certain JS driver abstraction builds completely random ObjectIds, and it's caused support incidents recently.)
[16:40:29] <jumsom> woops py sure is same deal for most clients
[16:40:53] <GothAlice> http://docs.mongodb.org/manual/reference/object-id/
[16:41:07] <kali> GothAlice: you're not fun, you know :)
[16:44:19] <jumsom> im using mongodb native JS driver
[16:45:44] <jumsom> am currently just validating the strings before the BSON.ObjectID invocation ie 24 char hex
[16:55:57] <GothAlice> http://docs.mongodb.org/manual/reference/object-id/ — I validate that the date range makes sense (i.e. no future dates), actually use a per-process incremental counter, and use real machine and process identifiers. All of these things are useful. (For example, having real UNIX timestamps in the ObjectId saves me ever needing a "created" date/time field.)
[16:56:00] <GothAlice> jumsom: ^
[16:57:02] <GothAlice> Real machine/process IDs mean I can check the audit log (which'll include _id ObjectIds) for weird queries, then know exactly which webapp server to check the logs of, and what time range to examine.
[17:10:31] <chetandhembre> hi
[17:11:14] <chetandhembre> open connection for mongodb server are 28000
[17:11:19] <chetandhembre> Is it normal ?
[17:13:18] <GothAlice> chetandhembre: How many application processes (clients) are connected? If each has a pool of connections, they can add up quickly. (Pool size * number of clients + 2 * replicas…)
[17:13:58] <chetandhembre> number of connection per host is 100
[17:14:10] <chetandhembre> clients are about 10 instance
[17:14:20] <chetandhembre> and 1 read replica
[17:15:41] <GothAlice> Yes, then the number of connections you are seeing would be a bit high.
[17:15:55] <GothAlice> Have you tuned the connection pool sizes on the clients?
[17:16:16] <chetandhembre> yeah
[17:16:53] <chetandhembre> I set socketKeepAlive to true
[17:16:56] <chetandhembre> is that ok ?
[17:17:19] <GothAlice> That helps to prevent certain network communication issues at the expense of a tiny bit of bandwidth overhead.
[17:17:46] <chetandhembre> things is my mongodb query is taking so much time
[17:17:50] <GothAlice> (I.e. it sends a "hey, I'm alive!" notice periodically to ensure the connection stays alive even when idle.)
[17:18:50] <chetandhembre> but 28000 connection open is too much
[17:18:59] <GothAlice> chetandhembre: Well, query performance is strongly effected by the structure of your data and your use of indexes; could you pastebin the query and .explain() of the query you are having difficulty with?
[17:19:20] <chetandhembre> Iok
[17:21:39] <GothAlice> chetandhembre: (As a side note on the connections thing, one of my production clusters does _not_ have keepalives enabled and reports an average of 10 connections per app server host right now.)
[17:26:46] <GothAlice> (Connections are multiplied, though: threads * processes * hosts
[17:31:08] <chetandhembre> http://pastebin.com/6iy2mtdV
[17:31:18] <chetandhembre> is get query explained
[17:32:45] <GothAlice> n=0; does that query return anything? millis=0; this query runs instantly. :/
[17:34:12] <chetandhembre> actually this query run on mongodb server with shell
[17:35:37] <GothAlice> What was the query, exactly?
[17:37:53] <chetandhembre> db.demo.find({ "_id": { $in: [ "1303450174", "1044676792", "662652712", "276849495" ] } }).explain()
[17:38:31] <chetandhembre> my actual java client code is this http://pastebin.com/Nq6hqxJP
[17:40:26] <GothAlice> That query is slow? In the shell, the explain reveals that the query should be instantaneous. From your Java code I'd insert "printf school of debugging" output (of the current UNIX timestamp) before line 9 and after line 9, then after the while loop. If your documents are excessively large, streaming them over the wire may be your bottleneck.
[17:41:12] <GothAlice> Also, there's no need to specify a natural sort… it uses natural sort by default. (And lately there have been several people having issues with sort causing the query planner to reject useful indexes, slowing things down.)
[17:41:58] <chetandhembre> yeah my document is are quite large
[17:42:24] <GothAlice> Generally you can specify a second argument to find() to retrieve only the fields you actually care about at the time, omitting bulk data.
[17:42:29] <chetandhembre> ok
[17:43:18] <GothAlice> For example, {_id: 1} as the second argument will only fetch the IDs of the matching records. (Less useful here, but you get the idea.) If you specify other fields, _id will be included by default (use {_id: 0} to exclude it).
[17:44:19] <chetandhembre> actually i am storing key value stuff
[17:44:26] <GothAlice> :|
[17:44:30] <chetandhembre> so I am fetching value only
[17:45:14] <GothAlice> Key/value like {somekey: value}, or key/value like {key: somekey, value: value}?
[17:45:38] <chetandhembre> {_id:key, value:value}
[17:46:00] <chetandhembre> {_id : key, value : value}
[17:46:16] <GothAlice> Are you trying to model EAV using compound keys?
[17:47:34] <chetandhembre> value is large json
[17:47:39] <chetandhembre> is that EAV ?
[17:47:46] <GothAlice> Well, no, what's an example key?
[17:48:39] <chetandhembre> it is instagram user id
[17:48:52] <chetandhembre> value is instagram profile
[17:49:07] <GothAlice> And you have to load and then deserialize the JSON when needed?
[17:50:02] <chetandhembre> yeah
[17:50:04] <GothAlice> Use of JSON BLOBs in a document storage system makes baby pandas cry tears of blood… it really sounds like you're trying to use MongoDB in the least MongoDB-ish (and most expensive to use) way possible, here. :/
[17:51:06] <chetandhembre> DO you have any suggestion for me ?
[17:51:10] <GothAlice> I.e. if you transform that JSON into BSON (BasicDBObject hierarchy in your case) you'll be able to get MongoDB to give you specific, deeply nested fields back rather than needing to load _all_ of it to pull out certain fields, with the extra deserialization step.
[17:51:18] <GothAlice> chetandhembre: My typing is slow today. :)
[17:52:05] <chetandhembre> i want full profile every time
[17:52:31] <GothAlice> I.e. instead of {_id: "GothAlice", value: '{"foo": 27, …}'} with extra deserialization to turn value into a usable object, store: {_id: "GothAlice", foo: 27, …}
[17:53:34] <chetandhembre> but desrialization happen in my code right ? not on mongodb server
[17:53:38] <GothAlice> Either way, using real BSON objects and not storing JSON serialized strings will make your data actually queryable (instead of completely opaque), allow changes to be issued incrementally, allow for intelligent filtering of returned fields to only those needed to process a request, etc., etc.
[17:53:39] <GothAlice> Yes.
[17:55:00] <GothAlice> What's happening right now is that you construct the BSON query data, fire that off to MongoDB, which then returns with the BSON document which is deserialized into a BasicDBObject tree by your MongoDB client driver. Once that's done, your own code takes over and _also_ has to deserialize the JSON data—a second deserialization that is completely unnessicary, and hides the data from MongoDB.
[17:55:51] <chetandhembre> actually my current usecase is only to get all profile back .. converting json to bson will make query fast ?
[17:55:53] <GothAlice> I.e. with your data you can't ask the database: give me all usernames for all users who registered before the beginning of the year.
[17:56:18] <GothAlice> chetandhembre: Storing the data in the correct, native BSON format means you won't have to completely reprocess the data twice on each access.
[17:56:30] <GothAlice> *And* you'll be able to ask it questions.
[17:56:47] <chetandhembre> ok ok
[17:57:12] <GothAlice> MongoDB is not Redis. MongoDB is far more than just a key/value store.
[17:59:10] <chetandhembre> cool
[17:59:45] <chetandhembre> things is i already have started filling db
[18:00:28] <GothAlice> And this is why it's critically important to consider the repercussions of data architecture before beginning to use it.
[18:02:31] <GothAlice> (Storing JSON strings in MongoDB is like storing CSV TEXT in MySQL… neither are things that will work out well in the end, and both ignore the features, and _purpose_, of the respective database architecture.)
[18:07:33] <chetandhembre> yeah
[18:09:20] <kibibyte> HELLOO
[18:33:21] <chetandhembre> is mongodb is better solution for storing key value data ?
[18:38:11] <GothAlice> chetandhembre: There are two situations where I store arbitrary key-value data. The first is a metadata filesystem built on MongoDB, which uses documents with arbitrary fields. I.e. {kind: 'music', artist: 'Alice Bevan-McGregor', …} — not every document in the collection will have an "artist" field; in fact, only "music" and "picture" document types do.
[18:38:13] <cheeser> better than...?
[18:38:42] <chetandhembre> cheeser: may be counchdb
[18:39:39] <cheeser> well, couchdb is a k/v store so it's been optimized for that...
[18:39:44] <GothAlice> Indeed.
[18:40:05] <cheeser> you *can* use mongodb as a k/v store but that's not really what it was meant for.
[18:40:21] <chetandhembre> GothAlice : yeah ...
[18:40:30] <GothAlice> (And sparse indexes.)
[18:58:00] <jmacdonald> hello.
[18:59:45] <jmacdonald> i am trying to connect with python, i have this 3 liner https://gist.github.com/jeffmacdonald/9c720ce7062ac1e0acd1
[19:00:12] <jmacdonald> i created "test" before authentication was enabled. then i enabled authentication and somewhat expecte this to work. i'm not sure why it isn't.
[19:02:03] <GothAlice> jmacdonald: The database you are using matters greatly for authentication; if you created the admin user in the admin database, you have to tell your client to authenticate against *that* database before switching.
[19:02:09] <GothAlice> jmacdonald: (Authentication is per-database.)
[19:02:24] <cheeser> (which is mildly annoying)
[19:03:19] <cheeser> just makes authentication tricky for things like mongo-hadoop
[19:03:30] <cheeser> makes the configuration a bit more complex than it could be
[19:03:43] <GothAlice> True that; also a fail for obviousness.
[19:04:00] <cheeser> yeah. the obviousness piece is the most annoying.
[19:04:02] <jmacdonald> GothAlice: when i created "test" i was totally anonymous.
[19:04:13] <GothAlice> jmacdonald: But then you enabled authentication.
[19:04:17] <jmacdonald> right.
[19:04:19] <GothAlice> Where did you add your first admin user?
[19:04:37] <jmacdonald> well. i let a chef community cookbook do it, so let me go read that to find out :)
[19:04:43] <GothAlice> -_-
[19:05:15] <jmacdonald> okay, in admin
[19:05:23] <jmacdonald> or perhaps that is the db it created. more reading.
[19:05:44] <GothAlice> The admin database AFIK always exists; it's important to the functioning of the database.
[19:06:06] <GothAlice> And yeah, that's where the first admin user is usually created, or otherwise you'd lose control of your server.
[19:06:36] <jmacdonald> okay, so in that paste i provided, i replaced the word "test" with "admin" and auth still fails.
[19:07:07] <GothAlice> jmacdonald: So, step one, can you connect, authenticate, and do things in the mongo shell?
[19:08:53] <GothAlice> jmacdonald: http://api.mongodb.org/python/current/examples/authentication.html#delegated-authentication — try client.test.authenticate('admin', 'admin', source='admin')
[19:08:57] <jmacdonald> i can locally. testing via network now. (to rule out pyhon i take it)
[19:09:15] <GothAlice> This will at least confirm that the user exists with the password you expect in the database you expect. :)
[19:10:46] <jmacdonald> https://gist.github.com/jeffmacdonald/4baa704669fde8de47aa obviously user does not exist.
[19:10:59] <jmacdonald> if i connect locally without a password , how can i list all users?
[19:11:01] <GothAlice> Or password is incorrect.
[19:11:13] <jmacdonald> oh i see. one sec.
[19:11:33] <jmacdonald> show users shows .. nothing at all.
[19:11:40] <jmacdonald> that explains much :)
[19:12:07] <GothAlice> A ha.
[19:12:10] <GothAlice> You need to add your first user.
[19:12:30] <GothAlice> You're only able to connect due to the "localhost" exception (when no users are configured, but authentication is).
[19:13:01] <GothAlice> jmacdonald: db.createUser({user: 'admin', pwd: 'admin', roles: ['readWriteAnyDatabase', 'userAdminAnyDatabase', 'dbAdminAnyDatabase', 'clusterAdmin']});
[19:13:15] <GothAlice> jmacdonald: That'd be the line you want to run in the mongo shell within the 'admin' database.
[19:13:32] <jmacdonald> i'm sorta learning chef and mongo at the same time here. so bit of a chicken/egg on the go.
[19:14:03] <jmacdonald> thats fine. i am :)
[19:17:00] <GothAlice> Chef, puppet, fabric, buildout, salt, ansible, blueprint, … it's a heavily reinvented wheel, that one. ;)
[19:17:33] <jmacdonald> how would you go about managing 1000 servers?
[19:17:59] <GothAlice> And MongoDB operating as a push message broker.
[19:18:26] <jmacdonald> sounds gross :)
[19:19:17] <jmacdonald> okay, i got everything working now.had to include one line in chef recipe. all good.
[19:19:19] <GothAlice> jmacdonald: Pragmatic automation. I push, Github notifies one of the controllers, the controller notifies all affected servers via MongoDB capped collection, they pull, a post-merge hook runs which identifies files changed in the commit, which packages own those files, which of those have init.d scripts, and automatically reloads/restarts as appropriate.
[19:19:23] <GothAlice> :)
[19:20:04] <jmacdonald> neat to learn about roles now, but i'm on my way.
[19:20:17] <GothAlice> (Going from a "generic server" to any of the roles, such as "application server" or "database secondary" is a git checkout to another branch; even init.d service associations are handled in Git via symlinks.)
[19:20:43] <jmacdonald> its neat. I'm not going to counter argue :)
[21:55:31] <BadHorsie> Is there something like Kibana for MongoDB?
[23:21:55] <huleo> hi
[23:30:39] <tskaggs> Hi! I'm trying to bulk update a mongodb collection with a new format for pricing and not sure how to do that efficiently. I'm using sails.js as my MVC and just running my functions through there. The items get update as I need them but timeout when I have to hit the update() function. http://hastebin.com/bikituqogu.coffee