pmxbot IRC Log Viewer

[00:00:27] <GothAlice> Their recommended approach, then? *digs through Exocortex for that book*

[00:00:57] <Boomtime> I cannot think of why you'd want to store a random number (in the document) for the purposes of selecting a random document

[00:01:39] <Boomtime> you want the documnents to form an perfectly flat distribution (no randomness) then select an index in that distribution

[00:01:52] <Boomtime> i.e select an index at random

[00:02:29] <Boomtime> how do you select a document, what is the equation?

[00:03:04] <GothAlice> (typing)

[00:03:38] <Boomtime> for example: a die has the 6 faces marked in a perfectly flat distribution (1,2,3,4,5,6, no randomness), the face is then selected at random from the set

[00:03:39] <GothAlice> Generate a pseudo-random number with a recorded starting seed, query the database for records with their recorded random number equal to or greater than that one. If multiple records returned, pick one using another step in the PRNG modulo the number of choices. You can then replay the same (random) choice using the original dataset and the starting seed, which would be actually randomly chosen.

[00:03:45] <GothAlice> Bam.

[00:04:18] <Boomtime> "recorded random number equal to or greater"

[00:04:26] <Boomtime> where did this come from?

[00:04:42] <Boomtime> there is a number stored in the documents generated at some earlier time?

[00:04:53] <GothAlice> Yes. Generated when inserted.

[00:05:01] <Boomtime> then you are not selecting randomly

[00:05:05] <Boomtime> you have bias

[00:05:25] <GothAlice> Yes. Positive bias towards the next approximate match.

[00:05:37] <Boomtime> no, bias in the gaps between

[00:06:03] <GothAlice> Yes; which results in positive bias in returned results, which "slide" up the gap to the next approximate match.

[00:06:06] <Boomtime> you bias those documents which have the earlier luck of having a largely chunk of the number space

[00:06:33] <Boomtime> you should try your method with only a few documents and you'll see how disatrous it is

[00:06:41] <GothAlice> I have many records.

[00:06:57] <Boomtime> it doesn't matter

[00:07:01] <Boomtime> the bias remains

[00:07:06] <GothAlice> Such bias is also acceptable.

[00:07:24] <GothAlice> The P is for Pseudo.

[00:07:28] <Boomtime> yikes, then why bother thinking you have any randomness at all?

[00:08:01] <Boomtime> it isn't pseudo-random either, you're distribution is not flat

[00:09:19] <GothAlice> It is if it describes a balanced binary tree; the random choice, which might not exist, is between two evenly balanced sides.

[00:10:02] <Boomtime> huh? there is no tree in your implementation

[00:10:20] <Boomtime> you have a single number space 0-1, all documents have a "random" number that puts them on that line

[00:10:48] <Boomtime> you then select a "random" point on that line and slide up to hit the first document

[00:11:18] <Boomtime> you have bias, documents on the number line with slightly larger spaces than others get selected more frequently

[00:11:19] <GothAlice> Pseudo-random. ¬_¬ The first insert gets 0.5 — it divides the available space. Now there are two spaces. The next insert gets to divide either the left- or right-hand spaces—this is chosen randomly. Etc.

[00:11:44] <GothAlice> Ad infinitum. Thus accuracy issues.

[00:11:52] <GothAlice> And the original issue. ^_^

[00:12:28] <Boomtime> "The first insert gets 0.5 " <- not random anymore?

[00:13:02] <GothAlice> The first one isn't. The next one is. The one after that isn't (it takes the opposite space.) The one after that is (between four available equal spaces).

[00:13:08] <GothAlice> &c.

[00:13:55] <GothAlice> But the choice is between [0.25, 0.75] on the second insert; not the whole 0-1 range.

[00:16:12] <GothAlice> This evenly distributes the range… thus the "slide up gap" between each will be the same, excluding any orphan leaves in the tree, which will have a bias. However, for this application, such (manageable and minor) bias is acceptable.

[00:19:31] <Boomtime> curious system, so it still has bias but you are constantly in a process of reducing it

[00:20:00] <GothAlice> Well, after inserting three records you'll have 4 equal spaces. No bias.

[00:20:09] <Boomtime> it seems to require significant maintenance.. how do you know what ranges to split on insert?

[00:20:26] <Boomtime> and how do you deal with deletes?

[00:20:31] <Boomtime> do you keep a list somewhere?

[00:21:16] <Boomtime> btw, there is a relatively simple method of selecting documents at random, requiring similar overhead as yours, and has more precision than you'd ever need

[00:21:50] <GothAlice> I have a tiny service generating the next division. It's a stateful generator (capable of resuming) — it's done algorithmically, so I don't really need to track the members of the sets, only the ever reducing range of the "current" set, and the "current" depth in the tree.

[00:22:13] <Boomtime> how do you deal with deletes?

[00:22:15] <GothAlice> And the collection is insert-only.

[00:22:24] <Boomtime> ah.. ok, not a general purpose solution then

[00:22:41] <GothAlice> … almost nothing I do for my own research is general-purpose. ^_^

[00:23:08] <Boomtime> also, the bias condition is by far more common than the relatively rare in-balance/unbiased arrangements

[00:23:46] <Boomtime> but it is at least limited to 2x selectivity, so I suppose you at least knows the bounds of the bias

[00:24:05] <GothAlice> Indeed, but the level of imbalance graphed against number of records will resemble a sawtooth; simply maintain records at the 0 points.

[00:25:20] <GothAlice> (And yes; bias is purely calculable for any given number of records in the tree.)

[00:25:42] <Boomtime> since you have insert only, i expect you have a much simpler method of selecting a doc at random: just have a single 64bit int in each doc, a simple counter

[00:26:16] <Boomtime> first doc =1, second = 2... now pick a random number between 1 to count() inclusive

[00:26:49] <Boomtime> flat distribution at all times, and the random number is completely under your control at all times for replayability

[00:28:54] <GothAlice> The data is already a tree… I was trying to kill two birds with one stone. (The exact range of any given space in the tree or point of division, indexed linearly, is calculable as well…)

[00:29:44] <Boomtime> if your goal isn't to select a random document, then you have some other purpose in mind

[00:29:55] <GothAlice> I'm trying to both store the tree in a very queryable way, but also be able to find random points of entry into those trees.

[00:30:03] <GothAlice> *sub-trees

[00:30:37] <Boomtime> these are different goals, a binary tree is a search system

[00:30:56] <GothAlice> … you've not heard of binary space partition trees?

[00:31:14] <Boomtime> obviously

[00:31:55] <Boomtime> I'm not the one trying to use one for random selection

[00:31:56] <GothAlice> Popular use has them as the storage mechanism for level data in 3D games, as it solves the "what can I see from point x/y/z" problem.

[00:32:19] <Boomtime> yeah, I might have more experience with that than you realize

[00:35:47] <Streemo> does anyone know of a guide to using mongo as a graph db? is this feasible?

[00:36:47] <cheeser> it's not advisable, no.

[00:36:56] <GothAlice> Streemo: Generally not, though technically possible.

[00:37:08] <Streemo> why no?

[00:37:10] <cheeser> if you want a graph db, use neo4j

[00:37:20] <GothAlice> Streemo: At work we learned the hard way to use a graph database to store graphs… don't try to roll something under a different model.

[00:37:23] <cheeser> because mongo doesn't model your data as a graph

[00:38:15] <Streemo> hmmm

[00:38:21] <Streemo> alright i guess that makes sense.

[00:38:38] <GothAlice> Streemo: We store our node data in MongoDB, but the graph in neo4j.

[00:39:04] <Streemo> so the look ups are done in neo4j

[00:39:11] <Streemo> and you grab data from mongo?

[00:40:34] <GothAlice> For the most part, yes. There's NLP on the Java side of things, too.

[00:53:32] <bramgg> I can not for the life of me add anything but Strings to MongoDB documents in Java. Can anyone help? http://pastebin.com/80nNVLMm

[00:54:39] <bramgg> For example "put"ing/"append"ing ("foo", "bar") works fine.

[01:13:41] <bramgg> If I go AFK but someone has an answer, I've posted the question on StackOverflow: https://stackoverflow.com/questions/26899694/create-mongodb-document-with-non-string-values-in-java

[01:27:24] <leotr> hello. I have big collection (about 70Gb). I get poor performance when i scan it. I scan it in sorted order and i created index for that. I even try to hint it but that doesn't help. Looks like it's stuck in some place and that's all...

[01:40:16] <GothAlice> leotr: What's the result of an .explain() on that query?

[02:31:30] <jchia> Hi, why is it recommended to run mongod with numactl --interleave=all on a NUMA machine? How does it help performance? I think it's going to allocate memory on multiple nodes without controlling the cores that the kernel schedules mongod threads to. Couldn't the kernel end up scheduling some mongod thread on a core that has slow access to some memory? Doesn't it make more sense to pin both the memory and the core to the same one for better affinity?

[04:51:03] <Lonesoldier728> hey keep getting an error error MongoError: Unsupported projection option: $meta ... anyone know why cant find too many complaints on google

[04:51:55] <Lonesoldier728> http://pastebin.com/Y8G7cGD9

[04:57:28] <Boomtime> Lonesoldier728: in a mongodb shell, please type db.version()

[04:57:47] <Lonesoldier728> I just found something that mentions I have to enable text search

[04:58:00] <Lonesoldier728> using heroku so trying to figure out how to enable it on mongolab

[04:58:48] <Boomtime> if you have a full text index, and you are using 2.6+ then text search is enabled by default

[04:59:02] <Boomtime> if you are using 2.4.x then it doesn't matter because $meta is not supported

[04:59:58] <Lonesoldier728> i am using mongoose

[05:03:05] <Boomtime> Lonesoldier728: in a mongodb shell, please type db.version()

[05:05:34] <Lonesoldier728> yeah have to connect mongo to my shell it is showing up as not found

[05:09:17] <Lonesoldier728> hm yeah 2.2 on my machine but 2.4 on heroku

[05:12:03] <Boomtime> neither 2.2 or 2.4 support $meta

[05:12:04] <Lonesoldier728> Boomtime just got version 2.6 on my terminal but I am assuming that does not affect what I have on my servers

[05:12:13] <Lonesoldier728> http://blog.mongolab.com/2013/02/experiment-with-mongodb-2-4-on-mongolab/

[05:12:28] <Boomtime> i'm not sure what you mean

[05:12:31] <Lonesoldier728> that is the add-on for heroku I am using for mongodb which apparently has textsearch enabled

[05:12:37] <Boomtime> you mean you are using a 2.6 client shell?

[05:13:15] <Boomtime> client version and server version are independant, though old clients may or may not work with new servers

[05:13:20] <Lonesoldier728> yeah

[05:20:38] <Lonesoldier728> On November 19, 2014, all free Sandboxes databases running MongoDB 2.4 will be upgraded to MongoDB 2.6

[05:20:44] <Lonesoldier728> do I have to wait for that

[05:21:25] <Lonesoldier728> damn yeah since I am using sandbox

[05:25:29] <irc_smirk> hello

[05:25:57] <irc_smirk> im looking for the php driver for osx - a precompiled binary ( mongo.so)

[05:26:03] <irc_smirk> i am not able to compile i with pecl

[05:26:14] <irc_smirk> referencing this page https://thomashunter.name/blog/getting-the-php-mongodb-driver-installed-with-mamp-on-os-x/

[05:26:21] <irc_smirk> unforunately the downloads page on github is no longer there

[05:26:34] <irc_smirk> anyone have mongo.so on osx/linux they cna upload somewhere

[08:08:28] <Demp> pymongo has no size method for cursors?

[08:09:17] <kali> Demp: it should be called count()

[08:10:02] <Demp> I thought count() ignores limit()

[08:10:26] <kali> ha yes, it does

[08:18:40] <ldksofdjk> Ive been learning, mongodb and am trying to get my head around how the expires ttl works. Ive read stackoverflow posts and have added code to make it work (from my understanding of how it works). Am pretty stumped as to why it does not work. Can any one point me in the direction of why my data is not deleting - http://pastebin.com/DGbbUeq0

[08:37:50] <nonuby> would a "text" index be right for field firstName and lastName where a user might search either/or and only partial so query would be {$or [ { "firstname": /nonub/ }, { "lastName": /nonub/ }]} at the moment it takes several seconds on 100k docs

[08:52:48] <ldksofdjk> Ive been learning, mongodb and am trying to get my head around how the expires ttl works. Ive read stackoverflow posts and have added code to make it work (from my understanding of how it works). Am pretty stumped as to why it does not work. Can any one point me in the direction of why my data is not deleting - http://pastebin.com/DGbbUeq0'

[09:47:03] <jchia> Hi, why is it recommended to run mongod with numactl --interleave=all on a NUMA machine? How does it help performance? I think it's going to allocate memory on multiple nodes without controlling the cores that the kernel schedules mongod threads to. Couldn't the kernel end up scheduling some mongod thread on a core that has slow access to some memory? Doesn't it make more sense to pin both the memory and the core to the same one for better affinity?

[09:55:48] <braz> @jchia MongoDB places an atypical workload on a system, where it as a single large multithread process consumes all the systems memory. If NUMA isn't disabled then strange performance problems can happen. This is the case when you are running a single mongod on the machine. NUMA rewards threads who access their own memory more frequently, while punishing threads who access each others memory. MongoDB, and any other multithreaded database, doesn't work well with

[09:57:13] <jchia> braz: so, why not just pin the CPU and memory to one node so that the CPUs that mongod gets scheduled on always access the most local memory?

[09:57:26] <jchia> braz: This is not the same as --interleave

[10:05:51] <braz> jchia: for multiple mongods on a host pinning with the likes of cgroups is a strategy

[10:06:19] <jchia> braz: what about single mongod?

[10:06:22] <braz> jchia: however if you are running only a single mongod you should use the all option

[10:07:45] <jchia> braz: which all option? do you mean --replIndexPrefetch all?

[10:08:29] <braz> jchia: I'll follow up later gotta go for a while

[10:57:31] <kas84> hi!

[10:58:25] <kas84> does anybody know why object size in a capped collection is smaller than a regular one?

[10:59:43] <kali> what ? you're capping the collection size, not the object size

[11:10:59] <TCMSLP> Hi all, we've disabled anonymous login (auth=true) which has prevented anonymous users from querying anything useful.

[11:11:33] <TCMSLP> However, is there a way to actively deny connections to mongodb without auth, besides firewall rules?

[13:47:39] <slvrbckt> hi all, im having a very difficult time getting a shard key to take with my new sharded cluster. seems anything i choose I get a (rather obtuse) error message of " Uniqueness can't be maintained unless shard key is a prefix "

[13:48:08] <slvrbckt> i've tried several combinations. i have a few different "unique index" fields, and always one of them comes up in the error message

[13:48:33] <slvrbckt> i really dont understand that error message

[13:48:48] <slvrbckt> and ive done a bunch of searching, but nothing really clarifies what im doing wrong

[13:49:01] <slvrbckt> im using an existing collection of already over 3+ million records

[13:51:39] <slvrbckt> does anyone know what on earth "unless shard key is a prefix" means? a prefix? of what?

[13:53:12] <kas84> guys, anybody tried 2.8 rc yet?

[13:53:40] <kas84> I want to try wired tiger storage engine, how can I do that?

[14:17:17] <omid8bimo> hello, i need help, i have 2 server in a datacenter in europe. my dataset is around 570 GB; i just installed a node in asia as a secondary-only node. it starts replicating when configured in replicaSet but when its near the end, like 500 GB already synced, im starting to get errors like these:

[14:17:23] <omid8bimo> [rsBackgroundSync] replSet not trying to sync from kookoja-db1:27017, it is vetoed for 477 more second

[14:17:30] <omid8bimo> conn13] assertion 13436 not master or secondary; cannot currently read from this replSet member ns:config.settings query:{}

[14:18:15] <omid8bimo> and primary says this node become stale

[14:18:47] <omid8bimo> can somebody help me? i tried this approach 2 times; a full clean resync, but boths times failed near the end

[15:26:19] <wblu> Does anyone know if a package for 2.8.0-rc0 will be published to http://downloads-distro.mongodb.org/repo/ubuntu-upstart/?

[15:49:07] <omid8bimo> can somebody tell me how a full resync works? i have a primary and a slave, and adding a new secondary in a different datacenter in another country. but the full resync stops near the end and says its become stale!

[16:09:03] <omid8bimo> deoes oplog size matter when doing initial sync?

[16:19:14] <kali> omid8bimo: it matters very much. it has to cover enough time for the main sync to take place

[16:36:37] <speaker1234> GothAlice, are you around?

[16:42:38] <GothAlice> speaker1234: I can be; I'll be walking into another meeting once one of my bosses gets off the phone, though.

[16:43:33] <speaker1234> GothAlice, very quick question. Have potential customer with an old Informix database. They need to upgrade and one of the requirements is simple, non-technologist friendly report generation. Do we have the tools in this area?

[16:44:06] <speaker1234> GothAlice, by tools in this area I mean tools that work with Mongo DB

[16:44:32] <speaker1234> one possibility might be a Libre office bridge

[16:45:35] <GothAlice> speaker1234: Alas, right now I have those tools, but they're fairly deep down the chain for extraction and open-sourcing.

[16:46:32] <speaker1234> ok. thanks. If anything comes of it, you're the first I will call

[16:46:48] <speaker1234> as you know, consulting, it's lots of nibbles, very few bites

[16:46:56] <speaker1234> everybody wants everything for free

[16:47:15] <GothAlice> speaker1234: http://cl.ly/image/0n300z2e3p08 — sample report; can't screenshot real data, though. ;)

[16:47:59] <speaker1234> talk about this later when it becomes real

[17:16:08] <malev> hello! I'm having a super weird issue. I have a webapp with a mongodb and sometimes gives me a 500 because: [conn21] auth: couldn't find user

[17:16:12] <malev> that's from mongo's logs

[17:16:25] <malev> but some others, just work: authenticate db: mapa76 { authenticate: 1, user: "...

[17:16:47] <malev> Usually works one time and fails the second. any Idea on what can be happening?

[17:48:08] <ahawkins> Hello everyone. Do I need to do a fs lock before mongodump?

[17:52:52] <cheeser> no

[17:53:58] <braz> @ahawkins only where your journal's on another volume or you are running without journaling (v bad idea) http://docs.mongodb.org/manual/tutorial/backup-with-filesystem-snapshots/#create-backups-on-instances-that-do-not-have-journaling-enabled

[18:03:14] <ahawkins> braz: roger. Should I restart other processes connected to the server?

[18:04:12] <ahawkins> noticed a process throwing records not found after doing a mongo restore

[18:08:37] <braz> @ahawkins not sure I understand the context fully, the best place is to add a fuller description in https://groups.google.com/forum/#!forum/mongodb-user

[18:08:52] <GothAlice> ahawkins: Your application may have cached some old (and no longer valid) ObjectIds.

[18:09:09] <GothAlice> ahawkins: Generally a good idea to kick app servers in the pants after restoring data.

[18:11:07] <GothAlice> ahawkins: Also, no, mongodump communicating with a mongod process (rather than directly accessing the datafilee on-disk) should never require a filesystem lock; locking is needed in the aforementioned scenarios only when attempting filesystem-level (instead of database-level) backups.

[18:11:54] <ahawkins> GothAlice: wondering if I need to force a flush of some sort before doing mongodump

[18:13:16] <GothAlice> ahawkins: You should not. Depending on use of http://docs.mongodb.org/manual/reference/program/mongodump/#cmdoption--oplog either records added/modified/deleted during the backup are ignored or the additions/modifications/deletions happening during the time it takes to do the dump will also be recorded.

[18:14:49] <GothAlice> (And by ignored, I mean you'll actually have some undefined behaviour. Modifications may be partially applied, ignored, or successful, deletions may be noticed or not, etc.)

[18:35:47] <ahawkins> GothAlice: thanks

[18:36:07] <omid8bimo> kali: oh ok. so its like when the initial sync begins, it marks the state of the dataset, starts from begining up until that mark, and reads the rest from oplog? is this assumption correct?

[18:36:38] <GothAlice> omid8bimo: Roughly, yes.

[18:36:56] <GothAlice> Then on restore it restores the original records as received, then applies the oplog.

[18:39:07] <GothAlice> Ah, different things, but roughly the same process. Initial replica syncs are handled in pretty much the same way as mongodump --oplog, except the synchronization continues forever afterwords, streaming the oplog from the primary.

[18:42:06] <omid8bimo> GothAlice: ok, so what could be my problem here? i did the initial sync twice and both times, i was near the end of replication when i got the "member become stale" error message

[18:42:28] <omid8bimo> my dataset is near 600GB and transfering over 7MB/s bandwidth

[18:43:57] <kali> omid8bimo: check out you oplog depth (rs.printReplicationInfo()) should give it to you

[18:44:15] <kali> omid8bimo: it has to be longer than your initial sync duration

[18:44:33] <omid8bimo> configured oplog size: 51200MB

[18:44:35] <omid8bimo> log length start to end: 99659secs (27.68hrs)

[18:44:47] <GothAlice> omid8bimo: Holy mackerel.

[18:45:13] <kali> and how long does the replication take ?

[18:45:17] <omid8bimo> i thinks with that bandwidth, initial sync is far longer than oplog capacity! am i correct?

[18:46:20] <GothAlice> omid8bimo: In terms of raw bandwidth, that'd take 23.8 (base 1000) or 24.38 (base 1024) hours. It's cutting it way too close. (Since there is extra time overhead beyond just bandwidth transfer.)

[18:46:31] <kali> 600000/7 is about 1 day

[18:46:38] <kali> so it's close

[18:46:53] <GothAlice> 23.8 or 24.38 depending on the unit meant by "GB". ;)

[18:46:58] <kali> yeah :)

[18:47:04] <omid8bimo> damn! so what's the solution here guys? :)

[18:47:19] <GothAlice> omid8bimo: Further increase your already rather astounding oplog size. :|

[18:47:27] <kali> grow the oplog. it's a bit tricky, but there is a procedure somewhere

[18:47:47] <kali> omid8bimo: http://docs.mongodb.org/manual/tutorial/change-oplog-size/

[18:48:33] <kali> omid8bimo: at this side of DB and oplog, it start to make sense to consider sharding (but that is probably not something you want to do today)

[18:48:40] <kali> s/side/size/

[18:49:09] <omid8bimo> kali: yeah i've seen that article; and im rather scared to do it.

[18:49:22] <GothAlice> omid8bimo: http://stackoverflow.com/a/13095913 is a SO answer to a slightly different question, but covers some important details such as maintaining equal oplog size amongst replica set members. http://www.briancarpio.com/2012/04/21/mongodb-the-oplog-explain/ for more general information.

[18:50:09] <omid8bimo> GothAlice: thanks dude

[18:50:50] <kali> omid8bimo: if you have a 3-node replica set working on your current datacenter, you're not taking that big a risk

[18:51:22] <GothAlice> omid8bimo: I've got 25 TiB of data in a three-shard with three replicas for each shard. Heavily write-only dataset with narrow scope queries (i.e. I almost never table scan… uuuuugh it's bad when it does) and maintaining a 24-h oplog requires about 96 GiB of space dedicated to it.

[18:51:39] <omid8bimo> kali: its 2 node actually. one primary, one secondary, and i had a arbiter which i removed to replace it with a new secondary

[18:52:39] <kali> omid8bimo: if it is cloud based, you can create two new nodes in the current DC with the right oplog size, make them join and then remove the old ones

[18:53:31] <kali> GothAlice: 8TB/shard ? on the big side for my taste, but whatever works :)

[18:53:53] <omid8bimo> kali: no its not cloud based. first question that im faced regarding the oplog size increase procedures, is that should take down the whole databases to do that?

[18:54:34] <kali> omid8bimo: not for long

[18:54:40] <kali> omid8bimo: if you do it right :)

[18:54:46] <kali> do you still have the arbiter ?

[18:55:05] <omid8bimo> kali: no, removed

[18:55:27] <kali> and the new node is already there i guess ?

[18:55:36] <GothAlice> kali: Yeah, with data stored over iSCSI across three Drobo arrays, one array per replica, shared between the shards, which is a weird setup.

[18:55:53] <omid8bimo> kali: yeah it is, but stuck at STARTUP2 phase.

[18:55:56] <kali> GothAlice: irk, hardware :)

[18:56:14] <GothAlice> (Lose array A, and the primary of set A goes away and a secondary from each of set B and C. ;)

[18:57:16] <kali> omid8bimo: the bottom line is: you need to have a majority of working nodes in your cluster at all time

[18:57:25] <GothAlice> omid8bimo: You'll need to scale up the oplog before replication can even succeed. :(

[18:57:45] <kali> omid8bimo: so i suggest you reconfigure to make the current remote node an arbiter, to restore some kind of sanity to your RS

[18:57:48] <GothAlice> omid8bimo: Until that's done, I'd add the arbiter back. (I only remove an arbiter after the node to replace it is live.)

[18:58:34] <kali> omid8bimo: then, you'll stop the secondary, and perform the oplog growing procedure on it. during that time, you'll have no redundancy and no secondary. everything will go to your primary

[18:59:26] <kali> omid8bimo: one the secondary is regrown back in the RS, and has catch up, you'll have to stepDown() the current primary. the secondary will become primary and your app should recover quickly

[18:59:56] <kali> omid8bimo: you'll stop your old primary and perform the oplog growing procedure on it, and restart it

[19:00:15] <kali> omid8bimo: then you'll reconfigure to make you arbiter a full data node, and it should resync :)

[19:00:27] <GothAlice> omid8bimo: The above should work nicely considering the active secondary will only have to catch up for the time needed to grow the oplog, which should be far less than 24h, then the old primary will catch up from the old secondary (which would become primary). :)

[19:00:52] <kali> if you don't make mistake, you'll have a few errors when you'll stop the secondary, and a nice bunch of errors when stopping the primary

[19:01:14] <kali> yeah, growing the oplog itself should be a matter of seconds

[19:01:21] <kali> assuming you have a modern file system

[19:03:12] <kali> sounds frightening, but... have you seen philae landing ? :)

[19:03:36] <omid8bimo> file system is xfs

[19:03:42] <omid8bimo> kali: yeah i saw it :D

[19:04:59] <kali> omid8bimo: yeah, xfs is fine. growing the oplog will take seconds

[19:05:57] <kali> omid8bimo: also, plan ahead. you don't want to resize it again in two monthes

[19:06:25] <GothAlice> Resizing is somewhat painful, and a little napkin calculation can save you a lot of time later.

[19:07:14] <omid8bimo> all right. so each node must have its own resize stuff done?

[19:07:25] <GothAlice> (I.e. we benchmarked our distributed RPC capped collection to be 8GB to handle one million simultaneous users' activity based on average document sizes, estimates of the number of events for any particular game session and whatnot.)

[19:07:45] <GothAlice> omid8bimo: Yeah; you want the oplogs to be equally sized across the cluster. (Or, as the SO answer mentioned, expect the one with shortest oplog to win.)

[19:07:56] <kali> yep, the two nodes you have already need a resize, and the remote third too, but it should be easier :)

[19:08:31] <rendar> GothAlice: lack of inode limitations? what you mean? that reiserFS implements all inode functions, while other FS don't?

[19:08:31] <kali> if you can afford it, having 4 ou 5 days of oplog can be a life saver

[19:08:43] <GothAlice> rendar: df -i

[19:08:58] <rendar> GothAlice: i'm not on unix right now, what you mean?

[19:09:20] <GothAlice> rendar: ext* family filesystems have a limit on the number of inodes based on a division of the block device size by block size chosen when initially formatting. (Usually 4KB.)

[19:09:28] <GothAlice> rendar: ReiserFS ignores that silliness.

[19:09:39] <rendar> oh! that's right

[19:09:45] <rendar> yeah

[19:10:07] <omid8bimo> GothAlice: kali: ok, so steps are that i dont touch the primary; add an arbiter to keep the vote/replicaset balance; bring down secondary, increase oplog, bring it up, step down primary so secondary with new oplog becomes primary, then increase oplog on this node, then add the new secondary with initial sync

[19:10:28] <rendar> GothAlice: ignoring that means that you can easily scale up to very huge files, without caring about block size and those stuff, i guess

[19:10:51] <GothAlice> rendar: And more importantly, never have "disk full" errors if some silly user of yours creates billions of one-byte files.

[19:10:59] <kali> omid8bimo: that's the spirit

[19:11:05] <rendar> GothAlice: yeah

[19:11:32] <GothAlice> (Getting a disk full error and having df -h show free space is one of the most confusing sysop things I have ever encountered… long ago. ;)

[19:12:24] <rendar> GothAlice: ext* fs have a minimum 4KB block for each files, so a bunch of little files would take a bunch of 4KB blocks, instead in reiserfs a 6bytes file would take only 6bytes (plus some other structure) but you got the point

[19:13:28] <GothAlice> rendar: This is particularly critical since several of my applications make rather extensive use of hard links. Also that's only true if you enable tail packing; there is still a "block size" in Reiser, it's just hidden away for the most part. (Tail packing allows the last block of a file, with free space at the end, to store another sub-block file at the end efficiently. Makes disk recovery darn hard, though.)

[19:14:01] <rendar> yeah

[19:14:12] <rendar> reiser has a block size, but i bet its not 4Kb

[19:14:43] <GothAlice> (It's set dynamically based on several factors when formatting.)

[19:14:48] <GothAlice> My favourite call from a cleanroom recovery company ever: "Uhm… I don't know how you did it, but this drive you sent in has six petabytes of data on it, in 14 trillion files. Which ones do you want?" XD

[19:15:11] <rendar> lol

[19:15:15] <omid8bimo> one question though, can i do these with the new secondary in STARTUP2 state? so i dont need to add back the arbiter?

[19:15:35] <rendar> 6 petabytes? what was that?

[19:15:42] <kali> omid8bimo: make it an arbiter

[19:15:47] <GothAlice> omid8bimo: No. I'd nuke the new secondary until such time as the oplogs have been scaled up, adding the arbiter back.

[19:16:10] <GothAlice> rendar: Hourly backups for three years, with unmodified files hard-linked to their prior versions. :3

[19:16:31] <GothAlice> rendarn: (Their recovery process couldn't identify hard links, so it treated each link as a distinct file.)

[19:16:43] <rendar> holy shit! but what kind of drives was it?! a cluster or what?

[19:16:50] <rendar> drive*

[19:16:55] <GothAlice> One 2 TiB drive.

[19:16:59] <omid8bimo> all right.

[19:17:30] <GothAlice> (Containing 1.3 TiB of de-duplicated data.)

[19:17:32] <rendar> GothAlice: lol

[19:17:57] <omid8bimo> so i should do all the steps in here? -> http://docs.mongodb.org/manual/tutorial/change-oplog-size/ - such as starting secondary in standalone, saving the oplog etc?

[19:22:38] <kali> omid8bimo: yes, but only after making sure you can stop the secondary (which means your RS must have the primary and an arbiter)

[19:24:25] <omid8bimo> kali: last time i tried to stop secondary i got these errors -> http://paste.debian.net/131644/

[19:24:31] <omid8bimo> does they matter?

[19:24:36] <omid8bimo> *do they..

[19:25:29] <kali> where do they come from ?

[19:26:00] <kali> as long as the primary is still happy, you're fine, anyway

[19:26:59] <GothAlice> (If you see messages about "read only" and "majority of nodes" on the primary, you may have a problem. If not, you'd be good to go to scale up that first secondary.)

[19:27:17] <omid8bimo> so the whole point is that primary stays put?

[19:28:50] <kali> the whole point is that the primary stays primary. for that, it needs to stay in contact with a strict majority of running nodes and arbiters

[19:29:46] <kali> so with an arbiter, when you take the secondary down, as long as the primary "sees" the arbiter, it will stay up. if it disappear, it will step down and become a read only secondary

[19:36:11] <GothAlice> (The arbiter confirms for the primary that it was actually the secondary that lost its connection / went away, and not the primary itself disappearing into a black hole. ;)

[19:37:31] <GothAlice> (To which the primary effectively says, "so what?" and keeps going. Without the arbiter the primary would think it lost its connection and assumes a secondary took over primary duties.)

[19:37:56] <omid8bimo> gotcha

[19:54:10] <omid8bimo> kali: ok, secondary's oplog increased and it caught up to primary; should i wait for it to populate its oplog?

[19:54:53] <omid8bimo> db.getReplicationInfo() -> usedMB is increasing, should i wait for it to complete something?

[19:55:33] <kali> no reason to

[19:55:35] <GothAlice> omid8bimo: You shouldn't need to, no.

[19:55:43] <kali> you can go on

[19:56:19] <omid8bimo> all right

[19:56:23] <GothAlice> omid8bimo: Both boxen will now contain the same actual data, and when you downgrade the primary to expand its oplog, it'll be able to catch up from what has been populated of the secondary's oplog so far.

[20:20:33] <omid8bimo> GothAlice: kali: all right, things are working with new oplog, i will start the new secondary for a full resync now. thanks guys

[20:20:53] <kali> see ? easy as pie

[20:21:01] <GothAlice> :)

[20:22:03] <GothAlice> Also somewhat impressive that even with just two real data nodes (to start) you can do things like that with only extremely minor interruptions in service. (I.e. the time needed to elect the secondary as primary in the second step.)

[20:22:55] <kali> yeah. anyone who have had to restore an oracle backup before must be in awe

[20:23:05] <GothAlice> OFC, you've got degraded network performance for the next 24h while that new secondary populates…

[20:23:17] <GothAlice> kali: Ha!

[20:24:19] <arussel> I'm using reactive mongo to connect. I've run a $set command, I was hoping to do {$set: {"a.b": "x"}} with result: {"a": {"b":"x"}} but I end up with {"a.b":"x"}

[20:24:31] <arussel> any idea what the command I've run to get that result ?

[20:24:53] <arussel> I can't even do it with the mongo console

[20:25:09] <kali> forgot the $set ?

[20:25:29] <GothAlice> 'Cause that does look reasonably correct.

[20:26:24] <arussel> kali: no the set is there

[20:26:49] <GothAlice> A ha. You'll want to use $set to set deep fields.

[20:27:07] <GothAlice> arussel: http://cl.ly/image/293b0t420t1V — with $set, it works correctly for me.

[20:28:32] <davis> hello

[20:28:40] <Jonn> I cant run mongo or mongod

[20:28:50] <Jonn> All I did was brew install mongo

[20:28:59] <GothAlice> arussel: > db.deepsert.update({}, {"c.d": "x"}) -> "can't have . in field names [c.d] at src/mongo/shell/collection.js:155" — not even sure how you're managing to get {"a.b": "x"} at all.

[20:29:09] <Jonn> brew install mongodb that is

[20:29:29] <GothAlice> John: Have you added /usr/local/bin to your PATH environment variable? Try this: export PATH="$PATH:/usr/local/bin"; mongo --version

[20:29:30] <kali> GothAlice: this message comes from the shell. not all drivers are that safe

[20:29:40] <joannac> GothAlice: some driver does it

[20:29:59] <arussel> GothAlice: sorry, I can set deep fields and it works, but somehow I've run a command that, instead of setting deep field manage to set a "a.b" field. I'm trying to find out which is the command that did that.

[20:29:59] <davis> i am trying to migrate a database from one server to another, I found some .bson files on the older server which appear to be backups. however when I run mongoimport on the new server with thesee files it appears that it needs to take either .csv or .json files.

[20:30:17] <joannac> davis: bson -> mongorestore

[20:30:21] <GothAlice> arussel: Aaaah. Alas, good luck with that. :|

[20:30:26] <Jonn> Here is my error

[20:30:27] <Jonn> 2014-11-13T16:29:06.111-0400 warning: Failed to connect to 127.0.0.1:27017, reason: errno:61 Connection refused 2014-11-13T16:29:06.112-0400 Error: couldn't connect to server 127.0.0.1:27017 (127.0.0.1), connection attempt failed at src/mongo/shell/mongo.js:146

[20:30:37] <Jonn> when running mongo

[20:31:04] <kali> Jonn: mongod not running

[20:31:06] <GothAlice> Jonn: Hmm; ps aux | grep mongod — anything there? (Or showing up in Activity Monitor?)

[20:31:22] <davis> joannac: manythanks

[20:31:25] <GothAlice> John: brew info mongodb — also run this, it has some instructions at the end on how to register mongodb as an auto-start service.

[20:31:28] <Jonn> I run mongod

[20:31:30] <Jonn> and get this

[20:31:37] <Jonn> ERROR: dbpath (/data/db) does not exist. Create this directory or give existing directory in --dbpath. See http://dochub.mongodb.org/core/startingandstoppingmongo

[20:31:51] <Jonn> But I created the dir

[20:31:57] <GothAlice> John: mongod, when run manually, has no configuration at all.

[20:31:58] <Jonn> in root and in my project dir

[20:32:25] <joannac> Jonn: ls -la /data/db

[20:32:27] <GothAlice> John: I seriously recommend running "brew info mongoldb" and following those instructions after updating /usr/local/etc/mongod.conf

[20:33:10] <joannac> nope

[20:33:24] <cheeser> Jonn not John

[20:33:24] <joannac> Jonn is his name; no 'h'

[20:33:37] <joannac> g'morning cheeser

[20:33:37] <GothAlice> XD That explains a few things!

[20:34:10] <GothAlice> Jonn: If you follow home-brew's instructions here, your data will go into /usr/local/var/mongodb, and permissions should already be correct.

[20:34:23] <Jonn> im here

[20:35:16] <Jonn> ls /usr/local/var/mongodb gives nothing in there

[20:35:17] <arussel> kali: any idea what command can go from {"a":"b"} to {"a":"c", "x.y":"z"} ?

[20:35:57] <GothAlice> Jonn: The key is that the directory will be created automatically with the correct permissions; your user should already own /usr/local. https://gist.github.com/amcgregor/23e0e36f80122223d565 is my /usr/local/etc/mongod.conf, FYI.

[20:36:19] <GothAlice> ("brew doctor" to confirm /usr/local is set up correctly)

[20:36:51] <Jonn> ill try that

[20:37:00] <Jonn> so the project needs a .conf file for mongo?

[20:37:12] <GothAlice> Jonn: All of that boils down to running "brew info mongodb" and following the instructions given.

[20:37:31] <GothAlice> (These instructions would have been printed out after the initial "brew install mongodb" command, too.)

[20:38:17] <GothAlice> And yes; those instructions require configuration via the mongod.conf file… unless you're particularly happy editing launchd .plist files.

[20:38:30] <Jonn> Im hella confused

[20:38:39] <GothAlice> Jonn: brew info mongodb

[20:38:44] <Jonn> did that

[20:38:50] <Jonn> no instructions

[20:39:09] <Jonn> dependencies, options, caveats

[20:39:19] <GothAlice> Ugh; they restructured it. Caveats are it, but they removed the really useful line.

[20:39:20] <GothAlice> >_<

[20:39:35] <Jonn> No wonder we are on different pages

[20:39:37] <Jonn> :)

[20:39:54] <GothAlice> Jonn: cp /usr/local/Cellar/mongodb/*/homebrew.mxcl.mongodb.plist ~/Library/LaunchAgents/

[20:40:05] <GothAlice> Jonn: launchctl load ~/Library/LaunchAgents/homebrew.mxcl.mongodb.plist

[20:40:15] <GothAlice> (Do that last one *after* updating mongod.conf!)

[20:40:38] <GothAlice> (At least they kept that last line in the info output…)

[20:40:39] <Jonn> did those

[20:40:45] <GothAlice> ps aux | grep mongod

[20:40:58] <Jonn> its there!

[20:41:04] <GothAlice> Jonn: mongo test

[20:41:22] <Jonn> connecting!!!

[20:41:28] <Jonn> Sweet Jebus!

[20:41:51] <Jonn> so when I close my laptop, how do i restart it

[20:41:52] <Jonn> ?

[20:42:06] <GothAlice> You don't. launchd will manage everything for you.

[20:42:08] <GothAlice> :)

[20:42:11] <Jonn> nice

[20:42:20] <Jonn> so every mongo db is stored in that dir?

[20:42:28] <Jonn> from every project I use?

[20:42:43] <cheeser> at least those create with that homebrew setup, yes.

[20:42:46] <Jonn> the same server runs them all on that 27017 port?

[20:42:49] <GothAlice> Yes. Directory-per-db will allow you to symlink those into your project folders if you want.

[20:42:55] <GothAlice> (Launchd can even be configured to only start mongodb when a process tries to connect to it, but that's Very Advanced Technique™.)

[20:43:20] <Jonn> got it

[20:43:22] <GothAlice> I leave my data in the data directory, though.

[20:43:32] <Jonn> Wow. Alice, thanks so much! You are awesome!

[20:43:37] <GothAlice> (And exclude "UNIX Files" from my Time Machine backups!)

[20:43:42] <Jonn> lol

[20:43:46] <Jonn> I can see that

[20:43:51] <Jonn> would be a big one

[20:43:54] <GothAlice> Really… you don't want Time Machine backing up your mongod stripes every hour…

[20:44:47] <cheeser> crashplan++

[20:45:02] <cheeser> though I only have /Users getting backed up

[20:45:21] <GothAlice> Jonn: However! My development environment isn't like my "development staging" environment… for that I use https://gist.github.com/amcgregor/c33da0d76350f7018875 (a BASH script of mine) to start up a complete sharded+replicated+authenticated mini-cluster on one machine. I do this because I need to be able to test and confirm correct behaviour of sharding indexes.

[20:46:05] <GothAlice> (And fault tolerance… after starting the script, I'll put my app under load and kill -9 mongod processes. :)

[20:47:24] <GothAlice> cheeser: My entire at-home dataset (25 TiB + random external drives) is backed up to Backblaze. It took three-four months for the initial backup!

[20:47:35] <cheeser> haha. i'll bet.

[20:47:44] <cheeser> those initial syncs are a bitch

[20:47:49] <GothAlice> And how.

[20:48:29] <cheeser> i have a local 5TB drive that I crashplan to as well as The Cloud

[20:48:38] <GothAlice> omid8bimo: (Be glad your sync process is ~24h… and not four months! ;)

[20:49:10] <omid8bimo> GothAlice: yeah! :D

[20:49:20] <Ontological> I'm having an issue where, if I try to execute commands via a script, I get an error. I can copy and paste the commands in, just fine. http://pastie.org/private/uvaylz54v4lbvez5t5jzg It yells and says "can't have . in field names [one.two.three]" Any suggestions?

[20:52:59] <GothAlice> Ontological: Works for me. https://gist.github.com/amcgregor/2cec36435d929cdca036

[20:53:55] <Ontological> If you copy what I wrote into an bash script and execute it, it will fail. That's my problem.

[20:54:15] <GothAlice> What you wrote doesn't do anything.

[20:54:29] <GothAlice> Er, waitaminute.

[20:54:37] <Ontological> http://pastie.org/private/yuuhorbgrzrg32c2sm6evg GothAlice

[20:54:54] <GothAlice> XD

[20:54:59] <Ontological> :P

[20:55:07] <GothAlice> My brain is clearly queued to sh-bang lines.

[20:55:15] <Ontological> So, that's why I'm super confused

[20:56:14] <Ontological> Let me know if it works for you as a script, though

[20:56:44] <GothAlice> The problem is actually quite obvious. Replace "mongo" in your script with "cat" and re-run to see.

[20:57:34] <GothAlice> (Obvious primarily in hindsight if you weren't aware that variables are expanded even in string blocks.)

[20:57:44] <Ontological> Yeah, but I'e also escaped the $ and it didnt work

[20:58:16] <GothAlice> Ontological: http://cl.ly/image/3d3f2q3c1n0o works for me. :)

[20:59:01] <Ontological> http://pastie.org/private/7ku9cayoyrfk6xegqldpa Is *not* working for me on 2.6.4.

[20:59:39] <Ontological> Well, I guess it's the SSH, then, if that works for you

[20:59:47] <GothAlice> … ssh?

[21:00:19] <Ontological> http://pastie.org/private/ej9kqs3srsla43cyletgw GothAlice

[21:00:38] <Ontological> Thanks for helping troubleshoot. At least I have an idea that it's most likely not mongo itself, but I'm still really confused

[21:00:54] <GothAlice> Hum… mother of god, you're nesting those blocks.

[21:01:32] <GothAlice> "ssh -t git@staging.server.com" — GIT? Also, "ssh -t git@staging.server.com mongo << EOF MONGO" — you can pass commands for SSH to execute…

[21:01:51] <Ontological> Well, this isn't the only command I'm running

[21:02:19] <Ontological> I was trying to improve my bash-ness, but you've discovered I've got some limitations :P

[21:03:00] <GothAlice> Ontological: http://cl.ly/image/2F383V3S2q3i — this approach does, however, work. A better approach in this instance would be to store the script on the other side, and just remotely execute it. ssh example.com ~/bin/process.sh

[21:03:22] <GothAlice> (Possibly with a scp to get it there initially.)

[21:03:27] <Ontological> But then I don't learn where I'm failing and the limitation of bash/ssh/mongo that I've discovered

[21:03:52] <GothAlice> (Nesting block strings is Danger Zone™… since you'll have to escape the escaping of the escaped $. ;)

[21:04:22] <GothAlice> \\\$ = madness

[21:04:32] <Ontological> HAHA

[21:04:34] <Ontological> That worked <#

[21:04:35] <Ontological> <3

[21:04:47] <GothAlice> … really, don't do that, though. Maintainability will suffer.

[21:05:23] <Ontological> I do like your most recent example

[21:05:59] <GothAlice> Ontological: You can even create a secure user that can only ever execute that one command, as an example.

[21:06:24] <GothAlice> (No interactive execution allowed, only one command allowed.)

[21:06:32] <Ontological> I'll just have to disconnect and then re-connect to the server, which is less efficient, but I see your point

[21:07:21] <GothAlice> Ontological: You can get creative with interactive SSH sessions wrapped in screen—you can send commands for screen to emit into the remote terminal. (This will also result in madness, but it does technically work.)

[21:09:39] <Ontological> Okay, I'm taking your advice about not double-nesting, but I *really* appreciate you helping me understand the fix, because it helps me gain a better understanding of what's going on and how I failed.

[21:10:29] <GothAlice> \\\$ \\\\$ \\\\$ \\\\\\$ \\\\$ \\\\\\$ — at a glance, which of those is valid? ;^)

[21:10:53] <Ontological> >_<

[21:10:58] <GothAlice> You see the problem. ;P

[21:11:11] <Ontological> Well, I know that the first one is valid in my current situation, but again, that depends on the situation

[21:21:57] <Mmike> If I have a mongodb server with bunch of data on it, may I just do rs.initiate() to start building replicaset of it?

[21:22:04] <Mmike> is my data going to survive that?

[21:22:58] <GothAlice> Mmike: http://docs.mongodb.org/manual/tutorial/convert-standalone-to-replica-set/

[21:23:18] <GothAlice> You shouldn't lose data.

[21:24:54] <Mmike> GothAlice, ack, thnx.

[22:14:39] <GothAlice> Ontological: If you're trying to improve your BASH-fu, https://gist.github.com/amcgregor/c33da0d76350f7018875 might be worth a gander. It does some nifty things. (Note line 62…)

[22:27:30] <BlakeRG> Anyone know why i am getting this error with yum install? http://pastebin.com/raw.php?i=WBM2NP1B

[22:29:57] <Ontological> I'm not on RHEL, but, does 10gen have a repo for RHEL and have you tried that?

[22:31:39] <Ontological> BlakeRG: ^

[22:38:01] <BlakeRG> Ontological: it's already installed, i believe it's trying to upgrade and giving these errors

[22:38:03] <BlakeRG> which is odd

[22:40:09] <Ontological> Thanks again GothAlice

[22:42:23] <ndb> jjkjjk

[22:42:24] <ndb> l:q!

[22:43:08] <Ontological> >_M

[22:46:06] <bpap> is there a way to force MMS monitoring agent to poll db stats once per minute? right now it polls once every 3 minutes, and I don't see any option to set that.

Log file Viewer

Help | Karma | Search:

#mongodb logs for Thursday the 13th of November, 2014