pmxbot IRC Log Viewer

[00:00:00] <Bajix> It seems like at some point, this would require the equivalent of what I was accomplishing via my map function earlier

[00:01:46] <GothAlice> Bajix: Given start_period = timestamp(start_time) - timestamp(start_time) % period_seconds and n_periods = ceil((end_time - start_time) / period_seconds), all_periods = [start_period] + [(start_period + period * period_seconds) for period in range(n_periods)]

[00:01:59] <GothAlice> The trick is in the amplification. Aggregates only amplify on $unwind. :(

[00:02:01] <Bajix> Math.floor((B - A) / ( 60 * 60 * 1000))

[00:02:40] <uuanton> @GothAlice is there a way then to drop local database and then "copy/sync" new oplog to secondaries ?

[00:03:41] <Bajix> Not seeing how we'd do for periods in range either

[00:04:45] <GothAlice> uuanton: Why are you thinking to copy an oplog around?

[00:04:52] <GothAlice> uuanton: That's… never a good idea.

[00:06:05] <uuanton> I mean secondaries resync from scratch even tho they have same data as primary from snapshot

[00:06:45] <GothAlice> uuanton: Based on an _old snapshot, outside the oplog time range_. So that behaviour is completely correct.

[00:06:48] <uuanton> but because i drop local database with production oplog and replset settings there are no common points in oplogs

[00:07:46] <GothAlice> uuanton: Which is why I recommended the approach of spinning up a restore of the snapshot as a brand new replica set instead of attempting to preserve the old configuration.

[00:08:39] <GothAlice> "Basically, restore a single member, convince it it's not a replica set, re set up the replica set, optionally with a fresh snapshot/backup to restore to the new secondaries. That's the process, and the last link there outlines how to force it if it won't cooperate. ;)"

[00:09:38] <GothAlice> The key factor of not having the new secondaries sync the first time is creating a _new_ snapshot, with current oplog time, which hopefully won't have expired by the time it's transferred and restored on the new secondaries.

[00:10:28] <GothAlice> If a new snapshot isn't an option, then syncing the first time must be.

[00:10:41] <yoofoo> Hello

[00:14:59] <GothAlice> Bajix: I'll have to ponder this. This, BTW, is the huge reason I don't ever store multiperiod data which requires further processing; I can highly recommend investigating pre-aggregation for this statistic in the future. Switching to pre-aggregation may also work as a solution, here, but it's a bit more heavyweight than crafting a cunning query or aggregate.

[00:15:30] <Bajix> Well, there could be multiple period durations as well

[00:15:47] <Bajix> This was an after the fact request

[00:15:58] <Bajix> Client troll status

[00:16:34] <GothAlice> (Basically; any time the user performs an event which updates their lastActive, also atomically increment a user count and $addToSet the user's identifier to a pre-aggregate record for the "current period", with a query to not increment if they're already in the set.)

[00:16:51] <GothAlice> Pre-aggregation like that is how we do this activity tracking at work.

[00:16:56] <GothAlice> (And click tracking, and…)

[00:17:53] <GothAlice> http://www.devsmash.com/blog/mongodb-ad-hoc-analytics-aggregation-framework being the article I generally link to introduce this approach; it covers several approaches for storing event data including some backing numbers for storage space, indexes, and different query performance.

[00:17:55] <Bajix> So, basically $addToSet the rounded down hour with every update

[00:18:23] <GothAlice> https://gist.github.com/amcgregor/1ca13e5a74b2ac318017#file-eek-py-L29-L47 < this part of my aggregate example code

[00:18:39] <GothAlice> (Snapped to hours, in that example.)

[00:19:06] <Bajix> It doesn't look possible from what I can tell to even use push twice per document

[00:19:26] <GothAlice> Taking your existing data and processing it to populate that pre-aggregate collection can be done client-side and as inefficiently as desired. (Since it only has to happen once, when first implementing this approach.)

[00:19:56] <GothAlice> Nope. The amplification problem; $unwind is the only thing that creates variable amounts of additional data to process, everything else preserves or reduces.

[00:20:00] <Bajix> From what I can tell, $push & $addToSet would be the only options for building an array that can be unwound, neither of which can push multiples

[00:20:23] <Bajix> Yea

[00:20:29] <GothAlice> Correct; sorry if I wasn't clear, "pre-aggregation" is wholly unrelated to crafting a perfect aggregate query.

[00:20:36] <Bajix> So it wouldn't actually be possible without having data pre-seeded to $unwind against

[00:20:47] <GothAlice> It's a completely different solution to the problem that ensures it isn't a problem in the future. ;)

[00:20:55] <GothAlice> Correct.

[00:21:10] <yoofoo> I'm new at mongodb. We are looking at mongodb for an MLM company. I need to know if it is a good choice or not. Most MLM use SQL for the relational/hierarchical nature of MLM. However, mongodb seems to provide more flexibility and performance while also offer supports for relational/ hierarchical. can I achieve the middle ground with mongodb and come out ahead at the end? Please advise.

[00:21:11] <Bajix> That's sort of what I thought going into this

[00:21:49] <GothAlice> yoofoo: MLM describes a graph; I'd recommend using a real graph database to store the connections. For other data storage needs, though, MongoDB can be great. (Even SQL doesn't do graphs well.)

[00:21:51] <Bajix> But it does seem like aggregate is so much more performant as to warrant pre-seeding

[00:22:26] <Bajix> Can $currentDate do current hour at all?

[00:22:41] <GothAlice> Bajix: Pre-aggregation is the "right way" to seed this data, in all likelihood. It'd also let you have one place to update all user-event-in-time-period related statistics.

[00:22:44] <Bajix> If I wanted to do something like addToSet + currentDate rounded down

[00:23:32] <Bajix> GothAlice: Makes sense, thanks

[00:23:35] <yoofoo> GothAlice, Thx. what would recommend for the graph db?

[00:24:27] <GothAlice> Bajix: The expression fed to $addToSet can be as complicated as you need, but I'm still not seeing where you're really going with that. If doctoring up the original event data is what you're thinking, I'd recommend keeping the precalculated data separate. I.e. I store hits in db.hits, but only keep N days of individual events for auditing purposes. But db.analytics are kept forever.

[00:24:39] <GothAlice> yoofoo: Neo4J is what we use at work; it's Java, but good people.

[00:25:26] <GothAlice> Certain questions about your data, i.e. the six-degrees problem, can only really be answered in a sane way by a real graph DB. ;)

[00:25:40] <Bajix> I was just going to add this to my periodic updates to atomically add the current hour, rounded down

[00:25:57] <GothAlice> Bajix: To every user actively logged in? That doesn't sound very scalable. ;)

[00:26:21] <GothAlice> Pre-aggregation is strongly (not "all"… not quite) about scaling your analytics.

[00:27:32] <GothAlice> No matter how many tracked clicks we get at work, there will only ever be one pre-aggregate record per time period per job we're tracking. Querying a two-week time-frame for a job will _never_ query more than 336 records, period, making aggregation over the pre-aggregated data for report display a constant-time affair.

[00:28:32] <Bajix> GothAlice: It would be to their Tokens, which just contain session data, so we wouldn't be doing insane things to the user collection

[00:28:55] <GothAlice> Mixing of concerns, though. Makes my skin creep. ;P

[00:29:05] <Bajix> How so?

[00:29:13] <GothAlice> Data vs. instrumentation of that data.

[00:30:26] <GothAlice> The query time and scalability of including it in the data itself is dependant on the popularity of your site/tool/service. The more users, the slower your reporting will get, and if you go the $unwind approach, it's geometrically worse over time, not linearly.

[00:30:41] <GothAlice> :)

[00:32:09] <GothAlice> Pre-aggregated, identifying the number of active users for any single period (i.e. right now, for a dashboard counter thingy) is O(1). A single exact record fetch on the (hopefully indexed) time period.

[00:32:34] <GothAlice> Not pre-aggregated, it's O(active-user-count).

[00:34:00] <Bajix> The token's storing only how they logged in, how long their session is, whether they're a guest user, and what websocket channel they're using

[00:34:21] <Bajix> It's not the "session data" itself

[00:34:23] <GothAlice> Bajix: But you do see that the number of records being evaluated grows unbounded?

[00:34:48] <GothAlice> Re-phrasing that in English: the more active your site, the more data that needs to be ploughed through to give you an answer to that analytic question.

[00:35:10] <Bajix> How would you get around that though?

[00:35:14] <GothAlice> Pre-aggregation.

[00:36:08] <GothAlice> "Number of active users right now?" becomes db.activity.find({period: utcnow().replace(minute=0, second=0, microsecond=0)}, {total: 1}).first().total

[00:36:47] <GothAlice> "Active users in the last 24 hours?" becomes an aggregate matching the time range of periods $sum'ing total, with the number of records evaluated being… exactly 24.

[00:36:49] <Bajix> Well I have that atomically updated on the websocket channel

[00:36:52] <GothAlice> (And always exactly 24. ;)

[00:38:08] <Bajix> So basically you think I should be atomically updating a doc for the current time period

[00:38:14] <GothAlice> Aye.

[00:38:37] <GothAlice> With the $addToSet and query to ensure you don't add the same user to a single period multiple times.

[00:38:55] <GothAlice> For analytics work, upserts are your friend.

[00:38:57] <Bajix> Well at that point I'd just switch off of MongoDB and use hyperloglogs in redis

[00:39:52] <Bajix> Sure

[00:39:55] <GothAlice> Your approach of bundling time periods into each user's login record would technically… work…

[00:40:02] <Bajix> I need the Tokens regardless for other reasons though

[00:40:12] <Bajix> Such as producing average session durations within a time span

[00:40:24] <Bajix> or comparing login method percentiles

[00:40:57] <GothAlice> Well, trick is, you can fire off the pre-aggregate upsert with a writeConcern of None. It can be re-built (worst case, this has never happened to us but we test the rebuild process anyway ;) from the token data as needed, the result is never used anywhere, etc.

[00:41:18] <GothAlice> So where your user data is required for each request, reading the count of active users… probably won't be.

[00:41:33] <Bajix> I always had the impression though that it would be more preferable to have more docs as opposed to using addToSet and having massive sub documents

[00:41:54] <GothAlice> (Also, for the… scope of the potential query here, running a separate query to read a single integer out of the DB is going to be way, way more performant than running a whole aggregate or map/reduce!)

[00:43:04] <GothAlice> {_id: <date>, total: 0, users: []} — that'd be an example empty pre-aggregate record.

[00:43:19] <Bajix> It's a little more complicated than that

[00:43:35] <GothAlice> For active user count? Oh, right, you are tracking both active and logged in but inactive.

[00:43:36] <Bajix> The tokens build in idempotency of the session duration updating

[00:44:13] <GothAlice> I care not for your tokens; the problem revolves around statistics pulled from their creation and last modification dates, the rest is unimportant. The pre-aggregated records are stored in their own collection.

[00:44:17] <Bajix> They could have multiple open web socket connections, and without tokens to maintain a cursor I wouldn't be able to do idempotent updating

[00:44:28] <Bajix> ok

[00:44:28] <Bajix> fiar

[00:44:30] <GothAlice> Sure. Keep the tokens for that tracking purpose.

[00:44:33] <GothAlice> {_id: <date>, active: 0, inactive: 0, users: []}

[00:44:35] <GothAlice> :P

[00:45:25] <Bajix> So, wouldn't 5 million users in that array be much worse than 5 million docs?

[00:45:52] <Bajix> That would quickily hit the doc size limit

[00:46:38] <Bajix> And I'd be trading a large number of small unwinds for a small number of massive unwinds

[00:47:09] <GothAlice> Depends; if you application code is smart enough, your analytic record doesn't need to track that.

[00:47:17] <Bajix> Any aggregation step after the $unwind is going to be equivalent

[00:47:31] <Bajix> in terms of time complexity

[00:48:04] <GothAlice> However, the 16MB document limit gives… 1.29 million user ObjectIds.

[00:48:09] <GothAlice> Active per time period.

[00:48:33] <GothAlice> Pre-aggregated requires no $unwind.

[00:49:16] <GothAlice> The user ID tracking is solely for the purpose of preventing double-accounting if the application pings back too frequently; I'm not sure how your application exactly tracks user activity, so I go with the safer approach out of the box. Obv. customize any and all of this to fit!

[00:49:16] <Bajix> Oh right derp

[00:49:32] <Bajix> It's using websocket heartbeats

[00:49:39] <Bajix> So double accounting would be an issue

[00:49:40] <GothAlice> Also 1.29 million users should be my problem. Heh.

[00:49:50] <GothAlice> If only. ;P

[00:50:04] <Bajix> What're you working on?

[00:50:23] <GothAlice> HR job distribution, applicant tracking, and career site CMS platform(s).

[00:50:40] <GothAlice> Basically a giant workflow machine.

[00:51:17] <GothAlice> We had just a few records shy of one million INFO level events in the month of January, for a scale idea.

[00:52:24] <Bajix> Ahh

[00:52:35] <Bajix> Shiit

[00:52:41] <GothAlice> (Used by companies such as the CBC, Rona, National Bank, Ultima foods, Sobeys, …)

[00:53:08] <GothAlice> I can say that because they OK'd including their logo in our marketing material. ^_^;

[00:53:24] <GothAlice> >_> <_<

[00:53:54] <Bajix> NDA's = the worst

[00:53:59] <Bajix> I've got some super weird ones as well

[00:54:21] <GothAlice> My favourite NDAs are the ones that include the NDA. So secret you can't admit there's a secret. ¬_¬

[00:54:39] <Bajix> Ah the gag orders

[00:55:08] <Bajix> One of my recent jobs had that... I couldn't talk about the fact that a termination agreement existed

[00:56:02] <GothAlice> Well, that's often to ensure other investors in a project don't get butthurt over each other's negotiated bonuses. ;)

[00:56:42] <GothAlice> Similar weird thing at work with job postings that don't actually mention a company, because the company can't admit they're looking to fill the position. Usually because they currently have someone in the position, and their contract might mention such behaviour…

[00:56:50] <GothAlice> HR is almost as weird as some NDAs.

[00:58:47] <Bajix> Left me real sour... I built a pipeline that accepts JSON/CSV as inputs... the director of engineering wanted me to parse JSON, convert it to CSV, then parse the CSV.... you can probably understand why I told him no

[00:59:08] <GothAlice> I'm glad I wasn't drinking anything when I read that.

[00:59:13] <GothAlice> Or I'd need a new keyboard.

[00:59:42] <GothAlice> I often have to explain patiently why I can't just give staff dumps of unmodified data as Excel workbooks. ¬_¬

[00:59:58] <GothAlice> I always have to ask, no, really, what exactly do you need from this data?

[01:00:32] <Bajix> This was a company in which I was the first engineer at, in which I had architected everything

[01:01:04] <Bajix> Then they got funding, and decided to hire dinosuars that didn't know the technology to manage things...

[01:01:18] <StephenLynx> more money more problems

[01:01:32] <GothAlice> See also: the mythical man month.

[01:01:47] <Waheedi> I have a replicates with 4 nodes, primary node 1 and 2,3,4 are secondaries

[01:02:35] <Bajix> Yea. It set them back 10 months by firing me, and cost many millions

[01:02:56] <Waheedi> 4 was added recently from 1, when I do rs.status() from 1,2,3 I can see 1,2,3,4 and they are syncing fine, but when I check from 4 It seems its stuck somewhere

[01:03:16] <GothAlice> Waheedi: what do the logs on that node say?

[01:03:17] <Waheedi> from the log of node 4 replSet info Couldn't load config yet. Sleeping 20sec and will try again

[01:03:21] <Bajix> Interesting book

[01:03:34] <GothAlice> Waheedi: Check your DNS; make sure that node can successfully resolve all of the others.

[01:03:47] <Waheedi> i talented from everywhere back and forth

[01:03:53] <Waheedi> all working

[01:03:57] <Bajix> Are you GothAlice on twitter?

[01:04:00] <GothAlice> To the names given back by rs.status()

[01:04:02] <GothAlice> Bajix: Aye.

[01:04:14] <Waheedi> yes GothAlice

[01:04:14] <Bajix> I'm twitter creeping you, just FWI

[01:04:17] <GothAlice> Bajix: For the most part that name is me everywhere, except on YouTube. *shakes a fist* There I'm TheRealGothAlice. ;)

[01:04:35] <Bajix> I'm either Jester831, Bajix or Bajix831

[01:05:15] <GothAlice> Waheedi: Could you gist the rs.status() from a working secondary and the non-working one? Also, is authentication enabled on this cluster?

[01:05:28] <Waheedi> no

[01:05:38] <Waheedi> i will gist the rs.status() now

[01:05:38] <GothAlice> Good. Eliminates that as a problem. :)

[01:05:48] <Waheedi> :)

[01:07:14] <Waheedi> GothAlice https://gist.github.com/dakwak/8bf44383cea79f9d6510

[01:07:33] <Waheedi> so the node I'm referring to should be infra-mongo6

[01:08:26] <Waheedi> https://gist.github.com/dakwak/8e244d5f4218c49247b8

[01:08:48] <joannac> what's your evidence that it's stuck?

[01:08:49] <GothAlice> Indeed, that node things mongo6 is very much alive.

[01:08:57] <GothAlice> *thinks

[01:09:04] <Waheedi> the second gist is for infra-mongo6

[01:09:24] <Waheedi> no evidence at all joannac :)

[01:09:27] <GothAlice> Waheedi: Is there any periodic status information in the log?

[01:09:31] <GothAlice> Like, actual on-disk log?

[01:09:48] <Waheedi> other than replSet info Couldn't load config yet. Sleeping 20sec and will try again.

[01:09:52] <joannac> Waheedi: can I have evidence that infra-mongo6 can connect to the others?

[01:10:06] <Waheedi> i can show you telnet gist

[01:10:18] <joannac> along with hostname and hostname -f output please

[01:10:24] <Waheedi> or share my screen on Skype :)

[01:10:27] <Waheedi> lol

[01:10:29] <joannac> I don't care about telnet. I want mongo shell output

[01:10:30] <Waheedi> ok

[01:10:48] <GothAlice> Hmm; confined DNS earlier, but didn't think to check the hostname actually matched. Good call, joannac.

[01:11:12] <GothAlice> *confirmed. Can't type today. Also hey-o, joannac. :) Welcome back to the jungle.

[01:11:19] <Bajix> GothAlice: Thanks for the help. Stepping out

[01:11:38] <GothAlice> Bajix: Have a great one, and I'd love to know how you get your stats issue resolved.

[01:11:51] <Bajix> Added you on FB

[01:11:56] <Bajix> I'll let you know

[01:12:18] <Waheedi> so it seems infra-mongo6 can't connect to itself and that might be the problem

[01:12:24] <Waheedi> it connects to other nodes and others connect to it

[01:13:37] <Waheedi> thank you for the help guys

[01:19:05] <Waheedi> ok now this is what I see from infra-mongo6 now, https://gist.github.com/dakwak/6ac1d41e1eb0aac8dea7

[01:19:35] <Waheedi> while on other nodes they still can see it fine

[01:19:49] <Waheedi> primary has in its configuration too

[01:19:53] <Waheedi> do i need to reconfig?

[01:20:37] <Waheedi> btw now dakwak-mongo6 should talk to itself using 127.0.0.1 but other nodes are talking to it using its public static ip

[01:21:48] <Waheedi> please let me know if it needs to talk to itself using its public ip :)

[01:28:24] <Waheedi> its working now

[01:28:26] <Waheedi> it was my bad

[01:29:00] <Waheedi> and infra-mongo6 is going to elect for primary now :)

[01:42:43] <Freman> wow, mongo clients aren't very consistent... golang/mongo (cli) send query{} but php uses $query{}

[01:47:57] <cheeser> what is the "golang/mongo (cli)" ?

[02:01:40] <GothAlice> wat

[02:02:56] <GothAlice> Freman: You do realize that Go and PHP define variables in very different ways? Google is also not informing me as to what golang/mongo (cli) actually is, or where it comes from.

[02:03:34] <Freman> no, I mean in the encoded bson

[02:04:28] <GothAlice> If whatever that is differed in any way on how it encodes the BSON specification, it wouldn't be able to communicate. BSON is also binary, not ASCII, so… uh… not pasteable into IRC, generally.

[02:04:50] <GothAlice> The "wat" stands. ;P

[02:05:43] <Freman> 2016-02-03_01:19:34.54861 Name: (string) (len=5) "query" - vs - 2016-02-03_01:17:13.81067 Name: (string) (len=6) "$query"

[02:06:12] <Freman> same bit of code generates both, it's just mgo/bson unmarshaller so the difference is the input

[02:06:55] <GothAlice> Okay. Is there a question?

[02:07:05] <cheeser> we're going to need to see both code snippets

[02:07:09] <GothAlice> And how.

[02:07:35] <GothAlice> Since BSON is, as mentioned, a ratified standard with pretty good library coverage, the immediate culprit that jumps to mind has to be user error.

[02:08:53] <Freman> it's the way the libraries are putting their queries together

[02:09:19] <Freman> it's not mongo's fault except that the mongo cli uses "query" but the documentation for the wire protocol is "$query"

[02:09:24] <GothAlice> Also link the library? Still can't find anything calling itself golang/mongo involving a command line interface.

[02:11:08] <Freman> no, the cli that comes with mongo, and the mgo golang library for mongo

[02:11:37] <Freman> http://pastebin.com/Xsv7HDTU is the code echoing that and https://github.com/facebookgo/dvara/blob/master/protocol.go is where readDocument comes from

[02:11:51] <Freman> the document on the wire varies from library to library

[02:12:02] <Freman> (that's better, "on the wire")

[02:12:59] <Freman> not a serious complaaint, I'm just amazed mongo works for everyone

[02:13:14] <Freman> and annoyed that I had to modify my code to work for both

[02:16:27] <cheeser> i think you're conflating things here, Freman

[02:17:27] <cheeser> when you say 'the mongo cli uses "query",' what are you referring to?

[02:23:06] <Freman> gimme a second, I'll do this better

[02:24:20] <yoofoo> When using mongoDB, what is the recommended guideline/practice for mapping the mongodb data using graphDB? For example, leave the data in mongodb flat (with no relationship) and use graphdb to map all the relationships? or what's the guideline? please advise.

[02:27:51] <yoofoo> let me rephrase my last question

[02:29:33] <yoofoo> in MLM, when using graphDB+mongoDB, what is the recommended guideline/practice for mapping the mongodb data using graphDB? For example, leave the data in mongodb flat (with no relationships) and use graphdb to map all the relationships? or what's the guideline? please advise.

[02:36:23] <Freman> This is the on the wire bson (https://docs.mongodb.org/v3.0/reference/mongodb-wire-protocol/#op-query) for the query from php http://pastebin.com/beuNTRRY and the same query from mongo cli http://pastebin.com/bAesaJC0

[02:43:28] <cheeser> i think what you're seeing there is the difference in how drivers submit queries to the server. the command format changed on the server end.

[02:43:33] <joannac> Freman: And? Why do you care?

[02:43:43] <cheeser> these details are largely irrelevant to driver users

[02:48:57] <cheeser> been there... :)

[02:49:09] <cheeser> also, make sure you test getters and setters.

[03:23:29] <Freman> cos I'm not a driver user, I'm an on-the-wire proxy

[03:24:04] <Freman> I don't actually care now that I'm aware of it, I can just strip the $

[03:24:07] <Freman> or add it

[03:33:15] <cheeser> i think you should take the time to understand the differences

[03:33:21] <cheeser> i really don't think it's that simple

[03:33:27] <cheeser> but, it's your code.

[03:34:08] <Freman> looks that simple, mongo still answers both forms, I'm just making sure the queries use indexes, worst case scenario I fail to parse and they get told to check their query again

[03:34:23] <Freman> I'm not changing anything

[06:35:59] <bz-> hi, how can i convert the bson id (objectid) to string to have it available in my data set returned into node

[10:28:21] <noncom> hi

[10:28:31] <noncom> i have this strange picture in my db folder: http://joxi.ru/J2be6W4s4w1RMm

[10:28:47] <noncom> the nb-crm and local databases look like they are the same files of the same size...

[10:28:49] <noncom> why is that?

[10:29:14] <noncom> i am rather new to mongo, so maybe i don't know something essential, but it'd be nice if someone explained, what does "local" actually mean then?

[10:31:40] <noncom> ah, it's rally rather basic... found an answer on SO that it's an oplog to restore back in time if i need

[10:32:56] <CustosLimen> hi

[10:33:01] <CustosLimen> so I'm setting up mongo on rhel

[10:33:26] <CustosLimen> and it installs only one control script - /etc/init.d/mongo

[10:33:45] <CustosLimen> I want arbiter and mongos and config server and data server

[10:33:50] <CustosLimen> so I should copy this ?

[11:00:21] <amcsi_work> Does the direction of an index for a field only matter if it's a compound index?

[11:07:38] <Lope> echo never > /sys/kernel/mm/transparent_hugepage/enabled && cat /sys/kernel/mm/transparent_hugepage/enabled shows me "always madvise [never]"

[11:10:40] <CustosLimen> Lope, and ?

[11:10:57] <CustosLimen> Lope, mine shows "[always] madvise never"

[11:11:04] <Lope> well that's not "never" and mongogb still complains that it's not "never:

[11:11:13] <CustosLimen> I see

[11:11:25] <CustosLimen> Lope, can you pastebin the complaint ?

[11:12:20] <Lope> hmm, I'm going to try reboot. Cos I tried to paste a multi line script in my terminal and it might have written some weird chars into that location. so just incase I corrupted something in the kernel I'm gonna reboot.

[11:12:27] <Lope> To see if it's still an issue.

[11:12:42] <Lope> brb

[11:27:09] <Lope> same siuation: echo never > /sys/kernel/mm/transparent_hugepage/enabled && cat /sys/kernel/mm/transparent_hugepage/enabled shows me "always madvise [never]"

[11:28:37] <Lope> oh, but mongodb is not complaining anymore. apparently the output I'm getting now is acceptable for it.

[11:49:04] <torak> hello everyone. I am trying to create a backend with mongo for my android app. I am using node.js for coding but how can i upload the node scripts to the server? Is it safe to use ftp? Or any other better way for that?

[11:53:30] <StephenLynx> that is just a linux question.

[11:53:44] <StephenLynx> unrelated to both mongo and node.

[12:45:06] <CustosLimen> chi

[12:57:05] <CustosLimen> on rhel, where mongodb installs /etc/init.d/mongod - how can I get multiple instances going ?

[12:57:14] <CustosLimen> or is there no specific mechanism ?

[13:26:48] <synthmeat> CustosLimen: you can just run "mongod" as any user, but be careful to run it with a new config so it doesn't wreck your existing database/logs/dirs

[13:27:40] <CustosLimen> synthmeat, not worried about user so much - but its ok - the documentation did say that for other processes (mongos, arbiter, config server, etc) I should just make new init script based on existing one

[13:28:32] <synthmeat> CustosLimen: it's more maintainable just to run it with a new config to me (for, say, development purposes). ymmv

[13:28:49] <CustosLimen> synthmeat, each instance will use its own config

[13:29:07] <CustosLimen> but they will all be on same server

[13:29:28] <synthmeat> CustosLimen: unrelated, but why would you want to do that in the first place?

[13:30:30] <synthmeat> (few use cases do come to mind, like using different storage engine or alike)

[13:31:40] <CustosLimen> synthmeat, http://i.imgur.com/ovMLnCk.png

[13:32:00] <CustosLimen> also in addition to that each server will have mongos process

[13:32:05] <synthmeat> ah, ok. never went web-scale myself :)

[13:32:36] <CustosLimen> so at max mongodb related process will be 4 on one server, but only one will be data carrying

[14:31:39] <Tomasso> why the following query that should calculate a simple average is always returning 0.000000 ? db.getCollection('Accuchecks').aggregate({$group: {_id: "$algorithm",accuracy: { $avg: "$difference.close"}}})

[14:32:36] <StephenLynx> what number did you expect?

[14:33:24] <StephenLynx> try just getting the sum and amount of documents.

[14:33:44] <Tomasso> StephenLynx: different than 0.. since almost all values from difference.close are different than 0

[14:33:56] <Tomasso> mmm Ok

[14:34:20] <Tomasso> $sum is 0 too

[14:34:33] <StephenLynx> v:

[14:34:42] <StephenLynx> maybe you are getting the wrong field?

[14:35:21] <StephenLynx> maybe the problem is on what you are using for _id?

[15:32:59] <Tomasso> Stupid question: how do you reference nested keys ? { hello : { num: 20} } should be "$hello.num" ?

[15:35:03] <cheeser> "hello.num"

[15:36:15] <StephenLynx> that depends on context.

[15:36:44] <StephenLynx> when you are indicating it on a query or a simple projection you use as cheeser posted.

[15:37:19] <StephenLynx> when you are referencing on an operation as the value of something and not the key, you must include the $.

[15:38:10] <StephenLynx> like {amount:"$aggregatedValue"} will output the value of aggregatedValue in the field amount

[16:26:58] <jfhbrook> I have some find(find).skip(skip).limit(limit) calls that don't appear to be dealing with a deterministic ordering of results, is this expected and/or what could cause this?

[16:27:29] <Tomasso> grgrg is something like $unwind but of objects and not arrays? i still get 0

[16:28:43] <StephenLynx> jfhbrook, if you are not sorting the results

[16:29:02] <StephenLynx> I don't think mongo makes an effort to give you results in a certain order

[16:29:34] <jfhbrook> StephenLynx: suppose they're sorted, but not on literally all fields, and some fields might have the same value

[16:29:46] <StephenLynx> again

[16:29:50] <StephenLynx> are you sorting them?

[16:30:04] <StephenLynx> your example doesn't have a sort.

[16:31:50] <jfhbrook> StephenLynx: yes, irl I'm calling a .sort

[16:32:07] <StephenLynx> then show me the full query.

[16:32:08] <jfhbrook> StephenLynx: that was a shorthand, not a copypasta

[16:32:13] <jfhbrook> I mean

[16:32:25] <jfhbrook> as long as I know that it's possible to get a nondeterministic ordering if the sort isn't absolute

[16:32:28] <jfhbrook> I'm good

[16:32:38] <jfhbrook> it's proprietary code, you understand

[16:32:44] <StephenLynx> what do you mean a nondeterministic ordering?

[16:33:19] <StephenLynx> m8, if you are not creative enough to show pieces of proprietary code without not actually revealing too much, you got bigger issues.

[16:34:19] <jfhbrook> I mean if I have 2 elements a and b, and a and b are equivalent based on my sorting algorithm, sometimes the order will be [a, b] and other times [b, a]

[16:35:12] <StephenLynx> i assume in that case they won't be moved.

[16:35:34] <StephenLynx> so it depends on the order they were in the first place.

[17:23:05] <julio> Hi! I'll have a system where my users can include your tasks. My category are: Morning, Afternoon, Evening and Night. Each user can include your task in a category, example: morning. User1 - Morning: clean the bedroom, wash clothes. Afternoon: make lunch, dry clothes, evening: study for exam, prepare dinner. Night: watch tv, go to sleep. Which is the best way to store this data. One single Database with collections separeted by ID or one collection for each

[17:23:05] <julio> user? I'll need to find some informations. Day X, how many task users Y did. Day X, how many time user Y clean the bathroom...

[17:24:12] <StephenLynx> a single collection for all users.

[17:24:24] <StephenLynx> having dynamic table/collection generation is one of the worst mistakes you can make.

[17:24:34] <julio> oh, ok

[17:24:56] <julio> Can I create a sketch of my collection to you help me (give me your opinion) ?

[17:27:16] <StephenLynx> k

[17:43:02] <julio> StephenLynx, http://pastebin.com/tM6Dw7cK

[17:43:10] <julio> so so like this?

[17:43:27] <StephenLynx> looks good.

[17:43:38] <StephenLynx> except for

[17:43:43] <StephenLynx> why is period an array?

[17:44:13] <julio> yes, the tasks could be made in the morning, afternoon, evening or night

[17:44:16] <StephenLynx> address don't need to be an object either?

[17:44:27] <StephenLynx> yeah, but that is kind of an enum

[17:44:32] <StephenLynx> it will be only one of those 4

[17:44:37] <julio> is it more fast if not a object?

[17:44:58] <StephenLynx> its easier to manipulate and has less overhead.

[17:45:15] <StephenLynx> plus the design doesn't make sense to have it as as sub document

[17:45:25] <julio> ok ok

[17:45:27] <julio> thank you

[17:46:27] <julio> what about indexes, where is the best... _id of users collection and user_id and date on tasks? I'll need to find some task made by user X in day Y...

[17:46:32] <julio> how many tasks, etc

[17:48:23] <StephenLynx> _id is always an unique index

[17:48:29] <StephenLynx> so you don't have to worry about that.

[17:48:48] <StephenLynx> now, having an index on the tasks collection on the user that owns it is a good idea.

[17:49:11] <StephenLynx> since it seems you are going to fetch tasks based on that often

[17:51:37] <julio> my searchs are based in date and period. is it everything of index at date and period?

[17:53:30] <StephenLynx> are you sure of that?

[17:53:40] <StephenLynx> that you don't base your searches on user_id?

[17:54:14] <StephenLynx> and btw

[17:54:15] <StephenLynx> "password": String, min 8,

[17:54:17] <StephenLynx> this is bad design

[17:54:34] <StephenLynx> if you are doing that, its because you are storing passwords in plain text.

[17:54:39] <StephenLynx> and you never do that.

[17:54:48] <StephenLynx> I recommend pbkdf2

[17:54:55] <StephenLynx> but bcrypt is not bad either.

[17:57:49] <GothAlice> StephenLynx: As a rather important note, pbkdf2 is a key derivation function, not a hashing function. It has some peculiarities beyond just salting. Security is hard, man. ;P

[17:58:24] <GothAlice> bcrypt/scrypt tend to be better from the "it's more secure with less understanding" perspective. Black boxes that deal with passwords.

[17:58:25] <StephenLynx> I know, but it works really well for that purpose.

[17:59:06] <StephenLynx> and from what I heard, has a higher ceiling than bcrypt

[17:59:18] <GothAlice> "ceiling"?

[17:59:29] <StephenLynx> it potentially harder to break.

[18:01:37] <GothAlice> Ah, that's likely just due to confusion between the interpretation of the term "rounds" between them.

[18:01:45] <StephenLynx> plus

[18:01:53] <StephenLynx> http://csrc.nist.gov/publications/nistpubs/800-132/nist-sp800-132.pdf

[18:02:04] <GothAlice> pbkdf2 uses 2^rounds, bcrypt I believe just uses the rounds straight.

[18:02:46] <GothAlice> But no, the major key point here is that you're not trying to derive a cryptographic key for use with block ciphers with the user's password, you're trying to verify the user knows a secret. :| Very different problems.

[18:03:24] <julio> StephenLynx, yes. My searchs are based: user_id where date = X and period = Y

[18:03:29] <GothAlice> Which is why I'm a fan of: https://en.wikipedia.org/wiki/Secure_Remote_Password_protocol

[18:03:32] <julio> StephenLynx, I'll read about pbkdf2.

[18:03:40] <StephenLynx> julio, I suggest you make that index on the user_id

[18:03:58] <StephenLynx> not having the others unindexed shouldn't be an immediate concern, but the user_id will

[18:04:10] <julio> just user_id at tasks collection?

[18:04:14] <StephenLynx> yes.

[18:04:24] <StephenLynx> change the index if needed in the future, though.

[18:04:34] <StephenLynx> but that one is the single most important one to have indexed.

[18:04:52] <julio> ok! fine fine thank you. StephenLynx :D

[18:04:55] <StephenLynx> np

[19:12:04] <tinylobsta> i switched from using embedded documents to using references because i ran into a use case where i needed to run a comparison that required having three nested loops. i'm curious, however, how efficient having nested queries is (i'm using mongoose on node) relative to having a nested loop structure?

[19:12:41] <StephenLynx> I strongly suggest that you don't use mongoose.

[19:13:10] <tinylobsta> is it really that inefficient?

[19:13:22] <tinylobsta> i'd have to tear apart my entire app to move away from it

[19:13:39] <StephenLynx> yes.

[19:13:47] <StephenLynx> nearly 10x slower than the native driver.

[19:13:52] <StephenLynx> its just garbage.

[19:14:17] <StephenLynx> not to mention the asinine way it handles certain times like ObjectId and dates

[19:14:18] <tinylobsta> that's frustrating, because every tutorial out there about node + mongodb recommends using mongoose

[19:14:21] <StephenLynx> and vulnerabilities.

[19:14:36] <StephenLynx> as a rule of thumb, I do the opposite of what webdevs recommend.

[19:15:22] <StephenLynx> webdevs in general shouldn't be allowed near computers, let alone to develop software.

[19:16:20] <tinylobsta> https://github.com/mongodb/node-mongodb-native

[19:16:24] <tinylobsta> so this is what i should leverage, then

[19:16:37] <stickperson> for a $projection in an aggregate query, is it possible to say “only give me this field if it exists”

[19:18:16] <GothAlice> stickperson: Hmm, {optionalfield: 1} as a projection always includes an "optionalfield" field in the result, even if the source document didn't have it?

[19:18:42] <GothAlice> It certainly will if you project using the renaming syntax: {myfield: "$optionalfield"}

[19:18:49] <stickperson> GothAlice: only when i do an aggregate query. when i just use find() it doesn’t get returned

[19:19:05] <GothAlice> stickperson: Could you gist/pastebin your aggregate? Or at least, the project stage?

[19:19:14] <stickperson> yea, one sec

[19:22:29] <GothAlice> Weird; I can't seem to reproduce your problem in isolation.

[19:22:44] <GothAlice> http://s.webcore.io/260d311k1A1D

[19:23:14] <stickperson> GothAlice: http://pastebin.com/pDisZSPd

[19:23:17] <GothAlice> I'm actually surprised that mine isn't always returning a field called "name" in that last one.

[19:23:50] <stickperson> GothAlice: syntax may look a little off because i’m using python, but it should be good enough

[19:23:50] <GothAlice> Aaaah, it's not the project that's failing you. It's probably the group.

[19:24:26] <GothAlice> Which field in your paste is the one that shouldn't be appearing if missing in the source document?

[19:24:29] <StephenLynx> tinylobsta, no, thats not it.

[19:24:36] <stickperson> ah right. yea, for this query i do a group after the project. in another query i do a projetion after the group

[19:24:46] <stickperson> GothAlice: let’s say article image

[19:24:49] <StephenLynx> https://www.npmjs.com/package/mongodb this is it.

[19:26:05] <GothAlice> db.foo.aggregate([{$project: {field: "$field", name: "$field"}}, {$group: {_id: 1, age: {$first: "$age"}}}]) → { "_id" : 1, "age" : null }

[19:26:35] <GothAlice> stickperson: Yup, in this example it's the $group doing you in. Fields defined within its stage will always be given a value, even if that value has to be null.

[19:28:02] <stickperson> GothAlice: damn. oh well, not a huge issue. will just have to add some more logic elsewhere

[19:28:04] <stickperson> thanks

[19:29:54] <StephenLynx> oh wait, that is the correct thing. its just with a different name.

[19:30:07] <tinylobsta> yeah, same thing

[19:30:08] <tinylobsta> thanks

[19:30:20] <StephenLynx> btw, let me get the documentation for it

[19:30:26] <StephenLynx> http://mongodb.github.io/node-mongodb-native/2.0/api/

[19:41:34] <torak> hello

[19:42:36] <torak> do i have to install nginx in order to process get and post requests with nodejs?

[19:43:06] <StephenLynx> not related to mongo.

[19:43:09] <StephenLynx> but no, you don't have to.

[19:46:11] <torak> StephenLynx: where should i place my scripts? How would the server know wich js file to run?

[19:46:43] <StephenLynx> unrelated to both mongo and node, depends on how you are running the server.

[19:46:52] <StephenLynx> I already told you that.

[20:10:24] <kexmex> hi

[20:10:29] <kexmex> this new mongo c# 2.0 driver

[20:10:43] <kexmex> i can't figure out a way to get ID after InsertOneAsync

[20:10:52] <kexmex> any ideas?

[20:11:42] <kexmex> it should be in the doc that i am inserting eh?

[20:12:40] <cheeser> kexmex: asking our c# driver guys

[20:12:46] <kexmex> where?

[20:13:59] <cheeser> internal chat

[20:14:27] <cheeser> "it's in the document they just inserted"

[20:14:54] <kexmex> cool i guess i was right )

[20:15:01] <kexmex> but i guess i need writeConcern acknowledged

[20:15:26] <cheeser> in 2.0, i think that's the default

[20:15:28] <cheeser> 2.x

[20:15:52] <kexmex> ah

[20:17:44] <kexmex> thanks for asking for me

[20:19:02] <cheeser> any time

[20:19:20] <cheeser> as long as it's business hours, of course. i may not have a life but craig does. ;)

[20:34:53] <cheeser> it's wild, let me tell you.

[21:14:48] <uuanton> anyone knows how to shutdown replicaset secondary remotely from primary ?

[21:20:11] <joannac> uuanton: you can open a mongoshell from the primary to the secondary and sent the shutdown command?

[21:20:20] <uuanton> i was trying to

[21:20:42] <uuanton> mongo --host ip-10-xxx.internal --eval "db = db.getSiblingDB('admin'); db.shutdownServer"

[21:21:03] <joannac> get rid of the db=db.getSiblingDB part

[21:21:19] <uuanton> mongo --host ip-10-xxx.internal admin --eval "db.shutdownServer"

[21:21:31] <joannac> db.getSiblingDB("admin").shutdown...

[21:22:56] <uuanton> shutdownServer failed: shutdown must run from localhost when running db without auth

[21:24:19] <joannac> oh. well there you go

[21:24:45] <joannac> that makes sense actually. If you don't have auth on you don't want some random connecting and shutting down your server

[21:25:09] <uuanton> i can't use auth, database secured thru firewall and has no direct access to internet

[21:28:38] <joannac> well, the mongod server can't tell that :p

[21:29:07] <joannac> if you can't turn on auth, then you have to ssh across and shutdown via a locahost connection

[21:30:37] <uuanton> joannac fair, how do i shutdown all secondaries (6) to restore data to primary and bring replica set back online ?

[21:31:02] <joannac> why do you need to shutdown all secondaries?

[21:31:04] <uuanton> my goal is to restore snapshot to replica set

[21:31:20] <uuanton> restore production data to staging

[21:31:34] <joannac> oh, you want to restore the snapshot to all members/

[21:31:36] <joannac> hmmm

[21:31:58] <uuanton> it must be an easier way for sure

[21:32:00] <joannac> ssh to each secondary, connect via mongo shell, shutdown command

[21:32:33] <uuanton> thats not going to scale i have to do it everyday

[21:32:59] <joannac> then turn on auth or script it

[21:33:12] <magicantler> how fast does mongo build indexes?

[21:33:22] <joannac> magicantler: how long is a piece of string?

[21:33:23] <uuanton> but how do i bring secondaries back alive ?

[21:33:34] <joannac> uuanton: ssh, start

[21:33:38] <magicantler> joannac: what?

[21:33:43] <magicantler> ok

[21:34:07] <magicantler> suppose we insert lots of documents, then delete them right away. is this a bad measure of performance since maybe it doesn't have time to build indexes? @joannac

[21:34:08] <joannac> magicantler: you think I can tell you that without knowing anything about your data or hardware?

[21:34:49] <joannac> magicantler: I don't understand your question.

[21:35:05] <uuanton> joannac you mean ssh to each secondary and restart mongo ?

[21:35:16] <magicantler> if i insert a shit load of documents, then try and delete them right away, will it be slower than if i had waited 10 minutes before deleting?

[21:35:19] <joannac> uuanton: start the mongod

[21:35:33] <joannac> magicantler: no

[21:35:49] <magicantler> joannac: why's that?

[21:35:49] <GothAlice> magicantler: Indexes are updated as the inserts are applied, so there's no impact there.

[21:36:11] <GothAlice> It's not like it inserts it, then for an indeterminate amount of time the record won't show up because the indexes haven't been updated.

[21:36:18] <GothAlice> ¬_¬

[21:36:37] <uuanton> oh GothAlice mebbe you know the smart way to restore whole replica set to new data, i talked to you yesterday

[21:37:04] <uuanton> im not longer care if secondaries have to resync again from scratch

[21:37:49] <magicantler> GothAlice: I figured it built them behind the scenes, and until it was built, the lookup just ran slower

[21:38:00] <GothAlice> uuanton: Then the process I described yesterday should work great; restore a snapshot, demote it from being a replica set, promote it back to being a *new* replica set, then spin up new secondaries.

[21:38:15] <joannac> magicantler: the insert won't succeed until the index entries are updated

[21:38:27] <magicantler> ah gotcha. thank you!

[21:39:02] <GothAlice> magicantler: Such an approach as to defer index updating wouldn't result in slowness, it'd result in the record not being there for any query using the index. That's a no-go anti-feature in a database engine. ;)

[21:39:22] <joannac> To be fair, we do have background indexes

[21:39:25] <uuanton> sorry im not quite understand the steps. I believe i need to stop secondaries first right

[21:39:34] <GothAlice> joannac: But that's just incremental initial construction, no?

[21:39:43] <GothAlice> I.e. the index doesn't actually "exist" for use until it's done.

[21:39:43] <joannac> yeah

[21:40:15] <joannac> right, that's index creation, not updates

[21:41:13] <GothAlice> uuanton: You don't necessarily need to, no. Especially if you want read-only access to the older data while the restore and new replication is set up.

[21:41:45] <GothAlice> But that'll depend on quite a number of things, like if you're changing hostnames during the restore.

[21:42:33] <uuanton> GothAlice, actually the goal is to access new data as soon as possible. While secondaries are going to be rebuilded with new data primary can still serve the traffic ?

[21:42:43] <GothAlice> At work we use incremental numeric DNS names, like r01s01.db.example.com — replica 1 of set 1. If we restore, we bump the set number, using a CNAME on db.example.com to point to the active cluster.

[21:43:48] <GothAlice> uuanton: Yes, though you need to make sure you have a sufficient oplog size to handle the incoming changes. I.e. it needs to be large enough to record all of the changes that happen while the secondaries are syncing, or the sync will never finish successfully.

[21:45:14] <uuanton> wouldn't secondaries start from scratch ?

[21:45:43] <GothAlice> They would, but step one is to copy the existing data. While that's happening, changes accumulate in the oplog. Once the initial data sync is complete, it then "catches up" by reading through the oplog and applying the changes listed there.

[21:46:03] <GothAlice> Finally, during normal operation, it "tails" the oplog to apply changes as they happen.

[21:46:50] <uuanton> sounds good

[21:48:42] <uuanton> after i shutdown secondaries my primary became secodary o_0

[21:50:06] <cheeser> if there's no majority, yes

[21:51:32] <GothAlice> uuanton: To protect your data, if a primary can't see (ping) a majority of the other nodes it knows about it calmly turns itself into a read-only secondary. It has no confidence that it's actually still the primary, since the missing majority could vote against it and it'd never know.

[21:52:37] <GothAlice> And if that happened, and it did still think it was a primary, it might accept changes to be written that the other one doesn't know about, and you have what's referred to as a "split brain problem". Once the nodes could see each-other again, the equivalent of all hell would break loose. ;P

[21:53:13] <GothAlice> MongoDB prefers to not open gates to netherworlds, so does the safe thing.

[21:55:53] <uuanton> ^^

[22:09:00] <uuanton> swapping drives in pain the the butt, how you guys backup production data ?

[22:09:16] <GothAlice> By having additional secondaries just for the purpose.

[22:09:29] <uuanton> how about staging

[22:10:16] <uuanton> additional secondaries you delayed ?

[22:10:44] <uuanton> someone delete from records how you going to restore the data ?

[22:11:21] <GothAlice> Indeed, we have two replicas in our office. One is "live" (and lets us run reports against our data without needing to pop out to the internet for each query) the other is delayed by 24 hours.

[22:11:38] <GothAlice> The first lets us continue operating even if the rest of the world is on fire. The second helps recover from user error.

[22:13:23] <uuanton> in my office we have staging where all testing and development occurs

[22:13:57] <GothAlice> We stage on the same infrastructure that production runs on, just in a separately scaled set of instances. I like my testing environments to be as close to real as possible. :)

[22:14:06] <uuanton> right now we mongodump production and mongorestore to staging and it takes almost 6 hours

[22:15:13] <GothAlice> Ouch. A faster approach would be to have an in-house secondary you disconnect from the set and demote back to standalone, with a filesystem snapshot made prior to this such that when you're done testing you can roll back the snapshot and it'll catch up with the rest of the cluster again. (Again, assuming sufficient oplog size to cover the testing period.)

[22:15:59] <uuanton> but staging is replica set too

[22:17:37] <uuanton> the goal of staging not only in testing development but to have similar infrastructure as production that if you need to upgrade mongodb or do something that you can practice on staging

[22:18:33] <uuanton> and secondaries located around the world that make it hard to test without same staging infrastructure

[22:18:33] <GothAlice> Sorry, I'm not seeing the difficulty. The process of taking an in-house secondary of your production data, demoting it out of the set, then promoting it to be the primary of a new in-house staging set, would still be faster than mongodump/restore.

[22:19:25] <GothAlice> I.e. it'd take five minutes before it started being able to answer queries.

[22:22:01] <GothAlice> In our case at work we have enough oplog for 50 hours of database activity (at our current level of utilization). That'd let us easily have a "live" (not delayed) secondary in the office that is filesystem snapshotted while a secondary, then demoted and promoted to being the primary of a new in-house set, every 24 or 48 hours (I'd go for 24; 48 doesn't leave comfortable enough headroom for us. ;)

[22:22:43] <GothAlice> Basically letting it breathe for an hour around midnight each night to "catch up" with production, only to re-snapshot, re-demote, and re-promote.

[22:23:02] <uuanton> i agree its a good idea

[22:23:15] <GothAlice> (We don't actually do this, though, we mongodump and restore about once a week, but our data takes tens of minutes to load, not hours. ;)

[22:23:39] <uuanton> but i don't think they would let me to remove and add secondaries from production to restore staging

[22:24:15] <GothAlice> It's a pretty natural way to get MongoDB to synchronize information, given replication is how MongoDB synchronizes information. ;P

[22:24:48] <GothAlice> (And if you add it as a non-voting hidden member, the rest of the set won't even be aware. You can even reduce load on your primary by having this staging secondary replicate and catch up from another secondary.)

[22:25:16] <uuanton> another reason why this is not possible that production and staging completely on different networks

[22:26:14] <uuanton> aws VPC

[22:26:49] <GothAlice> AWS do offer VPNs, though I haven't tried running a replica intermittently over such an arrangement.

[22:27:17] <GothAlice> I have a secondary at home, too, that does the midnight catchup routine. (But not for promotion/demotion or anything, just as a mother backup.)

[22:27:54] <GothAlice> *another

[23:39:09] <Doyle> Hey. Anyone know if the slow listDatabases issue was confirmed in MMAP? Seems that it was unable to be tested. https://jira.mongodb.org/browse/SERVER-20961?jql=project%20in%20(SERVER%2C%20TOOLS)%20AND%20fixVersion%20%3D%203.0.9%20AND%20resolution%20%3D%20Fixed%20

[23:43:46] <Boomtime> @Doyle: all the listed effects concerned wiredtiger only - the fixes also only affected wiredtiger code

[23:44:16] <Boomtime> are you seeing something that you think is a similar problem in mmap?

[23:46:18] <Doyle> listDatabases hung. Connecting to a router via robomongo was not possible. The only thing in the logs that was happening was an index built, and a big query. They both finished at about the same time, and the routers became available again

[23:46:22] <Doyle> it was odd

[23:49:26] <joannac> index build blocks other operations

[23:49:31] <joannac> if it's foreground

Log file Viewer

Help | Karma | Search:

#mongodb logs for Wednesday the 3rd of February, 2016