PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Thursday the 20th of November, 2014

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:00:43] <cheeser> anything that uses 'u' like that is sad.
[00:01:44] <drorh> GothAlice: nice what u linked
[00:02:53] <xissburg> THis is pymongo specific.. it seems to me it is not properly handling date timezones
[00:03:22] <xissburg> I set the tzinfo in the date attribute but when I retrieve it is has no timezone
[00:03:23] <drorh> cheeser: u need to hear it in context... (in case u havnt)
[00:03:42] <cheeser> no, i really don't
[00:04:20] <drorh> the ':' in the song makes the difference
[00:04:20] <GothAlice> xissburg: There are all sorts of caveats to using Python datetime objects with timezones.
[00:04:34] <xissburg> YEah it fucking sucks
[00:04:51] <GothAlice> xissburg: Well, it sucks less than most who run into issues think. Install pytz, use it, be happy. :)
[00:05:23] <xissburg> GothAlice: I am using twisted.amp and it has a amp.utc thingy that I can set to the tzinfo and I am doing it
[00:05:42] <xissburg> It works, but after I retrieve objects from mongo I have to set it again
[00:05:51] <GothAlice> "can set to the tzinfo and I am doing it" — how, *exactly*?
[00:08:10] <xissburg> datetime.datetime.now(utc) where utc is defined here http://twistedmatrix.com/trac/browser/trunk/twisted/protocols/amp.py line 2584, instance of _FixedOffsetTZInfo
[00:08:49] <GothAlice> xissburg: That is very wrong. datetime.datetime.utcnow().replace(tz=utc)
[00:09:07] <xissburg> uh
[00:10:01] <GothAlice> xissburg: Read through the 'pytz' manual — it covers why naive timezone casting will be wrong.
[00:10:26] <GothAlice> (And why it provides it's own Timezone.in_timezone(datetime) casting function.)
[00:11:34] <GothAlice> Regardless, PyMongo by default stores datetimes in UTC. (So a naive/tz-less datetime produced by utcnow() would work fine without any extra machinery.) Are you attempting to *store* dates in the DB in different zones (generally a Very Bad Idea™) or is this a display issue?
[00:12:33] <xissburg> GothAlice: shouldn't it be .replace(tzinfo=utc)?
[00:13:16] <GothAlice> Yes. I'm bad for writing p-code examples, sorry. ¬_¬
[00:13:45] <GothAlice> Basically .replace() avoids triggering timezone conversion machinery (it just drops the replacement zone in and says "really, this is a datetime in X timezone!").
[00:14:28] <GothAlice> xissburg: http://leach.it/python/2013/12/21/understanding-pymongo-and-datetimes.html may be a useful article.
[00:19:47] <xissburg> This is the date I have in mongo "date" : ISODate("2014-11-20T00:14:57.131Z"), this is what I get in python 'date': datetime.datetime(2014, 11, 20, 0, 14, 57, 131000)
[00:19:57] <xissburg> and that still gives me a timezone error
[00:24:48] <xissburg> Yea, that date.tzinfo is None, even though it was set to something before saving
[00:27:49] <GothAlice> xissburg: The article I linked explains that.
[00:27:55] <GothAlice> Also explains how to fix it.
[00:28:16] <xissburg> Ok.. I will get back to this later
[00:28:18] <xissburg> Thanks!
[00:28:50] <drorh> lol
[00:29:52] <drorh> is it just me or v8 doesnt do destructuring yet?
[01:59:28] <shoerain> why would mongostat --username <username> --password <password> (with the correct versions of both) give me assertion 18: auth fails? db.auth(<username>, <password>) work just fine
[02:07:05] <GothAlice> shoerain: Users are tied to databases in MongoDB, so if you added your user to the 'admin' database, you'll need to tell the tools to authenticate against that database specifically, regardless of which other database the user will use.
[02:07:49] <GothAlice> shoerain: --authenticationDatabase
[02:30:28] <shoerain> hmm i see
[02:53:34] <kevc> is it possible to shard over multiple replica sets? so that one collection has a master in NY and another collection has a master in London?
[02:54:06] <GothAlice> kevc: Indeed it is! That's a relatively common setup (sharded replica set).
[02:59:51] <shoerain> So... it looks like I'm making some 50-ish queries on a few database collections in my teensy node.js app every time I serve "GET /". Do I need to worry about batching them myself to reduce the number of queries to the database?
[03:00:45] <GothAlice> Yes.
[03:00:51] <shoerain> Draaaat
[03:00:52] <GothAlice> That sounds like a data locality issue.
[03:01:31] <GothAlice> shoerain: Have I linked you that Java article about schema design yet? (I swear that's my #1 most used link. ;)
[03:01:57] <GothAlice> (http://www.javaworld.com/article/2088406/enterprise-java/how-to-screw-up-your-mongodb-schema-design.html)
[03:02:00] <shoerain> I have 50-ish tasks, I look up a collection "Permission" for each task, then I look up a collection "Organization" for each "Permission". so I guess it could be 3x50==150... hmm
[03:02:23] <GothAlice> :|
[03:03:00] <shoerain> heh, what a huge difference testing locally and against a faraway machine makes
[03:03:08] <shoerain> I totally forgot about DB queries....
[03:03:30] <GothAlice> ^_^ Network latency is a thing.
[03:04:23] <kevc> GothAlice: So it seems you can create multiple shards, each with their own replica set. Think where I get stuck is how to say which shards a collection is using. Seems you can only tell a collection to use all shards.
[03:04:53] <GothAlice> Generally the model is to split the replicas across DCs, not the sharding.
[03:05:41] <GothAlice> I.e. you'd have a primary of each shard in the 'local' DC, secondaries in other DCs. (This gives 'local' reads for those other DCs and offsite backup.)
[03:06:48] <GothAlice> You could also have a field, i.e. "dc", on your documents which you shard upon, with the sharding index selecting which DC to send the data to. (But this means that larger queries across DCs get kinda crazy.)
[03:07:09] <kevc> GothAlice: So I get shard A (rs0, primary in NY) and shard B (rs1, primary in London)
[03:08:40] <GothAlice> kevc: http://docs.mongodb.org/manual/core/sharding-shard-key/#sharding-shard-key-query-isolation
[03:08:44] <shoerain> How should I batch a bunch of queries together? this aggregate framework sounds kind of like what I would want...
[03:09:46] <ObjectiveCopley> Not trolling: What advantage does a document based DBMS have over relational? When would I use document based over relational?
[03:11:11] <GothAlice> ObjectiveCopley: I issue a single query, with no joins, returning a single record representing the entire contents of a forum thread, with the sub-documents (replies) sliced to the current page. Query time: unmeasurable. Document systems let you manage, model, and explore your data with fewer (*cough*) constraints.
[03:11:16] <kevc> aha, Tag Aware Sharding. Ok, looks like I've got this, thanks!
[03:11:32] <GothAlice> kevc: :)
[03:11:50] <GothAlice> kevc: I might not do this, but apparently it's not totally uncommon.
[03:12:06] <ObjectiveCopley> GothAlice: Hmm.
[03:12:45] <ObjectiveCopley> My coworker and I both have used RDBMS's and Mongo and couldn't think of a use case for Mongo that couldn't be easily solved in a RDBMS but I guess a forum thread would be a good example
[03:12:58] <ObjectiveCopley> Mongo would be much faster I guess
[03:13:22] <GothAlice> ObjectiveCopley: Additionally, MongoDB provides three different (but re-using of their semantics) query systems: normal find (akin to SELECT), map/reduce (map/reduce ;), and aggregate pipelines. Aggregate pipelines (and to a lesser degree map/reduce) can be optimized to run in parallel for some truly extreme performance.
[03:14:00] <GothAlice> http://docs.mongodb.org/manual/core/aggregation-introduction/ gives an overview; it's a very different way of thinking about how you ask questions of and process your data.
[03:14:22] <ObjectiveCopley> How well does Mongo do with constraints
[03:14:25] <GothAlice> (And one that I find much easier to understand than massive JOIN trees.)
[03:14:27] <GothAlice> It doesn't have them.
[03:14:32] <ObjectiveCopley> Hm
[03:14:37] <kevc> ObjectiveCopley: it's useful when dealing with distributed data
[03:14:39] <GothAlice> Nor does it have referential integrity, nor any way to perform automatic joins of returned data.
[03:14:47] <ObjectiveCopley> So if you absolutely have to have constraints, MongoDB is not a good solution
[03:14:59] <ObjectiveCopley> Something where data integrity is mission critial
[03:15:13] <kevc> ObjectiveCopley: then use SQL for those situations
[03:15:20] <GothAlice> ObjectiveCopley: http://www.javaworld.com/article/2088406/enterprise-java/how-to-screw-up-your-mongodb-schema-design.html is an excellent (and quite short) overview of how MongoDB thinks about data vs. relational databases when comparing standard data modelling practices.
[03:16:41] <GothAlice> ObjectiveCopley: You *can* implement two-phase commits in MongoDB if you wish, but yeah, pragmatism usually wins. Use the right tool for the job: if you need transactional compliance or enforced referential integrity (instead of referential durability which is the approach in MongoDB), a relational database may be a better solution. It's the same with modelling graphs—or even worse, depending on how naively one tries to implement that in
[03:16:41] <GothAlice> MongoDB.
[03:16:55] <ObjectiveCopley> And I guess if you had a (large) piece of data that could be used by multiple different objects
[03:17:01] <ObjectiveCopley> Mongo would not be good for that situation, right?
[03:17:31] <ObjectiveCopley> At my former startup we managed a lot of uptime data, like pings
[03:17:37] <GothAlice> ObjectiveCopley: Depends quite heavily. A question often raised in this channel revolves around apparent "duplication of data" when the issue is actually one of optimizing query performance.
[03:17:45] <ObjectiveCopley> we found Postgresql and mysql to be absurdly slow for our needs, but mongo was exceptionally fast
[03:17:53] <kevc> We've got a system where data is coming in from multiple sources and new features are being added all the time. We want developers to be able to develop tools querying those datasets as they become available. Doing all of this with say Postgres would be a lot of work.
[03:18:24] <ObjectiveCopley> but that was the only solution we could come up with where mongo made sense
[03:18:59] <ObjectiveCopley> Because you could have a server, which had reports, and each reports had metadata, and if reports ran every 5 minutes, and each report had 3000 pieces of metadata, it became a massive tree of queries
[03:19:14] <ObjectiveCopley> Mongo made it exceptionally fast because we could keep it all in the same object
[03:19:53] <GothAlice> ObjectiveCopley: I have a personal dataset exceeding 25TiB in size, and at work I perform large-scale service monitoring, aggregated logging, and click stat analytics, use a cMS (component Management System, not Content ;), and even wrote our own Sphinx-like Okapi BM-25 full text indexer and parallel ranker for it. All on MongoDB.
[03:20:31] <GothAlice> (The search ranker can rank 10 years of indexed PDFs in less than two seconds. It's so fast we don't even bother to paginate results in the cMS.)
[03:21:00] <GothAlice> (From a city data set; i.e. council meeting summaries, minutes, and bylaws.)
[03:21:59] <GothAlice> We even use it to replace Celery and other distributed task queue systems, a la the presentation I gave on that system: https://gist.github.com/amcgregor/4207375 (1.9 million dRPC requests per second per host at last stress test.)
[03:22:17] <GothAlice> :)
[03:25:20] <GothAlice> Oh… and the authentication service and forums I wrote to power a community site for the second largest alliance in EVE Online. (12K members, also part of a coalition of alliances who also use the services totalling maybe… 16K?)
[03:28:53] <GothAlice> ObjectiveCopley: Apologies for the… tirade… but I wanted to very clearly illustrate that MongoDB can solve many, many different problems in very acceptable ways.
[03:29:09] <scrandaddy> Hey there. I am designing an application that is heavy on http request processing in real time. I would like to use mongo but am concerned about doing a lot of single writes as each request comes in. A solution I thought of is to use redis and batch write to mongo every so often. Is this a better idea? I'm looking at a cloud solution like mongolabs, maybe
[03:29:09] <scrandaddy> performance is something I don't need to worry about in that case?
[03:30:05] <GothAlice> scrandaddy: In theory the majority of execution time will be spent either fetching data over the wire or processing that data, not actually spent inside MongoDB.
[03:31:37] <scrandaddy> Hmm, how exactly are single writes affected by that?
[03:32:13] <GothAlice> Well, first, AFIK MongoDB has no such concept as a "batched write" (i.e. no multiple insert)… but I'll double check that.
[03:33:18] <Boomtime> yes it does
[03:33:33] <Boomtime> bulk insert, though it's more general than that, "bulk operation"
[03:33:45] <GothAlice> scrandaddy: https://gist.github.com/amcgregor/7be2ec27adc80c9fafa1#file-sync-cvm-py-L21-L54 is an example of our HTTP scraping data processing pipeline from work. (In this case a simplified example that parses RSS instead of having to literally scrape pages, but it still does HTTP for this.) The pipeline is designed to stream—as each job is scraped it's passed up the chain for insertion. (Source gets a minimal set, Parser would then fill
[03:33:46] <GothAlice> in the blanks, and the main JobSync class consumes these "streams".
[03:34:12] <Boomtime> however, builk ops are wire-protocol only, the operations themselves still take place individually at the server
[03:34:35] <Boomtime> http://docs.mongodb.org/manual/reference/method/db.collection.initializeUnorderedBulkOp/
[03:34:37] <GothAlice> (In this way we can separately scale each step of the processing… the top level JobSync doesn't care about the order of the scraping, so I can just run one which consumes jobs produced by all parsers, etc., and this would simplify batching of data quite substantially if I went that direction.)
[03:36:36] <GothAlice> Boomtime: That's really good to know! I think I'll make use of that.
[03:38:48] <scrandaddy> So using redis to buffer writes might be a good idea?
[03:38:51] <Boomtime> note there is also an 'ordered' version of same
[03:39:01] <Boomtime> scrandaddy: probably not
[03:39:10] <GothAlice> And the important caveat that if there is an error in the data part way through, it'll stop processing that batch.
[03:39:17] <Boomtime> what problem are you trying to solve that redis fixes?
[03:39:41] <Boomtime> GothAlice: that only makes sense for the ordered variant
[03:39:47] <GothAlice> Boomtime: Aye.
[03:39:50] <scrandaddy> I'm just worried that if I have too many requests coming in doing single writes to mongo I will big the system down
[03:39:59] <scrandaddy> Bog
[03:40:12] <Boomtime> and how does redis fix that?
[03:42:54] <Boomtime> scrandaddy: you want to use redis as a "cache" except that writes have to be written sooner or later to MongoDB.. so the same number of writes still exist, and the cache is invalidated when you do this.. so how does redis help at all?
[03:44:44] <GothAlice> scrandaddy: If you want deferred processing (i.e. one process that streams HTTP data into MongoDB, another that processes that data into real documents) you can insert the raw data into a capped collection that is being watched by a tailing cursor in the process that parses the data, but that will potentially make the "bogging down" problem worse, as the capped collection is by definition limited in size. This can give you some buffer time to
[03:44:44] <GothAlice> scale up more workers when needed, though.
[03:45:18] <scrandaddy> Isn't redis going to be faster and make the requests faster?
[03:45:34] <scrandaddy> And then I can write to mongo offline
[03:46:27] <scrandaddy> Hmm that could work I will look at capped collections
[03:46:28] <Boomtime> you seem to be inventing reasons to use redis, but can't quantify those reasons
[03:46:29] <GothAlice> scrandaddy: As a note, redis scaling badly as a way of tracking distributed task execution (similar to using it to buffer data processing), oddly enough, is what killed my last project.
[03:47:11] <Boomtime> if you don't want persistent storage then you might be better off with redis only
[03:47:29] <Boomtime> do you want the data stored persistently or not?
[03:48:43] <scrandaddy> Yeah, it will eventually live in s3 but I would like to make some of it available over an api before hand and my understanding is that redis might only live for minutes at a time
[03:48:44] <Boomtime> adding redis to your setup (in conjunction with mongodb) will add a new point of failure, add latency to all persistent storage (your data is more vulnerable) and add complexity to your deployment
[03:49:21] <Boomtime> meanwhile, you are not able to express exactly what beneift it offers
[03:49:49] <GothAlice> Optimization without measurement is by definition premature.
[03:49:54] <scrandaddy> Fair enough, you guys have convinced me
[03:50:24] <scrandaddy> I'll stick to mongo for now and checkout capped collections.
[03:50:51] <scrandaddy> GothAlice: what if instead of watching with a tail cursor I just processed the raw data every minute?
[03:51:41] <GothAlice> scrandaddy: https://gist.github.com/amcgregor/4207375 may be helpful. In that situation (which you certainly could do!) you'd have to make very sure that your capped collection is large enough to handle the expected time period… with sufficient head-room to give you time to react to scaling issues.
[03:52:16] <GothAlice> (At work, using a capped collection as per that presentation, we estimated 8GB capped collection to maintain 1M simultaneous users' games.)
[03:52:50] <GothAlice> The "head-room" on that was that we didn't have a million users… not even close. ;)
[04:01:05] <scrandaddy> Oh ok cool
[04:01:15] <scrandaddy> Thanks for the advice
[07:56:27] <Duikb00t> When do you use MongoDB? I am now only working with MySQL.
[07:58:28] <GothAlice> Duikb00t: http://irclogger.com/.mongodb/2014-11-19#1416454077-1416452929 :)
[08:00:02] <Duikb00t> So it's made for large web applications with large and real-time data?
[08:01:01] <GothAlice> Or tiny ones. (Forums are conceptually quite simple an application.)
[08:02:22] <GothAlice> One of my work datasets is ~100MB, my home dataset (as mentioned in the log) is rather terrifyingly large.
[08:03:13] <Duikb00t> That isn't so large I think?
[08:04:10] <GothAlice> 100MB isn't very much at all. (Maybe 20,000 records total in that set spread across several collections; most of that data was slurped via web scraping.)
[08:05:31] <GothAlice> Duikb00t: I do have to apologize, though, for having to run. It's 3am here and I need some offline maintenance so I can be functional tomorrow. ;) I mainly wanted to point you at that log as your question isn't uncommon.
[09:09:06] <Sunuba> Greetings all. I am new and I do not know how I should ask for help here or privately. Can someone please guide me?
[09:11:10] <Sunuba> Anyone here?
[10:17:01] <kevc> When I try to --upgrade on mongos it shows "Config database is at version v5"
[10:18:36] <kevc> but when I look at db.locks.all() there's an entry there for "upgrading config database to new format v5"
[14:28:52] <Mmike> Is there somewhere a list of return codes that rs.add() returns?
[15:29:18] <sekyms> If I have a schema and one of the values is an array
[15:29:38] <sekyms> how can I add a new array item to the existing array?
[15:30:07] <Derick> db.collection.update( { _id: yourid }, { arrayfield: { $push: 'newvalue' } } );
[15:30:22] <sekyms> oh cool there is $push
[15:30:31] <sekyms> Derick is the syntax the same in mongoose?
[15:34:12] <Derick> sekyms: sorry, I don't know mongoose
[15:34:22] <sekyms> no worries, what you gave me was helpful
[15:34:23] <sekyms> cheers
[17:14:03] <AlexZanf> Hey guys, I need to do a deep populate within a geoNear. How would i do that?
[17:14:13] <Derick> what's a deep populate?
[17:23:15] <AlexZanf> Derick, so lets say, i have articles, and articles has comments, and comments has tags, I would like to get articles, with the comments and with the comment tags
[17:23:16] <brianseeders> I think it's a mongoose term
[17:23:54] <AlexZanf> not using mongoose, thought it was a standard term, if not, how do you refer to that?
[17:29:23] <brianseeders> you would typically manually query for the related documents in your application code
[17:29:33] <brianseeders> i'm not super familiar with mongoose, but I believe it has a helper for it called populate
[17:30:31] <AlexZanf> brianseeders, oh im not using mongoose, but is there an example of manually querying? wouldnt that be slow? if you get 1000 results, you have to manually query all 1000, and say they each have 100 associasions, then you need to query those, and so on
[17:32:53] <brianseeders> yeah
[17:33:20] <brianseeders> although you should be able collect up all the IDs you need to pull on the second step and do them in bulk with something like $in
[17:35:08] <brianseeders> mongo shines when the data is de-normalized and you don't need arbitrary joins like that
[17:37:45] <sekyms> Is there a more appropriate place to ask a Mongoose question?
[17:41:31] <sekyms> I'm not sure why this isn't working https://gist.github.com/smykes/6f9b86d5c9c1bb26354f
[17:47:44] <kephu> hi
[17:48:27] <brianseeders> There's a #mongoosejs sekyms
[17:48:33] <AlexZanf> brianseeders, oh awesome, have a good example of that?
[17:48:35] <sekyms> oh there is?
[17:48:40] <sekyms> how did I miss that
[17:48:42] <sekyms> thank you brianseeders
[17:49:06] <kephu> in my aggregation query I wanted to pick the largest of the two: $size: {$setDifference: [array1, array2]} and $size: {$setDifference: [array2, array1]}
[17:49:17] <kephu> but I seem to be messing up or approaching this the wrong way
[17:49:22] <kephu> how do i do this?
[17:49:56] <GothAlice> kephu: http://docs.mongodb.org/v2.6/reference/operator/meta/max/
[17:50:07] <GothAlice> Er, wrong $max. Uno momento. XD
[17:50:53] <kephu> GothAlice: yeah, I figured that had something to do with that, but I'm apparently messing it up a bit, I think
[17:51:40] <GothAlice> http://docs.mongodb.org/v2.6/reference/operator/aggregation/max/#grp._S_max — this would require unwinding on the array of the two sizes first
[17:52:52] <kephu> "$max is an accumulator operator available only in the $group stage." gah
[17:54:25] <GothAlice> OTOH… I'm wondering if you really need to do that $setDifference twice… if you're only counting the length of the intersection (or length of the difference… since a~b is congruent to b~a when comparing common membership…)
[17:56:06] <kephu> GothAlice: yeah tbh my set maths are sorta rusty
[17:56:44] <kephu> GothAlice: yeah, thats my problem right there
[17:56:55] <kephu> anyway, I gotta group them thangs at some stage anyway
[17:56:58] <rendar> GothAlice: lol "uno momento" ? its "un momento" :P
[17:57:12] <GothAlice> rendar: Meh. :P I'm allergic to melk.
[17:57:20] <rendar> eheh
[17:57:42] <rendar> hmm melk? my translator can't find it
[17:58:20] <GothAlice> rendar: Mispronunciation of "milk" — my friends and I have some atypical ways of saying things, so don't mind my butchering of other language snippets.
[18:00:34] <rendar> lol
[18:02:20] <GothAlice> kephu: When working with sets… it's pretty important to have a solid grasp of the basic set operations. Combining them in creative ways is the path to greatness lined with much mathematical theory.
[18:02:45] <kephu> well yeah
[18:03:06] <kephu> but yeah, I was right on that one, I *do* need two operations here :D
[18:03:30] <GothAlice> There's an alternate approach…
[18:05:01] <GothAlice> Get the length of the union of the two sets subtracted from the maximal of the length of those two lists. You could potentially use an $if (to pick the larger of the two) with a $let (to define the two lengths for the conditional).
[18:06:31] <GothAlice> (You could also implement a non-$group'd $max that way, too, for the first case. I'd pre-calculate and store the lengths (or single maximal length) if this will be a common query. I.e. every time you $push, also $inc, each time you $pull also $inc: -1)
[18:07:07] <GothAlice> Pre-calculation of that can become a PITA if you don't carefully handle maintaining those lists as actual sets.
[18:22:04] <kephu> GothAlice: okay, I am at a loss on how to actually use $max :D
[18:25:58] <jshultzy> hey guys, anyone kind enough to help me out with a problem im having in my mongo master/slave config?
[18:29:32] <jshultzy> i've searched all over the net including mongo forums but couldnt find a solution. basically whats happening is that im recieving the following error in mongo logs on the slave: [replslave] all sources dead: data too stale halted replication, sleeping for 5 seconds i tried shutting down the mongo daemon, clearing out the data directory on the slave, and doing a full re-sync from master but what happens is that
[18:29:32] <jshultzy> Thu Nov 20 18:17:51.842 [replslave] repl: applied 3 operations
[18:29:32] <jshultzy> Thu Nov 20 18:17:51.890 [replslave] repl: end sync_pullOpLog syncedTo: Nov 19 19:17:11 546cecb7:d
[18:29:32] <jshultzy> Thu Nov 20 18:17:51.890 [replslave] repl: sleep 1 sec before next pass
[18:29:33] <jshultzy> Thu Nov 20 18:17:52.899 [replslave] repl: syncing from host:10.0.4.179
[18:29:33] <jshultzy> Thu Nov 20 18:17:52.899 [replslave] repl: old cursor isDead, will initiate a new one
[18:29:33] <jshultzy> Thu Nov 20 18:17:52.971 [replslave] repl: nextOpTime Nov 20 13:10:08 546de830:8 > syncedTo Nov 19 19:17:11 546cecb7:d
[18:29:34] <jshultzy> repl: time diff: 64377sec
[18:29:34] <jshultzy> repl: tailing: 0
[18:29:35] <jshultzy> repl: data too stale, halting replication
[18:29:35] <jshultzy> Thu Nov 20 18:17:53.015 [replslave] caught SyncException
[18:29:36] <jshultzy> Thu Nov 20 18:17:53.015 [replslave] repl: sleep 10 sec before next pass
[18:29:36] <jshultzy> [replslave] all sources dead: data too stale halted replication, sleeping for 5 seconds
[18:29:48] <Xe> jshultzy: http://pastebin.com exists
[18:29:54] <Xe> flooding the channel is considered rude
[18:30:32] <jshultzy> sorry but, new to IRC. I just set up account to see if I can get any help as a last resolt
[18:34:53] <kali> it looks like your server is too out of sync to recover with the oplog
[18:35:14] <kali> jshultzy: ^
[18:35:48] <jshultzy> thanks kali for responding. hmm, even after clearing out the old database data directory from the slave and re-syncing, what do you recommend to do
[18:36:48] <kali> how much time did it take to get to the error you get ?
[18:37:17] <jshultzy> oh gosh probably like 14 or so hours of running the resync to pull all the 850GB of database over to the slave
[18:37:43] <kephu> GothAlice: okay i am at a complete loss here :|
[18:38:26] <jshultzy> maybe a little less or more but it was running since yesterday
[18:39:18] <kali> jshultzy: yeah, ok. when you say "master/slave"... are you actually using master/slave or is it a replica set ?
[18:40:18] <jshultzy> i think its a master/slave... the following is from the config:
[18:40:18] <jshultzy> slave = true
[18:40:18] <jshultzy> source = 10.0.4.179
[18:40:35] <kali> ho, boy.
[18:41:18] <kali> master/slave has been deprecated for ages
[18:41:35] <kali> what version of mongodb are you running ?
[18:42:20] <jshultzy> 2.4.6 i believe
[18:43:28] <kali> ok, not so bad. well, i have no idea how to operate and maintain actual master/slave setup
[18:44:16] <jshultzy> think its just time for me to upgrade and use replset instead of master/slave. Thanks anyways kali
[18:44:33] <kali> it might help yeah
[18:45:14] <kali> one more thing: your actual problem is that the oplog is too small for the amount of time the slave needs to pull your data from the master
[18:45:36] <jshultzy> can i change the size of oplog?
[18:45:51] <kali> well, i know how to do that on replica set.
[18:46:02] <GothAlice> kephu: You have two possible approaches: $if with $let to calculate the lengths for use in the condition and for use in the true and false result output (simulate "max" on two elements) or use $group and $max in the $group projection. (This will require a $unwind before-hand.)
[18:49:10] <kali> jshultzy: it is possible that just stopping the master, changing the command line argument (or conf) and starting it again does the trick, but I have no first (or even second) hand experience in the matter
[18:49:21] <kali> jshultzy: http://docs.mongodb.org/manual/core/master-slave/
[18:49:50] <kali> jshultzy: it's actually very likely to work, tbh
[18:50:46] <jshultzy> what should I change in the conf?
[18:51:50] <kali> jshultzy: the oplog size. but i would feel better if you find out somebody who has done this before on a master/slave setup :)
[18:53:42] <kali> jshultzy: for reference, this is the procedure for a replica set: http://docs.mongodb.org/manual/tutorial/change-oplog-size/
[18:53:57] <jshultzy> Kali, I will look into this but you gave me something to look into now. Before, I literally was out of ideas. Thank you soo much!
[18:54:37] <kali> jshultzy: it involves keeping a "seed", but i think this is only relevant on a secondary
[19:22:14] <shoerain> hmm is there an equivalent of mongoose's populate on the node-mongodb-native driver?
[19:32:10] <AlexZanf> brianseeders, still around?
[20:09:50] <shoerain> hmmm yeah I can
[20:10:27] <shoerain> er, yeah I can't seem to find something akin to http://mongoosejs.com/docs/populate.html in node-mongodb-native. DBRef/manual references seem like what I would want
[20:14:22] <mtrivedi> Hello IRC users, we want to achieve write scaling in production mongodb cluster. for my tests, I have added hashed sharded key to achieve even writes to all shards. but overall throughput remains same before using hashed key
[21:36:53] <qutwala> hi, there, collection.findAndModify(criteria[, sort[, update[, options]]], callback) wont work on 2.6.3 version :( Any ideas for callback?
[21:38:49] <reelman14> Hi, I created even after creating an index on all the fields queried on, mongodb is scanning the entire collection, I have the index creation and query both here: http://pastie.org/9733221 Can you please tell what is wrong with the index or what is the right index to create?
[21:49:02] <ianp> is there a convenient way to do an update to copy a property tree to another place in mongo? like I have document.props which is nested json
[21:49:14] <ianp> and i want to copy it to document.otherprops.props
[21:49:37] <cheeser> pull it in to your app, write it back out
[21:49:54] <ianp> so in other words, write some code using a driver to do it
[21:49:59] <cheeser> yeah
[21:50:02] <ianp> K
[21:50:11] <ianp> probably be easier even if there was a query to do it
[21:50:12] <GothAlice> Well, by "write it back out" a better approach would be to $set the new value and $unset the old one, rather than pushing the whole document back out.
[21:50:19] <ianp> I don't care about unsetting it even
[21:50:25] <ianp> I just have an app that's looking for it in the other place
[21:50:30] <ianp> want to patch it in production
[21:51:26] <GothAlice> var props = db.example.findOne({_id: ObjectId('…')}, {props: 1}).props; db.example.update({_id: ObjectId('…')}, {$set: {'otherprops.props': props}})
[21:56:20] <GothAlice> ianp: Also, that sounds like a rather terrible plan. :/
[21:57:08] <GothAlice> (Without unsetting) each record will grow, and potentially grow substantially if that's a nested document being moved, so each record will have to be moved in order to accomplish the update.
[21:57:51] <GothAlice> (I.e. an update like that will thrash your IO for some time.)
[21:58:56] <ianp> It's only 500 records
[21:59:17] <GothAlice> Read-only data?
[22:00:42] <GothAlice> (Asking the last due to the duplication issue; if those nested values are written to, you'll have to worry about moving the data back to the top-level field.
[22:06:02] <hfp_work> Hey all, I have a running `mongod` on my machine. I want to run some queries against it so I do `mongo`. And then nothing happens. mongo just hangs there saying nothing whatsoever. If I enter any command, nothing happens. What am I missing?
[22:06:50] <GothAlice> hfp_work: If you run the following, do you see the running mongod process? ps aux | grep mongod
[22:07:24] <GothAlice> (Also, what command-line arguments did you pass to that mongod process? If you specified a configuration file, could you pastebin it?)
[22:08:13] <hfp_work> GothAlice: Yes I see it and I launched it with `mongod --dbpath=db`
[22:09:51] <GothAlice> hfp_work: So far so good. What's the output (should be one line) of: sudo lsof | grep mongod | grep TCP
[22:09:55] <GothAlice> (This command may take some time to run.)
[22:10:13] <GothAlice> (And I only need end of the line after "TCP".)
[22:11:20] <hfp_work> GothAlice: `*:27017 (LISTEN)`
[22:19:17] <sandis> Im new to mongodb from postgresql. Is there a good "migration guide" to get the thinking?