[00:54:35] <josaliba> If I run the script once, it creates two documents
[00:55:47] <Boomtime> that will insert a new document each time it runs, so if you've checked this twice, even accidentally, you'll have two (or more) documents
[01:08:51] <Boomtime> emphasis on S, this is a real process but means a very different thing from mongoD
[01:10:43] <josaliba> i'm reinstall everything to see
[01:14:25] <josaliba> didn't help, still have the same problem :(
[01:15:16] <Boomtime> you seem to have the same problem with everything - two mongodb instances will not cause two documents to be inserted
[01:16:00] <Boomtime> meanwhile, two instances of the app would not cause two documents to be inserted either - somehow the same script is being called twice
[01:17:02] <Boomtime> php echo goes back to the client that is rendering that output, it means nothing if the client only renders one stream - where would you look to see the other stream?
[01:17:36] <Boomtime> mongodb is seeing two different processes connect and each one inserts a document
[01:18:29] <Boomtime> given that the process IDs appear to be sharing the same internal counter, i would suggest it is lilkely these are threads of the web-server
[01:18:47] <Boomtime> the timestamps are identical
[01:20:05] <Boomtime> all the evidence points at two threads on the web-server, either the script is being onvoked twice by a client, or it is effecitvely running twice for some other reason
[01:20:55] <Boomtime> you can enable deeper logging on the mongodb to show if it confirms there are two connetions
[03:44:19] <zzing> I am wondering if mongo would be a good fit for data that contains tables of tabular data — but each table has metadata such as a citation, caption, name — also where it might be desired to be able to search both the metadata and the tabular data.
[03:45:13] <zzing> (and the tabular data is almost certainly unique (down to the column names and types) for each table)
[03:56:34] <zzing> I have been considering some models, from a relational to various 'nosql' types. I know exactly what to expect for the relational, but this data is not very relational.
[04:20:20] <zzing> Is there any way of recording/tracking changes to documents? I am thinking it is a useful idea, but not essential.
[04:22:57] <Boomtime> the oplog tracks changes to all documents by necessity though it is of a fixed size (circular buffer) and I suspect you mean more in a "historical versions" type of way
[04:23:34] <Boomtime> it would be up to you to build a versioning system for your data if that is what you want
[04:29:52] <zzing> Boomtime, sounds like an interesting challenge :-) I will probably figure something.
[04:33:51] <GothAlice> zzing: Versioning is indeed fun; at work we parse the oplog stream for updates of interest and record them permanently in a separate audit log collection.
[04:34:50] <zzing> Realistically, most of the updates will be small things, so it wouldn't be too bad to do a kind of diff to roll back changes
[04:35:05] <zzing> But luckily most of the data is enter once and query only
[04:35:44] <Boomtime> the oplog is a record of the operations that you perform, which might be close to a diff if you structure your updates properly
[04:36:12] <Boomtime> but the oplog collection itself will rollover after a while, so you'd need to record it permanently the way GothAlice describes
[04:36:30] <Boomtime> or you can roll your own some other way
[04:37:26] <zzing> I could do it more easily be recording changes when I make them through the rest interface
[04:37:42] <GothAlice> It lets me "seek back" through changes to individual fields and apply them in reverse (pushes/pops) or simply lets me jump to explicitly $set values from any point in time.
[04:38:15] <GothAlice> zzing: That was my initial approach, but I found parsing the oplog to be easier than updating waaaay too many queries. (I wanted auditing of *everything*, not just certain fields or certain collections.)
[04:39:03] <Boomtime> GothAlice: how do you undo a $set ?
[04:39:14] <GothAlice> By jumping to the $set prior to that one.
[04:39:28] <Boomtime> assuming you have everything, fair enough
[04:39:36] <GothAlice> (or to the original insert if we've run out of updates to scan ;)
[04:41:33] <zzing> GothAlice, shall be interesting when I get there. I am at the paper stage planning everything that I will be doing first :P
[04:42:22] <GothAlice> There's also a nifty hack involving use of $comment to pass data (like, say, the "effective user" ObjectId of the browser session initiating the query) around. :3
[04:43:17] <GothAlice> I resisted the ironic temptation to encode JSON data into it. ;)
[08:21:16] <foofoobar> Hi. I’m trying to filter in a collection for every row where „archived“ == false or „archived“ is not set, what is the idiomatic way to do this?
[08:22:12] <foofoobar> (archived should be a boolean)
[15:37:07] <SahanH> what's the ideal document, http://d.pr/n/1b78j+ or http://d.pr/n/ydpC+
[15:42:48] <Mmike> So, I'm trying to do rs.isMaster() in python, and then parse the output, but I get this ISODate(...)... is there a way to get rid of it?
[15:43:20] <brianseeders> anyone have experience with https://github.com/dropbox/hydra and MongoDB 2.6?
[15:43:43] <brianseeders> Trying to figure out a way to migrate a large dataset with minimal downtime
[15:44:20] <brianseeders> That doesn't use replication
[15:48:38] <Mmike> brianseeders: why not? Use replicasets
[15:52:57] <brianseeders> because our existing dataset doesn't work with replicasets
[15:53:41] <brianseeders> I've tried it every major version release, and it's never able to do the initial data sync
[15:55:43] <brianseeders> For example, when I tried it on 2.4.x, the collections would get duplicate key errors on the initial sync and just stop
[15:56:00] <brianseeders> But if you query the collection for the stated key, there's only one document
[16:44:31] <theRoUS> Derick: i have a db with a lot of records concerning nagios alerts. i'd like to do some data reduction/grouping by type, but the only field with the relevant info has '<hostname>/<check> <status>'. i'd like to select and group on just the '<check> <status>' portion.
[16:44:57] <theRoUS> is there a way to do that? that i can experiment with the in the mongo tool before going to code?
[16:52:04] <Derick> theRoUS: is it stored as a string or as multiple fields?
[17:24:22] <drorh> Hi. I come from relational backgroud and have a strong urge to fall back to mysql, although the project im doing dictates mongodb... I was wondering im there is a "when not to use mongo" sort of article to read...?
[17:26:38] <kali> drorh: you don't you tell us what you're struggling with instead ?
[17:30:24] <drorh> kali, In the project, which deals with flight trips, I consume 3 Apis, each of which has their own structure and identifiers to things which are potentially the same, other than pricing, potentially.
[17:31:54] <drorh> kali, the front end is to consume this mongo dB regardless of which api it came from
[17:41:50] <GothAlice> (Note the Java article linked there; it does a good job summarizing how to think about MongoDB structuring from a relational background.)
[17:42:07] <drorh> Im thinking: each document represents a unique trip, and embeds 3 arrays of documents, each of which is a list of the potential configurations per api. Am I on the right track?
[17:42:33] <GothAlice> There are some constraints when you have multiple lists in a document; you can effectively only deeply query one at a time, if that's an issue.
[17:46:14] <drorh> Im not sure I'm going to need to deep query.
[17:48:05] <GothAlice> Perfect, then such an arrangement can certainly work out, though I'd need to see schema ("arrays of documents" sounds scary ;) to see if this is the best fit for your use case. (I.e., what questions do you need to ask of the data?)
[17:49:54] <GothAlice> Oh, your old pastebin links still work.
[17:54:04] <GothAlice> I used an iPad for three months as my primary development machine… just needs the right apps. ;) (TextExpander, Colloquy, Textastic programmers' editor, Cathode SSH client, Transmit for general purpose connectivity, etc.)
[18:08:04] <GothAlice> drorh: One really important thing when using devices with intermittent connectivity: run an IRC bouncer somewhere. Then you won't miss messages if you lose connection. :)
[18:08:14] <GothAlice> (It's why I basically never go offline.)
[18:19:51] <drorh> GothAlice the sinmplest question I'd ask the data is give me the cheapest trip information for a specified destination (origin is fixed) and date interval
[18:29:50] <GothAlice> brianseeders: What steps, exactly, did you follow to migrate that?
[18:31:42] <brianseeders> Well, I went by this: http://docs.mongodb.org/manual/tutorial/convert-standalone-to-replica-set/
[18:32:37] <GothAlice> brianseeders: The error messages you had were on a fresh secondary you were spinning up (as part of the "Expand the Replica Set" step?)
[18:33:12] <brianseeders> I stopped mongod on the standalone, edited my conf file to include replset=rs0, started it back up, and did rs.initiate()
[18:33:56] <brianseeders> then i created a new instance, edited /etc/hosts to ensure that the host for the first replica would resolve, added replset=rs0 to the config, and started it up
[18:35:14] <brianseeders> the new replica fetched several databases/collections fine, but when it got to that one, that error appeared
[18:35:20] <GothAlice> (Though shouldn't you be using a hostname instead of IP there? ;)
[18:35:26] <brianseeders> and then it started the sync again from scratch
[18:35:58] <drorh> GothAlice ur silence is perfectly understandable:). rest assured I'll be back more co concerte
[18:36:44] <drorh> wow I managed to stutter in a text message!
[18:36:45] <GothAlice> drorh: The "final destination" ordered by price is a relatively simple query. What types of queries would you need to make against the "api" data?
[18:40:05] <GothAlice> Correcting that is beyond my ability to assist; like having the same key twice in a document (which is apparently possible) it's something I've never seen before. :/
[18:41:00] <drorh> GothAlice the admin as an expert in the field, will have a panel through which he configures the frequency (or complete lack of) of destinstion+api+timeframe calls to the remote service
[18:42:17] <drorh> GothAlice this is the easy part the way I see it
[18:43:38] <GothAlice> The key question for me is: when accessing a stored route, do you need access to all of the bound APIs? Conversely, when accessing the information for a *single* API, do you need access to the rest? The route itself? (Likely "no, no, yes")
[18:44:32] <brianseeders> So I did a repair on that database, and getCollectionNames() now only returns one instance of "forms", but db.stats() still reports too many collections
[18:44:42] <brianseeders> Wonder what will happen now
[18:45:09] <GothAlice> brianseeders: If I encountered this, the first thing I'd do is a very thorough mongodump, then a filesystem snapshot with mongod offline. Just in case.
[18:45:30] <drorh> GothAlice I'm not sure yet if the arch spec says that the agent (the ongoing script that fills mongo) is the data source ta the front, or there is an rdb in between
[18:45:30] <GothAlice> (There's no such thing as too many backups.)
[18:45:47] <brianseeders> I'm working off of a cloned snapshot :)
[18:45:56] <GothAlice> brianseeders: :) That makes me extremely happy.
[18:47:34] <drorh> GothAlice the front is api agnostic, up until billing comes in
[18:48:03] <GothAlice> drorh: I'm not entirely clear on the impact of your last two statements.
[18:51:23] <drorh> GothAlice the 2nd statement is still problematic?
[18:52:47] <GothAlice> From what I can tell you have several components: A front-end doing costing analysis for trip planning (neat, btw!), a back-end runner that continually refreshes the data from external APIs, and a management interface to control the behaviour of the runner, correct?
[18:52:49] <drorh> GothAlice the billing note should have been named booking note
[18:54:28] <GothAlice> As an interesting note, if you *do* need to integrate a relational database for (one would hope would be an excellent reason like transactional safety ;) I would recommend Postgres; if you use that you can connect Postgres to MongoDB and include MongoDB data in your Postgres queries without having to duplicate the data.
[19:00:14] <GothAlice> I'd have to ask: which "side" of the API does the question relate to? Back-end database interface? Front-end public API published by your app?
[19:01:57] <GothAlice> One generally writes code against a database layer (ORM, ODM, etc.) and some of these layers are truly agnostic. (For an example from Python, SQLAlchemy supports basically all forms of relational back-end, so you can freely migrate if need be without modifying your code.)
[19:02:13] <GothAlice> Unfortunately because of the fundamental differences in approach between document and relational databases, being able to be agnostic to *that* is much harder.
[19:03:00] <GothAlice> (You could bind MongoDB collections entirely into Postgres and use them completely through Postgres… but you're completely defeating the point of having MongoDB at all in that setup, you might as well drop it and just use Postgres.)
[19:03:28] <drorh> GothAlice by api/s I only mean the remote services consumed by the agent. by front-end I mean queries by an end user, roughly and succinctly
[19:04:31] <GothAlice> Right. The front-end is completely separate from the agent pulling in the data; the agent's job would be to transform foreign data structures returned by API calls into the format your front-end would expect.
[19:05:03] <GothAlice> (You likely would need specialized conversion routines, you write, for each external API you want your agent to consume.)
[19:05:24] <GothAlice> https://gist.github.com/amcgregor/7be2ec27adc80c9fafa1#file-sync-cvm-py-L21-L54 is an example of one of our API adapters. :)
[19:05:45] <GothAlice> (We're pulling in job data instead of travel data, but the idea is basically the same.)
[19:07:08] <ianp> Spring-data is like that for java based platforms. So grails (groovy-based rails thing on java) uses spring data and you can use it with mongo or SQL db's
[19:07:24] <ianp> It's quite nice considering the complexity of the problem
[19:08:01] <ianp> anyway, that's why the 'DAO' pattern exists
[19:08:13] <ianp> you put all your db specific stuff in one layer
[19:08:39] <ianp> not sure what you're thanking me for .. but yw :>
[19:08:49] <GothAlice> ianp: Heh, the last project we used DAO on had most of our unit tests testing the abstraction layer instead of our own code. ¬_¬
[19:10:25] <ianp> but still an example of people not thinking about what they're actually doing
[19:11:15] <GothAlice> "Look, I just wrote this neat test that makes sure we can store some of our more complicated structures and read them back out!" "Uhm… if we can't insert and fetch we've already lost. Good job."
[19:13:46] <drorh> I'm not sure this was referenced to me, but if it was then ouch
[19:14:18] <GothAlice> Heh, no, general commentary on using abstraction and spending too much time fretting that the abstraction even works. (The abstraction layer should have its own tests…)
[19:25:50] <drorh> GothAlice I had no doubt about having conversion routines, but when u said so explicitly I realized I'm going to implement 2 collections. 1 maintains isomorphism with the Apis, and the other fills up (if not existent for the query from the front) with converted data on demad and sends it to the front
[19:28:05] <GothAlice> Why not simply process all data as it arrives and include whatever tracking is needed (update timestamp, tagged identifier, etc.) needed for the agent to track state?
[19:28:40] <GothAlice> (For example, our job data pulls all jobs in, but only inserts "new" ones, checking modification times to "update" existing ones if needed based on a unique combination of (company, job_reference).)
[19:30:10] <GothAlice> Happens https://gist.github.com/amcgregor/7be2ec27adc80c9fafa1#file-source-syndicated-py-L65-L69 in our codebase (for the duplicate check; this code is a snapshot before we added updating.)
[19:33:00] <drorh> i think that would require the Apis to be mutually isomorphic, else a non essential abstraction is needed
[19:34:03] <drorh> the conventions would get rid of data and structure not essential for the front
[19:34:47] <GothAlice> Hmm. Surely there is a unique key buried somewhere in that data. ;) As long as the number of automatic translations needed on-demand is bounded (i.e. I can't accidentally trigger 1,000 on-demand translations with a single request) such an approach should be fine.
[19:35:16] <GothAlice> However this does add data processing to the front-end.
[19:36:36] <drorh> GothAlice too many unique keys at different places for different Apis. that's exactly the issue :)
[19:38:14] <drorh> GothAlice what new processings does this introduce to the front?
[19:38:47] <GothAlice> But MongoDB doesn't have to care. :) {foreign_key: {foo: 27, bar: "green"}, …} — unique index on foreign_key and you can have a per-API subdocument schema there.
[19:40:42] <GothAlice> The "on-demand" aspect of conversion would mean conversions are triggered by the front-end.
[19:43:32] <drorh> the front would just ask for converted data. most of the time some other instance of front already implicitly made sure the converted data already exists
[19:53:06] <GothAlice> laurensvanpoucke: You certainly can use a BLOB-like field stored with your main document. The limit used to be 4MB per document, so separating it was more of a concern in the past. (Now it's 16MB per document.) GridFS handles splitting large files up for you. It's also often a good idea to separate it out to speed up queries (i.e. if MongoDB needs to scan the table to answer a query, having the BLOBs in there will slow it down.)
[19:53:15] <GothAlice> Generally one stores metadata separate from the data. :)
[19:56:21] <laurensvanpoucke> and the gist I showed you, what do you think about that ?
[19:56:25] <GothAlice> (I priced it out: by not storing my data there I can afford to use Drobo disk arrays and replace every drive in each of the three arrays every month!)
[20:04:06] <GothAlice> After a certain point, using SaaS/PaaS/*aaS makes no financial sense.
[20:04:19] <GothAlice> brianseeders: I'm glad you got that fixed. T'was a crazy situation to be in.
[20:07:14] <brianseeders> Yes, the repair fixed it and the data seems to be okay
[20:09:14] <brianseeders> I'm glad to see that the erroneous duplicate key errors that I encountered when I originally tried to make this a replica set (2 years ago?) are gone
[20:10:00] <brianseeders> I tried to migrate away from a standalone instance back then with the same dataset and just gave up
[20:10:33] <brianseeders> That was when 2.4 first came out
[20:10:48] <GothAlice> Things have come a long way since 2.4.0. :)
[20:16:25] <brianseeders> Now I just need to figure out what the new setup should be and an actual migration plan
[20:17:29] <GothAlice> A handy script that sets up (a local) sharded replica set with 3x3 members and authentication enabled. Spread that across multiple hosts and you have an instant setup. :)
[20:18:15] <GothAlice> Also, MMS is great for setting up new clusters. (But you need to set them up initially with MMS, I believe.)
[20:18:54] <cheeser> depends on what you want to do with them
[20:19:34] <cheeser> monitoring/backup can be done on preexisting clusters. provisioning not so much.
[20:19:49] <GothAlice> That's the ticket. I keep forgetting the word "provision". XD
[20:21:15] <brianseeders> What I ultimately will want to do is set up a new cluster, replicate to it from the standalone instance (which is the only instance receiving reads/writes), then cut over to the new cluster
[20:21:22] <drorh> GothAlice the arch spec pretty much contradicts what the client said 2 days ago. now I have a tactical opportunity to be rid of the rdb in the middle :)
[20:21:57] <GothAlice> drorh: One of the reasons I love design documents so much. Clients "say" many things… ;)
[20:22:10] <brianseeders> I'm guessing Priority 0 members with some re-configuration at cutover time
[20:22:55] <GothAlice> brianseeders: You could do it pretty much as you describe. Turn the current solo server into a RS primary, add the other hosts in the new cluster, then just offline the original primary during a period of idle insert/update activity.
[20:23:16] <GothAlice> brianseeders: The new "cluster" will take over.
[20:24:02] <brianseeders> I would just need to re-configure them to be priority: 1, I assume
[20:25:07] <GothAlice> brianseeders: I don't think you'd even need to worry about having priorities in the first place.
[20:25:13] <drorh> GothAlice I wanna show u the arch sketch.. it appears like nonsense and I want to confirm this. k?
[20:25:27] <GothAlice> brianseeders: The solo will become primary, and will remain so until it's brought offline.
[20:25:36] <GothAlice> drorh: Sure! I once joined a project part-way through the specification process; they had opted to use zeromq, rabbitmq, redis, membase, and postgres. After I cleaned the coffee off my keyboard I offered to replace *all* of that with MongoDB. ;)
[20:26:46] <GothAlice> (Why worry about scaling five different services?! Blew my mind: and they even had a reason for the apparent duplication of queues: one had persistence, the other was faster. ¬_¬)
[20:26:49] <brianseeders> Well, I'm trying to avoid a situation where the solo becomes inaccessible because of problems on the new instances or something
[20:27:06] <brianseeders> During testing, I brought down that second instance and the original one demoted itself
[20:27:15] <GothAlice> Ah, that's because you didn't have an arbiter.
[20:27:44] <GothAlice> Remember: a primary can only exist if some node, somewhere can reach a "majority" of the cluster. With two hosts, if one goes down there is no way to get 50.1% of the vote.
[20:28:13] <GothAlice> An arbiter lets the hosts figure out if they lost *their* connection, or if the other host did.
[20:28:55] <brianseeders> Right, that's why I wanted to make the new ones priorty 0
[20:29:19] <brianseeders> Just to try to not introduce more places for problems in the old environment before the cutover
[20:29:42] <GothAlice> The correct solution is to have an arbiter; you can even co-host it on the same machine as the primary if you wish, for the purposes of avoiding read-only mode if the secondaries go away.
[20:30:04] <GothAlice> The end goal is to migrate to a clustered setup, though, isn't it?
[20:30:29] <GothAlice> (Don't co-host if you want that original primary to stick around for extended periods, though.)
[20:30:52] <brianseeders> As part of a migration of our entire infrastructure somewhere else
[20:32:37] <GothAlice> I'd perform the following process for the migration: 1. Promote solo to RS primary. 2. Add one secondary in the new site and an arbiter on the same host as the primary. 3. Let initial replication complete. If the secondary goes away at this point, the primary will keep chugging along as if nothing really matters. ;)
[20:33:43] <GothAlice> 4. Shut down the secondary in the new location. Perform a clone of the /var/lib/mongodb to two new hosts. (This will eliminate initial seeding of the new secondaries and save some time. ;) 5. Bring all three new hosts back online. 6. When satisfied, remove the old primary and arbiter, new cluster will take over.
[20:37:27] <GothAlice> brianseeders: Was that… clear or informative?
[20:41:58] <brianseeders> That is clear and probably what I will do
[20:45:41] <GothAlice> The trick with that process is if the secondaries become inaccessible during the migration, the primary will simply keep functioning. And, after step 2, before step 5, if the primary goes away, that first secondary will go into read-only mode (it won't become primary).
[20:46:34] <GothAlice> On step 5, though, the new cluster is ready and waiting to take over.
[20:50:22] <brianseeders> what prevents the secondary from being promoted?
[20:51:28] <GothAlice> That if the box hosting the primary and arbiter can't be contacted, the secondary can't reach > 50% of the hosts and isn't confident enough to take over.
[20:51:42] <GothAlice> (I.e. it can't verify if *it* lost its connection, or if everyone else did.)
[20:52:01] <brianseeders> Oh I see, I was thinking if the main instance went offline, but not the whole machine
[20:52:13] <brianseeders> in which case the secondary would promote
[20:52:27] <GothAlice> True; process-level failures would throw a wrench into this.
[20:53:00] <GothAlice> However the window for failure is only during the initial sync of the first secondary—how long does it take to sync your data the first time?
[20:53:27] <brianseeders> It will probably take a few hours
[20:53:47] <GothAlice> Have you ever had the primary mongod process just "go away" (without the host machine itself having connectivity issues)?
[20:54:21] <brianseeders> Goes away with no error messages
[20:54:26] <brianseeders> We just have to deal with it
[20:55:03] <GothAlice> brianseeders: I have hosts with 900+ days of uptime and service-level uptime reaching six nines. Things spontaneously disappearing is a Bad Situation™.
[20:58:20] <brianseeders> Yeah... It's not getting OOM-killed or anything
[20:58:35] <brianseeders> and the log just shows normal operation, followed by start-up messages
[21:00:01] <GothAlice> You'll still need the arbiter, but adding that first secondary as priority zero (then changing back after) during the initial replication will prevent it from ever becoming secondary in the event the primary mongod process dies but the arbiter doesn't.
[21:00:11] <GothAlice> I'm glad you're replacing that box, though!
[21:00:23] <brianseeders> I'm sure it doesn't help that the data is ancient
[21:00:51] <brianseeders> Migrations/Upgrades without re-importing or replicating the data
[21:01:50] <brianseeders> The data probably started on 1.8 or maybe even 1.6
[21:12:58] <GothAlice> ^_^ My crazy home dataset started on 1.6. (The data's been gathering since 2000/2001, originally in a relational database.)
[21:19:07] <brianseeders> I'm out for today, good talking and thanks for the help
[21:20:59] <rgenito> is there a way to have a mongo client without the mongo server?
[21:21:09] <rgenito> just checking ... i'm on ubuntu and i'm thinking to apt-get install mongo-clients
[21:21:27] <rgenito> ...just don't wanna have to do extra work @.@
[21:22:11] <GothAlice> rgenito: If you want to be particularly "conservative of time" you can just download the precompiled binaries from mongodb.org, extract them somewhere, and pick out the tools you want.
[21:22:40] <GothAlice> rgenito: Generally it's advisable to use your systems package manager, of course, which would include the daemon program… which you just don't have to run if you don't want it.
[21:29:38] <drorh> should I "int codify" all my indexes?
[21:30:41] <GothAlice> drorh: … what exactly do you mean by "int codify"?
[21:30:48] <Vile> Why do you want to do so? Is there any benefit to that in your case? Are there drawbacks?
[21:31:39] <drorh> well computers prefer ints over strings of ints to search by
[21:32:22] <Vile> Computers have no preferences. After all, strings are just long integers
[21:32:43] <Vile> But architects and developers and users do
[21:33:41] <drorh> for example if I have a destination city index, I can use the 3 letter code i.e LON, but instead I can define application wide that 1 means LON etc
[21:34:26] <GothAlice> drorh: If you use hashed indexes, strings of any length are turned into a number internally. ;)
[21:56:33] <drorh> I remember source diving MySQL once. never did that again lol. I'm guessing mongodb would not scare me away so bad. we'll see :)
[21:57:48] <GothAlice> drorh: I actually *had* to source dive MySQL in order to reverse engineer the on-disk InnoDB format after a multi-zone AWS failure corrupted things. Took me 36 hours—straight—to recover that data.
[21:58:03] <GothAlice> (And that was immediately prior to dental surgery to extract my wisdom teeth.)
[22:18:00] <drorh> GothAlice do u consider the online official docs and reference sufficient for my needs as you understand them, or is there maybe a really good book I can read?
[22:19:17] <GothAlice> The official documentation is quite good, though it's important to sometimes hunt around as the best description for a given task may be in a tutorial instead of reference. I also use search engines quite heavily, seeking out "real-world" use cases on blogs and things that fit my needs.
[23:21:57] <mlg9000> how does write concern come into play?
[23:22:51] <mlg9000> I was having sudden performance issues with my app, I disabled my furthest replica and boom.. right back up
[23:24:26] <GothAlice> Writes obviously only happen to the primary, but when requesting confirmation of a minimum level of replication you can use tags to identify which groups of replicas you care about replication to. I.e. if you have a very, very slow offsite replica for backup purposes only, don't include it in the tag set to ensure confirmed writes never wait for *it* to respond.
[23:26:11] <mlg9000> ok, default is Acknowledged but I'm not clear from ready that documentation what exactly that means
[23:26:58] <mlg9000> I really don't care if replica writes take awhile and I don't want them bogging performance down
[23:27:35] <mlg9000> it's an inventory systems, replica's are there in case we need to access data locally in the event the primary site is offline
[23:28:25] <GothAlice> To ensure that, make sure your primary is as close to your application servers as possible (in terms of latency). Acknowledged means when you issue a write the primary will recieve the data, check that it looks sane, and tell you all is well. (You know it got it, but you have no guarantee over if that change has propagated to the rest of the cluster.)
[23:30:32] <mlg9000> hum... my application server which does all the writes is on the same host as my mongodb primary. Not sure how a remote replica would slow things down then but it definitely is