[01:05:55] <kba> I'm somewhat new to mongodb, and I'm still trying to figure out the optimal architecture. Say you were developing a system where some "chats" would be related to some objects X. If I was using an RDBMS, I'd have a table with chat messages and a table with Xs.
[01:06:10] <kba> with NoSQL, would it make more sense to have each "chat" associated with each X, so to speak?
[01:06:35] <kba> I'd never need to search all chats, so theres' no reason why the chats should be in one collection, I feel.
[01:07:17] <kba> I'm just having a hard time wrapping my head around it. It's hard to let go of a decade of habits, I suppose
[01:07:51] <kba> So what would you guys suggest? I can't immediately see a downside to having only one collection of Xs with their chats associated.
[02:53:57] <Boomtime> anyway, can you try the method i mentioned? connect without credentials and use the db.auth - you can do this from the first session
[02:55:54] <Boomtime> i can only speculate the database name is still not being used for auth in that case
[02:56:49] <Boomtime> you can try adding the authdb option: http://docs.mongodb.org/manual/reference/program/mongo/#authentication-options
[02:56:59] <Boomtime> but i thought it defaulted to the database name specified in the connect string
[02:57:31] <Boomtime> and it is supposed to "If you do not specify a value for --authenticationDatabase, mongo uses the database specified in the connection string." ..
[02:57:43] <Boomtime> mysterious, but worth testing
[13:02:20] <vagelis> May i ask a PyMongo question please?
[13:03:25] <vagelis> Well I will ask cause there isnt a pymongo room.
[13:04:14] <vagelis> I want to delete an element from a list. I use $pull in my Python script but when i check the documents in the MongoShell they dont seem to get updated. What am i missing?
[13:10:52] <vagelis> Seems that i had to take a bit for changes to take effect, thanks anyway.
[17:37:23] <mrmccrac> from what i see theres no mongodb setting to limit the amount of memory the shard process uses. right now its using up basically all the memory on the machine and eventually starts swapping and kills the server
[17:38:22] <mrmccrac> anyone eles run into this or have any ideas on how to deal with it?
[19:21:01] <StephenLynx> its pretty shitty that 10gen claims “strict consistency”, but most people here actually don't go for that and accept the way it works as the intended design, IMO.
[19:21:58] <StephenLynx> that is among the reasons we are constantly suggesting other databases if the usecase doesn't fit mongo.
[19:21:59] <GothAlice> tejasmanohar: There are a number of phrases in that document that give me levels of dubious. There's also that in my own testing, doing a similar "nemesis process" which kill -9's whole mongod node VMs at the dom0 level while those nodes are abeing 80-90% stressed, was unable to reproduce a conflicting document rollback, so for our dataset and query needs the issues illustrated in that post are irrelevant.
[19:22:41] <StephenLynx> plus that, we don't actually witnessed these flaws.
[19:22:55] <StephenLynx> I never performed any strict test like her, but I never had any issues either.
[19:23:00] <GothAlice> As in all cases: don't just take someone's word at it, test your actual usage patterns at high stress levels. Optimization without measurement is by definition premature, and until you measure a problem, you can't really tell that you have one and any action you take to correct it will be wrong.
[19:24:33] <StephenLynx> I just take people's word and think it might have a couple of errors on very edge cases.
[19:24:41] <StephenLynx> and use it with that in mind.
[19:25:06] <GothAlice> I had a co-worker whose job, for two weeks, was to try to destroy MongoDB.
[19:25:17] <StephenLynx> did he get anything at all?
[19:25:26] <GothAlice> Assuming a Facebook game with one million simultaneous active users using every feature of the game.
[19:25:55] <GothAlice> He kill -9'd > 60% of the nodes (not just mongod, but also mongos, and config servers) before things started to notice problems.
[19:26:26] <GothAlice> Our scoring mechanism, however, keenly avoided rollback issues.
[19:27:44] <GothAlice> (It was a update-on-decrement system that would calculate actual scores based on the in-db base value, update time, and delta between the update time and now for regenerating "energy" and other resources.)
[19:29:51] <GothAlice> From one: "the systems (There where now effectively two distributed systems that operating independently of each other) maintained two simultaneous primaries for the same data set" <- LOLNO
[19:30:13] <StephenLynx> "[Microsoft SQL Server 2012 R2] It’s the best because it has 25+ years behind it." So I assume POGOL is the apex of corporate languages.
[19:30:18] <GothAlice> Sorry, but a minority partition would enter read-only, buddy…
[19:30:47] <GothAlice> StephenLynx: I've still got snippits of COBOL and REBOL around here…
[19:31:01] <StephenLynx> yeah, but you don't defend them because its old.
[19:31:23] <GothAlice> Accounting also uses a Filemaker app with 'macroman' encoding. Data interchange is hideously painful with the webapp.
[19:32:51] <StephenLynx> back on topic, I don't bother defending FOSS. FOSS doesn't need to prove anything to the unwilling.
[19:33:30] <GothAlice> "Mongo has no way to reliably detect a partition, let alone enter a read-only mode." < another wrong comment. So, tejasmanohar, take articles like these with huge grains of salt. ;)
[19:34:25] <GothAlice> (Run two mongod data nodes and a arbiter in their own VMs, pause the arbiter, watch the primary freak out and go read-only.)
[19:35:16] <GothAlice> Well, pause a data node and the arbiter. Two thirds majority and whatnot.
[19:36:00] <StephenLynx> plus people obsessing with mongo not having ACID, don't realize mongo is not meant to be THE ONE TRUE WAY.
[19:37:14] <preaction> you mean i have to learn more than one thing?
[19:37:18] <GothAlice> Yup. I even lower my write concerns to "ignore even socket errors" for some queries. Most of my queries are tuned individually to the level of reliability they need, and based on the consequences of a failed write (i.e. can the data be regenerated?) For example, our per-hour pre-aggregated analytics are ephemeral. Could care less if they went pif from one moment to the next, as they can be fully regenerated.
[19:37:48] <GothAlice> preaction: Indeed. One should generally use the right tool for a given job.
[19:38:10] <preaction> quelle horreur! i shall instead use mysql for everything!
[19:38:48] <preaction> but it's got full text search and i can store json blobs and there's handlersocket and it can do everything!!
[19:38:49] <GothAlice> MySQL is worse than MongoDB in many ways, from the standpoint of "I'd like one DB solution and not need to think about it again".
[19:39:50] <preaction> heh. i remember when i had to change some data from latin1 to utf-8, and turned out since i didn't declare my connection was utf-8, it just stored latin1 characters in utf-8 columns...
[19:41:32] <GothAlice> Any time you need a join, you've failed to think outside the spreadsheet when it comes to structuring your data. ;P
[19:41:44] <StephenLynx> either that or you duplicate data and update it several times when it changes.
[19:41:46] <preaction> relational data is a thing, but apparently joins are terrible and subselects are better now?
[19:42:04] <GothAlice> Again: optimization without measurement…
[19:42:27] <GothAlice> Jumping on bandwagons is a great way to write code you don't understand.
[19:44:46] <preaction> but without that, how will i keep up in conversations with the popular kids?
[20:40:59] <coderman2> how important is setting up a replicaset when setting up a mongo cluster?
[20:42:18] <GothAlice> coderman2: Sharding is like striping RAID. No redundancy. Replication is mirrored RAID: all redundancy. Mix the two and you get RAID 10.
[20:44:29] <coderman2> im considering moving the largest table in my postgres database over to a mongo cluster (keeping the rest of therelational tables in pg for now)...this table is nosql in nature...
[20:44:32] <GothAlice> On a database level, one shards for two main reasons: geographic distribution of data (i.e. you need parts of your data in different datacenters for performance reasons) or your have more data than fits in the RAM of one node. A replica set duplicates the memory requirements across machines, roughly, but gives you read-only nodes you can geographically distribute, if a little bit of lag in the data is OK. You can also delay one by an
[20:44:32] <GothAlice> amount of time, to allow for data recovery.
[20:45:33] <GothAlice> coderman2: I can highly recommend https://mms.mongodb.com/ (full disclosure: satisfied customer). It's free for < 8 hosts, and manages setting up the cluster stuff for you.
[20:45:33] <coderman2> geo distribution is not a concern for me, mainly performance and flexibility of queries
[20:46:30] <GothAlice> I'd start off with a two-node replica set with an arbiter on your app server, as long as that one table you're migrating can be held in RAM on those nodes.
[20:46:53] <GothAlice> (MMS also offers offsite backup which is free for < 1GB of data, I believe.)
[20:47:31] <GothAlice> The setup I describe will be resilient to one of the database servers going away due to network connection issues or crash… the application would just keep on truckin'.
[20:47:31] <coderman2> the entire collection has to fit in ram?
[20:47:43] <GothAlice> Doesn't have to, but it's "really nice to have".
[20:47:48] <GothAlice> It's more critical that your indexes fit in RAM.
[20:48:25] <GothAlice> Depends on the queries you need to run, though.
[20:49:00] <GothAlice> (Regular expression searches require more of the data to be actually in RAM, as non-prefix searches can't use an index, for example.)
[20:49:31] <coderman2> im at 1.6b rows right now in this table (which is partitioned by month) and each row is going to be roughly 500bytes in mongo
[20:49:47] <GothAlice> Hmm. Let me link you something.
[20:50:59] <GothAlice> This compares various ways of storing time-series data (sensor buoy readings, in this example) and how it impacts query performance and various aspects of storage (data and index) size.
[20:52:08] <GothAlice> You mentioned "partitioned by month" and I instantly think: pre-aggregation. This also _greatly_ improves aggregate query performance by allowing your reads to select on time increments in roughly constant time, based on the size of the range being selected. (Regardless of the number of individual recorded events.)
[20:52:49] <GothAlice> I.e. if you have a graph that always shows a month of data, it will always take the same amount of time to generate, within reasonable deviation to external factors.
[20:54:02] <coderman2> none of my queries are based on aggregations of this table, i keep counters in summary tables that do that much faster
[20:54:32] <GothAlice> So this is a write-heavy workload, not read-heavy. Cool.
[20:54:39] <coderman2> i mainly have about 5 columns that are static the rest are json-ish
[20:54:58] <GothAlice> http://docs.mongodb.org/ecosystem/use-cases/storing-log-data/ < this goes into some of the particulars of tuning for a write-heavy, log like workload.
[20:54:59] <coderman2> i would likely index those 5
[20:55:25] <GothAlice> Do any of them describe a unique index?
[20:55:43] <GothAlice> If so, you can potentially combine them in the _id as a natural way (in MongoDB) to express that.
[20:55:50] <coderman2> im still going to use postgres to generate my sequence number for this table
[20:55:55] <coderman2> so that would be the unique id
[20:57:46] <coderman2> i was a little shocked at how many machines are required to run a mongo instance. im reading 1+ config, 1+ query, 1+ shard
[20:57:54] <coderman2> do you typically put config/query on the same host?
[20:58:02] <GothAlice> Certainly, you can do that.
[20:58:20] <coderman2> how many query nodes do you recommend per shard?
[20:58:51] <coderman2> i typically have about100 connections to my db running various processes on the data, if that helps
[20:59:23] <GothAlice> It's a good idea to have more than one of each, but one can do. I put a query router and config server on each app server, actually, each app process talks to their local one, and each app process runs 10-100 threads (and thus connections).
[21:00:01] <GothAlice> With each app server running an app process per core minus one.
[21:01:41] <coderman2> i currently have 3 processing vms (16 cores each) that run all the processing in python. im thinking 1 query router/ config server with 8 cores might suffice
[21:03:14] <coderman2> and then i have to figure out how many data nodes i need. i think around 4 with 500gb of storage each would give me enough space to double in size to start
[21:03:16] <GothAlice> Okay, so, a query router and config server are used in sharding setups, that is, ones that aren't redundant. They also use very few resources in general, compared to the actual storage nodes.
[21:03:30] <GothAlice> Thus a query router / config server combo needs… 3 cores.
[21:04:02] <GothAlice> … or just co-lo them on your app servers, and make the config data redundant. This frees up more hosts for use as storage nodes.
[21:04:48] <GothAlice> You combine redundancy and data segregation by treating separate replica sets as shards.
[21:08:13] <coderman2> how many cores do you recommend for shards?
[21:12:37] <coderman2> 8 core, 16gb, 750gb gonna cost me 482 per month
[21:16:38] <GothAlice> Sharding would let you divide your data amongst more servers with less space required on each, which may be cheaper than larger, monolithic servers.
[21:21:58] <coderman2> all the cost is in the cpu/memory though
[21:22:03] <coderman2> so the more hosts the more expensive
[23:26:21] <NoOutlet> The best way to find out is to ask your question.
[23:30:33] <DeadPixel_> NoOutlet: I'm building this application where users can login. However, a user has a certain role. That role also has some data. The idea can be seen here (https://gist.github.com/MopperigeKat/0989201282011cdf4f5f). I just don't know how I should know what user is what role
[23:33:40] <DeadPixel_> No, it's not, or well, it could be, if that would make it easier
[23:33:45] <NoOutlet> Such that having a User with id 3804 will mean that you have either a Pilot or a Passenger with id 3804?
[23:33:57] <DeadPixel_> No, right now, no, but it could be if that would be easier
[23:36:29] <NoOutlet> Perhaps it would be best if flightHistory and orders existed on the user.
[23:36:35] <NoOutlet> Is there a reason not to do that?
[23:37:56] <DeadPixel_> well, there will be a lot more role-specific values, so that would make a mess imo
[23:40:08] <NoOutlet> A document doesn't have to have all of the properties of each of the roles unless the properties have values.
[23:40:39] <NoOutlet> That is, a user that is a passenger and not a pilot does not need to have the flightHistory property.
[23:47:59] <keeger> hello. ihave a questin about mongo used with a webserver
[23:48:34] <keeger> do i need to open a connection between the webserver and mongo on every incoming request? or do i open one connection and just send commands over it?
[23:50:35] <StephenLynx> you reuse the connection anyway.
[23:50:58] <keeger> ah i see what you mean, so the driver pretty much should handle that for me?
[23:51:02] <StephenLynx> if the driver doesn't implement a connection pool, its advisable to do it by yourself then to support higher concurrency levels.
[23:58:51] <keeger> well the drive doc says it defaults to 4k pool size