[00:03:45] <annoymouse> Not sure which one works best with mongo though
[00:04:03] <GothAlice> How will you be querying your data?
[00:04:11] <GothAlice> Do you need to preserve the blogs in some arbitrary order?
[00:04:56] <annoymouse> GothAlice: I'm not really sure how I will query the data. And the blogs don't necessarily have to be in any order
[00:05:09] <annoymouse> But the posts should be sorted by rating or views
[00:07:44] <GothAlice> Then I'd say that blogs would have owners, and posts would have blogs. (Don't use a generic name between collections; it makes things confusing. ;)
[00:13:31] <bazineta> I would have a colleciton of blogs, a collection of users, and an authorization colleciton that associates a user objectID to a blog objectID with an associated permission level, i.e., owner, manager, viewer, etc.
[00:13:48] <bazineta> Also, I would spell collection correctly.
[00:15:14] <bazineta> Then a collection of posts, associated to a blog via the blog objectID
[00:21:27] <bazineta> You want to associate a blog to a user, but think of the case where a given blog might have more than one manager or owner.
[00:21:53] <bazineta> That's a many to one situation, so you can model that a few ways, but a collection associating the two is often convenient.
[00:22:27] <bazineta> That collection would contain two referenence objectIDs, i.e., a blog objectID and a user objectID, then some permission level saying, this user has this authority to this blog.
[00:23:22] <GothAlice> https://github.com/bravecollective/forums/tree/develop/brave/forums/component — this is forum/bulletin board software with tag-based security. (Users have groups, specifically.) If your blogs can have comments, a forum is a pretty close analog.
[00:24:37] <GothAlice> In this case there are forums (blogs; collection), with threads (posts; collection), and comments (embedded in threads), with users "owning" everything.
[00:24:49] <bazineta> There are many paths, but I've found the authorization collection to be easy. One could also handle it via an array of users in the blog object. There are positives and negatives to each approach.
[00:25:00] <annoymouse> I plan on using Disqus for comments
[00:25:59] <annoymouse> bazineta: And then the blogs would have the posts
[00:26:05] <GothAlice> Forums/threads remain a similar concept to blogs/posts. (And good on you for not reinventing the comments wheel.)
[00:26:19] <bazineta> Most of what we deal with is unbounded and large, so a collection works well. If it's bounded and small, having the relations right in the document is a fine approach. Depends on the application, really.
[00:27:18] <GothAlice> That's why I use a different abstraction; don't embed users into the other records, embed roles (singular, or multiple, unique, or not).
[00:27:53] <GothAlice> The forums there use simple string "tag"-based security (with users having "tags" supplied by an upstream auth system).
[00:38:16] <androidbruce> Anyway in Mongojs to chain a mapreduce without writing a tmp collection between map reduces?
[00:45:34] <Boomtime> androidbruce: not that I know of in MR, but what you want sounds like aggregation
[00:46:39] <Boomtime> you can chain multiple match/group stages together, which is roughly analogous to what you're doing in a single MR
[00:47:30] <Boomtime> of course, in MR you get to write the function yourself in free form code, where-as in aggregation you need to be able to frame your match/group into the operators you have available
[00:47:46] <Boomtime> aggregation is much more efficient though
[00:48:00] <androidbruce> Incremental MR a possibility?
[00:49:51] <androidbruce> Thanks Boomtime I'll grep through
[00:50:01] <Boomtime> seriously though, MR is what you use when you have exhausted literally everything else, or if you just don't give a crap about database performance
[00:50:38] <cheeser> it's the regex of analytics :D
[00:52:34] <androidbruce> MR seemed to be an option
[00:52:50] <Boomtime> you should try aggregation first
[00:53:39] <GothAlice> I have a cluster at home containing 25 TiB of data… and I use aggregate queries. Not needing to execute a JavaScript runtime (and avoiding some extra locking AFIK) makes these indexed queries insanely fast.
[00:54:05] <GothAlice> (Always $match at the beginning of your aggregate pipeline to benefit from indexes.)
[00:55:54] <GothAlice> You can also increase the amount of RAM (or tell it to page to disk) over-large queries. See: http://docs.mongodb.org/manual/core/aggregation-pipeline-limits/
[01:05:26] <GothAlice> androidbruce: http://pauldone.blogspot.ca/2014/03/mongoparallelaggregation.html is another link you may be interested in; this is about the performance difference between a highly optimized map/reduce approach and both unoptimized and client-side parallel optimizations for aggregate queries.
[01:05:31] <GothAlice> (Aggregate queries win by a lot.)
[01:41:16] <annoymouse> bazineta: Quick question about what you suggested. Why do I need a separate collection for perms? Can't I just make an attribute in blog that lists who can edit?
[01:42:55] <bazineta> That's also ok. Depends on how granular you need to be and if there will be a lot or a few. If it's a lot, then a collection works well, if it's a few, putting it in the document works.
[01:48:11] <peterp> Hi guys, I'm looking to connect to a remote mongodb server. is SSH tunneling the best way to accomplish this in terms of reasonable security?
[01:48:49] <peterp> This instance will be the staging database
[01:48:53] <GothAlice> peterp: That's one approach. I use SSL and IPSec firewall setups.
[01:49:44] <GothAlice> SSH isn't generally meant for persistent connections, though, so I try to avoid it for anything that might need to be connected >24h.
[01:50:04] <peterp> for SSL I would need a certificate right?
[01:50:32] <peterp> If its technically, I probably shouldnt go down that route?
[01:50:56] <peterp> Basically, I'm new to separating the database from the rest of the application, so I'm having trouble with connecting to my remote mongoDB
[01:51:12] <peterp> I did find SSH tunneling to do the trick, but like you're saying, I'm not sure this is an ideal route to take
[01:51:28] <GothAlice> peterp: Likely your mongod process is only listening on 127.0.0.1 (localhost).
[01:51:35] <peterp> Since this is a marketing website, there's nothing I need to hide that the user won't already see
[01:51:36] <GothAlice> Which is why SSH tunnelling would work.
[01:53:18] <GothAlice> If your application and database hosts are in the same datacenter, you may be able to set up a VPN between them, or your provider may offer a way to build private virtual machine networks. If at any point you need to connect over an un-trusted wire, use encryption. (Generally means encrypted VPN tunnels or SSL encryption.)
[01:53:27] <crocket> I want to manage mongodb schema versions on nodejs.
[01:55:40] <GothAlice> You write the createIndex/ensureIndex commands in the upgrade portion of your migrations.
[01:55:50] <GothAlice> Why would it need to "provide" anything explicitly for indexes?
[01:55:50] <peterp> VPN is the route I am taking. Thank you GothAlice
[01:55:53] <crocket> GothAlice, But then, there is no built-in versioning.
[01:56:24] <crocket> An index may exist in various configurations in multiple versions.
[01:56:41] <crocket> How does mongopatch know what patch to apply?
[01:58:39] <crocket> GothAlice, Did you use mongopatch?
[01:58:54] <crocket> mongo-migrate stores finished migrations in migrations collection.
[01:59:53] <GothAlice> crocket: We established, the last time you were asking these questions, that you only ever need to worry about upgrading, not downgrading. MongoPatch migrations have a "shouldUpdate" callback; even if it doesn't provide versioning etc. you could roll it yourself quite easily if you wanted it.
[02:00:10] <GothAlice> I use none of these systems, I rolled my own very similar to mongopatch for Python.
[02:00:28] <crocket> GothAlice, How does mongopatch know which patches to apply if it doesn't have a versioning mechanism?
[02:00:55] <GothAlice> Please refer to the fine manual: https://www.npmjs.org/package/mongopatch#runing-patches
[02:02:49] <crocket> GothAlice, It seems to detect a document versino by simply inspecting document keys.
[02:03:09] <crocket> It's like feature detection which I don't know how to do reliably.
[02:37:26] <crocket> I want to deal with it with automation.
[02:38:14] <crocket> If a new guy joins a company and has no clue about the history of mongodb schema, he's gone.
[02:38:16] <GothAlice> crocket: What's the #1 migration in most databases? Adding a field. In MongoDB, you don't need a migration for that, you just somedocument.get('myNewField', someDefaultValue) and boom, it works, regardless of if the record defines the field.
[02:38:17] <cheeser> crocket: yes, it does. and we write code when that happens.
[02:38:35] <cheeser> i think you're inventing problems where none exist
[02:38:39] <crocket> cheeser, It would have been a lot better if you wrote a generalized solution.
[02:38:59] <cheeser> the history of the schema is pointless to a new developer. the current state is all that matters.
[02:39:10] <GothAlice> crocket: If no current solution fits your needs, that "write a generalized solution" thing cuts both ways.
[02:39:54] <crocket> cheeser, Not all documents are in the same version.
[02:40:10] <GothAlice> crocket: And here is where I gave you my automation: http://irclogger.com/.mongodb/2014-11-06#1415253098 (also mentioned other uses for that exact automation)
[02:40:39] <GothAlice> crocket: Mismatched versions don't mater for 99% of my migrations.
[02:41:26] <GothAlice> (Even if the added field is a calculated field, it'll be calculated on first use then saved with whatever other update gets run on that document.)
[02:41:46] <GothAlice> I only maintain a current state on indexes.
[02:42:08] <crocket> GothAlice, Do you mean you just store the latest version of indexes in your git repository?
[02:42:51] <GothAlice> Yup! Some code that runs on startup runs ensureIndex across all of the *current* indexes (in Python models) and to hell with the old ones. (If I need to remove one, I write a trivial migration to do it when I clean-up the associated fields.)
[02:43:20] <GothAlice> Since versioning your indexes separately from the rest of your model basically never happens.
[02:46:44] <GothAlice> I have an application. It contains a data model (schema). This is the canonical (current) data structure, always. (master branch = production, develop = development staging, feature/* = where code gets written) In production and staging there is a git hook that pretty much simply runs any .py files that get added to the "migrations" folder. Pull, if there's a migration, it runs.
[02:47:06] <GothAlice> These migrations are trivial chunks of basic MongoDB operations. Insert some records, transform some fields, do whatever.
[02:47:50] <GothAlice> None of them ever add an index; none of them need to. Each time my application service starts, it runs through all known models and runs db.collection.ensureIndex across each one. If they didn't exist before, they will now.
[02:48:39] <cheeser> e.g., using morphia, you'd call datastore.ensureIndexes()
[02:48:48] <cheeser> do that at startup and you're set
[02:48:49] <GothAlice> If I remove some fields, and they were indexed, in the migration where I delete the fields from all documents ($unset, I believe; it's late ^_^) I'll also delete the indexes.
[02:49:18] <GothAlice> Indexes are tied very heavily to the model structure, so there's no point separately "versioning" the migrations, and in my case, no point "versioning" at all; migrations are always run when first pulled.
[02:50:15] <GothAlice> (Also note I need to store no extra data of any kind—against documents or the database in general—to track state in this arrangement.)
[02:50:42] <GothAlice> Same with data migrations; they're just a .py file bundled with a .yaml file (if bulk loading in my case) that streams the YAML into the DB.
[02:53:17] <GothAlice> http://irclogger.com/.mongodb/2014-11-06#1415252744-1415252790 — I'll have to re-gist since it was private code that I later deleted.
[02:54:41] <GothAlice> crocket: https://gist.github.com/amcgregor/3dc373b16b1a5b410e2f — here's a different migration than the last one I gave you, but demonstrates the idea.
[02:54:55] <cheeser> what's so mysterious about that?
[02:54:58] <JesusRifle> I'm going totally mad with some nodejs code, can anyone explain why this snippet starts printing null? http://hastebin.com/xubomataca.js
[02:55:05] <crocket> Does irclogger.com log every freenode channel?
[02:55:06] <GothAlice> (and irclogger is easier to link to than my own IRC logs)
[02:58:06] <GothAlice> Some migrations mutate existing documents or collections, that one just dumps data into one. What the migration itself does doesn't matter, you can see you can literally write whatever code you want.
[02:58:10] <cheeser> the migration updates the docs, deletes any old indexes, the app creates any missing indexes at start up
[02:58:10] <crocket> GothAlice, What if I suffled fields?
[02:58:57] <crocket> I thought "the docs" meant the documentations.
[02:59:11] <cheeser> we weren't talking about documentation
[02:59:30] <JesusRifle> GothAlice: cheers for the help, im getting this back when printing error. { [MongoError: Connection Closed By Application] name: 'MongoError' }
[02:59:33] <crocket> When you say "docs", it usually means 'documentations'.
[02:59:47] <JesusRifle> I've also gone ahead and fixed the conflict
[03:04:03] <JesusRifle> I understand the code I have reallt doesnt do much, but I don't understand why I can't query for all documents and then query for them each individually one at a time
[03:04:03] <GothAlice> JesusRifle: /var/log/mongodb/mongod.log is the correct file; make sure mongod can write to it (ps aux | grep mongod — see who it is running as, make sure /var/log/mongdb can be entered by that user, too)
[03:05:06] <JesusRifle> there are entries in mongod.log but nothing I could see that was useful
[03:06:56] <JesusRifle> last thing is "waiting for connections on port 27017" all seems good
[03:06:58] <GothAlice> JesusRifle: db.close(); — what, precisely, does this do?
[03:11:46] <crocket> Python beats ndoejs in documentation.
[03:12:44] <GothAlice> crocket: Except a majority of Node.js MongoDB questions I have attempted to assist with for the last two weeks has somehow faulted the operation or use of callbacks, or callbacks have obscured the problem.
[03:12:51] <GothAlice> Python beats JS in terms of obviousness.
[03:13:03] <crocket> GothAlice, You can avoid callbacks.
[03:14:56] <GothAlice> http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf — §9.3.1 ToNumber Applied to the String Type, page 44 through 46.
[03:15:53] <GothAlice> Python also incorporates an arbitrary precision type (good for finance) and incorporates transparent bignum support, so, like, I can actually do crypto in Python.
[03:16:11] <cheeser> yeah. i wish java had that feature
[03:23:16] <GothAlice> Frameworks are there to provide a stable foundation from which to build your own application on top of. What's the point if your stable never solidifies? Make it modular (like WebCore is) and the framework itself need not change to accommodate different needs, you only need to pick different plugins.
[03:41:17] <GothAlice> One only *needs* websockets if one is transmitting binary data, AFIK.
[03:41:55] <GothAlice> (And even then, you could just base64 things, but websockets would be more efficient on binary data, so even then, one wouldn't *need* it.)
[03:42:50] <crocket> GothAlice, You make it sound as if websocket were bad for real-time communication.
[03:43:41] <GothAlice> They're unneeded for real-time communication. Also a lot less manageable to debug. &c. &c.
[03:51:33] <crocket> Should I use mongodb driver or mongoose on nodejs?
[03:52:30] <GothAlice> (The reason I ask is that https://gist.github.com/amcgregor/2712270#file-jquery-nhpm-js is the entirety of the "library" supporting the chat widget I demonstrated.) Mongoose looks reasonable, and using schemas can be handy, esp. as a validation layer for input.
[03:53:14] <crocket> GothAlice, It relies on external HTTP push server.
[03:54:16] <GothAlice> crocket: That JavaScript is client-side, so yes, it relies on a messaging server of some kind. (I have an implementation in Python, NHPM provides one for Nginx, and the protocol is trivial.) https://pushmodule.slact.net/protocol.html
[03:55:06] <GothAlice> (Read HTTP: The Definitive Guide before attempting such a task.)
[03:55:14] <crocket> GothAlice, As I said, I decided to go with nodejs and websocket.
[03:55:54] <crocket> That way, I can just finish the job and be relieved of the job in 3 months.
[03:56:08] <crocket> I can do what I like after I quit.
[03:56:20] <GothAlice> crocket: You: "I need to make a web server." Me: "Here's one you can use as a point of reference." I'll add that it's an async server, very suitable for translation into Node.js, with much callback goodness.
[03:56:47] <crocket> GothAlice, I guess NHPM doesn't have enough support on nodejs.
[03:57:06] <GothAlice> crocket: Have you read the protocol doc?
[04:01:36] <GothAlice> (You can't randomly seek in a btree index.)
[04:02:07] <quuxman> but couldn't it crawl around, guessing how far to jump ahead?
[04:02:39] <GothAlice> quuxman: That would give highly variable performance all the time instead of relatively linear performance as you more deeply seek.
[04:02:54] <GothAlice> (One of the reasons why ordering the columns in compound indexes becomes important. ;)
[04:03:24] <GothAlice> (By order, I mean ascending/descending, OFC.)
[04:03:26] <quuxman> so how do you recommend implementing pagination with a query that does a group?
[04:04:54] <GothAlice> quuxman: In cases where seeking pops the query beyond the acceptable delay, I have it write results to a temporary collection, then use that instead. (Saving the time to re-aggregate on subsequent pages.) It "snapshots" the results when the query is made, and I have to record and clean-up the collections after they're no longer used. (15 minutes after last access.) If it's been cleaned up and another page is requested, we re-query (and
[04:04:54] <GothAlice> accept that it might be slow for the user that time.)
[04:05:32] <quuxman> crazy. Is pagination just fundamentally hard?
[04:06:22] <GothAlice> But I'm also notorious for not paginating things. I.e. I can emit a search for everything in my CMS in a city dataset covering 10+ years of PDFs and whatnot (including result ranking on nothing) and display all results in < 2 seconds.
[04:09:40] <Boomtime> note there is bandwidth and db performance questions in the mix - if you're search constraints are such that these are not applicable then the answer is also yes
[04:56:46] <GothAlice> (Mine by virtue of the fact that it's the one I use.)
[04:56:57] <quuxman> GothAlice: "$group does not order its output documents". Does that mean indexes on the collection that is grouped can't be used to order the output?
[04:57:39] <GothAlice> quuxman: However, I recall something about certain optimizations in recent 2.6 relating to having $sort immediately follow certain operations…
[04:58:51] <GothAlice> Nevermind, it was about $sort followed by $limit.
[04:58:59] <quuxman> but then on $first it says "Order is only defined if the documents are in a defined order." which I read as maintaining the order of the query that went into the $group
[05:46:38] <crocket> I'm getting started up with mongodb on nodejs.
[07:45:23] <Tark> hey guys! I have a collection of documents with field price: [10, 100] or price: [0, 500]. I need return a cursor with fields without zero. Aggregation framework returns a CommandCursor without a count() method (why?)
[08:02:43] <crocket> Are collection names case-insensitive?
[08:34:22] <Soothsayer> Is there anything unusual about setting a BinData (md5) for the _id field?
[08:34:47] <Soothsayer> I am maintaining a collection that maps URLs to some kind of resource for URL rewriting. I’ve made the _id as the md5 binary of the URL so I don’t have to create another unique index
[09:31:02] <Hypfer> hi, any idea where I can get valid international addresses which I can simply import into mongo?
[09:49:49] <awasthinitin> anybody used any gui based mongodb manager for node server. Linux server
[10:35:24] <Avihay_work> Soothsayer: there's the possibility of hash collisions. I'm guessing that if your URLs are unique (or otherwise an md5 scheme would be pointless), you can use them as the _id, and I'm guessing mongo will use hashes with collision solving for the index in both cases
[10:35:52] <Avihay_work> disclaimer - I started reading bout mongo "yesterday"
[10:40:21] <Soothsayer> Avihay_work: naa, I’m beyond those issues of hash collision. I’m more dealing with issues of value length of indexed field
[11:00:23] <kali> Soothsayer: don't worry that is fine
[11:00:59] <kali> Soothsayer: the first limit you're going to hit this way is at 700bytes or so (the index will truncate the value)
[11:01:34] <Soothsayer> kali: got it.. but if im storing an md5 binary i won’t hit that limit anyway
[11:01:42] <Soothsayer> kali: so I can have really long URLs without any hassle
[11:02:25] <Soothsayer> kali: its more work at the application and over-all debugging / management level.. (even most Mongo GUIs - Robomongo, Rockmongo, etc.. do not know how to deal with a document with an _id as binary content)
[11:02:32] <Soothsayer> so i was wondering if its worth the effort
[11:02:33] <kali> i use exactly the same trick to manage wikipedia titles at some point
[11:04:02] <Soothsayer> kali: ye, got you, that was my earlier strategy.. but then felt if im being efficient, lets be really efficient if the effort is not exponential
[11:04:31] <kali> i think binary _id should be manageable :)
[12:51:14] <crocket> Why dose mongoose convert the first capital letter to a lowercase in collection name?
[12:59:30] <kali> crocket: that's a "feature" borrowed to rails framework: translating ClassName to collection_name. i hate it. most of the time, these framework also try (and regularly fail) to make up a plural name for the class and use it as the collection name
[12:59:49] <kali> crocket: mongoose certainly allows you to specify an arbitrary name for the collection anyway
[13:08:26] <crocket> kali, It's not even documented.
[15:09:44] <gswallow> We have been hit with this bug: https://www.google.com/url?q=https%3A%2F%2Fjira.mongodb.org%2Fbrowse%2FSERVER-15369&sa=D&sntz=1&usg=AFQjCNE3zw7Kx80E1Z3noeykhESwFA29ww
[15:11:08] <gswallow> Oddly, we are on the kernel that's supposed to fix this bug (3.13.0-39-generic #66~precise1). We are running Mongo 2.6.3.
[15:11:44] <gswallow> There's mention in the bug report that you can fix the corrupted namespace file, though with the size of the impacted database being a whopping 0.28 GB, I could just copy over the database. It's not hot.
[16:12:03] <ctorp> Is it possible to force the order of the keys in a .find() result?
[16:13:02] <ctorp> ie returned object {"first":bar, "last":baz} instead of {"last":baz, "first":bar}
[16:35:37] <TCMSLP> hey all - quick question - we're trying to disable anonymous access to mongo (2.4.6) via AUTH=true but we can still connect without credentials
[17:18:36] <GothAlice> What I said: that index can be used to filter results (used in the query), but data won't be pulled from the index to try to optimize the returned results.
[17:23:17] <GothAlice> Actually, that whole link you gave explains it: you can query a collection (retrieving data that requires lookup in the collection, which uses indexes to filter) or you can query an index (only retrieve data that is indexed, uses the index to filter and retrieve data.)
[17:23:32] <GothAlice> Indexes on embedded documents can't be used for the latter of those two.
[17:24:26] <GothAlice> It's not like the index won't otherwise be used, however. .find({"user.login": "tester"}) will use the index and be nicely efficient.
[17:52:58] <shoshy> hey, for some weird reason my mongo "stopped" working (on an instance), i used the bitnami one for quite some time, but since today it just stopped, here's what happens when i run mongo: http://pastie.org/9712248
[17:53:20] <shoshy> i want to backup the db, i was able to fork it... but not sure how i can update
[19:24:34] <NaN> who can I see the log or the last inserted data?
[19:27:36] <bazineta> Curios if anyone is using EC2 general purpose data volumes for data, instead of provisioned. I'm having a hard time seeing the downside; they're faster, so why are they cheaper?
[19:28:22] <bazineta> i.e., 200GB 1K PIOPS is $90/month, buth a GP SSD at 500GB is 1.5K to 3K burst IOPS, and it's $47/month.
[19:33:08] <bazineta> The second has the 3 IOPS/GB info
[19:36:16] <bazineta> So the confusing thing is, as I read that, let's say I bought a 1TB GP SSD volume. I can run 3K IOPS on that indefinitely. It's dramatically cheaper than an 3K PIOPS volume. So why would I consider a PIOPS volume?
[19:36:55] <kali> it has changed this last time i had to provision capacity
[19:37:51] <bazineta> Yes, same here. I've used PIOPs to date, but now I'm starting to saturate during background flush, so was looking into options.
[19:41:15] <shoshy> hey, if i've copied all the files @ /data/db/ can i just copy them to a different instance with mongo and enjoy the "new" , restored db?
[19:41:27] <kali> bazineta: well, i'm like you. i can't make sense of it
[19:41:31] <bazineta> And it seems practically ideal; flush takes about 1.5 to 2 seconds out of every minute, so I'd always be full of burst credits.
[19:41:46] <kali> bazineta: might be the right time to get in touch with support if you have that option
[19:42:09] <shoshy> because something werid happend with my mongo, and i can't seem to get into the cli / run it properly.. so i've spawned a new instance and just want to restore it
[19:42:34] <kali> shoshy: if the files are "sane", it will work
[19:42:42] <bazineta> @shoshy journal as well if you have it separate, otherwise should be ok
[19:42:58] <kali> shoshy: provided that the mongod was switched of when you copied the data
[19:44:29] <shoshy> kali: i see, thanks! and how can i maybe try to run the cli of mongo? this is the error i get when i try to "sudo mongo" http://pastie.org/9712561
[19:49:38] <mwatson> Hi, I'm having an issue with duplicate keys during an aggregate query out operation, with the main objective of unwinding an array in each document. What's the recommended way to generate a new id for each document?
[21:51:19] <idni> hi. since ive never worked with mongo db - is it appropriate for things like e.g. a pretty big project management software? sounds hard to do something like this without joins
[21:51:50] <GothAlice> idni: MongoDB gives you the flexibility to model your data in very different ways than most are used to.
[21:53:01] <GothAlice> For example, instead of having a table of threads and a table of replies and a table of forums those threads belong to, and rely heavily on joins, you might have one "collection" of threads with the replies stored in the same record as the thread the replies belong to, and a collection of "forums".
[21:53:40] <hfp> Hi all, what is the best way to use mongoimport with a tsv file on travis-ci? All my builds fail because it doesn't load the mongo data... I ahve `mongoimport -d cities -c cities --file data/cities_canada-usa.tsv --jsonArray --type tsv --headerline` in my travis file. How do you guys do it?
[21:53:48] <GothAlice> MongoDB provides ways of returning specific subsets (slicing, get one by query) as well as atomically manipulate (such as append to a list, etc).
[21:54:51] <idni> but, as in your example the forum, whats about the users table? threads are posted by users, which are (probably?) stored in a user "collection"?
[21:56:51] <GothAlice> You'd do manual joins where needed.
[21:57:19] <joannac> hfp: is your data actually a json array? doesn't sound like it
[21:57:43] <idni> but in big applications there will be tons of "manual joins"?
[21:57:46] <GothAlice> I.e. if you want to allow lookup of all threads containing a reply by a certain user, by username, you might db.users.findOne({username: 'GothAlice'}, {_id: 1}) to get the user's ID, then db.threads.find({'comment.author': GothAlice._id})
[21:58:05] <hfp> joannac: works locally with this argument, I didnt check without it
[21:58:28] <hfp> joannac: the mongoimport statement should be in `before_script` or in `script` ?
[21:58:46] <GothAlice> idni: This better models reality, though. It should be an error condition if a username doesn't exist, this provides a nice point to trap that condition.
[21:59:44] <GothAlice> idni: https://github.com/amcgregor/forums/blob/develop/brave/forums/component/thread/model.py#L69-L79 is an example of this model, in Python. :)
[22:08:36] <GothAlice> I'd read this from beginning to end; it includes specific coverage of typical relationship patterns.
[22:10:14] <idni> i am definitely going to read it, thanks!
[22:11:18] <GothAlice> My CMS datasets use a combination of nested set, parent references, array of ancestors, and materialized paths in order to model hierarchical site structures. (Each part of that optimizes a specific type of query about the tree.)
[22:12:23] <GothAlice> You can get pretty creative with MongoDB, and some apparent instances of "data duplication" may be a good thing… as always, MongoDB really forces you to think about how you want to *use* your data.
[22:13:40] <idni> i think i need to be creative with a nosql database? :P
[22:13:42] <cheeser> i was just having a conversation about that with some people here in the office
[22:16:35] <GothAlice> idni: Well, the first thing a CMS would do would be to map a URL to a document, but it'd also need to worry about the security of each folder up to the root. To query for that most efficiently I do two queries: query for "path" in every combination of path elements up (i.e. /, /foo/, /foo/bar/, /foo/bar/baz), ordered by the path, querying for ACL rules.
[22:17:09] <GothAlice> This finds the deepest document in the database matching that path, gives me an efficient way to run through the security for each level, and then I do a second query to get the rest of the data about the deepest document.
[22:19:31] <GothAlice> Breadcrumb lists are common in CMSes, too, so I also store a list of the IDs and titles of each parent, which combined with the path produce a complete breadcrumb navigation without having to load any extra data. (If a title is updated, you can query and update all references in one operation…)
[22:19:39] <GothAlice> Much thought went into the modelling. :)
[22:24:15] <idni> but, hmm, the more i think about it, the more complicated it gets :p
[22:25:50] <idni> for example, my current erp solution has a feed of all actions (new projects, changes in projects, etc.). if these entries are shown depends on so much things (permissions, are you part of the project, etc.)
[22:26:56] <idni> the feed table itself just has a bunch of columns (action, user_id, project_id, etc.) - without joins, i cant imagine how to implement this feed :o
[22:29:13] <GothAlice> At work we event log. ¬_¬ Quite a bit, actually.
[22:29:21] <GothAlice> Including all HTTP requests and responses, with session data.
[22:30:44] <GothAlice> One approach I like to the security issue is the use of automatically generated textual tags; then each log entry can contain a few bytes per entity or group or role or whatever who is allowed to see the record.
[22:33:47] <idni> ive never heard of generated textual tags
[22:37:50] <GothAlice> idni: The group "Support Staff" has the tag "a". Any record support staff are allowed to see has the string "a" in a field containing a list of tags, etc. (You can even go so far as to have split readers/writers and limit your updates to only records containing tags relevant to the current user, and other interesting combinations. :)
[22:44:01] <hfp> joannac: I dont know, I dont get it. The tests pass on my machine but when I run it on travis it fails. the mongoimport runs fine but then the app can't see the data. I use localhost:27017 for the mongodb server in my app, is this different on travis?
[22:44:21] <GothAlice> idni: The tags are really just an optimization allowing me to avoid passing in large sets of "in"s in various combinations of $or and $and; it's like pre-aggregated security..
[22:46:37] <joannac> hfp: can you connect with the mongo shell and see the data?
[22:52:51] <joannac> hfp: I don't know what that means. travis just runs tests and possibly scripts for you. where's your actual mongod process?
[22:53:09] <GothAlice> joannac: Travis provides certain services. MongoDB is one of them.
[22:53:30] <joannac> so your mongodb server (where all your data lives) is in the cloud?
[22:55:29] <GothAlice> https://github.com/marrow/marrow.task/blob/develop/.travis.yml (note the "services:" section) and I use https://github.com/marrow/marrow.task/blob/develop/.travis/install.sh#L8-L12 to make sure it's current.
[22:56:09] <GothAlice> hfp: Link to the failing build?