pmxbot IRC Log Viewer

[00:02:17] <annoymouse> If I'm making a CMS for a blog type website, and I have users, who own blogs, who own posts, how should I represent it?

[00:02:41] <annoymouse> Should I make three models and place an "owner" attribute in the blogs and posts schema?

[00:03:15] <annoymouse> Or should I make an attribute in users that is a list of the blogs owned by that user

[00:03:24] <annoymouse> Or should I make an attribute in users that is a list of the ids blogs owned by that user

[00:03:30] <annoymouse> three options rn

[00:03:45] <annoymouse> Not sure which one works best with mongo though

[00:04:03] <GothAlice> How will you be querying your data?

[00:04:11] <GothAlice> Do you need to preserve the blogs in some arbitrary order?

[00:04:56] <annoymouse> GothAlice: I'm not really sure how I will query the data. And the blogs don't necessarily have to be in any order

[00:05:09] <annoymouse> But the posts should be sorted by rating or views

[00:07:44] <GothAlice> Then I'd say that blogs would have owners, and posts would have blogs. (Don't use a generic name between collections; it makes things confusing. ;)

[00:13:31] <bazineta> I would have a colleciton of blogs, a collection of users, and an authorization colleciton that associates a user objectID to a blog objectID with an associated permission level, i.e., owner, manager, viewer, etc.

[00:13:48] <bazineta> Also, I would spell collection correctly.

[00:15:14] <bazineta> Then a collection of posts, associated to a blog via the blog objectID

[00:20:02] <annoymouse> Now I'm really confused

[00:20:31] <annoymouse> bazineta: What's an authorization collection?

[00:20:59] <bazineta> So, you have users. You have blogs.

[00:21:04] <annoymouse> Right

[00:21:27] <bazineta> You want to associate a blog to a user, but think of the case where a given blog might have more than one manager or owner.

[00:21:53] <bazineta> That's a many to one situation, so you can model that a few ways, but a collection associating the two is often convenient.

[00:22:27] <bazineta> That collection would contain two referenence objectIDs, i.e., a blog objectID and a user objectID, then some permission level saying, this user has this authority to this blog.

[00:23:22] <GothAlice> https://github.com/bravecollective/forums/tree/develop/brave/forums/component — this is forum/bulletin board software with tag-based security. (Users have groups, specifically.) If your blogs can have comments, a forum is a pretty close analog.

[00:24:37] <GothAlice> In this case there are forums (blogs; collection), with threads (posts; collection), and comments (embedded in threads), with users "owning" everything.

[00:24:49] <bazineta> There are many paths, but I've found the authorization collection to be easy. One could also handle it via an array of users in the blog object. There are positives and negatives to each approach.

[00:25:00] <annoymouse> I plan on using Disqus for comments

[00:25:59] <annoymouse> bazineta: And then the blogs would have the posts

[00:26:05] <GothAlice> Forums/threads remain a similar concept to blogs/posts. (And good on you for not reinventing the comments wheel.)

[00:26:19] <bazineta> Most of what we deal with is unbounded and large, so a collection works well. If it's bounded and small, having the relations right in the document is a fine approach. Depends on the application, really.

[00:27:18] <GothAlice> That's why I use a different abstraction; don't embed users into the other records, embed roles (singular, or multiple, unique, or not).

[00:27:53] <GothAlice> The forums there use simple string "tag"-based security (with users having "tags" supplied by an upstream auth system).

[00:38:16] <androidbruce> Anyway in Mongojs to chain a mapreduce without writing a tmp collection between map reduces?

[00:45:34] <Boomtime> androidbruce: not that I know of in MR, but what you want sounds like aggregation

[00:45:47] <androidbruce> Oh?

[00:46:39] <Boomtime> you can chain multiple match/group stages together, which is roughly analogous to what you're doing in a single MR

[00:47:30] <Boomtime> of course, in MR you get to write the function yourself in free form code, where-as in aggregation you need to be able to frame your match/group into the operators you have available

[00:47:46] <Boomtime> aggregation is much more efficient though

[00:48:00] <androidbruce> Incremental MR a possibility?

[00:49:05] <Boomtime> somewhat: http://docs.mongodb.org/manual/tutorial/perform-incremental-map-reduce/

[00:49:51] <androidbruce> Thanks Boomtime I'll grep through

[00:50:01] <Boomtime> seriously though, MR is what you use when you have exhausted literally everything else, or if you just don't give a crap about database performance

[00:50:38] <cheeser> it's the regex of analytics :D

[00:51:00] <androidbruce> Hah

[00:51:03] <GothAlice> https://gist.github.com/amcgregor/7bb6f20d2b454753f4f7

[00:51:16] <GothAlice> ^ This is a comparison between map/reduce and aggregate query approaches to the same thing.

[00:51:29] <GothAlice> (With a Python snippit at the end to encapsulate storage of aggregate queries within MongoDB itself.)

[00:52:06] <androidbruce> We have some monster collections we need to get a handle on

[00:52:12] <androidbruce> Like billions of records

[00:52:22] <androidbruce> So

[00:52:34] <androidbruce> MR seemed to be an option

[00:52:50] <Boomtime> you should try aggregation first

[00:53:39] <GothAlice> I have a cluster at home containing 25 TiB of data… and I use aggregate queries. Not needing to execute a JavaScript runtime (and avoiding some extra locking AFIK) makes these indexed queries insanely fast.

[00:54:05] <GothAlice> (Always $match at the beginning of your aggregate pipeline to benefit from indexes.)

[00:55:54] <GothAlice> You can also increase the amount of RAM (or tell it to page to disk) over-large queries. See: http://docs.mongodb.org/manual/core/aggregation-pipeline-limits/

[00:57:08] <androidbruce> Very good information

[00:57:15] <androidbruce> Thanks for all the help

[00:57:52] <GothAlice> No worries.

[01:05:26] <GothAlice> androidbruce: http://pauldone.blogspot.ca/2014/03/mongoparallelaggregation.html is another link you may be interested in; this is about the performance difference between a highly optimized map/reduce approach and both unoptimized and client-side parallel optimizations for aggregate queries.

[01:05:31] <GothAlice> (Aggregate queries win by a lot.)

[01:06:34] <bazineta> Good info, Alice, thanks.

[01:41:16] <annoymouse> bazineta: Quick question about what you suggested. Why do I need a separate collection for perms? Can't I just make an attribute in blog that lists who can edit?

[01:42:55] <bazineta> That's also ok. Depends on how granular you need to be and if there will be a lot or a few. If it's a lot, then a collection works well, if it's a few, putting it in the document works.

[01:48:11] <peterp> Hi guys, I'm looking to connect to a remote mongodb server. is SSH tunneling the best way to accomplish this in terms of reasonable security?

[01:48:49] <peterp> This instance will be the staging database

[01:48:53] <GothAlice> peterp: That's one approach. I use SSL and IPSec firewall setups.

[01:49:44] <GothAlice> SSH isn't generally meant for persistent connections, though, so I try to avoid it for anything that might need to be connected >24h.

[01:50:04] <peterp> for SSL I would need a certificate right?

[01:50:09] <GothAlice> Correct.

[01:50:17] <GothAlice> Though you can technically generate one of your own.

[01:50:27] <GothAlice> (Crypto is crypto.)

[01:50:32] <peterp> If its technically, I probably shouldnt go down that route?

[01:50:56] <peterp> Basically, I'm new to separating the database from the rest of the application, so I'm having trouble with connecting to my remote mongoDB

[01:51:12] <peterp> I did find SSH tunneling to do the trick, but like you're saying, I'm not sure this is an ideal route to take

[01:51:28] <GothAlice> peterp: Likely your mongod process is only listening on 127.0.0.1 (localhost).

[01:51:35] <peterp> Since this is a marketing website, there's nothing I need to hide that the user won't already see

[01:51:36] <GothAlice> Which is why SSH tunnelling would work.

[01:51:42] <peterp> Yes, that is the case

[01:51:52] <peterp> Do you think over the internet is a better approach than SSH tunneling?

[01:52:09] <GothAlice> Not at all.

[01:53:18] <GothAlice> If your application and database hosts are in the same datacenter, you may be able to set up a VPN between them, or your provider may offer a way to build private virtual machine networks. If at any point you need to connect over an un-trusted wire, use encryption. (Generally means encrypted VPN tunnels or SSL encryption.)

[01:53:27] <crocket> I want to manage mongodb schema versions on nodejs.

[01:53:32] <crocket> What tools do I have?

[01:53:38] <crocket> mongo-migrate, mongopatch, ...

[01:54:04] <peterp> So I should avoid using a public IP for mongo to listen to

[01:54:19] <peterp> I see. That makes sense. I am using digitalocean, perhaps I will be in the same data center

[01:54:45] <GothAlice> T_T digitalocean

[01:54:59] <peterp> is T_T a good or bad thing

[01:55:00] <crocket> GothAlice, hello again

[01:55:19] <crocket> GothAlice, I'm looking at mongopatch, and it doesn't seem to provide a mechanism to manage indexes.

[01:55:31] <crocket> Am I wrong?

[01:55:40] <GothAlice> You write the createIndex/ensureIndex commands in the upgrade portion of your migrations.

[01:55:50] <GothAlice> Why would it need to "provide" anything explicitly for indexes?

[01:55:50] <peterp> VPN is the route I am taking. Thank you GothAlice

[01:55:53] <crocket> GothAlice, But then, there is no built-in versioning.

[01:56:24] <crocket> An index may exist in various configurations in multiple versions.

[01:56:41] <crocket> How does mongopatch know what patch to apply?

[01:58:39] <crocket> GothAlice, Did you use mongopatch?

[01:58:54] <crocket> mongo-migrate stores finished migrations in migrations collection.

[01:59:53] <GothAlice> crocket: We established, the last time you were asking these questions, that you only ever need to worry about upgrading, not downgrading. MongoPatch migrations have a "shouldUpdate" callback; even if it doesn't provide versioning etc. you could roll it yourself quite easily if you wanted it.

[02:00:10] <GothAlice> I use none of these systems, I rolled my own very similar to mongopatch for Python.

[02:00:28] <crocket> GothAlice, How does mongopatch know which patches to apply if it doesn't have a versioning mechanism?

[02:00:55] <GothAlice> Please refer to the fine manual: https://www.npmjs.org/package/mongopatch#runing-patches

[02:02:49] <crocket> GothAlice, It seems to detect a document versino by simply inspecting document keys.

[02:03:09] <crocket> It's like feature detection which I don't know how to do reliably.

[02:03:23] <crocket> versino -> version

[02:08:57] <crocket> OK

[02:08:58] <crocket> mongopatch logs individual document changes in a log db.

[02:09:11] <crocket> Does it log index changes?

[02:16:21] <crocket> GothAlice, I don't think mongopatch is up for the job.

[02:23:19] <crocket> I think mongodb migration is split in two parts.

[02:23:23] <crocket> The one is index migration.

[02:23:30] <crocket> The other is document migration.

[02:23:41] <crocket> Document migration should be done for each document.

[02:23:52] <crocket> Index migration can be done per collection?

[02:24:06] <crocket> I think index migration can be done on a database.

[02:31:56] <crocket> How do I handle index migration and document migration on mongodb?

[02:33:35] <cheeser> handle how? what do you mean by migration?

[02:33:46] <crocket> schema versioning

[02:33:52] <cheeser> oh.

[02:34:07] <crocket> Index and document should be handled separately.

[02:34:16] <cheeser> well, we write code to reshape our documents and drop/create indexes

[02:34:24] <crocket> cheeser, we write code?

[02:34:32] <crocket> Who is "we"?

[02:34:40] <cheeser> me and the people i work with

[02:35:03] <crocket> You write a customized solution to that problem?

[02:35:16] <cheeser> yes

[02:35:21] <crocket> That's not very good

[02:35:27] <GothAlice> I think the key point that hasn't been grasped is that this isn't really a _problem_ in MongoDB.

[02:35:29] <crocket> Everyone has to reinvent the wheel.

[02:35:31] <cheeser> it works

[02:35:40] <GothAlice> It's so not a problem that most migrations are a few lines of JS injected into the MongoDB shell.

[02:35:58] <cheeser> well, ours are java based but yeah.

[02:36:04] <crocket> GothAlice, Yet, it is a lot of work if you don't automate it for years.

[02:36:24] <cheeser> if you can go years without a schema migration, then there's no code to write anyway.

[02:36:29] <GothAlice> crocket: I gave you my automation. It was one line. On each pull check for the addition of migrations, if found, run them.

[02:36:44] <crocket> GothAlice, your automation?

[02:37:14] <crocket> cheeser, Schema migration happens.

[02:37:26] <crocket> I want to deal with it with automation.

[02:38:14] <crocket> If a new guy joins a company and has no clue about the history of mongodb schema, he's gone.

[02:38:16] <GothAlice> crocket: What's the #1 migration in most databases? Adding a field. In MongoDB, you don't need a migration for that, you just somedocument.get('myNewField', someDefaultValue) and boom, it works, regardless of if the record defines the field.

[02:38:17] <cheeser> crocket: yes, it does. and we write code when that happens.

[02:38:35] <cheeser> i think you're inventing problems where none exist

[02:38:39] <crocket> cheeser, It would have been a lot better if you wrote a generalized solution.

[02:38:59] <cheeser> the history of the schema is pointless to a new developer. the current state is all that matters.

[02:39:10] <GothAlice> crocket: If no current solution fits your needs, that "write a generalized solution" thing cuts both ways.

[02:39:54] <crocket> cheeser, Not all documents are in the same version.

[02:40:10] <GothAlice> crocket: And here is where I gave you my automation: http://irclogger.com/.mongodb/2014-11-06#1415253098 (also mentioned other uses for that exact automation)

[02:40:39] <GothAlice> crocket: Mismatched versions don't mater for 99% of my migrations.

[02:40:40] <crocket> GothAlice, Wuh....

[02:41:08] <crocket> What about index versioning?

[02:41:26] <GothAlice> (Even if the added field is a calculated field, it'll be calculated on first use then saved with whatever other update gets run on that document.)

[02:41:29] <GothAlice> I don't.

[02:41:32] <GothAlice> There's no point.

[02:41:46] <GothAlice> I only maintain a current state on indexes.

[02:42:08] <crocket> GothAlice, Do you mean you just store the latest version of indexes in your git repository?

[02:42:51] <GothAlice> Yup! Some code that runs on startup runs ensureIndex across all of the *current* indexes (in Python models) and to hell with the old ones. (If I need to remove one, I write a trivial migration to do it when I clean-up the associated fields.)

[02:43:20] <GothAlice> Since versioning your indexes separately from the rest of your model basically never happens.

[02:44:00] <crocket> I'm very confused

[02:46:44] <GothAlice> I have an application. It contains a data model (schema). This is the canonical (current) data structure, always. (master branch = production, develop = development staging, feature/* = where code gets written) In production and staging there is a git hook that pretty much simply runs any .py files that get added to the "migrations" folder. Pull, if there's a migration, it runs.

[02:47:06] <GothAlice> These migrations are trivial chunks of basic MongoDB operations. Insert some records, transform some fields, do whatever.

[02:47:50] <GothAlice> None of them ever add an index; none of them need to. Each time my application service starts, it runs through all known models and runs db.collection.ensureIndex across each one. If they didn't exist before, they will now.

[02:48:39] <cheeser> e.g., using morphia, you'd call datastore.ensureIndexes()

[02:48:48] <cheeser> do that at startup and you're set

[02:48:49] <GothAlice> If I remove some fields, and they were indexed, in the migration where I delete the fields from all documents ($unset, I believe; it's late ^_^) I'll also delete the indexes.

[02:49:18] <GothAlice> Indexes are tied very heavily to the model structure, so there's no point separately "versioning" the migrations, and in my case, no point "versioning" at all; migrations are always run when first pulled.

[02:49:28] <crocket> ok

[02:49:37] <crocket> I can call ensureIndex on startup.

[02:49:40] <GothAlice> (If several will run in one pull, and order matters, the list of scripts is sorted alphabetically first.)

[02:49:40] <crocket> What about data migration?

[02:50:15] <GothAlice> (Also note I need to store no extra data of any kind—against documents or the database in general—to track state in this arrangement.)

[02:50:42] <GothAlice> Same with data migrations; they're just a .py file bundled with a .yaml file (if bulk loading in my case) that streams the YAML into the DB.

[02:50:47] <GothAlice> It's… like, three lines.

[02:50:57] <crocket> GothAlice, Can you show me an example?

[02:52:56] <GothAlice> Right, I had one that I already showed you.

[02:53:11] <crocket> The git hook command?

[02:53:17] <crocket> I want to see .py

[02:53:17] <GothAlice> http://irclogger.com/.mongodb/2014-11-06#1415252744-1415252790 — I'll have to re-gist since it was private code that I later deleted.

[02:53:38] <crocket> What is irclogger?

[02:54:20] <cheeser> the name pretty much says it...

[02:54:40] <crocket> What is irclogger.com for?

[02:54:41] <GothAlice> crocket: https://gist.github.com/amcgregor/3dc373b16b1a5b410e2f — here's a different migration than the last one I gave you, but demonstrates the idea.

[02:54:50] <cheeser> for logging irc, of course.

[02:54:55] <cheeser> what's so mysterious about that?

[02:54:58] <JesusRifle> I'm going totally mad with some nodejs code, can anyone explain why this snippet starts printing null? http://hastebin.com/xubomataca.js

[02:55:05] <crocket> Does irclogger.com log every freenode channel?

[02:55:06] <GothAlice> (and irclogger is easier to link to than my own IRC logs)

[02:55:21] <cheeser> crocket: of course not

[02:55:49] <GothAlice> JesusRifle: console.log(err); // See what's actually happening there.

[02:56:18] <cheeser> JesusRifle: i'm not a big node guy, but i'd rethinking those funcation parameter names passed to that findOne() callback

[02:56:40] <GothAlice> There's some conflict there, yes.

[02:57:06] <crocket> GothAlice, That looks like a static collection versioning.

[02:57:10] <cheeser> it *should* be fine given the standard scoping rules but javascript is weird like that and removing any ambiguity can only help

[02:57:17] <crocket> Versioning over a static collection that's not meant to change.

[02:57:22] <GothAlice> crocket: It's not versioning. There is no versioning.

[02:57:29] <GothAlice> Migration gets added to the code: it gets run.

[02:57:31] <GothAlice> Bam, that's it.

[02:58:06] <GothAlice> Some migrations mutate existing documents or collections, that one just dumps data into one. What the migration itself does doesn't matter, you can see you can literally write whatever code you want.

[02:58:10] <cheeser> the migration updates the docs, deletes any old indexes, the app creates any missing indexes at start up

[02:58:10] <crocket> GothAlice, What if I suffled fields?

[02:58:37] <crocket> cheeser, the docs?

[02:58:40] <GothAlice> documents

[02:58:47] <cheeser> geez

[02:58:57] <crocket> I thought "the docs" meant the documentations.

[02:59:11] <cheeser> we weren't talking about documentation

[02:59:30] <JesusRifle> GothAlice: cheers for the help, im getting this back when printing error. { [MongoError: Connection Closed By Application] name: 'MongoError' }

[02:59:33] <crocket> When you say "docs", it usually means 'documentations'.

[02:59:47] <JesusRifle> I've also gone ahead and fixed the conflict

[02:59:48] <cheeser> no. it doesn't.

[03:00:22] <GothAlice> JesusRifle: What does it say in your MongoDB logs?

[03:00:25] <crocket> cheeser, We came from different backgrounds, so don't expect people to know what abbreviations mean all the time.

[03:00:47] <cheeser> well, i do expect people to apply a little critical reasoning and context clues.

[03:01:13] <crocket> cheeser, I thought you were talking about updating documentations via some bizzare procedures.

[03:01:22] <crocket> It was a possibility that I couldn't rule out.

[03:01:36] <GothAlice> crocket: That code could update the "docs" if it wanted to.

[03:01:37] <GothAlice> That's the point.

[03:01:45] <GothAlice> That code is generic code; nothing db-specific.

[03:02:07] <cheeser> crocket: that's an incredibly long leap

[03:02:10] <GothAlice> Just so happens to be handy to run code to change the database when deploying, so most of them do do databasey things.

[03:02:16] <crocket> cheeser, As I said different backgrounds.

[03:02:27] <cheeser> clearly

[03:02:32] <crocket> cheeser, And, we haven't shared much in common.

[03:02:37] <JesusRifle> which logs should i be looking at? ivs tried tailing /var/log/mongodb/mongod.log

[03:02:47] <GothAlice> JesusRifle: What OS are you using?

[03:02:57] <JesusRifle> linux, archlinuc

[03:03:03] <JesusRifle> archlinux

[03:03:06] <cheeser> ah yeah!

[03:04:03] <JesusRifle> I understand the code I have reallt doesnt do much, but I don't understand why I can't query for all documents and then query for them each individually one at a time

[03:04:03] <GothAlice> JesusRifle: /var/log/mongodb/mongod.log is the correct file; make sure mongod can write to it (ps aux | grep mongod — see who it is running as, make sure /var/log/mongdb can be entered by that user, too)

[03:05:06] <JesusRifle> there are entries in mongod.log but nothing I could see that was useful

[03:06:56] <JesusRifle> last thing is "waiting for connections on port 27017" all seems good

[03:06:58] <GothAlice> JesusRifle: db.close(); — what, precisely, does this do?

[03:07:06] <GothAlice> (in your original code)

[03:07:26] <JesusRifle> bugger, im closing before all the callbacks have finished :)

[03:07:49] <crocket> Does anyone use Mongoose on nodejs?

[03:08:35] <JesusRifle> brilliant thanks GothAlice

[03:08:42] <GothAlice> Is Mongoose the one that generates completely random ObjectIds?

[03:09:02] <crocket> I don't know.

[03:09:08] <crocket> Mongoose looks good?

[03:09:38] <GothAlice> I avoid JS, and noted adding another item to the reasons why I don't use it as a primary language, so no. It looks terrible. ;P

[03:09:50] <GothAlice> (And no, Mongoose does right by its ObjectId instances.)

[03:10:21] <crocket> GothAlice, What do you use?

[03:10:22] <crocket> python?

[03:10:28] <GothAlice> Yes. And lots-o BASH.

[03:10:38] <crocket> Python is not significantly better than js.

[03:10:42] <JesusRifle> yes as you might have guessed this is my first time with node

[03:10:53] <GothAlice> crocket: Reeeeeally nau.

[03:11:12] <crocket> Python is clean, but once you get the hang of JS, you don't feel much of difference.

[03:11:26] <cheeser> new version of Motor today

[03:11:27] <crocket> At least in nodejs.

[03:11:37] <crocket> OK.

[03:11:46] <crocket> Python beats ndoejs in documentation.

[03:12:44] <GothAlice> crocket: Except a majority of Node.js MongoDB questions I have attempted to assist with for the last two weeks has somehow faulted the operation or use of callbacks, or callbacks have obscured the problem.

[03:12:51] <GothAlice> Python beats JS in terms of obviousness.

[03:13:03] <crocket> GothAlice, You can avoid callbacks.

[03:13:19] <crocket> You can use promises.

[03:14:56] <GothAlice> http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf — §9.3.1 ToNumber Applied to the String Type, page 44 through 46.

[03:15:05] <crocket> It seems python is better.

[03:15:08] <crocket> not much, though.

[03:15:53] <GothAlice> Python also incorporates an arbitrary precision type (good for finance) and incorporates transparent bignum support, so, like, I can actually do crypto in Python.

[03:16:11] <cheeser> yeah. i wish java had that feature

[03:16:14] <cheeser> and array slicing

[03:16:18] <GothAlice> And Pypy's JIT does some mind-bending things. (Including producing Fibonacci solvers that run faster than equivalent C code.)

[03:17:03] <GothAlice> Python also runs PHP and Ruby code faster than PHP (or even Facebook's PHP-to-C++ compiler) or Ruby.

[03:17:04] <GothAlice> ;P

[03:17:08] <crocket> I learned python 3 a while ago.

[03:17:11] <cheeser> PHP is terrible

[03:17:15] <crocket> I could probably start a new project in python 3.

[03:17:18] <cheeser> no one should ever use it

[03:17:31] <GothAlice> cheeser: (You skipped over the funny bit where Python did PHP better than PHP.)

[03:17:38] <crocket> I need to make a websocket chat application with mongodb as storage.

[03:17:46] <crocket> What web frameworks do you recommend for python?

[03:17:51] <cheeser> sorry. the black hole of PHP suckitude snagged me

[03:17:54] <GothAlice> I run my own: http://web-core.org

[03:18:20] <GothAlice> It's been in use for about 10 years, stable and unmodified in… 6 or so?

[03:18:25] <GothAlice> *in/for

[03:18:32] <crocket> GothAlice, That means dead.

[03:18:37] <GothAlice> Except not.

[03:18:58] <crocket> unmodified in 6 years?

[03:19:13] <cheeser> that means stable

[03:19:14] <GothAlice> Pretty much. WSGI hasn't changed in even longer.

[03:19:18] <cheeser> dead means not in use.

[03:19:22] <GothAlice> ^

[03:19:36] <crocket> I wouldn't use a project that hasn't had a single change in the last 6 years.

[03:19:37] <GothAlice> It's a microframework that predates most of the term's usage in Python. (I.e. before Flask and Bottle.)

[03:19:44] <GothAlice> crocket: Your loss. :)

[03:19:58] <crocket> It means the community around it dead ,too.

[03:20:02] <GothAlice> https://github.com/marrow/WebCore/blob/rewrite/examples/monster.py — these are examples of various controller structures

[03:20:33] <crocket> I need to make a web chat application with websocket.

[03:20:35] <crocket> Which framework?

[03:20:39] <GothAlice> crocket: It has users, including several cities.

[03:20:42] <crocket> I probably need to go to #python

[03:20:44] <GothAlice> crocket: Wrong question.

[03:20:56] <GothAlice> crocket: http://ecanus.craphound.ca/chat/

[03:21:01] <crocket> GothAlice, If it has users, users should have submitted pull requests.

[03:21:16] <GothAlice> crocket: Why? If there was nothing that needs to be added?

[03:21:25] <GothAlice> https://gist.github.com/amcgregor/2712270 is the code behind that chat widget.

[03:21:30] <crocket> GothAlice, Customer needs can't be anticipated in advance.

[03:21:36] <cheeser> really, crocket? you're going to lecture GothAlice about the health of her own framework she's used happily for 10 years?

[03:21:40] <GothAlice> crocket: Then you fail to understand what a framework is for.

[03:21:57] <crocket> GothAlice, It seems your project is a very exception.

[03:21:58] <GothAlice> Try out the chat example, though; I may be able to save you a lot of time.

[03:22:39] <crocket> GothAlice, Where is the chat example again?

[03:22:47] <crocket> I need a live demo.

[03:23:04] <crocket> Oh, that link

[03:23:16] <GothAlice> Frameworks are there to provide a stable foundation from which to build your own application on top of. What's the point if your stable never solidifies? Make it modular (like WebCore is) and the framework itself need not change to accommodate different needs, you only need to pick different plugins.

[03:41:17] <GothAlice> One only *needs* websockets if one is transmitting binary data, AFIK.

[03:41:55] <GothAlice> (And even then, you could just base64 things, but websockets would be more efficient on binary data, so even then, one wouldn't *need* it.)

[03:42:50] <crocket> GothAlice, You make it sound as if websocket were bad for real-time communication.

[03:43:41] <GothAlice> They're unneeded for real-time communication. Also a lot less manageable to debug. &c. &c.

[03:44:08] <cheeser> they're not so bad to debug.

[03:44:18] <crocket> Not on nodejs

[03:44:22] <cheeser> wireshark, e.g., supports the framing format

[03:44:41] <GothAlice> Yeahp. Too cool for JSON. ;P

[03:46:09] <cheeser> nodejs is orthogonal to websockets.

[03:46:26] <cheeser> there are many reasons to not use nodejs. that's just one more.

[03:46:47] <crocket> cheeser, For now, I have decided to go with node.js

[03:46:58] <crocket> I'm going to quit my current company in 3 months anyway.

[03:47:14] <crocket> And, I'm already familiar with nodejs ecosystem.

[03:47:17] <cheeser> i'm not going back down that rabbit hole

[03:47:35] <crocket> cheeser, as if python were not a rabbit hole.

[03:47:58] <cheeser> i'm not going to debate the issue.

[03:48:10] <crocket> cheeser, java is not better in my opinion.

[03:50:05] <crocket> GothAlice, There are some mature websocket libraries for nodejs and web browsers.

[03:50:16] <GothAlice> crocket: Why need a "library"?

[03:50:39] <crocket> GothAlice, You don't want to deal with a protocol without a library.

[03:51:04] <GothAlice> Would jQuery count as a "library" in this case, even though XHR is a minor part of the generalist package?

[03:51:26] <crocket> Anyway

[03:51:33] <crocket> Should I use mongodb driver or mongoose on nodejs?

[03:52:30] <GothAlice> (The reason I ask is that https://gist.github.com/amcgregor/2712270#file-jquery-nhpm-js is the entirety of the "library" supporting the chat widget I demonstrated.) Mongoose looks reasonable, and using schemas can be handy, esp. as a validation layer for input.

[03:53:14] <crocket> GothAlice, It relies on external HTTP push server.

[03:53:22] <crocket> doensn't it?

[03:54:16] <GothAlice> crocket: That JavaScript is client-side, so yes, it relies on a messaging server of some kind. (I have an implementation in Python, NHPM provides one for Nginx, and the protocol is trivial.) https://pushmodule.slact.net/protocol.html

[03:54:17] <crocket> I need to make a web server.

[03:54:50] <GothAlice> crocket: https://github.com/marrow/marrow.server.http/blob/develop/marrow/server/http/protocol.py#L31 — I've done that, too. (And they're harder than you think.)

[03:54:51] <GothAlice> ;)

[03:55:06] <GothAlice> (Read HTTP: The Definitive Guide before attempting such a task.)

[03:55:14] <crocket> GothAlice, As I said, I decided to go with nodejs and websocket.

[03:55:54] <crocket> That way, I can just finish the job and be relieved of the job in 3 months.

[03:56:08] <crocket> I can do what I like after I quit.

[03:56:20] <GothAlice> crocket: You: "I need to make a web server." Me: "Here's one you can use as a point of reference." I'll add that it's an async server, very suitable for translation into Node.js, with much callback goodness.

[03:56:47] <crocket> GothAlice, I guess NHPM doesn't have enough support on nodejs.

[03:57:06] <GothAlice> crocket: Have you read the protocol doc?

[03:57:10] <crocket> GothAlice, no

[03:57:15] <GothAlice> Right-o, then.

[03:57:17] <crocket> GothAlice, I don't care protocols.

[03:57:22] <crocket> I care about API.

[03:57:30] <crocket> and community support.

[03:58:51] <quuxman> Is there a way to make a query where a certain field is not duplicated in the results?

[03:59:00] <GothAlice> quuxman: Define duplicated?

[03:59:32] <quuxman> so if foo.entity_id is the same as a previous foo.entity_id in the result set, it would skip over it

[03:59:56] <GothAlice> quuxman: That's an aggregate query and use of $first in a $group operation.

[04:00:24] <quuxman> sounds good

[04:00:54] <GothAlice> (Group on foo.entity_id, then $first everything? I suspect there is a smoother way.)

[04:01:03] <quuxman> Another related question: is limit and skip still scan all the records that would've been scanned without those options?

[04:01:24] <GothAlice> AFIK yes, skip will always either walk the collection, or walk an index.

[04:01:35] <quuxman> :-[

[04:01:36] <GothAlice> (You can't randomly seek in a btree index.)

[04:02:07] <quuxman> but couldn't it crawl around, guessing how far to jump ahead?

[04:02:39] <GothAlice> quuxman: That would give highly variable performance all the time instead of relatively linear performance as you more deeply seek.

[04:02:54] <GothAlice> (One of the reasons why ordering the columns in compound indexes becomes important. ;)

[04:03:24] <GothAlice> (By order, I mean ascending/descending, OFC.)

[04:03:26] <quuxman> so how do you recommend implementing pagination with a query that does a group?

[04:04:54] <GothAlice> quuxman: In cases where seeking pops the query beyond the acceptable delay, I have it write results to a temporary collection, then use that instead. (Saving the time to re-aggregate on subsequent pages.) It "snapshots" the results when the query is made, and I have to record and clean-up the collections after they're no longer used. (15 minutes after last access.) If it's been cleaned up and another page is requested, we re-query (and

[04:04:54] <GothAlice> accept that it might be slow for the user that time.)

[04:05:32] <quuxman> crazy. Is pagination just fundamentally hard?

[04:05:59] <Boomtime> to do well, yes

[04:06:04] <Boomtime> it's easy to do badly

[04:06:22] <GothAlice> But I'm also notorious for not paginating things. I.e. I can emit a search for everything in my CMS in a city dataset covering 10+ years of PDFs and whatnot (including result ranking on nothing) and display all results in < 2 seconds.

[04:06:36] <GothAlice> One page.

[04:07:06] <GothAlice> Why obscure data from the users when you don't need to, eh? ;)

[04:07:45] <Boomtime> depends if you are prepared to send your database to the user

[04:08:04] <GothAlice> I'm willing to send the title, link, and description of every asset they have authorized access to, yes.

[04:08:16] <Boomtime> then sure, send away

[04:09:40] <Boomtime> note there is bandwidth and db performance questions in the mix - if you're search constraints are such that these are not applicable then the answer is also yes

[04:09:48] <Boomtime> you're = your

[04:50:22] <fly-away> Starting mongod: /usr/bin/dirname: extra operand `2>&1.pid'

[04:50:22] <fly-away> Try `/usr/bin/dirname --help' for more information.

[04:50:29] <fly-away> great start, mongo

[04:51:06] <GothAlice> fly-away: I suspect you are laying blame in the wrong place. How, exactly, are you attempting to start the service?

[04:51:44] <fly-away> /etc/init.d/mongod start

[04:51:50] <GothAlice> Which distro?

[04:52:01] <fly-away> centos 6

[04:52:05] <GothAlice> Ouch.

[04:52:30] <GothAlice> Well, there's a problem with your initscript. How did you install mongodb?

[04:52:51] <GothAlice> (MongoDB doesn't ship from MongoDB with an initscript…)

[04:52:54] <GothAlice> (AFIK)

[04:53:17] <fly-away> there is a problem with mongodb-org package

[04:53:37] <fly-away> which I just got from official repo

[04:54:06] <GothAlice> "2>&1.pid" looks more like a configuration issue. Is there a /etc/conf.d/mongod or /etc/defaults/mongod?

[04:54:24] <GothAlice> (And could you pastebin your /etc/rc.d/mongod or /etc/init.d/mongod?)

[04:55:04] <fly-away> https://groups.google.com/forum/#!topic/mongodb-user/7FuQD3UmcZU

[04:55:17] <fly-away> its not mine

[04:55:27] <fly-away> its all default

[04:56:36] <GothAlice> Here's mine: https://gist.github.com/amcgregor/8ccb06ebc15c7fb4ddd4

[04:56:46] <GothAlice> (Mine by virtue of the fact that it's the one I use.)

[04:56:57] <quuxman> GothAlice: "$group does not order its output documents". Does that mean indexes on the collection that is grouped can't be used to order the output?

[04:57:08] <GothAlice> quuxman: Correct.

[04:57:16] <quuxman> that makes it useless for me

[04:57:39] <GothAlice> quuxman: However, I recall something about certain optimizations in recent 2.6 relating to having $sort immediately follow certain operations…

[04:58:51] <GothAlice> Nevermind, it was about $sort followed by $limit.

[04:58:59] <quuxman> but then on $first it says "Order is only defined if the documents are in a defined order." which I read as maintaining the order of the query that went into the $group

[04:59:18] <GothAlice> Aye.

[04:59:53] <quuxman> hm, but expressions can't sort

[05:00:07] <GothAlice> $sort prior to $group

[05:00:42] <GothAlice> If you $match, then $sort, then $group the first two stages can use the index.

[05:01:02] <quuxman> I think that'll be fine

[05:46:31] <crocket> Hey man

[05:46:38] <crocket> I'm getting started up with mongodb on nodejs.

[07:45:23] <Tark> hey guys! I have a collection of documents with field price: [10, 100] or price: [0, 500]. I need return a cursor with fields without zero. Aggregation framework returns a CommandCursor without a count() method (why?)

[07:45:30] <Tark> what should I do?

[07:45:44] <crocket> Man

[07:47:25] <Tark> aw... hm... maybe $elemMatch would works

[07:58:55] <Tark> yep, it works. Am I good? I am good!

[07:58:57] <Tark> thanks!

[08:02:43] <crocket> Are collection names case-insensitive?

[08:34:22] <Soothsayer> Is there anything unusual about setting a BinData (md5) for the _id field?

[08:34:47] <Soothsayer> I am maintaining a collection that maps URLs to some kind of resource for URL rewriting. I’ve made the _id as the md5 binary of the URL so I don’t have to create another unique index

[09:31:02] <Hypfer> hi, any idea where I can get valid international addresses which I can simply import into mongo?

[09:49:49] <awasthinitin> anybody used any gui based mongodb manager for node server. Linux server

[09:49:52] <awasthinitin> ?

[10:05:34] <Hypfer> awasthinitin: what?

[10:06:22] <awasthinitin> hypfer: looking for a phpmyadmin like manager for mongodb

[10:07:51] <awasthinitin> to install on a linux server running nodejs

[10:22:09] <fleetfox> Hello, NumberLong(0), to bool?

[10:24:46] <fleetfox> I'm confused, i have "this.a || this.b" in $where. Both are NumberLong(0), and i get true

[10:25:04] <fleetfox> Isn't 0 a falsy variable in js?

[10:32:53] <Hypfer> awasthinitin: take a look at robomongo. its not web basedbut it works fine.

[10:32:57] <fleetfox> rs0:PRIMARY> Boolean(NumberLong(0));

[10:32:57] <fleetfox> true

[10:32:57] <fleetfox> rs0:PRIMARY> Boolean(0);

[10:32:57] <fleetfox> false

[10:33:04] <fleetfox> How does this make sense ?

[10:35:24] <Avihay_work> Soothsayer: there's the possibility of hash collisions. I'm guessing that if your URLs are unique (or otherwise an md5 scheme would be pointless), you can use them as the _id, and I'm guessing mongo will use hashes with collision solving for the index in both cases

[10:35:52] <Avihay_work> disclaimer - I started reading bout mongo "yesterday"

[10:40:21] <Soothsayer> Avihay_work: naa, I’m beyond those issues of hash collision. I’m more dealing with issues of value length of indexed field

[11:00:23] <kali> Soothsayer: don't worry that is fine

[11:00:59] <kali> Soothsayer: the first limit you're going to hit this way is at 700bytes or so (the index will truncate the value)

[11:01:34] <Soothsayer> kali: got it.. but if im storing an md5 binary i won’t hit that limit anyway

[11:01:42] <Soothsayer> kali: so I can have really long URLs without any hassle

[11:02:17] <kali> exactly

[11:02:25] <Soothsayer> kali: its more work at the application and over-all debugging / management level.. (even most Mongo GUIs - Robomongo, Rockmongo, etc.. do not know how to deal with a document with an _id as binary content)

[11:02:32] <Soothsayer> so i was wondering if its worth the effort

[11:02:33] <kali> i use exactly the same trick to manage wikipedia titles at some point

[11:02:47] <Soothsayer> got it

[11:03:34] <kali> true, it can be unpractical at some points. an alternative is to store the hash as a md5 string, but i don't recommend it

[11:03:44] <kali> i mean a hex string

[11:03:58] <kali> or in base64 or whatever

[11:04:02] <Soothsayer> kali: ye, got you, that was my earlier strategy.. but then felt if im being efficient, lets be really efficient if the effort is not exponential

[11:04:31] <kali> i think binary _id should be manageable :)

[12:50:43] <crocket> Does anyone use mongoose?

[12:51:14] <crocket> Why dose mongoose convert the first capital letter to a lowercase in collection name?

[12:59:30] <kali> crocket: that's a "feature" borrowed to rails framework: translating ClassName to collection_name. i hate it. most of the time, these framework also try (and regularly fail) to make up a plural name for the class and use it as the collection name

[12:59:49] <kali> crocket: mongoose certainly allows you to specify an arbitrary name for the collection anyway

[13:08:26] <crocket> kali, It's not even documented.

[13:46:13] <feathersanddown> Hi :)

[14:09:22] <crocket> kali : Where is it documented?

[14:23:18] <kali> crocket: no idea

[14:23:27] <crocket> kali, I found it.

[14:23:32] <crocket> I hate pluralization.

[14:23:36] <crocket> It only adds to confusion.

[14:23:51] <kali> +1

[14:23:56] <crocket> It leads to anger and confusion.

[14:24:06] <kali> it's soooo 2007

[14:25:47] <crocket> It's 2014.

[14:25:49] <crocket> 7 years

[14:26:27] <crocket> Anyway

[14:26:32] <crocket> I've read mongoose guide.

[14:27:12] <cheeser> oh, dera

[15:09:44] <gswallow> We have been hit with this bug: https://www.google.com/url?q=https%3A%2F%2Fjira.mongodb.org%2Fbrowse%2FSERVER-15369&sa=D&sntz=1&usg=AFQjCNE3zw7Kx80E1Z3noeykhESwFA29ww

[15:10:05] <gswallow> Ungooglified: https://jira.mongodb.org/browse/SERVER-15369

[15:11:08] <gswallow> Oddly, we are on the kernel that's supposed to fix this bug (3.13.0-39-generic #66~precise1). We are running Mongo 2.6.3.

[15:11:44] <gswallow> There's mention in the bug report that you can fix the corrupted namespace file, though with the size of the impacted database being a whopping 0.28 GB, I could just copy over the database. It's not hot.

[16:12:03] <ctorp> Is it possible to force the order of the keys in a .find() result?

[16:13:02] <ctorp> ie returned object {"first":bar, "last":baz} instead of {"last":baz, "first":bar}

[16:19:46] <cheeser> no

[16:24:28] <ctorp> cheeser: thanks. good to know.

[16:35:37] <TCMSLP> hey all - quick question - we're trying to disable anonymous access to mongo (2.4.6) via AUTH=true but we can still connect without credentials

[16:35:46] <TCMSLP> any $clue?

[16:36:03] <GothAlice> TCMSLP: Have you added a user?

[16:36:22] <TCMSLP> This is part of an Openshift install, so there's an openshift user

[16:36:26] <TCMSLP> with password set

[17:12:22] <syllogismos> http://docs.mongodb.org/manual/core/query-optimization/#limitations

[17:12:38] <syllogismos> can someone explain the second limitation regarding indexes

[17:13:24] <syllogismos> { _id: 1, user: { login: "tester" } }

[17:13:30] <syllogismos> ^ my document

[17:13:41] <syllogismos> { "user.login": 1 }

[17:13:44] <syllogismos> ^ my index

[17:14:02] <syllogismos> db.users.find( { "user.login": "tester" }, { "user.login": 1, _id: 0 } )

[17:14:14] <syllogismos> ^ the above query is not covered?

[17:14:40] <syllogismos> sample query that is covered by the index?

[17:15:42] <GothAlice> The index, in that situation, is used for filtering only. It can't be used to optimize ("indexOnly") the results.

[17:17:42] <syllogismos> However, the query can use the { "user.login": 1 } index to find matching documents.

[17:17:46] <syllogismos> what does that mean?

[17:18:13] <syllogismos> I got this example from the docs

[17:18:21] <syllogismos> GothAlice

[17:18:36] <GothAlice> What I said: that index can be used to filter results (used in the query), but data won't be pulled from the index to try to optimize the returned results.

[17:20:40] <syllogismos> db.users.find({"user.login": "tester"}, {"_id: 0})

[17:20:49] <syllogismos> db.users.find({"user.login": "tester"}, {_id: 1})

[17:20:51] <syllogismos> this works

[17:20:54] <syllogismos> the second one?

[17:21:15] <syllogismos> db.users.find({"user.login": "tester"}, {_id: 1, "user.login": 0}) more specifically

[17:22:09] <GothAlice> It'll work regardless.

[17:22:14] <GothAlice> Read the section on "indexOnly".

[17:22:53] <syllogismos> I mean if we are taking advantage of the indexing,

[17:23:09] <syllogismos> ok, I will read about indexOnly now..

[17:23:11] <syllogismos> thanks GothAlice

[17:23:17] <GothAlice> Actually, that whole link you gave explains it: you can query a collection (retrieving data that requires lookup in the collection, which uses indexes to filter) or you can query an index (only retrieve data that is indexed, uses the index to filter and retrieve data.)

[17:23:32] <GothAlice> Indexes on embedded documents can't be used for the latter of those two.

[17:24:26] <GothAlice> It's not like the index won't otherwise be used, however. .find({"user.login": "tester"}) will use the index and be nicely efficient.

[17:26:56] <GothAlice> syllogismos: ^

[17:28:21] <syllogismos> yeah im trying to understand it :) it is 11:00 pm and im tired

[17:28:22] <syllogismos> :D

[17:44:04] <Rickky> morning

[17:52:58] <shoshy> hey, for some weird reason my mongo "stopped" working (on an instance), i used the bitnami one for quite some time, but since today it just stopped, here's what happens when i run mongo: http://pastie.org/9712248

[17:53:20] <shoshy> i want to backup the db, i was able to fork it... but not sure how i can update

[19:24:34] <NaN> who can I see the log or the last inserted data?

[19:27:36] <bazineta> Curios if anyone is using EC2 general purpose data volumes for data, instead of provisioned. I'm having a hard time seeing the downside; they're faster, so why are they cheaper?

[19:28:22] <bazineta> i.e., 200GB 1K PIOPS is $90/month, buth a GP SSD at 500GB is 1.5K to 3K burst IOPS, and it's $47/month.

[19:28:41] <bazineta> Surely I'm missing something.

[19:29:17] <kali> bazineta: without piops, you have no warranty at all on bandwidth

[19:29:43] <kali> bazineta: with piops, you know what you get, and you don't get more, but at least you know you'll always get that

[19:29:53] <bazineta> They guarantee 3 IOPS/GB on the GP SSD

[19:30:03] <kali> for a burst

[19:30:06] <bazineta> i.e., at 500GB, it's 1.5K IOPS

[19:30:09] <bazineta> Bursts to 3K

[19:30:09] <kali> but not for a long time

[19:30:17] <kali> not sustained

[19:31:04] <bazineta> Unless I'm misreading the docs, the 3 IOPS/GB is a minimum, guaranteeed

[19:31:15] <bazineta> with burst above that, scaling in time based on the volume size

[19:31:57] <kali> mmm ok, give me the links you're looking at

[19:32:41] <bazineta> http://aws.amazon.com/blogs/aws/enhanced-ebs-throughput/

[19:32:56] <bazineta> http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html

[19:33:08] <bazineta> The second has the 3 IOPS/GB info

[19:36:16] <bazineta> So the confusing thing is, as I read that, let's say I bought a 1TB GP SSD volume. I can run 3K IOPS on that indefinitely. It's dramatically cheaper than an 3K PIOPS volume. So why would I consider a PIOPS volume?

[19:36:55] <kali> it has changed this last time i had to provision capacity

[19:37:06] <kali> s/this/since/

[19:37:51] <bazineta> Yes, same here. I've used PIOPs to date, but now I'm starting to saturate during background flush, so was looking into options.

[19:41:15] <shoshy> hey, if i've copied all the files @ /data/db/ can i just copy them to a different instance with mongo and enjoy the "new" , restored db?

[19:41:27] <kali> bazineta: well, i'm like you. i can't make sense of it

[19:41:31] <bazineta> And it seems practically ideal; flush takes about 1.5 to 2 seconds out of every minute, so I'd always be full of burst credits.

[19:41:46] <kali> bazineta: might be the right time to get in touch with support if you have that option

[19:42:09] <shoshy> because something werid happend with my mongo, and i can't seem to get into the cli / run it properly.. so i've spawned a new instance and just want to restore it

[19:42:34] <kali> shoshy: if the files are "sane", it will work

[19:42:42] <bazineta> @shoshy journal as well if you have it separate, otherwise should be ok

[19:42:58] <kali> shoshy: provided that the mongod was switched of when you copied the data

[19:44:29] <shoshy> kali: i see, thanks! and how can i maybe try to run the cli of mongo? this is the error i get when i try to "sudo mongo" http://pastie.org/9712561

[19:45:12] <cheeser> mongod isn't running

[19:45:12] <shoshy> bazineta: thank you!

[19:45:49] <kali> shoshy: you need to check out the mongod logs

[19:45:55] <ron> SHOSHANA!

[19:48:39] <shoshy> cheeser, kali: thank you

[19:49:38] <mwatson> Hi, I'm having an issue with duplicate keys during an aggregate query out operation, with the main objective of unwinding an array in each document. What's the recommended way to generate a new id for each document?

[19:54:54] <bazineta> kali, https://forums.aws.amazon.com/thread.jspa?messageID=551626&#551626

[19:55:36] <mwatson> https://groups.google.com/forum/#!topic/mongodb-user/fjp1URnu4mo any help would be greatly appreciated

[19:56:19] <kali> bazineta: ha. for tiny disks.

[19:56:45] <bazineta> kali pretty much, for anything of reasonable size, looks like the GP SSD is the way to go

[19:58:29] <kali> bazineta: indeed.

[20:03:39] <mirageglobe> bit of a side track.. anyone uses bootstrap3? and what is the room for that?

[20:05:42] <mk5053> Can anyone help me with my $aggregate [$unwind $out] duplicate index issue?

[20:32:13] <GothAlice> mk5053: You're mwatson? ;) I'm looking at it now.

[20:32:27] <GothAlice> mk5053: What is the exact duplicate key error?

[20:32:37] <mk5053> GothAlice: yep!

[20:33:30] <GothAlice> mirageglobe: You want ##bootstrap — "unofficial" channels have two pound symbols at the front.

[20:34:27] <Derick> £ ← pound sign

[20:34:37] <GothAlice> Not pound symbol.

[20:34:40] <GothAlice> :P

[20:34:41] <mk5053> GothAlice: The error depends on my exact project operation. If I hide the _id field, the out operation inserts no documents

[20:35:35] <GothAlice> mk5053: Sounds like you really want a new compound ID. Say, _id: {oid: "$_id", value: "$value"}

[20:35:50] <GothAlice> (oid = "old" or "original" ID)

[20:36:19] <GothAlice> You're effectively re-"unique"ing on that compound key. :)

[20:36:40] <mk5053> GothAlice: Yeah that should work!

[20:36:55] <GothAlice> (Or, use $value.timestamp, if no two events can share the same exact moment in time.)

[20:37:05] <GothAlice> (It'd be more efficient.)

[20:37:33] <GothAlice> Heck, I'm blind. $value.distinct_id — you have a distinct ID. XD

[20:53:46] <mirageglobe> thanks!

[21:51:19] <idni> hi. since ive never worked with mongo db - is it appropriate for things like e.g. a pretty big project management software? sounds hard to do something like this without joins

[21:51:50] <GothAlice> idni: MongoDB gives you the flexibility to model your data in very different ways than most are used to.

[21:53:01] <GothAlice> For example, instead of having a table of threads and a table of replies and a table of forums those threads belong to, and rely heavily on joins, you might have one "collection" of threads with the replies stored in the same record as the thread the replies belong to, and a collection of "forums".

[21:53:27] <idni> sounds pretty intereseting

[21:53:40] <hfp> Hi all, what is the best way to use mongoimport with a tsv file on travis-ci? All my builds fail because it doesn't load the mongo data... I ahve `mongoimport -d cities -c cities --file data/cities_canada-usa.tsv --jsonArray --type tsv --headerline` in my travis file. How do you guys do it?

[21:53:48] <GothAlice> MongoDB provides ways of returning specific subsets (slicing, get one by query) as well as atomically manipulate (such as append to a list, etc).

[21:54:51] <idni> but, as in your example the forum, whats about the users table? threads are posted by users, which are (probably?) stored in a user "collection"?

[21:56:43] <GothAlice> Yup.

[21:56:51] <GothAlice> You'd do manual joins where needed.

[21:57:19] <joannac> hfp: is your data actually a json array? doesn't sound like it

[21:57:43] <idni> but in big applications there will be tons of "manual joins"?

[21:57:46] <GothAlice> I.e. if you want to allow lookup of all threads containing a reply by a certain user, by username, you might db.users.findOne({username: 'GothAlice'}, {_id: 1}) to get the user's ID, then db.threads.find({'comment.author': GothAlice._id})

[21:58:05] <hfp> joannac: works locally with this argument, I didnt check without it

[21:58:28] <hfp> joannac: the mongoimport statement should be in `before_script` or in `script` ?

[21:58:46] <GothAlice> idni: This better models reality, though. It should be an error condition if a username doesn't exist, this provides a nice point to trap that condition.

[21:59:44] <GothAlice> idni: https://github.com/amcgregor/forums/blob/develop/brave/forums/component/thread/model.py#L69-L79 is an example of this model, in Python. :)

[22:00:21] <idni> thanks a lot! :)

[22:01:07] <idni> still not totally sure about this techonlogy, but still sounds interesting :)

[22:04:51] <GothAlice> idni: I've been a happy user for more than 6 years, with 25 TiB of data in my personal dataset.

[22:06:10] <idni> so much data without joins scares me a lot :P

[22:06:47] <Derick> you design your schema differently isntead

[22:06:50] <Derick> instead*

[22:07:44] <idni> thats probably my problem. i cannot imagine how to build a database without (tons of) relationships

[22:08:03] <GothAlice> idni: http://docs.mongodb.org/manual/data-modeling/

[22:08:07] <GothAlice> We have you covered. :)

[22:08:36] <GothAlice> I'd read this from beginning to end; it includes specific coverage of typical relationship patterns.

[22:10:14] <idni> i am definitely going to read it, thanks!

[22:11:18] <GothAlice> My CMS datasets use a combination of nested set, parent references, array of ancestors, and materialized paths in order to model hierarchical site structures. (Each part of that optimizes a specific type of query about the tree.)

[22:12:23] <GothAlice> You can get pretty creative with MongoDB, and some apparent instances of "data duplication" may be a good thing… as always, MongoDB really forces you to think about how you want to *use* your data.

[22:12:44] <GothAlice> idni: ^

[22:13:40] <idni> i think i need to be creative with a nosql database? :P

[22:13:42] <cheeser> i was just having a conversation about that with some people here in the office

[22:16:35] <GothAlice> idni: Well, the first thing a CMS would do would be to map a URL to a document, but it'd also need to worry about the security of each folder up to the root. To query for that most efficiently I do two queries: query for "path" in every combination of path elements up (i.e. /, /foo/, /foo/bar/, /foo/bar/baz), ordered by the path, querying for ACL rules.

[22:17:09] <GothAlice> This finds the deepest document in the database matching that path, gives me an efficient way to run through the security for each level, and then I do a second query to get the rest of the data about the deepest document.

[22:19:31] <GothAlice> Breadcrumb lists are common in CMSes, too, so I also store a list of the IDs and titles of each parent, which combined with the path produce a complete breadcrumb navigation without having to load any extra data. (If a title is updated, you can query and update all references in one operation…)

[22:19:39] <GothAlice> Much thought went into the modelling. :)

[22:19:52] <idni> acl rules?

[22:19:57] <GothAlice> Access Control List.

[22:20:15] <GothAlice> I.e. https://github.com/bravecollective/core/blob/develop/brave/core/group/acl.py

[22:21:31] <idni> ah ok

[22:24:15] <idni> but, hmm, the more i think about it, the more complicated it gets :p

[22:25:50] <idni> for example, my current erp solution has a feed of all actions (new projects, changes in projects, etc.). if these entries are shown depends on so much things (permissions, are you part of the project, etc.)

[22:26:56] <idni> the feed table itself just has a bunch of columns (action, user_id, project_id, etc.) - without joins, i cant imagine how to implement this feed :o

[22:29:13] <GothAlice> At work we event log. ¬_¬ Quite a bit, actually.

[22:29:21] <GothAlice> Including all HTTP requests and responses, with session data.

[22:30:44] <GothAlice> One approach I like to the security issue is the use of automatically generated textual tags; then each log entry can contain a few bytes per entity or group or role or whatever who is allowed to see the record.

[22:33:47] <idni> ive never heard of generated textual tags

[22:37:50] <GothAlice> idni: The group "Support Staff" has the tag "a". Any record support staff are allowed to see has the string "a" in a field containing a list of tags, etc. (You can even go so far as to have split readers/writers and limit your updates to only records containing tags relevant to the current user, and other interesting combinations. :)

[22:39:00] <idni> ah, i see

[22:39:40] <idni> but sounds bit difficult when there come new groups?

[22:39:49] <joannac> hfp: sounds like a travis-ci problem then, if it works normally

[22:40:52] <GothAlice> idni: Depends on the group. Most are simple $push appends to certain subsets based on queries across the other fields.

[22:43:19] <idni> hmm i see

[22:44:01] <hfp> joannac: I dont know, I dont get it. The tests pass on my machine but when I run it on travis it fails. the mongoimport runs fine but then the app can't see the data. I use localhost:27017 for the mongodb server in my app, is this different on travis?

[22:44:21] <GothAlice> idni: The tags are really just an optimization allowing me to avoid passing in large sets of "in"s in various combinations of $or and $and; it's like pre-aggregated security..

[22:46:37] <joannac> hfp: can you connect with the mongo shell and see the data?

[22:47:21] <hfp> joannac: on the travis test?

[22:52:51] <joannac> hfp: I don't know what that means. travis just runs tests and possibly scripts for you. where's your actual mongod process?

[22:53:09] <GothAlice> joannac: Travis provides certain services. MongoDB is one of them.

[22:53:30] <joannac> so your mongodb server (where all your data lives) is in the cloud?

[22:55:29] <GothAlice> https://github.com/marrow/marrow.task/blob/develop/.travis.yml (note the "services:" section) and I use https://github.com/marrow/marrow.task/blob/develop/.travis/install.sh#L8-L12 to make sure it's current.

[22:56:09] <GothAlice> hfp: Link to the failing build?

Log file Viewer

Help | Karma | Search:

#mongodb logs for Tuesday the 11th of November, 2014