[01:33:04] <joannac> so you'll get lots of collections called assets, games, players etc
[01:33:07] <Synt4x`> I know they will each be loaded as a collection, but within which DB?
[01:35:09] <Synt4x`> that makes sense right? the DB is different then the collection? like usually I had to say: use MY_DB and my collections would be there
[14:51:10] <Bitpirate> I'am using mongoose to update/insert (when not exitst) a document, how can i do this? because save() just keeps inserting
[15:45:22] <quuxman> Ironically I think when you really get into the details of queries and document's, Python's type system does a much better job representing mongo input and output than JS
[15:45:56] <quuxman> but for almost all practical purposes you can just treat the data as dicts
[15:45:57] <GothAlice> quuxman: Yes and no. The SON container types had to be written (instead of relying on the underlying and built-in list/dict types) due to the need to preserve order of the mappings.
[15:46:24] <GothAlice> {name: 1, age: -1} — the result of sorting on this as a Python dictionary is undefined.
[15:46:34] <quuxman> (I was responding to crocket saying Mongo works well with nodejs :-P)
[15:46:53] <GothAlice> I'm suggesting each language has had its compromises. ;)
[15:47:44] <GothAlice> The callback structure in Node makes debugging somewhat painful, as we've noticed over the last week. ^_^
[15:47:51] <quuxman> In Python it's actually still a little obnoxious to write the order, because whenever you write { ... } in code, the order of ... is lost
[15:48:23] <GothAlice> I use a wrapper that takes positional string arguments, i.e. .sort('name', '-age') and does the right thing.
[15:48:33] <quuxman> And I'm pretty sure Python doesn't support overloading parsing, so {} is not a dict in some cases
[15:48:33] <cheeser> document order should be insignificant anyway outside of an array
[15:48:54] <GothAlice> quuxman: False. {foo: bar} will always result in a dict. {foo, bar} will result in a set.
[15:49:05] <quuxman> right, I was trying to say that
[15:49:23] <GothAlice> It's a pretty clear syntax difference, though; one not often confused.
[15:49:26] <quuxman> oh, right. I needed to put a ':' in there
[15:49:42] <quuxman> I actually didn't know about {foo, bar}. Cool
[15:49:51] <GothAlice> cheeser: It's not. If you index on a field containing sub-documents, and want to match an exact sub-document, order is critical.
[15:50:31] <cheeser> oh for comparing document to document. right.
[15:50:36] <GothAlice> (The two documents hash differently.)
[15:50:39] <quuxman> I guess it looks like a dict comprehension without a ':' on the LHS
[15:50:46] <cheeser> but that's a shortcoming of the comparison
[15:51:05] <quuxman> Sometimes I learn more about Python in #mongodb than in #python (most of the time)
[15:51:07] <GothAlice> cheeser: I like fast hash comparisons. Deep field-by-field comparison would be hideously inefficient.
[15:51:49] <GothAlice> quuxman: (apologies to the #mongodb gods about this) but I started ##python-friendly after dissatisfaction with #python.
[15:51:55] <quuxman> this is amazing. Set comprehensions. :-)
[15:52:10] <GothAlice> quuxman: Technically it's a set notation generator comprehension, but…
[16:15:22] <winem_> good evening. I have a quick question regarding a small development setup. I have 2 nodes that can be used to run mongodb services. We use the recommended setups with 11 nodes in the prod env (more shards will be added soon)
[16:15:35] <winem_> which service would you run on which of the 2 hosts in the dev environment? or would you recommend a 3rd one?
[16:16:46] <winem_> my current plan is to run on node 1: a config srv, a mongos, 1 shard member. on node 2: option 1: a 2nd shard with 2 replica, to have a whole replica set or 2 shard members on node 1 and only the 2 replica per shard on the 2nd node
[16:18:11] <GothAlice> winem_: In development I use something like https://gist.github.com/amcgregor/c33da0d76350f7018875 to start up a replicated, sharded, authenticated cluster on one box for testing. This could be spread across two fairly easily. (There are two shards in this setup…)
[16:20:27] <winem_> great, let me check your link.
[16:38:52] <GothAlice> winem_: Was the script useful?
[16:39:48] <winem_> got a call. so I was just able to download it and open it in sublime yet
[17:20:33] <winem_> looked like the following lines beginning -- were comments.. now it looks fine
[17:28:09] <GothAlice> winem_: Must have been set to a SQL syntax, I'm guessing. ;) You should be able to modify the vars at the top and run locally if you wish, just to try it out.
[17:29:15] <winem_> and I check an option to merge the duplicate lines like 83 and 84
[17:29:54] <winem_> I guess you could do it with a single echo commant using tee to pipe the output to stdout and stdin of mongo
[17:34:39] <GothAlice> Aye; I was lazy when stripping those functions of confidential code. XD
[17:35:34] <GothAlice> (The real code uses ebegin/eend with proper error handling. It's quite pretty.)
[17:36:18] <winem_> lazy admins are good admins - this is one thing the developers and admins told me when I started to discover linux and work as an admin. some years later, I know what they mean :)
[17:36:46] <GothAlice> winem_: My philosophy is: "Anything worth doing twice is worth automating." ;)
[17:55:02] <winem_> I really like your coding style :)
[17:57:04] <GothAlice> I develop "best practices" and stick to them. Recently I've changed my FOSS project pattern to something more like: https://github.com/marrow/marrow.schema
[17:57:11] <winem_> the services will be run by root, right?
[17:57:14] <GothAlice> (That has more individual tests than executable statements in the library.)
[17:57:44] <GothAlice> winem_: No. In my real script I use start-stop-daemon to do the launching and service stopping, and it allows you to chroot and setuid/setgid on startup.
[17:58:51] <winem_> will take a look at the FOSS project on the weekend. it sounds good because I also have to rewrite a lot of setup scripts from prior colleagues without any error or exception handling..
[17:59:51] <winem_> and it's quite funny if you run a script without any errors but only a few things work like expected..
[18:13:15] <GothAlice> StanAccy: In my logging code I have a write concern of zero (I don't care), and I don't even check if the server received the data. Makes the inserts very, very fast, even for ridiculously large amounts of data. (I.e. every HTTP request, including session data, and every HTTP response are logged, cost is ~6ms.)
[18:19:28] <winem_> Goth, how would you use the 2nd node I have... some more shards or slaves? any recommendation?
[18:21:58] <GothAlice> winem_: From my script, right now both replicated shards are on one boxen. You could split each shard onto its own boxen. (Two boxen, each with three replica nodes, arranged into a sharded pair.)
[18:22:15] <GothAlice> Obviously the replication is somewhat pointless on one box, but this is a test rig. ;)
[18:23:34] <winem_> yes, it's only the test rig, so I ask me if I have to use the 2nd node for our devs or for my team - the ops :)
[18:28:23] <winem_> and another question you might answer. I was on a talk at a conference about mongoDB in production use and I'm very sure that she said you can use a list of "columns" as shard key and not only a single one
[18:28:40] <winem_> but I can't find that on docs.mongodb.org
[18:28:48] <GothAlice> My production cluster is divided into three pools: application servers, database nodes, and ops. Ops manages and automates everything else a la https://gist.github.com/amcgregor/4342022 (VM name pool and tracking collection), https://gist.github.com/amcgregor/2032211 (logging), etc.
[18:29:15] <winem_> I'd like to use a shard key of a column with a timestamp, application name, userid
[18:32:51] <winem_> yes, I started with several mysql clusters using everything you can imagine regarding replication, HA with corosync, pacemaker, etc, DRBD and so on.. then I switched to a new company and had only a few (unfortunately really old and basic) mysql hosts and to use oracle
[18:33:44] <GothAlice> We rolled out own HA oplog replication on Postgres. As a nifty fact, the experience from doing that (and automating it) resulted in my current behaviour of not giving my database servers any form of permanent storage.
[18:34:12] <winem_> now I want back to relational databases and I don't like mysql any more - since oracle bought it. some things like updates and release notes are poor documented, etc. so I decided to use postgres and to create the base for our new tracking services with mongoDB which inspires me a lot :)
[18:35:28] <GothAlice> winem_: https://gist.github.com/amcgregor/1ca13e5a74b2ac318017 — this is an example pre-aggregated click tracking record and a minor query to get stats back by day of the week. :)
[18:35:40] <winem_> ok, oplog went on my todo-list too :)
[18:36:51] <winem_> thanks. saves some time and I see how someone else is using it :)
[18:36:51] <GothAlice> winem_: https://developer.rackspace.com/blog/postgresql-plus-wal-e-plus-cloudfiles-equals-awesome/ for a longer write-up.
[18:37:13] <mike_edmr> I made a comment on a nodejs package, telling the author he could get rid of his _created field since its redundant with objectId
[18:37:20] <mike_edmr> he said it was too awkward to query IDs on date
[18:38:28] <GothAlice> Works like a hot darn, actually.
[18:39:35] <winem_> mike_edmr, let's create a self-help group :)
[18:39:58] <winem_> no one cares about normalization and optimization (indexes!!!!!)
[18:40:50] <cheeser> normalization isn't always an optimization.
[18:41:09] <cheeser> optimization is a function of use case
[18:41:15] <winem_> a friend studies in hamburg, germany... and their prof said "let's talk about mysql and database structures today"... she was quite happy and started her shell...
[18:41:27] <GothAlice> That's why I write things like complete C10K-capable, HTTP/1.1-compliant web servers in 171 opcodes… and other insane simplifications like that. ;)
[18:41:41] <winem_> then the prof says "please open this link. we'll use phpmyadmin, it's everything a developer needs"... this made me cry :(
[18:44:05] <winem_> have to leave the chat for a short meeting.. will be back in 20 minutes
[19:13:57] <winem_> sounds like you had a lot of fun...
[19:14:45] <GothAlice> It took 36 hours straight to recover everything, and that bout of insomnia was immediately followed by extraction of my wisdom teeth and four impacted molars in front of the wisdom teeth.
[19:15:28] <GothAlice> (Knew I had no chance of recovering the data immediately _following_ dental surgery… so I had to rush to get it all recovered before.)
[19:15:32] <winem_> but I think that every admin has to go through his personal nightmare in his career at least once :)
[19:16:26] <GothAlice> Cross-zone failures on AWS (which, in theory, should be impossible) would certainly classify as a nightmare.
[19:17:35] <GothAlice> (That's when we switched to Rackspace. At least our SLA doesn't over-promise any more. ;)
[19:17:43] <winem_> one of my nightmares occured last week. we were a whole week without any production environment because one (yes, only one!) storage in the data center of our hoster has crashed...
[19:18:01] <cheeser> over promise. under deliver. that's the secret to a successful business, right?
[19:18:03] <winem_> don't be happy about an oracle RAC installation if both notes are on the same storage...
[19:18:36] <GothAlice> We use MooseFS distributed filesystems across our application servers. Single points of failure just don't happen here. ;)
[19:19:07] <winem_> the good point about it: my boss gave me the budget to switch to another hoster with 2 datacenters and using software and technologies which are up to date, 2 data centers, etc...
[19:19:19] <winem_> this was not planned before mid of 2015 :D
[19:19:44] <winem_> MooseFS? never heard of that, to be honest... is it comparable to hadoop?
[19:19:56] <GothAlice> Heh; I had to suggest replacing our office internet connection. I wanted to set up an in-house replica of our production data and realized it'd just destroy the connection.
[19:20:06] <GothAlice> winem_: MooseFS and Hadoop are very different beasts.
[19:20:44] <GothAlice> MooseFS is a FUSE filesystem that has per-file and per-tree replication control, undeletion, and minor versioning support, amongst other goodies.
[19:20:54] <GothAlice> It's meant to be used as a standard filesystem.
[19:22:02] <winem_> but I guess I have enough new technologies and software I'd like to implement until the end of the year..
[19:23:17] <GothAlice> Yeah; my cluster stack has been growing for 6 years, now.
[19:23:25] <GothAlice> Pro tip: don't try to slap everything together all at once. ;)
[19:23:53] <winem_> exactly this is what I'd like to avoid
[19:25:11] <winem_> moving from oracle to postgres, using puppet, mongodb for our tracking data, replacing the applications manager with nagios and the movement of our whole production and staging environment to a new hoster should be enough for the next months :)
[19:25:27] <winem_> and I forgot that we started to use liferay as portal server...
[19:27:42] <GothAlice> That's my replacement, across several hundred nodes across several clusters.
[19:27:42] <winem_> and I play around with hubot...
[19:27:59] <winem_> so you use git for the setup of your servers?
[19:28:06] <GothAlice> Yup. Let me dig something up for you…
[19:28:29] <GothAlice> for cmd in $(git diff --name-only @{1}.. | xargs qfile -Cq | xargs --no-run-if-empty equery files | grep init.d/ | sort -u); do ${cmd} reload; done #truepowerofunix
[19:28:31] <winem_> sure :) I'm interested in everything I can learn from
[19:29:12] <GothAlice> That's a git post-merge hook that examines the last pull for which files changed, asks the system which packages own those files, then finds *all* files for those packages, filters to initialization scripts, and reloads the packages affected by the pull.
[19:29:16] <mike_edmr> Fabric in Python is push based config managment
[19:29:59] <winem_> holy heell... it's as easy as brilliant...
[19:30:01] <GothAlice> (With my management cluster, specifically Shodan, sending push notifications to the VMs over MongoDB capped collections when I push to Github, it gets notified, then notifies the VMs to pull.)
[19:31:36] <GothAlice> All of my VMs are identical: I run a purely homogenous environment. On startup they examine the kernel command line for which role (branch) to check out, switch to it, and attach the hooks provided by the role.
[19:32:18] <GothAlice> Part of those branches are the symlinks that control /etc/init.d service registration.
[19:32:20] <winem_> which leads into setup scripts with > 1000 lines which might be realized with much less in other languages... but I just have to do some things in bash - otherwise it feels not right :)
[19:33:17] <GothAlice> winem_: You may notice I tend to do a *lot* with very few lines. ;)
[19:33:22] <winem_> GothAlice, this is brilliant.. this shows how much we're influenced by all the companies, magazines and communities who say that you have to use puppet, chef and so on... it can be that simple
[19:34:21] <winem_> I guess everyone of us does this sometimes... also in scripts.. some of them got a comment like "don't know what happens below, but trust me - it works" when you take a look at it some months later
[19:34:23] <mike_edmr> its a powerful tool. just use it smartly and no issues
[19:34:25] <GothAlice> The key point here is that if Shodan doesn't get two pings from a VM in a row it literally spins up a new VM to replace the dying one and kills the dying one.
[19:34:40] <GothAlice> Why have processes running to "repair" running systems when you can just replace the faulty component?
[19:38:00] <GothAlice> winem_: I don't. While I did have remarkable physical boxen stability, VMs have let me take stability to the next level. (I don't measure reliability with nines any more… uptime on my primary is approaching three years continuous.)
[19:39:06] <winem_> yes, I know what you mean and you're completely right. but some thinks that required a lot of brain, nerves and nights are gone with virtualization
[19:41:03] <GothAlice> E-gads! I only ever bother checking uptime on my clusters these days, but one of my physical boxen deployments for a local company has an uptime exceeding 8 years!
[19:41:24] <winem_> this is what it should look like!
[19:41:45] <GothAlice> ¬_¬ It's an AMD Athlon box. Holy mackerel.
[19:46:01] <GothAlice> I'm, personally, waiting for the day they accidentally drywall in the server closet. I'm imagining following the CAT5 into room with no doors. XD
[19:48:34] <winem_> the mongos is listening on 127.0.0.1 on port 37017, right=
[19:48:43] <winem_> to late to hit the right keys... -.-'
[19:49:08] <GothAlice> winem_: My testing script adds 10000 to the default port number to keep it out of the way during the test run. I'd recommend setting that variable back to the default.
[19:50:14] <winem_> ok. but this should be fine to have a small environment to play around
[19:51:10] <winem_> I know I could find this in the official docs, but I'm really tired and have to ride home 30 kilometers by bike.. listening on 127.0.0.1 also allows connections from remote hosts?
[19:51:34] <winem_> not sure if it has to be 127.0.0.1 or if it's the default and only allows connections from localhost..
[19:51:34] <GothAlice> winem_: No. Listen on 0.0.0.0 to do that.
[19:52:02] <GothAlice> "listen" determines which network interface is used when creating the server sockets. 127.0.0.1 = only loopback connections are possible. 0.0.0.0 = any.
[19:52:12] <winem_> switching the bind_ip in /etc/mongod.conf affects all instances / services, right?
[19:52:18] <GothAlice> (You can also specify things like the "internal network" interface, i.e. on rackspace/amazon.)
[19:52:30] <GothAlice> winem_: That script of mine uses no configuration files.
[19:53:04] <winem_> like I said I'm really tired :D
[19:53:26] <GothAlice> The default is to bind to all interfaces when unspecified.
[19:53:55] <winem_> working for more than 110 hours in the last week to create a lot of workarounds and setup servers in our development environment to take over the productive services temporarily was not very funny...
[19:55:10] <winem_> imagine the most unprofessional hosting company and you have still no idea of what I have seen in the week without the prod env...
[19:55:22] <cheeser> that still leaves you 8 hours to eat/sleep every day. no excuses!
[19:55:54] <GothAlice> vm bundle role=app tag=2.3.1 test register upload deploy cluster=production hosts=3 expire=4h — create three temporary (four hour) nodes in the production set after running the full test suite and creating, registering, and uploading a new VM image.
[19:56:04] <winem_> took them 2 and a half days to find out how to check if there is any data written to the tapes... and 2 more to find out how to restore the backups..
[19:56:19] <winem_> Goth, I don't want to hear that :D
[19:56:33] <GothAlice> Could be worse. One hosting company physically lost a server of mine.
[19:56:42] <GothAlice> Literally couldn't find it in the racks.
[19:57:06] <winem_> another great example: they calculate 4 hours to setup a jdk, jboss and database driver.. my script does this in <5 minutes (including the time it takes to grab the sources from elsewhere)...
[19:57:33] <winem_> ok, then I have something that might make you feel a bit better
[19:58:11] <winem_> https://www.youtube.com/watch?v=QHTeDus8AnY this was one of the best talks on the code.talks this year - at least it was pure entertainment
[19:58:57] <GothAlice> winem_: Speaking of performance and delays… compiling a kernel from depclean takes 54 seconds on my main cluster. It takes Ubuntu longer to download and extract the binary image. ;)
[20:01:06] <cheeser> i'd have been terribly surprised if there wasn't.
[20:01:33] <GothAlice> MongoDB and PHP are two of the packages available in the cluster that *require* -j1 during compilation or they fail to compile. Fun stuff.
[20:04:10] <GothAlice> That's a great presentation, BTW.
[20:04:22] <GothAlice> I had to fire a client for doing things like import($_GET['_page'])
[20:04:48] <winem_> very good if you fire them... saw too often such situations without any consequences
[20:05:01] <GothAlice> He got hosed twice because of two different failures on his part.
[20:05:20] <GothAlice> My servers now explicitly disable fopen everything, the mail() function, and a bunch of other things. I've neutered PHP.
[20:08:56] <winem_> what about developers with access to the prod env? :)
[20:09:38] <GothAlice> Depends on the cluster. One cluster is a general managed hosting system for "unwashed masses", so people have access to their own MooseFS homes. The dedicated application clusters are more strict.
[20:09:55] <GothAlice> (We also have a silly hat for people doing things in prod.)
[20:10:56] <winem_> we have a plush toy called blame :)
[20:11:41] <winem_> it's a fluffy black ball with huge teeth in a cage... ok ok, actually, it's on my desc
[20:11:58] <GothAlice> … our continuous integration board is powered by ponies.
[20:12:24] <winem_> installed a new git on a server and typed rm -rf in the wrong window.. how ever, it was installed again <5 minutes later. thanks to documentation and automation
[20:12:45] <winem_> which tools do you use for CI?
[20:13:02] <GothAlice> An incredibly important note: don't have the root git repo folder called ".git".
[20:13:53] <GothAlice> https://gist.github.com/amcgregor/4947201 — Here's an ancient and ugly copy/paste merge of some of my Amazon startup automation.
[20:17:08] <GothAlice> Montréal is Canada's capitol of Python development.
[20:17:15] <winem_> canada is a great country and I'd love to live near whistler :)
[20:17:31] <winem_> really? this explains a lot...
[20:17:43] <winem_> do you know jardineworks, e.g.?
[20:18:03] <GothAlice> Also helps that Ubisoft is here. ^_^ When people on the street ask me what I do and I say "software engineering", they inevitably ask if I work for Ubi. XD
[20:19:25] <winem_> I could create a team with very smart guys from canada... frontend and backend developers, database administrators, even QA... should I move? ;)
[20:25:55] <idok> hello... I m trying to deploy my apps on debian server... I get this error : OperationFailure: database error: Can't canonicalize query: BadValue geo near accepts just one argument when querying for a GeoJSON point. Extra field found: $maxDistance: 10000
[20:26:18] <GothAlice> idok: Server versions on both boxen?
[20:26:47] <idok> GothAlice: I just updated now : 2.6.5
[20:27:29] <samgranger> Out of interest, which one of these 2 options is the best for saving friend relationships? https://gist.github.com/levicook/4132037
[20:28:08] <GothAlice> samgranger: A real graph database.
[20:28:09] <idok> GothAlice: ouch.. when I run mongo now , I get http://paste.kde.org/pmyu71yai
[20:31:49] <samgranger> GothAlice: thanks, I'll do some research
[20:32:24] <GothAlice> samgranger: Of the two, I'd stick with option #1 until you have enough data to properly benchmark test your queries. It's slightly more semantic. See also: http://stackoverflow.com/questions/5125709/storing-a-graph-in-mongodb and https://github.com/Voronenko/Storing_TreeView_Structures_WithMongoDB
[20:32:53] <GothAlice> TreeView != friendship relationships, but still, it demonstrates several interesting ways of structuring related data.
[20:33:35] <samgranger> GothAlice: Thanks - would I be best saving friend requests into a separate collection?
[20:34:01] <GothAlice> At work we got hackernews'd and lifehacker'd the same week; database growth of users and their entire social graphs was so significant, it killed the project. (Some of our algorithms naively attempted to match one record against all people records…)
[20:34:31] <GothAlice> (So yeah… think carefully about how you store graphs.)
[20:34:41] <samgranger> GothAlice: I'd like to prevent that :)
[20:34:55] <samgranger> I've been hackernews'd before and it was pretty hefty
[20:35:12] <GothAlice> Storage in a separate collection may be worthwhile, however it'd be easy to extract from the first approach of storing within the user records themselves, and indexes will make it fast either way.
[20:35:21] <GothAlice> And external storage complicates initial prototyping.
[20:36:04] <GothAlice> Also depends on if your relationships can be unidirectional, or only bidirectional. (I.e. real "friends" vs. "followers".)
[20:37:36] <samgranger> GothAlice: Real friends, not followers :)
[20:38:36] <GothAlice> Okay, if there is no chance for unidirectional, store externally in simple pairs: {_id: [ObjectId("user 1"), ObjectId("user 2")]} — and yeah, that's a thing.
[20:38:48] <samgranger> GothAlice: What project got hackernews'd btw?
[20:42:06] <GothAlice> (matchFWD was the thing with the Jython NLP code.)
[20:42:29] <winem_> Goth, I rebooted the server for some final tests, executed your script again (without bind_ip in $COMMON and I get the following messages during startup http://apaste.info/pro
[20:42:40] <winem_> but all serves seem to be up and running
[20:43:20] <GothAlice> winem_: Seems to have started twice somehow. Your PID files likely don't match real processes any more, so stopping may require manual intervention.
[20:43:22] <winem_> and I'm wondering why they are still listening on 127.0.0.1 if I remove bind_ip 127.0.0.1 from COMMON
[20:43:45] <GothAlice> winem_: Is it also listening on the other interfaces? (lsof | grep mongod | grep TCP)
[20:43:57] <winem_> ok, pidfiles might be a good hint. will check that :)
[20:44:24] <GothAlice> winem_: Nah; line 33 is the first "start" invocation finishing, the next line clearly indicates that "start" was invoked again, somehow.
[20:44:41] <GothAlice> Starting a second time potentially nukes the PID files, preventing the "stop" command from working.
[20:45:01] <winem_> yes, this would make sense... let me double-check the script...
[20:45:10] <GothAlice> winem_: https://gist.github.com/amcgregor/8ccb06ebc15c7fb4ddd4#file-mongodb-sh-L23 — start-stop-daemon is your friend. ;)
[21:06:33] <samgranger> GothAlice: Do you use mongoose/node?
[21:08:24] <samgranger> Well, wondering how a Schema design would look like for that simple friend pair :/
[21:08:55] <StanAccy> I read that Mongo provides ACID props on a document level (i.e. row equivalent). Is there any way to guarantee tranasactional semantics on a series of document updates?
[21:15:59] <GothAlice> StanAccy: There are several approaches to that.
[21:16:05] <GothAlice> samgranger: Alas, I don't node.
[21:18:09] <kali> there is this doc from the mongodb doc too: http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/
[21:18:32] <GothAlice> Unfortunately because atomic operations are only document-level in MongoDB, you effectively need a "maintenance" worker (i.e. like Posgres' vacuuming) to commit these abstracted transactions.
[21:19:12] <kali> but getting full tx semantics will require lot of very hard work
[21:19:28] <GothAlice> If that's a hard requirement, use something more ACID than MongoDB.
[21:20:00] <GothAlice> ("Right tool for the right job" pragmatism, as much as I love MongoDB. Same for graphs. ;)
[21:20:09] <kali> having a few places where you need some stronger semantics than doc acidity in an app is kinda ok, if you know what you're doing. but don't start something with tx all over the place with mongodb
[21:24:44] <GothAlice> samgranger: Oh, also remember to sort the pairs before storing them. Don't want accidental inverse duplicates.
[21:26:00] <samgranger> GothAlice: MongoError: can't use an array for _id - should I really be saving them in _id?
[21:26:24] <GothAlice> Ah, right. Need to use a mapping there.
[21:30:48] <samgranger> quite new to mongo and really want to make sure I'm getting everything right :)
[21:31:51] <GothAlice> Separating this in your simplified case (no unidirectional links) will increase query efficiency. You'll only have to insert a record to make friends, rather than updating two friend records. (This is important in the case where the record grows and needs to be moved around on-disk to store the added value.)
[21:32:08] <GothAlice> s/friend records/user records
[21:33:04] <samgranger> should I add a reference to Users with the object's?
[21:33:22] <GothAlice> samgranger: That's exactly what the embedded document stored in _id does.
[21:33:48] <GothAlice> (And the sorting thing still applies: a should always < b.)
[21:36:22] <GothAlice> I also used single-letter key names intentionally; since you may have a large number of individual relationships the space used by the key strings adds up quickly.
[22:02:47] <GothAlice> You have to define _id (or id, depending on how your abstraction layer deals with it) as being an embedded document (or generic arbitrary mapping without schema, again, depending on abstraction).
[22:03:24] <GothAlice> class SomeDocument(Document): id = DictField() # MongoEngine "generic" example.
[22:04:46] <GothAlice> class Relationship(EmbeddedDocument): a = ReferenceField(User); b = ReferenceField(User) \n\n class SomeDocument(Document): id = EmbeddedDocumentField(Relationship) # MongoEngine "complete" example.