[00:41:23] <_blizzy_> cheeser, is it possible to explain how a mongodb database would look like using a dictionary or json?
[00:43:13] <Boomtime> _blizzy_: you can use mongoexport to get a JSON dump of a mongodb database.. is that what you are after?
[00:43:51] <_blizzy_> Boomtime, I understand that SQL are tables, but I'm having a hard time wraping my head around what a mongodb database would look like.
[00:44:08] <GothAlice> _blizzy_: Many of the concepts are similar.
[00:44:57] <GothAlice> _blizzy_: See: http://docs.mongodb.org/manual/reference/sql-comparison/ and http://docs.mongodb.org/manual/reference/sql-aggregation-comparison/
[00:49:07] <GothAlice> _blizzy_: Using your JSON comparison, this is a better sample, using JavaScript notation. (Works in Python, too.)
[00:51:57] <GothAlice> This highlights a few things: "records" (documents) in a collection don't need to be exactly the same in their fields. There are no schemas, though it's generally a good idea to keep things pretty similar. (In this case, a "gender" of "r", robot, has an extra field, "reference".) Also, objects can nest "complex" datatypes, like lists (arrays), and even other dicts (called "embedded documents").
[00:54:29] <_blizzy_> GothAlice, that makes so much sense now. thank you.
[00:55:01] <GothAlice> For slightly more advanced reading, see: http://docs.mongodb.org/manual/data-modeling/, and, if you're coming from a relational/SQL world, see: http://www.javaworld.com/article/2088406/enterprise-java/how-to-screw-up-your-mongodb-schema-design.html
[01:00:34] <Zyphonic> Anyone here familiar with pymonog and setting up custom datatypes for read/write? I went off of this article here - http://api.mongodb.org/python/current/examples/custom_type.html but on the API docs for the database type said it's being deprecated in 3.0
[01:01:03] <Zyphonic> http://api.mongodb.org/python/current/api/pymongo/database.html#pymongo.database.Database.add_son_manipulator is the link to the method that's deprecated
[04:38:14] <Streemo> I want my objects to have multiple Ids. Because I am keeping track of all individual instances. When I serve a page, I make a query to find an object based on some given instance ID. IS it faster to have the object contain an instanceID array field? or would it be faster to have an entirely different collection and do two queries?
[04:38:56] <joannac> how many isntances per document?
[04:43:59] <Streemo> i mean its really coming down to db.objects.findOne(instanceId in instanceArrayField) Versus: db.objects.findOne((_id: db.instances.findOne({_id: instanceId}).objectId})
[04:44:12] <Streemo> depending on if i normalize or denomalize my data
[04:59:34] <Streemo> joannac: i think if i were to plot the number VS. length of arays, the distribution would be peaked around 5-10, which isn't too bad.
[04:59:48] <Streemo> the occasional outlier wont make a huge diference
[05:02:55] <Streemo> joannac: what do you think is better? Doing a single query on 10,000 objects with 10-15 single value fields (plus the one array field) having to shuffle through a len = 10 array per object OR: Doing one query on 100,000 very small objects with only two single value fields, and then another query on 10,000 objects only going through single value fields?
[05:03:44] <Streemo> 10,000 length 10 arrays queried once, VERSUS 100,000 single fields queried and then 10,000 single fields queries. pretty much
[05:05:25] <joannac> why would you need to shuffle through a length 10 array?
[05:05:39] <Streemo> that's the instance id thing we talked about
[05:05:47] <Streemo> each object has ~10 instances
[05:05:48] <joannac> you would just index it, surely
[08:42:35] <amitprakash> Hi, while reading on http://blog.mongodb.org/post/87200945828/6-rules-of-thumb-for-mongodb-schema-design-part-1/2/3 .. they've mentioned two way referencing
[08:42:55] <amitprakash> however, how does two way referencing handle the problems that might arise due to race condition..
[08:44:42] <amitprakash> i.e say user1 want to assign a task X to person Y, while user2 wants to assign task X to person Z, if we update as set(task.user); set(user.task).. then the order of execution could follow the pattern set(taskT.user=X); set(taskT.user=Y); set(userY.task=T); set(userX.task=T)=
[08:45:29] <amitprakash> is there a way to ensure transactions while updating two way referenced documents
[08:48:47] <kali> amitprakash: nope, there is no built-in multidoc strategy. but in your case, i don't think it is an issue: update the task collection first, while checking that the task is not allocated. then, and only if it worked, update the denormalization in the user collection
[08:50:10] <kali> amitprakash: basically, as long as you express the invariant "a task is assignd to 0 or 1 user" as something that is verifiable on one single document, it stays relatively easy
[08:51:23] <amitprakash> kali, the issue is with query statements, the conditions to add might not be ass simple as checking if taskT has a user assigned but significantly complex
[08:52:29] <amitprakash> putting the condition as the query param might either be cpu intensive or not possible due to external dependencies
[08:53:48] <amitprakash> kali, a condition could be something of the sort as if there are no users with this count of this tasktype assigned > some number
[08:54:33] <amitprakash> i.e. only assign this "easy" task to a user if this task has no users and there are no users with more than 50 "easy" tasks
[08:55:26] <amitprakash> there are around 30m users, so doing that count aggregation takes time
[08:56:34] <kali> you need to denormalize that kind of thing anyway. you need to maintain an easy task counter on the user collection with an index on it.
[08:56:57] <kali> bottom line is: mongodb provides no ready-to-use tool to maintain multidoc invariant. you have to do it on your own
[08:57:14] <amitprakash> and I cant lock individual documents either
[08:57:17] <kali> (there is one exception, it's the "unique" index)
[08:57:37] <kali> amitprakash: you can't do it at the mongodb level, but you can implement a lock on top of it
[08:57:59] <kali> amitprakash: but with this type of constrait, you'll have to wonder if you have picked the right database for the job
[08:59:33] <kali> amitprakash: everything from pessimistic locking to two phase commit can be implemented on top of mongodb, but it's up to the application
[09:12:42] <FlynnTheAvatar> Hi, I have issues install mongodb-org 3.0.0-2 and 3.0.1-0 with yum - error is mongodb-org conflicts with mongodb-org-server-3.0.0-2.el7.x86_64
[09:43:45] <Stiffler> how to store objects in object in mongo
[09:45:57] <Derick> Stiffler: by nesting them? Not sure what you're trying to ask, but this works: db.collection.insert( { top_level_field: { nested_object_field1: 4, nested_object_field2: 'foo' } }
[12:46:52] <cheeser> um. you can query against them. you can only the subdoc if you want. but that'd be at the timetable level because that's the 'doc' in question.
[12:50:14] <Stiffler> if I can, so how to get array field contains only type equal to 15
[12:58:07] <cheeser> i actually don't use arrays that much as their a bit wonky to work with but i have to fix some morphia bugs around them so off i go. :D
[13:16:53] <pamp> Hi, what is the hardware requirements for mongos and config server's in a production cluster.. RAM and HD
[14:08:37] <FlynnTheAvatar> (replace el7 with el6 for centos 6.6)
[14:09:07] <wrighty> Aye, that's working. Slightly annoying that these issues have made it up to the release. :(
[14:09:13] <wrighty> Ah well - thanks for the solution! :D
[14:30:51] <pamp> Hi, what is the hardware requirement for config server and mongos(routers) in a prodution cluster?
[14:32:03] <pamp> I'm planning a cluster with two shard with on Replica Set each
[14:33:06] <pamp> what's the best approach? use 7 machines, 2 shards two routes 3 config server, or put on the same server one shard one route and one config server
[14:33:08] <cheeser> config servers don't need much
[14:34:08] <pamp> dont need much, like what ? in RAM CPU and HD
[14:34:19] <cheeser> yeah. those are pretty light.
[14:36:38] <pamp> an Azure virtual machine instance A0, with 1 core, 0,75 GB RAM and 20 GB disk size is enough?
[14:57:15] <GothAlice> crised: If you want high-availability (i.e. if one DB host goes down, your whole app doesn't go down) you'll need at least two DB hosts and an arbiter that could live on the application host.
[14:57:37] <GothAlice> Otherwise, have a really good backup plan. ;)
[14:57:56] <crised> GothAlice: data won't be loss, right?
[14:58:57] <GothAlice> crised: There is the possibility for data loss, yes. If your application is in the middle of doing something with the data (i.e. saving a user's changes) and the DB host goes away, so does the in-progress change.
[14:59:30] <crised> GothAlice: ok.... What other option do I have?
[15:01:59] <GothAlice> Well, two for the DB, one for your app.
[15:02:39] <crised> GothAlice: What if I expand later.... and want to have multiple light apps
[15:02:56] <crised> Shouldn't it be better two have 3 instances?
[15:04:00] <GothAlice> Well, for redundancy you only really need two copies of your data. For reliability (high availability) you'd need to add an arbiter to the cluster, but an arbiter doesn't really need its own host. All it does is vote, not store data.
[15:06:10] <crised> GothAlice: isn't there a turn key solution for this?
[15:06:21] <GothAlice> Indeed. When you have a replica set with two nodes and one of them dies, the other needs some way to determine if _it_ is encountering the network failure, or if the other host is the one flaking out. An arbiter is a "neutral third party" that lets the remaining node know. (I.e. a node needs to see > 50% of the other nodes for it to become primary in the event of a failure.)
[15:24:13] <cheeser> MMS can spin them up for you.
[15:24:53] <crised> cheeser: since it's EBS backed then, there is no chance of data loss?
[15:25:57] <cheeser> be wary of absolute claims :)
[15:26:11] <crised> cheeser: yes, but it *should* be no data loss
[15:26:44] <cheeser> that's the idea at least, yes
[15:26:57] <GothAlice> crised: No. Any time you're dealing with moving parts, there's always risk. Having a replica set reduces the risk of catastrophic failure to scenarios where basically everyone in the same zone is having issues. We don't use AWS at work because it ate our data and required 36 hours of reverse engineering corrupted files to recover after a multi-zone cascade failure.
[15:54:02] <bagpuss_thecat> cheeser: I believe so... all I have is are two hosts, and I can't seem to deploy the monitoring agent to them
[15:54:30] <bagpuss_thecat> I have two uunpublished changes, but I always get "Another session or user has already published changes" when I try to confirm them
[16:38:30] <Derek57> Hi all, had a question. Has anyone been having an issue with Wiredtiger and memory? On the previous DB type, I was able to run a server with 2 compounds and 2.5 million documents on a 48GBs server. Now with Wiredtired, I'm maxing out 112GB of RAM and 16GBs of swap, with intermitted crashes from segmentation faults, and half the time impossible to run .count().
[16:39:09] <Derek57> And not sure if that last message sent. Had a bit of an issue re-setting up my IRC client.
[16:44:12] <Derek57> Ah... Well that all looks very familiar to my recent issues. A bit relieved that it wasn't something I had misconfigured.
[16:53:27] <GothAlice> NoOutlet: Alas, no, couldn't reproduce. I'm actually suspecting a bug in VirtualBox's SATA implementation. :/
[16:59:40] <Derek57> Oh. I'm running it currently in Azure, possibly that could be related?
[16:59:57] <Derek57> As it's not on a dedicated box.
[17:10:36] <GothAlice> Derek57: Alas, my issue is unrelated to yours. Mine is about high-stress benchmarking (hosing a node for ten minutes of multi-megabyte-per-second mixed find() and update() operations) killing the VM. ;) I am using WiredTiger, though.
[17:14:08] <Derek57> Fair enough! I may switch back to mmapv1 for the time being, and will do more research in to why I may be having the issue. Just wanted a quick look to see if anyone else was having something related. :)
[18:36:48] <Petazz> Hi! I'm trying to calculate how many users there were in a db grouped by the date saved in a users ObjectId. I'm trying to do this with mapReduce but not really getting a sane result. Why? http://pastebin.com/GiTsh4mq
[18:37:18] <Petazz> I know there is now return 1, but even then the result is not 1 fore each day
[18:41:20] <kali> apart from that, you're doing the most common mistake with map/reduce. reduce will be called 0 to N time for one given key. so the map output must match the "form" of your expected value
[18:41:55] <kali> and reduce can only reduce: its output has the same "form" than the map output, and the items in the values array have that very same form too
[18:43:05] <kali> so you need to emit(id,1) in map, and reduce must return the sum of the integers it will find in values
[18:43:17] <kali> but use aggregation pipeline anyway
[18:44:29] <Petazz> Hmm ok, this was basically an excercise to try figure out how mapReduce works. So map is called once per every document and then reduce is called N times for a key?
[18:45:28] <GothAlice> Petazz: https://gist.github.com/amcgregor/1623352 was the first map/reduce I ever wrote that worked. ;)
[18:45:29] <kali> Petazz: 0 if there is only one mapped document for one given key, and then by batch of 1000 iirc
[18:46:26] <Petazz> Ah ok so it is called once per key, but the array is limited to 1000?
[18:47:04] <kali> nope. if there are 1010 key, you'll get called once for the first 1000 keys, then one for the resulting value plus the remaining 10 keys
[18:47:38] <kali> or any other combination. it's whatver mongodb decides
[18:49:41] <Petazz> Ok so this is basically for sum operations
[18:49:52] <Petazz> Since the array can be handled "recursively" if you will
[18:50:07] <GothAlice> Petazz: Well, map/reduce has a number of applications.
[18:51:43] <GothAlice> Wow, that last search of mine returned some odd results for "strange uses for map reduce". Like, "10 strange uses for blood", and even less safe-for-work entries. Thanks, Google.
[18:54:17] <Petazz> Let's say I wanted to have a document that holds the number of users per day so that I don't have to calculate the historic data again, what would be the best way to do it?
[18:55:09] <GothAlice> Petazz: http://www.devsmash.com/blog/mongodb-ad-hoc-analytics-aggregation-framework covers a number of methods of storing historical analytics data and covers their impact in performance and disk space.
[18:55:52] <GothAlice> (We do click and other event tracking at work. This article was invaluable when it came time to model our own data.)
[18:56:49] <GothAlice> If you do bulk processing under a cron job you're placing intermittent high load on your infrastructure.
[18:57:14] <GothAlice> You're also delaying your metrics by, potentially, the full time period between cron runs. Our analytics (http://cl.ly/image/142o1W3U2y0x) are live. :)
[18:58:14] <Petazz> Nice. Haven't read much about aggregation with mongo yet :/
[18:58:33] <GothAlice> (Using pre-aggregation as described in the article I linked. When saving the Hit document, I "upsert" an Analytic document for the relevant combination of hourly time period and other criteria like invoice, company, job, etc.) We only keep Hits around for a short time using a TTL index that automatically cleans up old (by time) data.
[19:00:39] <GothAlice> Hits are accurate to the millisecond, but we only keep a week of them around. Analytics are only hour-accurate. https://gist.github.com/amcgregor/1ca13e5a74b2ac318017 is an example Analytic document and aggregate query to give the two-week comparison line chart from that dashboard screenshot I linked.
[19:13:35] <rpcesar> hello. I have used mongo extensively for many applications in the past, I am used to it and I love it, however I am entering a situation where I am going to be severely memory (RAM) constrained on a project. Is it , My working set of data will never fit and "moar ram" is not an option. I guess what I want is to keep indexes in RAM, let the data itself primarally fetch off disk. I am worried about the performance profile of mongo pul
[19:14:50] <rpcesar> is there any way to force/configure mongo to keep the indexes, or at least primary index, in ram at all times, or pending that is there an alternative database that better fits this performance profile?
[19:17:13] <GothAlice> rpcesar: The default mmapv1 backend relies on the system's kernel to best handle paging blocks from disk into RAM. I don't even know if there is a least-recently-used cache, or what mechanism it would use to determine which pages can be swapped out. AFIK there is no way to keep something in RAM within MongoDB except by regularly priming it (i.e. making a query that requires running the entire index.)
[19:18:06] <GothAlice> Also, as a note, in that situation any query that can't be fully covered by an index (i.e. needs to compare values within the documents) will be ludicrously slow.
[19:19:21] <rpcesar> this will be primarally used as a K/V store , so that's not really a problem (I would basically use a big slave occasioanlly to do aggragates, but everything else would be ObjectID / Natural Index only)
[19:19:58] <rpcesar> but yea, do you know any database that better fits the performance profile I am talking about, preferably one that plays somewhat nice with mongo?
[19:20:46] <GothAlice> rpcesar: What's your write load going to be like?
[19:21:57] <GothAlice> Also, there's an optimization that may save some space, if you have lots of records: use single-letter field names. Since MongoDB needs to store the field names within each document, the space allocated to these names can add up quickly if you many small documents.
[19:22:00] <rpcesar> expecting about equal reads and writes.
[19:23:07] <GothAlice> https://en.wikipedia.org/wiki/Berkeley_DB might be a candidate (it's a K/V store), but I'm currently unaware of how it operates in constrained environments.
[19:23:51] <rpcesar> ive looked at levelDB and Tokyo Cabinate, using redis as an external index. In all these cases though, they seem to suck up all the ram they can.
[19:24:13] <GothAlice> Indeed, most database engines will do that.
[19:24:14] <rpcesar> what I am trying to do here is have a custom index I cannot express easily in mongo or other situations residing in redis.
[19:26:04] <GothAlice> (Same with projects involving ZeroMQ/RabbitMQ. And memcache/membase.) For graphs, though, you should invest in a real graph database.
[19:26:13] <GothAlice> Rather than attempting to mash several non-graphs together.
[19:26:19] <rpcesar> ive looked into a few in that area.
[19:26:46] <rpcesar> Neo4J being one of them, their idea of what consitutes a graph and mine were somewhat different.
[19:27:15] <jclif> Hi all. We're trying to set up a sharded cluster from an existing replica set, and were wondering about a few details. We have around 1.5 TB of data in our current setup, and plan to connect this replica set to a cluster of 5 additional replica sets with 2 400GB instances. The idea will be to retire our initial replica set after the data has been migrated. Is this possible?
[19:28:05] <Boomtime> 1.5TB of data, across how many collections? how big is the biggest collection?
[19:28:34] <jclif> the biggest collection is around 800 gb
[19:28:58] <GothAlice> jclif: Yes. Initial chunk migration might take some time with that much data, though. A big consideration: sharding key. With a bad key, records won't evenly distribute amongst the nodes.
[19:29:23] <GothAlice> (And having MongoDB try to dump 800GB into a 400GB shard is clearly a no-go. ;)
[19:30:21] <Boomtime> 800GB collection will be very difficult to shard
[19:30:59] <jclif> yeah, thats been a big issue for; we went into the mongo office in nyc for help on choosing a sharding key, but the consensus seemed to be that we needed to just use a shard key with the id
[19:31:08] <jclif> is it not possible to shard with a collection of that size?
[19:31:53] <Boomtime> you may be able to do it by increasing the chunk size first
[19:32:20] <GothAlice> jclif: That'd give uniform distribution amongst nodes, without any concern for data locality (improving query performance). Looks like the over-large collections would need to be dumped, cleared, sharded, then restored, or values tuned as Boomtime suggests.
[19:33:47] <Petazz> I guess there's no way to run custom javascript within the aggregation framework?
[19:34:07] <Petazz> Just found the jira ticket for my specific problem: https://jira.mongodb.org/browse/SERVER-9406
[19:34:26] <GothAlice> Petazz: Yeah, welcome to the club, on that ticket.
[19:34:58] <GothAlice> Petazz: The point of the aggregate framework is to get away from needing to spin up the JS VM.
[19:35:13] <GothAlice> (Aggregate queries are faster than the equivalent map/reduce in every case I've tested.)
[19:35:40] <Petazz> Yea, found that reason too. I guess the operators are native from C++ so they should run faster
[19:42:20] <daaku> we're hitting a panic in the mgo driver where the server is returning more documents than numberToReturn in an OP_GET_MORE. the docs at http://docs.mongodb.org/meta-driver/latest/legacy/mongodb-wire-protocol/ are not entirely clear, but it seems like the server is allowed to do this?
[19:42:58] <daaku> (we're getting this in less than 1% of our queries, so it's quite rare)
[19:43:55] <jclif> i mispoke; our largest db is 800gb; the largest collection is 350gb. with our key being so small, and chunk size correctly set, given that we can properly shard this collection, how would one handle the growth of such a collection?
[19:51:32] <GothAlice> fewknow: Okay, mongo-connector is awesome. Not sure why I haven't cannibalized it earlier. (Dex is likewise awesome, and a 10gen tool I regularly use.)
[19:51:53] <fewknow> GothAlice: I have been using it for a while...even made contributions to the code.
[20:29:08] <delinquentme> with regards to mongo ( lulz, duh ) ... how do I bid out wether my particular use case ... can be well contained within a single server instance?
[20:29:32] <GothAlice> delinquentme: A good general rule is: can your data fit in the RAM of that single instance?
[21:16:10] <greyTEO> does $addToSet work on an array of ojects?
[21:16:28] <greyTEO> from my test, it does not validate the object
[21:27:54] <greyTEO> fewknow, I am mainly doing it to denormalize my data
[21:28:35] <greyTEO> i have 3 collections. 1 contains all 3 as a complete object.
[21:29:22] <greyTEO> They are inserted/updated by Apache Spark. I wanted to nect the documents to avoid having refences and multiple lookups..
[23:03:55] <grazfather> Hey guys, I am looking for an operator like $addtoSet, but that only checks for a certain key. e.g. I have a list of simple dictionaries, and I want to make sure the _name_ is unique, not necessarily the whole dictionary
[23:05:39] <daidoji> grazfather: can you give an example?
[23:10:02] <ejb> Hello, I'm looking for some design advice. I'd like to build a simple product comparison / review engine. Products will have a variable number of attributes so mongo came to mind. Are there any frameworks out there that might help me with this concept?
[23:11:12] <ejb> As an example, consider bicycles. I would essentially have a giant table with all of the attributes that might matter to someone shopping for a bicycles: size, speeds, weight, price, etc.
[23:11:39] <grazfather> daidoji: Sure. I have a collection 'clients' which has a field 'logins'. Logins is a list of items whose schema is {'url':str, 'username':str, 'password':str}. I want to update a certain URLs username and password. 'push' will add a duplicate, and addtoSet will verify that url, u/n, and password all match
[23:12:08] <ejb> I'd like to be able to add products (and their attributes) through a simple UI. If there's already something out there that has the UI, even better.
[23:21:49] <daidoji> grazfather: and what end result are you looking for?
[23:22:20] <daidoji> ejb: hmm thats pretty vague. From a data modeling standpoint you might be able to do all that with Mongo
[23:22:52] <daidoji> ejb: UIs, frameworks, and picking all those things are a bit outside the purview of this channel
[23:23:16] <daidoji> ejb: if you're asking those questions my advice would be to pick any of them and start building what you have in mind and you'll learn as you go along
[23:24:24] <daidoji> ejb: so Rails or Django etc... for creating a web UI and frontend, model your data in Mongo etc.., and basically start building stuff and see what breaks
[23:24:37] <ejb> daidoji: yeah, I was hoping that keeping it vague would yield some creative ideas. I'm mostly looking to cut out the CRUD work and get right to the actual idea
[23:25:15] <ejb> daidoji: cool, I'm versed in django. Came into the mongo world sideways, via meteor
[23:25:51] <daidoji> ejb: roger, then I'd stick with Django (mongoengine which works mostly okay) and the Admin console to build stuff fast
[23:26:12] <daidoji> ejb: although in my experience if you want to go beyond the basics, admin console starts becoming quite a burden
[23:26:21] <ejb> daidoji: do you still get the admin ui when using mongo with django?
[23:26:39] <daidoji> ejb: if you keep a disciplined schema
[23:26:50] <daidoji> ejb: I've never used grappelli so I wouldn't know
[23:27:08] <daidoji> ejb: but I followed the tutorial using mongoengine
[23:27:16] <daidoji> and everything seemed to work pretty well
[23:28:18] <daidoji> ejb: GothAlice might have more info for you as she's much more well versed in all that than I probably am, so she might have more ideas for you
[23:31:18] <GothAlice> Possibly with the fiery passion of a thousand burning hot Latino stellar bodies. (English sucks. "Stars" is ambiguous. "Suns" maybe?)
[23:31:34] <grazfather> daidoji: I want to be able to update a client entry s/t it'll add a new url to logins, but if a login with the same url (i don't care about u/n and password matching) exists, instead update that entry in the array
[23:32:09] <GothAlice> grazfather: I have a "LoginAttempt" collection for the purpose of capturing both failures and successes, for auditing. It simplifies what you are trying to do, a lot.
[23:32:55] <grazfather> I don't think that's applicable at all?
[23:33:38] <daidoji> GothAlice: yeah I'm not a fan either but ejb is familiar with it
[23:33:39] <GothAlice> Update-if-not-different and upserts.
[23:34:09] <GothAlice> "add a new, but if exists, update it" is the exact definition of an upsert. ;)
[23:34:55] <grazfather> Yeah but it is a field in an entry, not an entry in its own collection
[23:36:01] <daidoji> GothAlice: do you have any experiences in the best way to transfer data between Mongo instances (short of writing scripts)?
[23:36:23] <daidoji> GothAlice: use case is I'm using one instance as a Data Warehouse type endpoint and then need to occassionally replicate that over to Production
[23:36:41] <daidoji> I've been mongoexporting/importing --upsert but I was wondering if there was a better way
[23:36:49] <GothAlice> grazfather: http://docs.mongodb.org/manual/reference/operator/query/elemMatch/ with http://docs.mongodb.org/manual/reference/operator/update/positional/ will let you "update if it's there". You can then check for nModified/nUpdated, if zero, insert. But this introduces a race condition that won't exist if you pivot your data and turn those embedded documents into their own collection and use real upserts.
[23:36:54] <daidoji> (like to capture deletes from the Data Warehouse instance etc...
[23:37:19] <GothAlice> daidoji: I was introduced to https://github.com/10gen-labs/mongo-connector today.
[23:37:38] <daidoji> GothAlice: word, I'll check it out thanks
[23:37:40] <GothAlice> daidoji: It sounds like mongo-connector is pretty much what you're looking for.