[00:14:12] <GothAlice> cheeser: The dancing comes from my JIRA tickets having been fixed, simplifying https://github.com/marrow/mongo/blob/feature/query/marrow/mongo/util/capped.py#L10-L45 to a very minor helper.
[00:14:20] <GothAlice> Vs. before with threads and signals and muteness and timers and …
[00:14:34] <rendar> pymongo are python bindings for mongodb right?
[00:19:42] <GothAlice> Got accidentally fixed with the move to find/getMore commands.
[00:21:21] <GothAlice> cheeser: Started as https://gist.github.com/amcgregor/52a684854fb77b6e7395#file-worker-py-L62-L110 then became https://github.com/marrow/task/blob/develop/marrow/task/queryset.py?ts=4#L32-L85 before finally getting cleaned up in marrow.mongo
[01:58:59] <monque> hey guys. got a question: lets assume ive a document structure like this: a embeds many b embends many c embeds many d. is there a way to query all d without starting at a ?
[02:31:12] <GothAlice> monque_: No. But I don't think you're asking what you think you're asking, nor modelling your data in a way that makes any sense.
[02:34:22] <GothAlice> MongoDB supports, generally, only a single array per document for many operations, notably $elemMatch, the principal way of targeting a single sub-document array element for other operations. https://docs.mongodb.org/manual/reference/operator/projection/positional/#array-field-limitations
[02:49:45] <GothAlice> Additionally, variably named fields, i.e. $a.$b.$c.$d, are a no-go for any form of querying, really, except absolutely explicit, likely without the benefit of indexes.
[09:26:22] <SimpleName> document: a = { "normal_id" : "789", "nums" : [ { "num" : 2 }, {"num" : 3} ], "c" : 2 } what should I do if I want to find a document that nums[0]['num'] equal to 2
[09:26:35] <SimpleName> how can I write the query condition
[09:28:48] <x4w3> Good morning, how could i do a backup of my database in mongo, please?
[10:41:52] <nalum> hello all, I'm using the mongo-connector to sync data between elasticsearch and mongodb. If I have a dataset in elasticsearch already and stop mongo-connector then start it again, doing a complete resync, does it try to overwrite the existing data or does it see that it's there and skip it?
[11:05:52] <solata> i have an existing mongo database with entities where different properties have static urls inside them. i would like to replace every occurance of for example /my/static/folder/* to /new/folder/*. is there a way to do this easily ?
[11:06:39] <solata> (except mongoexport ,load in texteditor, search and replace, mongoimport)
[15:21:35] <tantamount> I basically want to do a join and store a new field which is the mapping of an array of DBRefs
[15:21:52] <tantamount> So, denormalize one of the fields stored in the foreign document
[15:22:49] <tantamount> I can do a "for" on the array of DBRefs but I thought JavaScript had mapping primitives now... although maybe only for arrays, and this array seems to actually be an object
[15:29:25] <tantamount> Array.fetchRefs seems to do the trick
[15:29:49] <tantamount> Not sure if that's even documented heh
[16:02:47] <tantamount> Why doesn't Array.unique() work on a list of DBRefs?
[16:03:31] <tantamount> I suppose it's because they're all different object instances?
[16:38:11] <uehtesham90_> hello, so i am running a cron job which reads files and inserts data from those to mongo db. and there is high possibility of getting duplicate errors (which i do)...i wanted to know what is the best way to handle this (using pymongo)
[16:38:36] <uehtesham90_> currently what i do is that for each data point, i first check if its in the database or not...if it is not, then i inserted
[17:29:42] <GothAlice> Silenced: From the article: "MongoDB only provides packages for 64-bit long-term support Ubuntu releases. Currently, this means 12.04 LTS (Precise Pangolin) and 14.04 LTS (Trusty Tahr). While the packages may work with other Ubuntu releases, this is not a supported configuration."
[17:30:10] <StephenLynx> i recommend centos, Silenced
[17:30:34] <StephenLynx> they keep up with their releases.
[17:30:40] <kurushiyama> I am totally with StephenLynx, Silenced.
[17:31:08] <Silenced> StephenLynx: What package manager does centos use ?
[17:31:09] <GothAlice> I'm a fan of Gentoo, as it lets me compile in some of the Enterprise edition features, such as SSL support, and disable JavaScript.
[18:22:12] <GothAlice> But several points come immediately to mind: IRC is more open, more capable, and gives the freedom of choice both of service provider and client, with effectively no use cost. Why pay for the privilege of vendor lock-in?
[18:22:46] <uuanton> i understand its free you just pay for a server
[18:24:19] <uuanton> I just joined 4 different slack channel for free too bad its only few people there
[18:25:16] <GothAlice> Federation being a Future™ feature, yeah, I'd expect most Slack channels to be pretty small.
[18:26:14] <uuanton> GothAlice I remember you mentioned how to increase oplog to store 24hours of data
[18:26:16] <GothAlice> (And certainly, randoms joining a channel doesn't cost… the random joining the channel. But just think, with federation and a base price of $12.50/user/month, #mongodb alone would cost $3,787.50 per month.
[18:32:17] <uuanton> so it requires to stepdown primary then hmmm
[18:32:42] <uuanton> last time i stepdown primary there were 10min application downtime
[18:32:49] <GothAlice> Indeed; capped collections can't be resized, AFIK, thus requiring a wholly new oplog be created. Can't have secondaries tailing the old one when that happens.
[18:33:49] <kurushiyama> uuanton: Then there is something _seriously_ wrong with your setup.
[18:34:40] <kurushiyama> uuanton: We do a stepdown randomly during failtests.
[18:35:50] <GothAlice> I randomly kill -9 VMs during our own testing, and application failover is usually instantaneous. (Fast enough that it basically doesn't notice.)
[18:36:23] <uuanton> how do you check failover in newrelic ?
[18:37:11] <kurushiyama> GothAlice: Well, we do see failed writes, but we simpyl rewrite after a certain threshold. uuanton: I can only speak for myself, but I use scripts and MMS
[18:37:36] <GothAlice> StephenLynx: Hey now, it might be silly, but clearly it provides something _someone_ wants. ;P
[18:37:48] <StephenLynx> yeah, it provides slack with money :v
[18:38:14] <StephenLynx> its pretty much how MS got into the enterprise business.
[18:38:28] <kurushiyama> Well. Slack... isnt that the service where you get private chat rooms on demand? Wow, that's a totally new idea, nobody had in the early days of the internet...
[18:38:47] <StephenLynx> have a complete bullshit and inferior product than the open and free alternatives, market the crap out of it, make money
[18:39:08] <StephenLynx> they got the extend of buying articles saying IRC was dying
[18:39:13] <StephenLynx> because people were migrating to it
[18:53:17] <kurushiyama> GothAlice: Please take my sincere aplogies.
[19:55:15] <nalum> hello all, I've attempted to set up mongo-connector to deal with a single collection but it doesn't want to sync the data. The collection has 4 documents in it, if I switch to a different collection, which has ~1000 documents, it syncs fine. Any ideas on what might be happening? There is an error about SSL validation but nothing else.
[19:55:32] <nalum> the SSL validation happen on the working sync config as well
[20:01:54] <nalum> this is the config - http://pastebin.com/uiD9TYUA
[21:15:05] <Doyle> Hey. What's the best way to estimate the iops required for separate journal and index volumes?
[21:30:17] <uuanton> how big mongodb log drive should be to accommodate 700gb database replica. Does logappend=true good idea ?
[21:35:08] <Doyle> uuanton, you'll want to setup a logrotate script of some kind against that directory. You can send SIGUSR1 (check on that) to initiate a log rotation. I have a cron that looks at the current log filze size, if over 2g, send signal to pid, & zip old log file. Check if more than x tar.gz in directory, delete oldest
[21:45:15] <Doyle> Would it be correct to say that your journal & data volumes have to have the same iops? And that if running a separate index volume, you typically want as many iops as possible?
[21:48:26] <uuanton> i put both journal and data on same drive
[21:52:29] <uuanton> the reason i did that because I snapshot it a lot and restore it
[21:52:56] <Doyle> uuanton, yea, the consistency required when backing up with snapshots is a concern also
[21:54:27] <torak> I moved to the mongodb from Parse.com but i am not sure about how should i keep the 'pointers' they call it like that in parse. I mean There is a column for holding the owners objectId this can be stored as one string. but i dont know why they make them an object and put 3 strings in it, they use it like that: "List_ID" : {
[21:59:19] <GothAlice> torak: If the framework you are using has expectations, it's best to follow those expectations. As an important note, it seems its expectations are somewhat… "pants on head" in that DBRef is already a thing (https://docs.mongodb.org/manual/reference/database-references/) and ObjectIds aren't strings.
[22:00:10] <Doyle> Anyone know what the impact of having too many indexes is?
[22:00:34] <GothAlice> Wasted space and slower inserts/updates.
[22:00:51] <torak> GothAlice: hmm. i see. There is already a thing for it. So we dont have to create our referances by hand like we do in mysql
[22:00:58] <Doyle> bc all those indexes have to be updated then... right?
[22:01:42] <GothAlice> torak: Well, yes and no. MongoDB doesn't support joins except LEFT OUTER during an aggregate query pipeline (i.e. not during normal querying). So it's still 100% manual.
[22:02:00] <GothAlice> Doyle: Correct. Additionally, querying probably won't use them.
[22:02:26] <GothAlice> But the basic impact is that no insert/update/delete is "complete" until the indexes are updated.
[22:02:32] <torak> GothAlice: thank you for your help :)
[22:03:49] <GothAlice> torak: The biggest point here is that ObjectIds are ObjectIds, not strings. Storing them as strings a) wastes space (27 bytes instead of 11 per), and b) requires additional conversion to make them useful, since an ObjectId is actually a compound value containing creation time, server node, process ID, and a per-process counter.
[22:03:52] <Doyle> thanks GothAlice. db.stats outputs counts in bytes, right?
[22:04:07] <GothAlice> Doyle: Unless you pass in an argument telling it which scale you want, yes.
[22:04:22] <GothAlice> db.stats(1024) for KB, for example.
[22:05:08] <gzn> need help creating scalable db schema for the following prompt http://pastebin.com/gjxGG3F3
[22:05:33] <GothAlice> Doyle: Sure can, but that's the shell or your language doing the calculation, of course. ;P
[22:07:01] <GothAlice> gzn: That looks an awful lot like a homework question.
[22:08:56] <gzn> its for an interview, I was thinking database sharding. ive been working as a lamp dev so long and mysql doesnt work too great for partial searches. or i could do elastic search on mysql but i figure there has to be a better way
[22:10:16] <GothAlice> MongoDB sharding would naturally split up the query across multiple back-end mongod nodes according to the "sharding index". For example, you could randomly assign records to nodes by hashing the ID ("cheap/easy" but often sub-optimal), grouped by area code, etc.
[22:11:05] <GothAlice> Not sure about how mongos handles returning partial responses, but I'm guessing it'd only do that if sorting on $natural.
[22:11:34] <GothAlice> I.e. you'd be able to stream responses as the individual shards generate them, but only if not otherwise sorting (or needing to perform another operation over the whole set of results).
[22:16:42] <Doyle> GothAlice, when you said querying probably wouldn't use those indexes, why would that be?
[22:17:33] <GothAlice> Doyle: Because MongoDB uses a "we'll try to optimize as much as we can, but don't take too long doing it" approach to query planning. If it can't figure out a sane intersection fast enough, it'll simply ignore them. Previously it didn't even do intersections, meaning only one index could _possibly_ be used for any given query.
[22:18:37] <GothAlice> This is why having fewer indexes, but using compound indexes, is so strongly focused on. Note the "Prefixes" section: https://docs.mongodb.org/manual/core/index-compound/
[22:19:26] <GothAlice> Also why "arbitrary metadata" (having dynamically named fields on your documents) is basically a no-go: it's impossible to efficiently index "arbitrary data".
[22:20:00] <GothAlice> ({foo: 27, bar: 42} vs. {metadata: [{name: "foo", value: 27}, {name: "bar", value: 42}]})
[22:21:35] <GothAlice> The latter you can index {"metadata.key", "metadata.value"} quite easily, and that single index will help with "find documents with metadata named foo" queries, as well as "find documents with metadata named foo that equals 27" queries. The former… can't use indexes on $exists at all, so the first example query is basically out. ;P
[22:28:48] <Doyle> GothAlice, do you have any input on how many iops would be required for separate journal and index volumes under wired tiger? It seems the journal needs only a fraction of what's required by the dbpath
[22:30:40] <GothAlice> Alice's Law #146: Optimization without measurement is by definition premature.
[22:31:04] <GothAlice> Theorizing about iops is not entirely useful because it'll depend entirely on your dataset load.
[22:32:12] <GothAlice> On my dataset at home, for example, the journal sees about 2x as many iops as the actual dataset. Mostly because I'm lazy and didn't prune out a bunch of ineffective update-if-not-different queries that basically can't ever succeed (adding the update to the journal, but not actually touching the data files).
[22:51:26] <Doyle> Is there an iops profiling tool to give an indication of iops against dbpath vs index? I'm asking for too much right now. :P I could swear there was a command to give iops against a file under linux, but I can't think of it. That would be useful.