pmxbot IRC Log Viewer

[00:09:44] <GothAlice> Finally getting a chance to open-source some things from work. :D

[00:10:23] <cheeser> ~dance

[00:10:28] <cheeser> :(

[00:10:39] <GothAlice> https://github.com/marrow/mongo/blob/feature/query/example/dashboard/model.py?ts=4 < sample model, https://github.com/marrow/mongo/blob/feature/query/example/dashboard/controller.py?ts=4 < light weight isolated statistics generation API generating JSON using it

[00:10:41] <rendar> GothAlice: cool

[00:11:18] <rendar> GothAlice: finally? who prevented to do that before? :)

[00:11:26] <GothAlice> Time.

[00:11:29] <rendar> i see

[00:12:00] <GothAlice> This provides a clearer example of the "let's not obfuscate pymongo behind multiple layers of random stuff" philosophy.

[00:12:12] <GothAlice> marrow.mongo – the anti-ODM DAO.

[00:12:14] <rendar> lol :)

[00:14:12] <GothAlice> cheeser: The dancing comes from my JIRA tickets having been fixed, simplifying https://github.com/marrow/mongo/blob/feature/query/marrow/mongo/util/capped.py#L10-L45 to a very minor helper.

[00:14:20] <GothAlice> Vs. before with threads and signals and muteness and timers and …

[00:14:34] <rendar> pymongo are python bindings for mongodb right?

[00:14:38] <GothAlice> Correct.

[00:14:59] <rendar> which in turns run on pypy, iirc...

[00:15:14] <GothAlice> m.mongo aims to augment the existing tools, which after the face lift in 3.x, are quite pleasant to use out of the box.

[00:15:44] <GothAlice> rendar: https://travis-ci.org/marrow/mongo :)

[00:16:01] <rendar> :)

[00:18:56] <cheeser> GothAlice: which issue?

[00:19:16] <GothAlice> https://jira.mongodb.org/browse/SERVER-15815

[00:19:42] <GothAlice> Got accidentally fixed with the move to find/getMore commands.

[00:21:21] <GothAlice> cheeser: Started as https://gist.github.com/amcgregor/52a684854fb77b6e7395#file-worker-py-L62-L110 then became https://github.com/marrow/task/blob/develop/marrow/task/queryset.py?ts=4#L32-L85 before finally getting cleaned up in marrow.mongo

[00:44:35] <cheeser> GothAlice: ah!

[01:58:59] <monque> hey guys. got a question: lets assume ive a document structure like this: a embeds many b embends many c embeds many d. is there a way to query all d without starting at a ?

[02:31:12] <GothAlice> monque_: No. But I don't think you're asking what you think you're asking, nor modelling your data in a way that makes any sense.

[02:31:17] <GothAlice> http://www.javaworld.com/article/2088406/enterprise-java/how-to-screw-up-your-mongodb-schema-design.html

[02:34:22] <GothAlice> MongoDB supports, generally, only a single array per document for many operations, notably $elemMatch, the principal way of targeting a single sub-document array element for other operations. https://docs.mongodb.org/manual/reference/operator/projection/positional/#array-field-limitations

[02:49:45] <GothAlice> Additionally, variably named fields, i.e. $a.$b.$c.$d, are a no-go for any form of querying, really, except absolutely explicit, likely without the benefit of indexes.

[09:26:22] <SimpleName> document: a = { "normal_id" : "789", "nums" : [ { "num" : 2 }, {"num" : 3} ], "c" : 2 } what should I do if I want to find a document that nums[0]['num'] equal to 2

[09:26:35] <SimpleName> how can I write the query condition

[09:28:48] <x4w3> Good morning, how could i do a backup of my database in mongo, please?

[09:29:56] <ramnes> x4w3: mongodump

[09:30:05] <x4w3> ok tk i read about it

[09:31:18] <x4w3> ramnes: like this: mongodmongodump --host mongodb1.example.net --port 3017 --username user --password pass --out /opt/backup/

[09:31:29] <ramnes> yep

[09:40:16] <ramnes> yesterday I made an "awesome" list for MongoDB: https://github.com/ramnes/awesome-mongodb

[09:40:28] <ramnes> what do you guys think of it? :)

[09:42:09] <ramnes> I really gave my all and put all my Gimp skills into that logo

[09:46:54] <x4w3> ramnes: nice

[10:02:34] <x4w3> ramnes: mongodump --db 3garbage --out /home/desarrollo/egarbage | gzip -9 > mongo_bck_22_02_2016.tar.gz

[10:03:04] <x4w3> file maybe mongo_bck_22_02_2016.gz

[10:41:52] <nalum> hello all, I'm using the mongo-connector to sync data between elasticsearch and mongodb. If I have a dataset in elasticsearch already and stop mongo-connector then start it again, doing a complete resync, does it try to overwrite the existing data or does it see that it's there and skip it?

[11:00:36] <nalum> Anyone?

[11:04:29] <solata> Hi

[11:05:52] <solata> i have an existing mongo database with entities where different properties have static urls inside them. i would like to replace every occurance of for example /my/static/folder/* to /new/folder/*. is there a way to do this easily ?

[11:06:39] <solata> (except mongoexport ,load in texteditor, search and replace, mongoimport)

[11:09:05] <solata> something like collection.db.update({anyfield: { $regex: /^\/myfolder(.*)/ }}, { $set: { $foundfield: '/newfolder{CAPTUREGROUP0}'}}, {multi: true});

[11:44:51] <kurushiyama> solata: It would be easiest to iterate over the docs and do a bulk operatio, setting only the part you want to change.

[13:18:53] <dddh> https://image-store.slidesharecdn.com/17ac6c48-3686-481e-8ca2-cc377d43053c-original.jpeg

[13:41:25] <ramnes> lold

[13:58:45] <Ben_1> StephenLynx: last week u said it's not so performant to use a string _id like "device:sensor:timestamp" am I right?

[13:59:29] <StephenLynx> not as using separate indexed fields, yes.

[13:59:43] <Ben_1> https://www.mongodb.com/presentations/mongodb-time-series-data-part-2-analyzing-time-series-data-using-aggregation-framework

[13:59:58] <StephenLynx> tl,dr;

[14:00:03] <StephenLynx> whats the gist of it?

[14:00:48] <Ben_1> it's a 40 minutes video of an mongodb architect that is using that _id string instead of an compound index

[14:01:09] <Ben_1> on 20:00 - 23:00 he is explaining why

[14:01:54] <StephenLynx> i never said anything about compound indexes though.

[14:02:21] <Ben_1> using seperate attributes instead of that string index, is a compound index

[14:03:06] <StephenLynx> is not.

[14:03:43] <StephenLynx> its just multiple indexes on the same collection

[14:03:51] <StephenLynx> a compound index is an index that uses several fields.

[14:04:53] <StephenLynx> that regex must be left anchored

[14:04:58] <StephenLynx> its really limited on what you can do there.

[14:05:20] <StephenLynx> even more than a compound index, if you were to use one.

[14:05:38] <StephenLynx> you are just grasping for straws

[14:05:44] <StephenLynx> do w/e you want

[14:06:46] <Ben_1> for sure it have to be left anchored but I don't know any case for me that is not working with that way.

[14:07:14] <Ben_1> maybe just wanted to argue that it is not so bad as you said last week

[14:39:39] <Sagar> i am getting this error while using genghis.php app on listing my collection

[14:39:40] <Sagar> 2016-02-22T19:49:13.357+0530 I ACCESS [conn141] Unauthorized: not authorized on admin to execute command { listDatabases: 1 }

[14:39:42] <Sagar> what could be wrong?

[15:09:28] <Sagar> ..

[15:11:16] <Ben_1> Sagar: it seems your application is not authorized to use this command ?!

[15:11:50] <Ben_1> maybe wront credentials

[15:11:53] <Ben_1> *wrong

[15:19:11] <tantamount> What's the best way to map an array of DBRefs?

[15:19:38] <tantamount> It seems like when I query the object, the field is actually an object rather than an Array

[15:20:51] <tantamount> Even though GUIs recognize it as an array field

[15:20:53] <cheeser> come again?

[15:21:35] <tantamount> I basically want to do a join and store a new field which is the mapping of an array of DBRefs

[15:21:52] <tantamount> So, denormalize one of the fields stored in the foreign document

[15:22:49] <tantamount> I can do a "for" on the array of DBRefs but I thought JavaScript had mapping primitives now... although maybe only for arrays, and this array seems to actually be an object

[15:29:25] <tantamount> Array.fetchRefs seems to do the trick

[15:29:49] <tantamount> Not sure if that's even documented heh

[16:02:47] <tantamount> Why doesn't Array.unique() work on a list of DBRefs?

[16:03:31] <tantamount> I suppose it's because they're all different object instances?

[16:38:11] <uehtesham90_> hello, so i am running a cron job which reads files and inserts data from those to mongo db. and there is high possibility of getting duplicate errors (which i do)...i wanted to know what is the best way to handle this (using pymongo)

[16:38:36] <uehtesham90_> currently what i do is that for each data point, i first check if its in the database or not...if it is not, then i inserted

[16:38:40] <uehtesham90_> else skip it

[16:38:52] <uehtesham90_> i want to know if there is a better (or faster) way to do it

[16:38:59] <cheeser> why not just insert and ignore DupeKeyExceptions?

[16:39:17] <uehtesham90_> that was my alternate solution

[16:39:47] <uehtesham90_> but i do some pre-processing to update the record with additional data and then insert it

[16:40:15] <uehtesham90_> but regardless, these two solutions are good enough right?

[16:41:43] <cheeser> it's your app. you get to define that.

[17:26:19] <Silenced> mongodb is not working in my ubuntu 15.10 . How to install it the "working" way ?

[17:26:44] <cheeser> depends on what "not working" means

[17:27:15] <GothAlice> Silenced: Have you followed the instructions here https://docs.mongodb.org/manual/tutorial/install-mongodb-on-ubuntu/ ?

[17:27:32] <Silenced> cheeser: "Failed to start mongod.service: Unit mongod.service failed to load: No such file or directory."

[17:27:37] <Silenced> This is what i got

[17:28:05] <cheeser> sounds like a missing data dir

[17:28:17] <Silenced> GothAlice: Yes

[17:28:39] <Silenced> It worked in my Ubuntu 14.04.

[17:28:43] <Silenced> But not in 15.10

[17:29:42] <GothAlice> Silenced: From the article: "MongoDB only provides packages for 64-bit long-term support Ubuntu releases. Currently, this means 12.04 LTS (Precise Pangolin) and 14.04 LTS (Trusty Tahr). While the packages may work with other Ubuntu releases, this is not a supported configuration."

[17:30:10] <StephenLynx> i recommend centos, Silenced

[17:30:34] <StephenLynx> they keep up with their releases.

[17:30:40] <kurushiyama> I am totally with StephenLynx, Silenced.

[17:31:08] <Silenced> StephenLynx: What package manager does centos use ?

[17:31:09] <GothAlice> I'm a fan of Gentoo, as it lets me compile in some of the Enterprise edition features, such as SSL support, and disable JavaScript.

[17:31:13] <cheeser> Silenced: yum

[17:31:30] <cheeser> GothAlice: doesn't ssl come in the community builds now?

[17:31:34] <Silenced> cheeser: So can i go with Arch then ?

[17:31:41] <cheeser> Silenced: um. que?

[17:31:50] <Silenced> Means ?

[17:31:56] <StephenLynx> yum, Silenced

[17:32:06] <cheeser> how'd we get from centos/yum to arch?

[17:32:17] <StephenLynx> kek

[17:33:03] <Silenced> oh arch uses pacman :/ My bad. Confused

[17:33:52] <Silenced> Anyways. Let me try Centos then

[17:34:47] <GothAlice> cheeser: "Changed in version 3.0: Most MongoDB distributions now include support for TLS/SSL." Key emphasis on "most". ;P

[17:36:24] <Silenced> Well i found someother solution. There is docker image for mongo ;) Now i can run it

[17:37:58] <StephenLynx> >docker

[17:40:07] <cheeser> GothAlice: ah. i was pretty sure we'd plugged that gap.

[17:40:16] <cheeser> any idea of what distros don't have it?

[17:40:19] <cheeser> probably windows

[17:40:33] <GothAlice> Heh, yeah, I believe it's the Windows ones.

[17:40:38] <GothAlice> "When in doubt, compile it yourself!"

[17:40:48] <Derick> GothAlice: good luck on Windows with that one :)

[17:40:48] <cheeser> gentoo++

[17:41:56] <StephenLynx> >>windows

[17:41:59] <StephenLynx> :v

[17:42:05] <StephenLynx> #JUST

[18:17:42] <uuanton> is there any mongodb slack channel ?

[18:21:09] <GothAlice> … probably?

[18:22:12] <GothAlice> But several points come immediately to mind: IRC is more open, more capable, and gives the freedom of choice both of service provider and client, with effectively no use cost. Why pay for the privilege of vendor lock-in?

[18:22:46] <uuanton> i understand its free you just pay for a server

[18:23:12] <GothAlice> https://slack.com/pricing

[18:24:19] <uuanton> I just joined 4 different slack channel for free too bad its only few people there

[18:25:16] <GothAlice> Federation being a Future™ feature, yeah, I'd expect most Slack channels to be pretty small.

[18:26:14] <uuanton> GothAlice I remember you mentioned how to increase oplog to store 24hours of data

[18:26:16] <GothAlice> (And certainly, randoms joining a channel doesn't cost… the random joining the channel. But just think, with federation and a base price of $12.50/user/month, #mongodb alone would cost $3,787.50 per month.

[18:26:54] <GothAlice> https://docs.mongodb.org/manual/tutorial/change-oplog-size/

[18:32:17] <uuanton> so it requires to stepdown primary then hmmm

[18:32:42] <uuanton> last time i stepdown primary there were 10min application downtime

[18:32:49] <GothAlice> Indeed; capped collections can't be resized, AFIK, thus requiring a wholly new oplog be created. Can't have secondaries tailing the old one when that happens.

[18:33:49] <kurushiyama> uuanton: Then there is something _seriously_ wrong with your setup.

[18:34:40] <kurushiyama> uuanton: We do a stepdown randomly during failtests.

[18:34:55] <uuanton> intresting

[18:35:50] <GothAlice> I randomly kill -9 VMs during our own testing, and application failover is usually instantaneous. (Fast enough that it basically doesn't notice.)

[18:36:23] <uuanton> how do you check failover in newrelic ?

[18:37:11] <StephenLynx> yeah, slack is bullshit

[18:37:11] <kurushiyama> GothAlice: Well, we do see failed writes, but we simpyl rewrite after a certain threshold. uuanton: I can only speak for myself, but I use scripts and MMS

[18:37:36] <GothAlice> StephenLynx: Hey now, it might be silly, but clearly it provides something _someone_ wants. ;P

[18:37:48] <StephenLynx> yeah, it provides slack with money :v

[18:37:53] <GothAlice> Mostly that.

[18:38:14] <StephenLynx> its pretty much how MS got into the enterprise business.

[18:38:28] <kurushiyama> Well. Slack... isnt that the service where you get private chat rooms on demand? Wow, that's a totally new idea, nobody had in the early days of the internet...

[18:38:47] <StephenLynx> have a complete bullshit and inferior product than the open and free alternatives, market the crap out of it, make money

[18:39:08] <StephenLynx> they got the extend of buying articles saying IRC was dying

[18:39:13] <StephenLynx> because people were migrating to it

[18:45:55] <uuanton> slack is for newfags

[18:47:04] <uuanton> but i like it a lot

[18:51:18] <kurushiyama> uuanton: A customer "invited" me to it. I find it as usable as crap.

[18:51:58] <GothAlice> So, not sure if something changed during one of my "naps", but can we keep the phrasing at least somewhat polite?

[18:52:22] <GothAlice> This isn't #bashslack.

[18:53:17] <kurushiyama> GothAlice: Please take my sincere aplogies.

[19:55:15] <nalum> hello all, I've attempted to set up mongo-connector to deal with a single collection but it doesn't want to sync the data. The collection has 4 documents in it, if I switch to a different collection, which has ~1000 documents, it syncs fine. Any ideas on what might be happening? There is an error about SSL validation but nothing else.

[19:55:32] <nalum> the SSL validation happen on the working sync config as well

[20:01:54] <nalum> this is the config - http://pastebin.com/uiD9TYUA

[21:15:05] <Doyle> Hey. What's the best way to estimate the iops required for separate journal and index volumes?

[21:30:17] <uuanton> how big mongodb log drive should be to accommodate 700gb database replica. Does logappend=true good idea ?

[21:35:08] <Doyle> uuanton, you'll want to setup a logrotate script of some kind against that directory. You can send SIGUSR1 (check on that) to initiate a log rotation. I have a cron that looks at the current log filze size, if over 2g, send signal to pid, & zip old log file. Check if more than x tar.gz in directory, delete oldest

[21:44:43] <uuanton> thanks

[21:45:15] <Doyle> Would it be correct to say that your journal & data volumes have to have the same iops? And that if running a separate index volume, you typically want as many iops as possible?

[21:48:26] <uuanton> i put both journal and data on same drive

[21:49:20] <uuanton> 2100 / 3000 iops

[21:51:06] <Doyle> I'm looking at the old mongodb aws piops doc, and it's saying that it gives the journal a 25G drive with 250 piops

[21:51:28] <Doyle> Letting the journal sit with data on a drive with an overhead in iops seems reasonable

[21:52:06] <torak> Hello

[21:52:29] <uuanton> the reason i did that because I snapshot it a lot and restore it

[21:52:56] <Doyle> uuanton, yea, the consistency required when backing up with snapshots is a concern also

[21:54:27] <torak> I moved to the mongodb from Parse.com but i am not sure about how should i keep the 'pointers' they call it like that in parse. I mean There is a column for holding the owners objectId this can be stored as one string. but i dont know why they make them an object and put 3 strings in it, they use it like that: "List_ID" : {

[21:54:27] <torak> "__type" : "Pointer",

[21:54:27] <torak> "className" : "List",

[21:54:27] <torak> "objectId" : "3RnaMPdX3D"

[21:54:27] <torak> },

[21:55:00] <torak> But i can use this just like that, "listId": "332f2ed"

[21:55:07] <Doyle> uuanton, yea, the consistency required when backing up with snapshots is a concern also

[21:55:41] <Doyle> The journal is super tiny, so it can't be eating much in terms of iops. 2TB db yields a 2G journal or so

[21:57:45] <torak> can anyone help ?

[21:58:34] <uuanton> torak i can't help sorry

[21:59:17] <torak> uuanton: thank you :(

[21:59:19] <GothAlice> torak: If the framework you are using has expectations, it's best to follow those expectations. As an important note, it seems its expectations are somewhat… "pants on head" in that DBRef is already a thing (https://docs.mongodb.org/manual/reference/database-references/) and ObjectIds aren't strings.

[22:00:10] <Doyle> Anyone know what the impact of having too many indexes is?

[22:00:34] <GothAlice> Wasted space and slower inserts/updates.

[22:00:51] <torak> GothAlice: hmm. i see. There is already a thing for it. So we dont have to create our referances by hand like we do in mysql

[22:00:58] <Doyle> bc all those indexes have to be updated then... right?

[22:01:03] <Doyle> on each insert/update

[22:01:42] <GothAlice> torak: Well, yes and no. MongoDB doesn't support joins except LEFT OUTER during an aggregate query pipeline (i.e. not during normal querying). So it's still 100% manual.

[22:01:44] <starfly> and remove

[22:02:00] <GothAlice> Doyle: Correct. Additionally, querying probably won't use them.

[22:02:26] <GothAlice> But the basic impact is that no insert/update/delete is "complete" until the indexes are updated.

[22:02:32] <torak> GothAlice: thank you for your help :)

[22:03:49] <GothAlice> torak: The biggest point here is that ObjectIds are ObjectIds, not strings. Storing them as strings a) wastes space (27 bytes instead of 11 per), and b) requires additional conversion to make them useful, since an ObjectId is actually a compound value containing creation time, server node, process ID, and a per-process counter.

[22:03:52] <Doyle> thanks GothAlice. db.stats outputs counts in bytes, right?

[22:04:07] <GothAlice> Doyle: Unless you pass in an argument telling it which scale you want, yes.

[22:04:22] <GothAlice> db.stats(1024) for KB, for example.

[22:05:08] <gzn> need help creating scalable db schema for the following prompt http://pastebin.com/gjxGG3F3

[22:05:19] <Doyle> Ah, nice, can it do 1024*1024?

[22:05:27] <Doyle> well... I can just put in that num :P

[22:05:30] <Doyle> Thanks

[22:05:33] <GothAlice> Doyle: Sure can, but that's the shell or your language doing the calculation, of course. ;P

[22:07:01] <GothAlice> gzn: That looks an awful lot like a homework question.

[22:08:56] <gzn> its for an interview, I was thinking database sharding. ive been working as a lamp dev so long and mysql doesnt work too great for partial searches. or i could do elastic search on mysql but i figure there has to be a better way

[22:10:16] <GothAlice> MongoDB sharding would naturally split up the query across multiple back-end mongod nodes according to the "sharding index". For example, you could randomly assign records to nodes by hashing the ID ("cheap/easy" but often sub-optimal), grouped by area code, etc.

[22:11:05] <GothAlice> Not sure about how mongos handles returning partial responses, but I'm guessing it'd only do that if sorting on $natural.

[22:11:34] <GothAlice> I.e. you'd be able to stream responses as the individual shards generate them, but only if not otherwise sorting (or needing to perform another operation over the whole set of results).

[22:12:05] <GothAlice> https://docs.mongodb.org/manual/core/sharding-introduction/

[22:16:42] <Doyle> GothAlice, when you said querying probably wouldn't use those indexes, why would that be?

[22:17:33] <GothAlice> Doyle: Because MongoDB uses a "we'll try to optimize as much as we can, but don't take too long doing it" approach to query planning. If it can't figure out a sane intersection fast enough, it'll simply ignore them. Previously it didn't even do intersections, meaning only one index could _possibly_ be used for any given query.

[22:18:37] <GothAlice> This is why having fewer indexes, but using compound indexes, is so strongly focused on. Note the "Prefixes" section: https://docs.mongodb.org/manual/core/index-compound/

[22:19:03] <Doyle> Amazing info, thanks GothAlice

[22:19:26] <GothAlice> Also why "arbitrary metadata" (having dynamically named fields on your documents) is basically a no-go: it's impossible to efficiently index "arbitrary data".

[22:20:00] <GothAlice> ({foo: 27, bar: 42} vs. {metadata: [{name: "foo", value: 27}, {name: "bar", value: 42}]})

[22:21:35] <GothAlice> The latter you can index {"metadata.key", "metadata.value"} quite easily, and that single index will help with "find documents with metadata named foo" queries, as well as "find documents with metadata named foo that equals 27" queries. The former… can't use indexes on $exists at all, so the first example query is basically out. ;P

[22:28:48] <Doyle> GothAlice, do you have any input on how many iops would be required for separate journal and index volumes under wired tiger? It seems the journal needs only a fraction of what's required by the dbpath

[22:30:40] <GothAlice> Alice's Law #146: Optimization without measurement is by definition premature.

[22:31:04] <GothAlice> Theorizing about iops is not entirely useful because it'll depend entirely on your dataset load.

[22:32:12] <GothAlice> On my dataset at home, for example, the journal sees about 2x as many iops as the actual dataset. Mostly because I'm lazy and didn't prune out a bunch of ineffective update-if-not-different queries that basically can't ever succeed (adding the update to the journal, but not actually touching the data files).

[22:35:30] <Doyle> Fair, thanks GothAlice

[22:51:26] <Doyle> Is there an iops profiling tool to give an indication of iops against dbpath vs index? I'm asking for too much right now. :P I could swear there was a command to give iops against a file under linux, but I can't think of it. That would be useful.

[23:19:57] <synapse> StephenLynx hey there

[23:20:44] <StephenLynx> wot

Log file Viewer

Help | Karma | Search:

#mongodb logs for Monday the 22nd of February, 2016