pmxbot IRC Log Viewer

[00:13:25] <Synt4x`> nevermind, got it

[01:29:25] <Synt4x`> anybody here?

[01:30:24] <joannac> yes

[01:31:08] <Boomtime> yeppers

[01:31:26] <Synt4x`> so I have a folder of a bunch of .bson I just got

[01:31:32] <Synt4x`> if I do mongorestore <folder>

[01:31:37] <Synt4x`> where does the DB get created?

[01:31:38] <Synt4x`> as what?

[01:32:13] <Boomtime> the bson is named

[01:32:44] <Synt4x`> there are like ~20 of them

[01:32:48] <Synt4x`> assets, games, players, etc.

[01:33:04] <joannac> so you'll get lots of collections called assets, games, players etc

[01:33:07] <Synt4x`> I know they will each be loaded as a collection, but within which DB?

[01:35:09] <Synt4x`> that makes sense right? the DB is different then the collection? like usually I had to say: use MY_DB and my collections would be there

[01:35:13] <joannac> yes

[01:35:19] <Synt4x`> but when I do a mongorestore like this I'm not sure where it goes to, which DB

[01:35:21] <joannac> whatever the folder is called

[01:35:37] <joannac> e.g. if I have a folder called abc, and files called def.bson, ghi.bson

[01:35:46] <joannac> the restore will go into database abc

[01:36:13] <joannac> and I'll have collections abc.def, abc.ghi

[01:37:17] <Synt4x`> ok perfect! I see how it works now

[01:37:19] <Synt4x`> thank you

[03:37:11] <crocket> Shit

[03:37:19] <crocket> MongoDB is more advanced than MySQL in many aspects.

[03:37:28] <crocket> It has 64bit datetime type.

[03:44:39] <MacWinner> sorry if this is rookie question.. how come parseInt returns a float?

[03:44:53] <MacWinner> i noticed in the logs it's returning stuff like 153.0

[03:44:57] <MacWinner> instead of 153.

[03:45:36] <joannac> huh?

[03:45:42] <joannac> Is this mongodb related?

[04:27:29] <shoerain> MacWinner: you're talking about Javascript's parseInt? https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/parseInt

[04:30:36] <crocket> parseThis

[07:49:08] <crocket> Yo

[07:49:17] <crocket> I want to write a CRUD web application using mongoDB.

[07:49:34] <crocket> It seems to require a lot of reading for a beginner.

[07:49:54] <crocket> It seems I need to read MongoDB CRUD Operations, Data Models, Aggregation, and Indexes.

[07:50:04] <crocket> The rest are optional.

[07:50:18] <crocket> I'm almost done with CRUD operations.

[07:50:22] <crocket> and Data Models

[07:50:30] <crocket> I need to read Aggregation and Indexes.

[07:50:51] <crocket> Can I avoid Aggregation and Indexes and get good results?

[07:54:53] <Boomtime> you definitely want indexes, you might get by without aggregation

[07:58:49] <crocket> Boomtime, I think MySQL didn't require a lot of reading to get started.

[07:59:21] <crocket> Or, did I underestimate my past efforts?

[08:00:09] <kali> crocket: i think you do, but there is a lack for organized tutorials in mongodb documentation right now

[08:02:31] <crocket> kali, Right

[08:02:36] <crocket> I had to go back and forth in the tutorials.

[08:02:38] <crocket> no

[08:02:40] <crocket> the manual

[08:02:43] <Boomtime> crocket: did you understand SQL before you started?

[08:02:48] <crocket> Boomtime, yes

[08:02:50] <crocket> Boomtime, MySQL

[08:02:53] <crocket> Postgres

[08:03:00] <crocket> A little bit of Oracle

[08:03:01] <Boomtime> so, you're syaing you had to learn something new?

[08:03:17] <crocket> Boomtime, like MySQL?

[08:03:31] <crocket> MongoDB is a new paradigm.

[08:03:35] <crocket> It's not RDBMS.

[08:03:39] <Boomtime> MongoDB is not SQL, so you had to start at an newbie level

[08:03:43] <crocket> I'm learning a new paradigm.

[08:03:59] <Boomtime> this can feel daunting, all I can say to you is that it's worth it

[08:04:00] <crocket> It suits nodejs well.

[08:04:23] <crocket> Nodejs uses JSON, and MongoDB stores JSON documents.

[08:04:45] <crocket> It's an object database

[08:04:50] <Boomtime> MongoDB stores BSON documents but the conversion is mostly direct and your point stands

[09:21:19] <Mmike> when I do db.collection.find() it spits out a json. Can I somehow modify that within mongo cli to have i 'pretty' printed?

[09:21:39] <Mmike> printjson(...) doesn't seem to work here (or I have wrong idea on how prinjson should work)

[09:27:34] <Mmike> ha

[09:27:37] <Mmike> .pretty()

[09:28:18] <Zelest> Zelest.pretty() == false

[09:28:19] <Zelest> :'(

[14:34:57] <agenteo_> hi is the mongo driver for ruby supporting collection.copyTo ?

[14:35:05] <steveholt> in a query, how do I specify collection ?

[14:35:20] <steveholt> (mongoose)

[14:51:10] <Bitpirate> I'am using mongoose to update/insert (when not exitst) a document, how can i do this? because save() just keeps inserting

[15:45:22] <quuxman> Ironically I think when you really get into the details of queries and document's, Python's type system does a much better job representing mongo input and output than JS

[15:45:56] <quuxman> but for almost all practical purposes you can just treat the data as dicts

[15:45:57] <GothAlice> quuxman: Yes and no. The SON container types had to be written (instead of relying on the underlying and built-in list/dict types) due to the need to preserve order of the mappings.

[15:46:24] <GothAlice> {name: 1, age: -1} — the result of sorting on this as a Python dictionary is undefined.

[15:46:34] <quuxman> (I was responding to crocket saying Mongo works well with nodejs :-P)

[15:46:53] <GothAlice> I'm suggesting each language has had its compromises. ;)

[15:47:44] <GothAlice> The callback structure in Node makes debugging somewhat painful, as we've noticed over the last week. ^_^

[15:47:51] <quuxman> In Python it's actually still a little obnoxious to write the order, because whenever you write { ... } in code, the order of ... is lost

[15:48:01] <GothAlice> Exactly.

[15:48:23] <GothAlice> I use a wrapper that takes positional string arguments, i.e. .sort('name', '-age') and does the right thing.

[15:48:33] <quuxman> And I'm pretty sure Python doesn't support overloading parsing, so {} is not a dict in some cases

[15:48:33] <cheeser> document order should be insignificant anyway outside of an array

[15:48:54] <GothAlice> quuxman: False. {foo: bar} will always result in a dict. {foo, bar} will result in a set.

[15:49:05] <quuxman> right, I was trying to say that

[15:49:23] <GothAlice> It's a pretty clear syntax difference, though; one not often confused.

[15:49:26] <quuxman> oh, right. I needed to put a ':' in there

[15:49:42] <quuxman> I actually didn't know about {foo, bar}. Cool

[15:49:51] <GothAlice> cheeser: It's not. If you index on a field containing sub-documents, and want to match an exact sub-document, order is critical.

[15:50:10] <cheeser> how?

[15:50:15] <quuxman> can you write set comprehensions somehow?

[15:50:18] <GothAlice> {name: "Bob", age: 27} != {age: 27, name: "Bob"}

[15:50:23] <GothAlice> quuxman: Yes, yes you can.

[15:50:31] <cheeser> oh for comparing document to document. right.

[15:50:36] <GothAlice> (The two documents hash differently.)

[15:50:39] <quuxman> I guess it looks like a dict comprehension without a ':' on the LHS

[15:50:46] <cheeser> but that's a shortcoming of the comparison

[15:51:05] <quuxman> Sometimes I learn more about Python in #mongodb than in #python (most of the time)

[15:51:07] <GothAlice> cheeser: I like fast hash comparisons. Deep field-by-field comparison would be hideously inefficient.

[15:51:49] <GothAlice> quuxman: (apologies to the #mongodb gods about this) but I started ##python-friendly after dissatisfaction with #python.

[15:51:55] <quuxman> this is amazing. Set comprehensions. :-)

[15:52:10] <GothAlice> quuxman: Technically it's a set notation generator comprehension, but…

[16:15:22] <winem_> good evening. I have a quick question regarding a small development setup. I have 2 nodes that can be used to run mongodb services. We use the recommended setups with 11 nodes in the prod env (more shards will be added soon)

[16:15:35] <winem_> which service would you run on which of the 2 hosts in the dev environment? or would you recommend a 3rd one?

[16:16:46] <winem_> my current plan is to run on node 1: a config srv, a mongos, 1 shard member. on node 2: option 1: a 2nd shard with 2 replica, to have a whole replica set or 2 shard members on node 1 and only the 2 replica per shard on the 2nd node

[16:16:50] <winem_> any recommendations?

[16:18:11] <GothAlice> winem_: In development I use something like https://gist.github.com/amcgregor/c33da0d76350f7018875 to start up a replicated, sharded, authenticated cluster on one box for testing. This could be spread across two fairly easily. (There are two shards in this setup…)

[16:20:27] <winem_> great, let me check your link.

[16:38:52] <GothAlice> winem_: Was the script useful?

[16:39:48] <winem_> got a call. so I was just able to download it and open it in sublime yet

[16:39:50] <winem_> let me check it now

[16:50:33] <winem_> it looks good :)

[17:15:59] <winem_> GothAlice, did you write it?

[17:16:12] <GothAlice> That script? Yes.

[17:16:19] <GothAlice> It's an extract and simplification from my at-work code.

[17:18:02] <winem_> great. I think I'll use it

[17:18:46] <winem_> but is the tailing \ in line 50 e.g. ok? wouldn't this mean that the shell would expect some more parameters?

[17:20:18] <winem_> ignore it, used the wrong syntax highlightin

[17:20:20] <winem_> g

[17:20:33] <winem_> looked like the following lines beginning -- were comments.. now it looks fine

[17:28:09] <GothAlice> winem_: Must have been set to a SQL syntax, I'm guessing. ;) You should be able to modify the vars at the top and run locally if you wish, just to try it out.

[17:28:59] <winem_> yes, already doing so :)

[17:29:15] <winem_> and I check an option to merge the duplicate lines like 83 and 84

[17:29:54] <winem_> I guess you could do it with a single echo commant using tee to pipe the output to stdout and stdin of mongo

[17:34:39] <GothAlice> Aye; I was lazy when stripping those functions of confidential code. XD

[17:35:34] <GothAlice> (The real code uses ebegin/eend with proper error handling. It's quite pretty.)

[17:36:18] <winem_> lazy admins are good admins - this is one thing the developers and admins told me when I started to discover linux and work as an admin. some years later, I know what they mean :)

[17:36:46] <GothAlice> winem_: My philosophy is: "Anything worth doing twice is worth automating." ;)

[17:37:14] <winem_> perfect ;)

[17:37:22] <winem_> will also check out your python scripts

[17:37:38] <GothAlice> winem_: Actually, my philosophy is https://gist.github.com/amcgregor/9f5c0e7b30b7fc042d81, but that's one of them. ;)

[17:37:42] <GothAlice> ^_^

[17:41:50] <winem_> I'm just at rule 45 but I agree with all of them and think you'd love books and talks of uncle bob :)

[17:42:20] <winem_> and some of them made me laugh - in the good way ;)

[17:48:39] <GothAlice> Away, minion of Bob! :P http://cl.ly/image/0n3u2s3A1e04

[17:49:36] <winem_> still had no chance to test your script... due to the rogers story :D

[17:49:41] <GothAlice> XD

[17:55:02] <winem_> I really like your coding style :)

[17:57:04] <GothAlice> I develop "best practices" and stick to them. Recently I've changed my FOSS project pattern to something more like: https://github.com/marrow/marrow.schema

[17:57:11] <winem_> the services will be run by root, right?

[17:57:14] <GothAlice> (That has more individual tests than executable statements in the library.)

[17:57:44] <GothAlice> winem_: No. In my real script I use start-stop-daemon to do the launching and service stopping, and it allows you to chroot and setuid/setgid on startup.

[17:58:51] <winem_> will take a look at the FOSS project on the weekend. it sounds good because I also have to rewrite a lot of setup scripts from prior colleagues without any error or exception handling..

[17:59:51] <winem_> and it's quite funny if you run a script without any errors but only a few things work like expected..

[18:01:35] <GothAlice> winem_: Note #37. ;^)

[18:02:57] <GothAlice> (Alice's Law #37, that is.)

[18:03:25] <winem_> hehe, yes :)

[18:10:06] <StanAccy> does Mongo auto update indexes when adding new elements or do we have to manage the index manually?

[18:10:31] <GothAlice> StanAccy: If you ensureIndex (create it at least once), you're good to go. MongoDB will deal with it for you.

[18:10:39] <StanAccy> GothAlice: Thanks.

[18:11:06] <StanAccy> with regards to fault tolerance, does MongoDb also ensure that persisted data is copied to at least 2 different locations?

[18:11:37] <cheeser> StanAccy: depends on your write concern

[18:11:44] <GothAlice> StanAccy: If you have a write concern of 2, then yes.

[18:11:51] <cheeser> but ultimately, every replica set member will have a copy of all of your data

[18:12:04] <StanAccy> cheeser: So its configurable, great. Can I config async vs sync copies?

[18:12:20] <cheeser> that's what the write concern is for

[18:12:28] <StanAccy> cheeser: Thanks

[18:12:30] <cheeser> np

[18:13:15] <GothAlice> StanAccy: In my logging code I have a write concern of zero (I don't care), and I don't even check if the server received the data. Makes the inserts very, very fast, even for ridiculously large amounts of data. (I.e. every HTTP request, including session data, and every HTTP response are logged, cost is ~6ms.)

[18:19:28] <winem_> Goth, how would you use the 2nd node I have... some more shards or slaves? any recommendation?

[18:21:58] <GothAlice> winem_: From my script, right now both replicated shards are on one boxen. You could split each shard onto its own boxen. (Two boxen, each with three replica nodes, arranged into a sharded pair.)

[18:22:15] <GothAlice> Obviously the replication is somewhat pointless on one box, but this is a test rig. ;)

[18:23:34] <winem_> yes, it's only the test rig, so I ask me if I have to use the 2nd node for our devs or for my team - the ops :)

[18:28:23] <winem_> and another question you might answer. I was on a talk at a conference about mongoDB in production use and I'm very sure that she said you can use a list of "columns" as shard key and not only a single one

[18:28:40] <winem_> but I can't find that on docs.mongodb.org

[18:28:48] <GothAlice> My production cluster is divided into three pools: application servers, database nodes, and ops. Ops manages and automates everything else a la https://gist.github.com/amcgregor/4342022 (VM name pool and tracking collection), https://gist.github.com/amcgregor/2032211 (logging), etc.

[18:29:15] <winem_> I'd like to use a shard key of a column with a timestamp, application name, userid

[18:29:19] <GothAlice> http://docs.mongodb.org/manual/core/index-compound/

[18:29:33] <GothAlice> winem_: Also note generally you don't need "creation date" timestamps as the _id ObjectId already includes this.

[18:30:22] <winem_> it's a timestamp when the event was written on the ipad. not the date when the data set has been synced with the backend. :)

[18:30:48] <GothAlice> It's something a lot of people do without thinking after migrating from SQL-land. Rich IDs are neat, and often unexpected.

[18:31:02] <winem_> ok, that was too easy... so I can just create an index over several columns and use this index as shard key?

[18:31:06] <GothAlice> Yup.

[18:31:14] <winem_> perfect :)

[18:31:44] <GothAlice> MongoDB hands out bonus points if you can formulate queries that only need to touch indexes. (indexOnly in the explain)

[18:31:51] <GothAlice> :)

[18:32:51] <winem_> yes, I started with several mysql clusters using everything you can imagine regarding replication, HA with corosync, pacemaker, etc, DRBD and so on.. then I switched to a new company and had only a few (unfortunately really old and basic) mysql hosts and to use oracle

[18:33:44] <GothAlice> We rolled out own HA oplog replication on Postgres. As a nifty fact, the experience from doing that (and automating it) resulted in my current behaviour of not giving my database servers any form of permanent storage.

[18:34:12] <winem_> now I want back to relational databases and I don't like mysql any more - since oracle bought it. some things like updates and release notes are poor documented, etc. so I decided to use postgres and to create the base for our new tracking services with mongoDB which inspires me a lot :)

[18:34:32] <GothAlice> http://grimoire.ca/mysql/choose-something-else

[18:34:45] <winem_> the current datawarehouse is based on mssql.. really painful setup..

[18:34:54] <cheeser> GothAlice: i know that author pretty well :D

[18:35:06] <GothAlice> :D

[18:35:28] <GothAlice> winem_: https://gist.github.com/amcgregor/1ca13e5a74b2ac318017 — this is an example pre-aggregated click tracking record and a minor query to get stats back by day of the week. :)

[18:35:40] <winem_> ok, oplog went on my todo-list too :)

[18:36:19] <GothAlice> winem_: WAL-e.

[18:36:20] <mike_edmr> ObjectId already includes this.

[18:36:20] <mike_edmr> 18:29 -!- _yoy_ [~YoY@ltl56-2-88-169-169-201.fbx.proxad.net] has quit [Ping timeout: 255 seconds]

[18:36:25] <mike_edmr> err

[18:36:39] <GothAlice> winem_: https://github.com/wal-e/wal-e

[18:36:51] <winem_> thanks. saves some time and I see how someone else is using it :)

[18:36:51] <GothAlice> winem_: https://developer.rackspace.com/blog/postgresql-plus-wal-e-plus-cloudfiles-equals-awesome/ for a longer write-up.

[18:37:13] <mike_edmr> I made a comment on a nodejs package, telling the author he could get rid of his _created field since its redundant with objectId

[18:37:20] <mike_edmr> he said it was too awkward to query IDs on date

[18:37:27] <GothAlice> mike_edmr: False. XD

[18:37:37] <mike_edmr> not my opinion, just his ;)

[18:38:14] <winem_> :D

[18:38:16] <mike_edmr> nobody cares about data normalization /optimization anymore.. brings a tear to my eye

[18:38:18] <GothAlice> db.collection.find({_id: {$gte: ObjectId.from_datetime(begin), $lt: ObjectId.from_datetime(end)}})

[18:38:28] <GothAlice> Works like a hot darn, actually.

[18:39:35] <winem_> mike_edmr, let's create a self-help group :)

[18:39:58] <winem_> no one cares about normalization and optimization (indexes!!!!!)

[18:40:50] <cheeser> normalization isn't always an optimization.

[18:41:09] <cheeser> optimization is a function of use case

[18:41:15] <winem_> a friend studies in hamburg, germany... and their prof said "let's talk about mysql and database structures today"... she was quite happy and started her shell...

[18:41:27] <GothAlice> That's why I write things like complete C10K-capable, HTTP/1.1-compliant web servers in 171 opcodes… and other insane simplifications like that. ;)

[18:41:41] <winem_> then the prof says "please open this link. we'll use phpmyadmin, it's everything a developer needs"... this made me cry :(

[18:41:50] <cheeser> i just netcat

[18:42:20] <winem_> GothAlice, looks like you would be a perfect team member for my ops team :)

[18:42:45] <GothAlice> cheeser: I reverse engineer on-disk innodb structures and read the hex dumps straight. ;)

[18:42:57] <GothAlice> winem_: Alas, I'm needed where I am. ;^)

[18:43:03] <cheeser> cute. hex.

[18:43:26] <cheeser> http://xkcd.com/378/

[18:43:27] <GothAlice> My first programs were all written in hand-produced machine code.

[18:43:47] <winem_> Goth, perfect! high performance mysql.. one of the best books for mysql lovers like we are :)

[18:43:52] <winem_> pretty cool :)

[18:44:05] <winem_> have to leave the chat for a short meeting.. will be back in 20 minutes

[19:13:57] <winem_> sounds like you had a lot of fun...

[19:14:45] <GothAlice> It took 36 hours straight to recover everything, and that bout of insomnia was immediately followed by extraction of my wisdom teeth and four impacted molars in front of the wisdom teeth.

[19:14:50] <GothAlice> Fun? No.

[19:14:52] <GothAlice> :P

[19:15:05] <winem_> ouch...

[19:15:28] <GothAlice> (Knew I had no chance of recovering the data immediately _following_ dental surgery… so I had to rush to get it all recovered before.)

[19:15:32] <winem_> but I think that every admin has to go through his personal nightmare in his career at least once :)

[19:16:26] <GothAlice> Cross-zone failures on AWS (which, in theory, should be impossible) would certainly classify as a nightmare.

[19:16:53] <winem_> oh yes..

[19:17:35] <GothAlice> (That's when we switched to Rackspace. At least our SLA doesn't over-promise any more. ;)

[19:17:43] <winem_> one of my nightmares occured last week. we were a whole week without any production environment because one (yes, only one!) storage in the data center of our hoster has crashed...

[19:17:56] <GothAlice> …

[19:18:01] <cheeser> over promise. under deliver. that's the secret to a successful business, right?

[19:18:03] <winem_> don't be happy about an oracle RAC installation if both notes are on the same storage...

[19:18:36] <GothAlice> We use MooseFS distributed filesystems across our application servers. Single points of failure just don't happen here. ;)

[19:19:07] <winem_> the good point about it: my boss gave me the budget to switch to another hoster with 2 datacenters and using software and technologies which are up to date, 2 data centers, etc...

[19:19:19] <winem_> this was not planned before mid of 2015 :D

[19:19:44] <winem_> MooseFS? never heard of that, to be honest... is it comparable to hadoop?

[19:19:56] <GothAlice> Heh; I had to suggest replacing our office internet connection. I wanted to set up an in-house replica of our production data and realized it'd just destroy the connection.

[19:20:06] <GothAlice> winem_: MooseFS and Hadoop are very different beasts.

[19:20:44] <GothAlice> MooseFS is a FUSE filesystem that has per-file and per-tree replication control, undeletion, and minor versioning support, amongst other goodies.

[19:20:54] <GothAlice> It's meant to be used as a standard filesystem.

[19:21:09] <GothAlice> Also with 100% less Java.

[19:21:28] <winem_> sounds interesting...

[19:22:02] <winem_> but I guess I have enough new technologies and software I'd like to implement until the end of the year..

[19:23:17] <GothAlice> Yeah; my cluster stack has been growing for 6 years, now.

[19:23:25] <GothAlice> Pro tip: don't try to slap everything together all at once. ;)

[19:23:53] <winem_> exactly this is what I'd like to avoid

[19:25:11] <winem_> moving from oracle to postgres, using puppet, mongodb for our tracking data, replacing the applications manager with nagios and the movement of our whole production and staging environment to a new hoster should be enough for the next months :)

[19:25:27] <winem_> and I forgot that we started to use liferay as portal server...

[19:25:30] <GothAlice> Puppet nuuuuu…

[19:25:46] <GothAlice> Nagios nuuuu....

[19:25:54] <winem_> why not? I used this years ago and was quite happy

[19:26:06] <GothAlice> "used this years ago" — we have better things now.

[19:26:09] <winem_> what would you recomment? and nagios is not yet fix.. shinken or icinga would also be possible

[19:26:15] <GothAlice> It's like going with cpanel. XD

[19:26:21] <winem_> chef? or something completely different?

[19:26:29] <winem_> nothing against cpanel xD

[19:26:42] <GothAlice> chef/puppet/etc. are solutions tackled from the wrong direction. ;)

[19:27:14] <mike_edmr> i always thought they were backwards too

[19:27:15] <winem_> ok, then let's hear what you prefer. my goal is to implement as much automation as possible

[19:27:26] <GothAlice> Git.

[19:27:42] <GothAlice> That's my replacement, across several hundred nodes across several clusters.

[19:27:42] <winem_> and I play around with hubot...

[19:27:59] <winem_> so you use git for the setup of your servers?

[19:28:06] <GothAlice> Yup. Let me dig something up for you…

[19:28:29] <GothAlice> for cmd in $(git diff --name-only @{1}.. | xargs qfile -Cq | xargs --no-run-if-empty equery files | grep init.d/ | sort -u); do ${cmd} reload; done #truepowerofunix

[19:28:31] <winem_> sure :) I'm interested in everything I can learn from

[19:29:12] <GothAlice> That's a git post-merge hook that examines the last pull for which files changed, asks the system which packages own those files, then finds *all* files for those packages, filters to initialization scripts, and reloads the packages affected by the pull.

[19:29:16] <mike_edmr> Fabric in Python is push based config managment

[19:29:59] <winem_> holy heell... it's as easy as brilliant...

[19:30:01] <GothAlice> (With my management cluster, specifically Shodan, sending push notifications to the VMs over MongoDB capped collections when I push to Github, it gets notified, then notifies the VMs to pull.)

[19:30:03] <mike_edmr> http://docs.fabfile.org/en/latest/tutorial.html

[19:30:41] <GothAlice> I see no need whatsoever for any of those tools. Git + some tiny BASH script hooks works better.

[19:31:07] <mike_edmr> some people are not so fond of bash scripting

[19:31:22] <winem_> this is what dreams are made of... I use git for so many things but never thought about using it that simple

[19:31:33] <winem_> I love bash scripting...

[19:31:36] <GothAlice> All of my VMs are identical: I run a purely homogenous environment. On startup they examine the kernel command line for which role (branch) to check out, switch to it, and attach the hooks provided by the role.

[19:32:18] <GothAlice> Part of those branches are the symlinks that control /etc/init.d service registration.

[19:32:20] <winem_> which leads into setup scripts with > 1000 lines which might be realized with much less in other languages... but I just have to do some things in bash - otherwise it feels not right :)

[19:33:17] <GothAlice> winem_: You may notice I tend to do a *lot* with very few lines. ;)

[19:33:22] <winem_> GothAlice, this is brilliant.. this shows how much we're influenced by all the companies, magazines and communities who say that you have to use puppet, chef and so on... it can be that simple

[19:33:32] <winem_> yes, I did ;)

[19:34:09] <mike_edmr> ya there is a lot of irrational fear in IT of certain things

[19:34:14] <mike_edmr> like triggers in sql

[19:34:21] <winem_> I guess everyone of us does this sometimes... also in scripts.. some of them got a comment like "don't know what happens below, but trust me - it works" when you take a look at it some months later

[19:34:23] <mike_edmr> its a powerful tool. just use it smartly and no issues

[19:34:25] <GothAlice> The key point here is that if Shodan doesn't get two pings from a VM in a row it literally spins up a new VM to replace the dying one and kills the dying one.

[19:34:40] <GothAlice> Why have processes running to "repair" running systems when you can just replace the faulty component?

[19:34:41] <GothAlice> ;)

[19:34:44] <winem_> sql triggers are also very powerful and I like to use them for small things

[19:35:16] <winem_> good point :=

[19:35:18] <winem_> ;)

[19:35:31] <GothAlice> (That's what I meant by "wrong direction".)

[19:35:48] <sheki> cronos: when I try to read the profile locally it fails with heap.out: header size >= 2**16

[19:36:11] <winem_> sometimes, I miss the times without all the vms... using hardware, stonith devices, heartbeat, pacemaker, corosync and so on. :)

[19:36:24] <GothAlice> sheki: That's not good. Which "profile"?

[19:36:41] <sheki> sorry, wrong channel, nvm

[19:36:57] <GothAlice> ^_^ No worries.

[19:38:00] <GothAlice> winem_: I don't. While I did have remarkable physical boxen stability, VMs have let me take stability to the next level. (I don't measure reliability with nines any more… uptime on my primary is approaching three years continuous.)

[19:39:06] <winem_> yes, I know what you mean and you're completely right. but some thinks that required a lot of brain, nerves and nights are gone with virtualization

[19:39:22] <winem_> which is not bad ;)

[19:41:03] <GothAlice> E-gads! I only ever bother checking uptime on my clusters these days, but one of my physical boxen deployments for a local company has an uptime exceeding 8 years!

[19:41:24] <winem_> this is what it should look like!

[19:41:45] <GothAlice> ¬_¬ It's an AMD Athlon box. Holy mackerel.

[19:41:56] <winem_> :D

[19:46:01] <GothAlice> I'm, personally, waiting for the day they accidentally drywall in the server closet. I'm imagining following the CAT5 into room with no doors. XD

[19:46:34] <winem_> xD

[19:48:34] <winem_> the mongos is listening on 127.0.0.1 on port 37017, right=

[19:48:43] <winem_> to late to hit the right keys... -.-'

[19:49:08] <GothAlice> winem_: My testing script adds 10000 to the default port number to keep it out of the way during the test run. I'd recommend setting that variable back to the default.

[19:49:47] <GothAlice> (Or blank should work.)

[19:50:14] <winem_> ok. but this should be fine to have a small environment to play around

[19:51:10] <winem_> I know I could find this in the official docs, but I'm really tired and have to ride home 30 kilometers by bike.. listening on 127.0.0.1 also allows connections from remote hosts?

[19:51:34] <winem_> not sure if it has to be 127.0.0.1 or if it's the default and only allows connections from localhost..

[19:51:34] <GothAlice> winem_: No. Listen on 0.0.0.0 to do that.

[19:51:44] <winem_> ok. :)

[19:52:02] <GothAlice> "listen" determines which network interface is used when creating the server sockets. 127.0.0.1 = only loopback connections are possible. 0.0.0.0 = any.

[19:52:12] <winem_> switching the bind_ip in /etc/mongod.conf affects all instances / services, right?

[19:52:18] <GothAlice> (You can also specify things like the "internal network" interface, i.e. on rackspace/amazon.)

[19:52:30] <GothAlice> winem_: That script of mine uses no configuration files.

[19:52:55] <winem_> oh sure, sorry...

[19:53:04] <winem_> like I said I'm really tired :D

[19:53:26] <GothAlice> The default is to bind to all interfaces when unspecified.

[19:53:55] <winem_> working for more than 110 hours in the last week to create a lot of workarounds and setup servers in our development environment to take over the productive services temporarily was not very funny...

[19:54:17] <GothAlice> O_o

[19:55:10] <winem_> imagine the most unprofessional hosting company and you have still no idea of what I have seen in the week without the prod env...

[19:55:22] <cheeser> that still leaves you 8 hours to eat/sleep every day. no excuses!

[19:55:54] <GothAlice> vm bundle role=app tag=2.3.1 test register upload deploy cluster=production hosts=3 expire=4h — create three temporary (four hour) nodes in the production set after running the full test suite and creating, registering, and uploading a new VM image.

[19:56:04] <winem_> took them 2 and a half days to find out how to check if there is any data written to the tapes... and 2 more to find out how to restore the backups..

[19:56:11] <GothAlice> …

[19:56:13] <GothAlice> Wau.

[19:56:19] <winem_> Goth, I don't want to hear that :D

[19:56:33] <GothAlice> Could be worse. One hosting company physically lost a server of mine.

[19:56:42] <GothAlice> Literally couldn't find it in the racks.

[19:57:06] <winem_> another great example: they calculate 4 hours to setup a jdk, jboss and database driver.. my script does this in <5 minutes (including the time it takes to grab the sources from elsewhere)...

[19:57:18] <winem_> GothAlice, really? :D

[19:57:22] <GothAlice> Yeah.

[19:57:33] <winem_> ok, then I have something that might make you feel a bit better

[19:58:11] <winem_> https://www.youtube.com/watch?v=QHTeDus8AnY this was one of the best talks on the code.talks this year - at least it was pure entertainment

[19:58:57] <GothAlice> winem_: Speaking of performance and delays… compiling a kernel from depclean takes 54 seconds on my main cluster. It takes Ubuntu longer to download and extract the binary image. ;)

[19:59:12] <GothAlice> -j64 is glorious.

[19:59:50] <winem_> impressive :)

[20:00:02] <GothAlice> (I use Gentoo.)

[20:00:11] <cheeser> gentoo++

[20:00:21] <winem_> yep, gentoo++

[20:00:43] <cheeser> is there a mongodb ebuild for gentoo? i haven't used gentoo in a few years.

[20:00:51] <GothAlice> Certainly is.

[20:01:06] <cheeser> i'd have been terribly surprised if there wasn't.

[20:01:33] <GothAlice> MongoDB and PHP are two of the packages available in the cluster that *require* -j1 during compilation or they fail to compile. Fun stuff.

[20:04:10] <GothAlice> That's a great presentation, BTW.

[20:04:22] <GothAlice> I had to fire a client for doing things like import($_GET['_page'])

[20:04:48] <winem_> very good if you fire them... saw too often such situations without any consequences

[20:05:01] <GothAlice> He got hosed twice because of two different failures on his part.

[20:05:20] <GothAlice> My servers now explicitly disable fopen everything, the mail() function, and a bunch of other things. I've neutered PHP.

[20:06:42] <winem_> very good :)

[20:08:56] <winem_> what about developers with access to the prod env? :)

[20:09:38] <GothAlice> Depends on the cluster. One cluster is a general managed hosting system for "unwashed masses", so people have access to their own MooseFS homes. The dedicated application clusters are more strict.

[20:09:55] <GothAlice> (We also have a silly hat for people doing things in prod.)

[20:10:12] <GothAlice> Only *one* hat, notably.

[20:10:56] <winem_> we have a plush toy called blame :)

[20:11:41] <winem_> it's a fluffy black ball with huge teeth in a cage... ok ok, actually, it's on my desc

[20:11:58] <GothAlice> … our continuous integration board is powered by ponies.

[20:12:24] <winem_> installed a new git on a server and typed rm -rf in the wrong window.. how ever, it was installed again <5 minutes later. thanks to documentation and automation

[20:12:45] <winem_> which tools do you use for CI?

[20:13:02] <GothAlice> An incredibly important note: don't have the root git repo folder called ".git".

[20:13:21] <winem_> anyone did that? oO

[20:13:53] <GothAlice> https://gist.github.com/amcgregor/4947201 — Here's an ancient and ugly copy/paste merge of some of my Amazon startup automation.

[20:14:13] <GothAlice> https://gist.github.com/amcgregor/4947201#file-update-via-git-sh — /.git-root ;)

[20:15:09] <winem_> I understand your concept more and more... it's really interesting :)

[20:15:45] <winem_> where are you from?

[20:15:49] <GothAlice> Note the RAIDing of ephemeral stores. EBS is slow, so I didn't bother to use it. ;)

[20:16:08] <GothAlice> Currently located in Montréal, Québec, Canada. Formerly from a lot of places. ;)

[20:16:51] <winem_> and one more from montreal oO

[20:17:08] <GothAlice> Montréal is Canada's capitol of Python development.

[20:17:15] <winem_> canada is a great country and I'd love to live near whistler :)

[20:17:31] <winem_> really? this explains a lot...

[20:17:43] <winem_> do you know jardineworks, e.g.?

[20:18:03] <GothAlice> Also helps that Ubisoft is here. ^_^ When people on the street ask me what I do and I say "software engineering", they inevitably ask if I work for Ubi. XD

[20:18:09] <cheeser> ubisoft++

[20:18:18] <GothAlice> No, I don't know Jim, Bob, or Jane from Canada, though I'm sure they're very nice.

[20:18:23] <GothAlice> XP

[20:18:26] <winem_> xD

[20:18:42] <winem_> was just wondering :)

[20:19:13] <cheeser> do you know Bruce?

[20:19:25] <winem_> I could create a team with very smart guys from canada... frontend and backend developers, database administrators, even QA... should I move? ;)

[20:19:47] <samgranger> Hey guys

[20:19:49] <GothAlice> G'day, Bruce. Here's our secretary, Bruce, and our head of QA, Bruce.

[20:19:53] <GothAlice> samgranger: Howdy!

[20:20:11] <samgranger> GothAlice: Howdydoodledooo!

[20:21:33] <GothAlice> .ui co ro do .i mi'e alis.

[20:25:55] <idok> hello... I m trying to deploy my apps on debian server... I get this error : OperationFailure: database error: Can't canonicalize query: BadValue geo near accepts just one argument when querying for a GeoJSON point. Extra field found: $maxDistance: 10000

[20:25:55] <idok> It works on my ubuntu

[20:26:18] <GothAlice> idok: Server versions on both boxen?

[20:26:47] <idok> GothAlice: I just updated now : 2.6.5

[20:27:29] <samgranger> Out of interest, which one of these 2 options is the best for saving friend relationships? https://gist.github.com/levicook/4132037

[20:28:08] <GothAlice> samgranger: A real graph database.

[20:28:09] <idok> GothAlice: ouch.. when I run mongo now , I get http://paste.kde.org/pmyu71yai

[20:28:18] <GothAlice> samgranger: Unfortunately.

[20:31:49] <samgranger> GothAlice: thanks, I'll do some research

[20:32:24] <GothAlice> samgranger: Of the two, I'd stick with option #1 until you have enough data to properly benchmark test your queries. It's slightly more semantic. See also: http://stackoverflow.com/questions/5125709/storing-a-graph-in-mongodb and https://github.com/Voronenko/Storing_TreeView_Structures_WithMongoDB

[20:32:53] <GothAlice> TreeView != friendship relationships, but still, it demonstrates several interesting ways of structuring related data.

[20:33:35] <samgranger> GothAlice: Thanks - would I be best saving friend requests into a separate collection?

[20:34:01] <GothAlice> At work we got hackernews'd and lifehacker'd the same week; database growth of users and their entire social graphs was so significant, it killed the project. (Some of our algorithms naively attempted to match one record against all people records…)

[20:34:31] <GothAlice> (So yeah… think carefully about how you store graphs.)

[20:34:41] <samgranger> GothAlice: I'd like to prevent that :)

[20:34:55] <samgranger> I've been hackernews'd before and it was pretty hefty

[20:35:04] <samgranger> *too

[20:35:12] <GothAlice> Storage in a separate collection may be worthwhile, however it'd be easy to extract from the first approach of storing within the user records themselves, and indexes will make it fast either way.

[20:35:21] <GothAlice> And external storage complicates initial prototyping.

[20:36:04] <GothAlice> Also depends on if your relationships can be unidirectional, or only bidirectional. (I.e. real "friends" vs. "followers".)

[20:37:36] <samgranger> GothAlice: Real friends, not followers :)

[20:38:36] <GothAlice> Okay, if there is no chance for unidirectional, store externally in simple pairs: {_id: [ObjectId("user 1"), ObjectId("user 2")]} — and yeah, that's a thing.

[20:38:48] <samgranger> GothAlice: What project got hackernews'd btw?

[20:39:16] <GothAlice> matchFWD

[20:39:18] <GothAlice> Social job matching platform.

[20:39:35] <GothAlice> http://lifehacker.com/5981723/matchfwd-uses-your-social-profiles-to-find-you-the-perfect-job was the article. :)

[20:40:02] <GothAlice> Alas it's dead. Some of the code will survive in my current project, though: https://rita.illicohodes.com/

[20:41:30] <mike_edmr> ha i remember reading about it

[20:41:52] <GothAlice> XD

[20:41:56] <GothAlice> Small world.

[20:42:06] <GothAlice> (matchFWD was the thing with the Jython NLP code.)

[20:42:29] <winem_> Goth, I rebooted the server for some final tests, executed your script again (without bind_ip in $COMMON and I get the following messages during startup http://apaste.info/pro

[20:42:40] <winem_> but all serves seem to be up and running

[20:43:20] <GothAlice> winem_: Seems to have started twice somehow. Your PID files likely don't match real processes any more, so stopping may require manual intervention.

[20:43:22] <winem_> and I'm wondering why they are still listening on 127.0.0.1 if I remove bind_ip 127.0.0.1 from COMMON

[20:43:45] <GothAlice> winem_: Is it also listening on the other interfaces? (lsof | grep mongod | grep TCP)

[20:43:57] <winem_> ok, pidfiles might be a good hint. will check that :)

[20:44:24] <GothAlice> winem_: Nah; line 33 is the first "start" invocation finishing, the next line clearly indicates that "start" was invoked again, somehow.

[20:44:41] <GothAlice> Starting a second time potentially nukes the PID files, preventing the "stop" command from working.

[20:45:01] <winem_> yes, this would make sense... let me double-check the script...

[20:45:10] <GothAlice> winem_: https://gist.github.com/amcgregor/8ccb06ebc15c7fb4ddd4#file-mongodb-sh-L23 — start-stop-daemon is your friend. ;)

[20:45:43] <winem_> noooooooooooooooo -.-

[20:45:51] <winem_> ok, I have to go home now...

[20:45:53] <GothAlice> ?

[20:45:54] <GothAlice> :(

[20:46:04] <winem_> [root@bh-db-s002 scripts]# wc -l create_sharded_replica-set_on_localhost.sh -> 357 create_sharded_replica-set_on_localhost.sh

[20:46:11] <winem_> guess what I did :)

[20:46:19] <GothAlice> Oh gods.

[20:46:20] <GothAlice> XD

[20:46:23] <winem_> xD

[20:46:24] <GothAlice> Good job!

[20:46:26] <GothAlice> XP

[20:46:30] <winem_> thanks! xD

[20:46:44] <GothAlice> Hey, I've "killall -u root" before, accidentally one of my VMs.

[20:46:50] <GothAlice> ;P

[20:47:19] <GothAlice> "ps aux… wait, root shouldn't have any shell processes… killall -u root<enterohCRAP>"

[20:47:23] <winem_> did you watch the talk from the code.talks until the end?

[20:47:31] <GothAlice> Just going through the question period.

[20:47:39] <winem_> they mention a kill script, that starts to kill random processes when the load becomes to much :D

[20:47:52] <GothAlice> http://cl.ly/image/2J242J2w3H1t

[20:48:14] <GothAlice> The kernel's own OOM handler would do a better job than "random"...

[20:48:20] <winem_> this was probably their reaction when they found it

[20:51:16] <winem_> ok, have a nice evening GothAlice

[20:51:22] <winem_> bye

[20:51:24] <GothAlice> winem_: Have a great one!

[21:06:33] <samgranger> GothAlice: Do you use mongoose/node?

[21:08:24] <samgranger> Well, wondering how a Schema design would look like for that simple friend pair :/

[21:08:55] <StanAccy> I read that Mongo provides ACID props on a document level (i.e. row equivalent). Is there any way to guarantee tranasactional semantics on a series of document updates?

[21:15:59] <GothAlice> StanAccy: There are several approaches to that.

[21:16:05] <GothAlice> samgranger: Alas, I don't node.

[21:16:53] <GothAlice> http://research.google.com/pubs/pub36726.html

[21:17:20] <GothAlice> http://en.wikipedia.org/wiki/Two-phase_locking

[21:18:09] <kali> there is this doc from the mongodb doc too: http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/

[21:18:32] <GothAlice> Unfortunately because atomic operations are only document-level in MongoDB, you effectively need a "maintenance" worker (i.e. like Posgres' vacuuming) to commit these abstracted transactions.

[21:19:12] <kali> but getting full tx semantics will require lot of very hard work

[21:19:18] <GothAlice> Indeed.

[21:19:28] <GothAlice> If that's a hard requirement, use something more ACID than MongoDB.

[21:20:00] <GothAlice> ("Right tool for the right job" pragmatism, as much as I love MongoDB. Same for graphs. ;)

[21:20:09] <kali> having a few places where you need some stronger semantics than doc acidity in an app is kinda ok, if you know what you're doing. but don't start something with tx all over the place with mongodb

[21:24:44] <GothAlice> samgranger: Oh, also remember to sort the pairs before storing them. Don't want accidental inverse duplicates.

[21:26:00] <samgranger> GothAlice: MongoError: can't use an array for _id - should I really be saving them in _id?

[21:26:24] <GothAlice> Ah, right. Need to use a mapping there.

[21:26:24] <StanAccy> kali: Thanks for the link

[21:26:41] <GothAlice> samgranger: {_id: {a: ObjectId("…"), b: ObjectId("…")}}

[21:27:15] <samgranger> Ahh

[21:27:17] <GothAlice> Using a list would have simplified the queries quite a bit.

[21:29:48] <GothAlice> Interesting factoid: your friendship queries will run entirely out of the index. :)

[21:30:30] <samgranger> I should be saving this in a separate collection right? Just to make sure I'm understanding this correctly

[21:30:38] <GothAlice> Yes.

[21:30:48] <samgranger> quite new to mongo and really want to make sure I'm getting everything right :)

[21:31:51] <GothAlice> Separating this in your simplified case (no unidirectional links) will increase query efficiency. You'll only have to insert a record to make friends, rather than updating two friend records. (This is important in the case where the record grows and needs to be moved around on-disk to store the added value.)

[21:32:08] <GothAlice> s/friend records/user records

[21:33:04] <samgranger> should I add a reference to Users with the object's?

[21:33:09] <samgranger> id's*

[21:33:22] <GothAlice> samgranger: That's exactly what the embedded document stored in _id does.

[21:33:48] <GothAlice> (And the sorting thing still applies: a should always < b.)

[21:36:22] <GothAlice> I also used single-letter key names intentionally; since you may have a large number of individual relationships the space used by the key strings adds up quickly.

[21:37:34] <GothAlice> (a single-letter key takes 7 bytes on-disk: 1-type 4-length 1-string 1-null)

[21:57:19] <samgranger> Hmm I'm getting a cast error

[21:58:04] <samgranger> {_id:{a:545cc6f0b44ac305040a55f3,b:545cc74fb44ac305040a55f4}}

[21:58:11] <samgranger> path: _id

[22:02:47] <GothAlice> You have to define _id (or id, depending on how your abstraction layer deals with it) as being an embedded document (or generic arbitrary mapping without schema, again, depending on abstraction).

[22:02:50] <GothAlice> samgranger: ^

[22:03:24] <GothAlice> class SomeDocument(Document): id = DictField() # MongoEngine "generic" example.

[22:04:46] <GothAlice> class Relationship(EmbeddedDocument): a = ReferenceField(User); b = ReferenceField(User) \n\n class SomeDocument(Document): id = EmbeddedDocumentField(Relationship) # MongoEngine "complete" example.

Log file Viewer

Help | Karma | Search:

#mongodb logs for Friday the 7th of November, 2014