pmxbot IRC Log Viewer

[00:27:23] <macwinner> hi, for updating to mongo 3.2 from 3.0, can I just update each node and continue to use mmap.. then to move to wired tiger, I could take secondaries in the replica set, stop them, delete data directory, change to Wired Tiger in config, and then start secondary and wait for sync?

[01:53:21] <hdon> shi all :) i just read "What’s missing from MongoDB is a SQL-style join operation" at http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/ -- is this still the case today?

[01:55:44] <StephenLynx> no

[01:55:44] <StephenLynx> it never was, honestly. but I heard there is something like this on 3.2

[01:55:44] <StephenLynx> i think cheeser knows better about it.

[01:55:45] <StephenLynx> and that article is pure FUD.

[01:56:03] <hdon> StephenLynx, weird

[01:56:32] <hdon> well i'm not done with the article but it seems to be making a pretty certain allgeation (not the U in FUD)

[01:57:11] <StephenLynx> the thing is

[01:57:15] <StephenLynx> I read that article too.

[01:57:19] <StephenLynx> and the problem was not with mongo

[01:57:27] <StephenLynx> but with people using the wrong tool for the scenario.

[01:57:42] <StephenLynx> mongo is not bad, mongo just doesn't fit what they needed.

[01:57:46] <hdon> that's usually the case isn't it?

[01:57:52] <hdon> :3

[01:57:57] <StephenLynx> not according to that article title

[01:58:06] <hdon> but i'm wondering -- is mongo still the wrong tool for their use case?

[01:58:11] <StephenLynx> yes.

[01:58:17] <hdon> article titles are hyperbolic clickbait

[01:58:21] <StephenLynx> mongo is not a relational database, which is what they need.

[01:58:26] <hdon> mmm

[01:58:28] <hdon> ok, thanks

[01:58:44] <StephenLynx> if you system relies heavily on relations, then mongo is not what you need.

[01:59:06] <hdon> yeah, that is my use case. oh well, i'd wanted to use mongo for something for a while now.

[01:59:20] <hdon> can i ask, if mongo's weakness is relations/joins, where does it really shine?

[01:59:49] <hdon> maybe i'm just part of the rdbms cult and can't think about living without joins, but it seems to me to be a common need, and living in a world of maximum denormalization doesn't sound fun to me

[01:59:56] <StephenLynx> at large amounts of data, specially when its focused on reading and your model is flexible.

[02:00:04] <StephenLynx> and when your data is not too relational.

[02:00:08] <hdon> mmm

[02:00:35] <StephenLynx> mongo shines when it has to handle giganourmos datasets and you have to constantly add more servers to your cluster.

[02:00:51] <hdon> ahh

[02:01:07] <hdon> well thanks StephenLynx, i'll try to keep this in mind

[02:01:10] <StephenLynx> np

[02:01:19] <StephenLynx> I have a project where I use it: lynxhub.com

[02:01:31] <StephenLynx> I also use mongo to store uploaded files.

[02:01:52] <hdon> lol

[02:02:09] <hdon> an application based on mongo compared to PHP?

[02:02:21] <StephenLynx> I don`t use PHP

[02:02:23] <StephenLynx> I use node

[02:02:29] <StephenLynx> and you could use mongo with PHP

[02:02:31] <hdon> i hate php :(

[02:02:34] <StephenLynx> yeah, me too

[02:02:39] <hdon> oh but

[02:02:49] <StephenLynx> part of my goal with this project is to try and get people off PHP

[02:02:51] <hdon> i thought you were comparing lynxchan application to php with the 30x performance figure

[02:03:07] <StephenLynx> I am referring to the difference between node and PHP

[02:03:28] <hdon> yeah we use it where i work but i'm moving us away from it... the ceo keeps holding me back though. all time budgets go to maintaining legacy php code :(

[02:03:29] <StephenLynx> that averages around that on the benchmarks I saw

[02:04:07] <hdon> although i did do a project recently to challenge one of my ideas about php: i wrote a traditional client/server request/response preforking server in PHP. i was surprised it could be done, and pretty easily.

[02:04:25] <StephenLynx> preforking?

[02:05:54] <hdon> preforking: starting listening for connections, then fork, then wait for connections and serve requests. it's an old model, still used today. there are more efficient models though, but it's nice for applications where you need a separate process to serve each request and you can decide on the number of forks ahead of time

[02:06:16] <hdon> in my case it's just load balancing between other remote services and the number of processes scales 1:1 with the number of remote nodes, so it was the perfect choice

[02:06:39] <StephenLynx> ah

[02:06:40] <StephenLynx> get it

[02:06:50] <hdon> apache2 still ships preforking configurations i believe

[02:06:59] <StephenLynx> could your child processes communicate with the master process?

[02:07:48] <hdon> yes. as in my case, the first process run becomes the master process. you call pipe(2) or similar before forking, and then the parent and child each get one end of the pipe

[02:09:09] <hdon> another advantage of preforking is that it's somewhat easier to work with using common *nix signals. if for some reason one of your processes is hanging or something, you can send it a signal to deal with it.

[02:09:28] <hdon> but you don't need this model to be able to manage your server, it's just nice because it's easy

[02:09:54] <StephenLynx> it seems like standard cluster behaviour on node.

[02:10:15] <hdon> i've made several toys with node but never did a cluster with it

[02:10:23] <StephenLynx> it works great out of the box.

[02:10:55] <StephenLynx> you know if the process is a master or a worker, your master spawns workers, workers share the HTTP listener transparently.

[02:11:04] <StephenLynx> you can send messages back and forth

[02:11:13] <hdon> this is i assume some kind of library/framework executed by node which implements clustering?

[02:11:22] <StephenLynx> its part of the default library.

[02:11:24] <StephenLynx> cluster

[02:11:34] <hdon> oh, cool

[02:11:34] <StephenLynx> its part of the default library.

[02:11:36] <StephenLynx> cluster

[02:11:53] <StephenLynx> that project would have been much easier and efficient on that, probably.

[02:12:21] <StephenLynx> this is how people make node use all cores, since its single-threated.

[02:12:22] <hdon> >A single instance of Node.js runs in a single thread. To take advantage of multi-core systems the user will sometimes want to launch a cluster of Node.js processes to handle the load.

[02:12:23] <StephenLynx> threaded*

[02:12:38] <hdon> it says "thread" but i guess it means "process" ?

[02:12:54] <StephenLynx> no, it means thread.

[02:12:54] <hdon> or maybe it means "can only run your code in a single thread"

[02:13:11] <StephenLynx> yeah, its always single-threaded. this spawns multiple processes, each one with one thread.

[02:13:11] <hdon> mmmm

[02:13:52] <hdon> i worked on GPSEE (a spidermonkey-based competitor of nodejs that no one ever noticed) while nodejs was becoming popular. we had threading, but this is not strictly allowed by emcascript specs, certainly not the way we were doing it

[02:14:28] <hdon> i had designs to apply static code analysis to break up a program into parts that could be run in separate threads and profiling the code to see where best to break it up

[02:14:40] <hdon> but i was never budgeted time for that project :3

[02:14:43] <StephenLynx> maybe its part of V8 behaviour.

[02:14:49] <hdon> it is

[02:14:53] <StephenLynx> that never implemented threads.

[02:15:06] <StephenLynx> because sure, if your VM implements threads you could have JS threads.

[02:15:17] <hdon> yeah ecmascript does not (or did not at the time) ever allow multiple threads, so it was really a hack that we made it work at all

[02:15:24] <StephenLynx> :v

[02:15:54] <hdon> iiuc nodejs used a thread pool for i/o at one point (maybe still does?) though

[02:16:12] <hdon> which is totally reasonable

[02:16:24] <StephenLynx> I heard it does, behind the scenes.

[02:16:32] <StephenLynx> but its an internal behaviour.

[02:16:37] <StephenLynx> to listen to events.

[02:16:51] <StephenLynx> and your code still always run on a single thread

[02:17:08] <hdon> and in fact might enable some types of applications to optimize performance by offloading certain tasks to the node runtime to be executed outside of the javascript context (say, encoders and decoders running on your i/o before it reaches your js context)

[02:17:34] <StephenLynx> yeah, I do that.

[02:17:39] <hdon> :)

[02:17:41] <StephenLynx> my software generates graphs

[02:17:52] <hdon> like, plots and charts?

[02:17:56] <StephenLynx> sec

[02:17:57] <hdon> or like networks?

[02:18:06] <StephenLynx> http://lynxhub.com/graphs.js

[02:18:15] <hdon> ahhh

[02:18:22] <hdon> so how does that work?

[02:18:36] <StephenLynx> so what I do when I have to rebuild them all is to run one loop for each core

[02:18:58] <StephenLynx> and this core will launch commands to imagemagick and only proceed when this command finishes

[02:19:10] <StephenLynx> this way I always have one command for imagemagick running for each core

[02:19:11] <hdon> how do they dequeue their jobs? does nodejs cluster module provide task concurrency primitives?

[02:19:19] <StephenLynx> dunno

[02:19:25] <StephenLynx> I just listen to the callback

[02:19:32] <hdon> can i see that code?

[02:19:41] <StephenLynx> gitgud.io/LynxChan/LynxChan

[02:19:54] <StephenLynx> its on src/be/dbMigrations.js, at the bottom

[02:20:16] <StephenLynx> aah

[02:20:17] <StephenLynx> wait

[02:20:26] <StephenLynx> you mean, how do I know when each loop finishes?

[02:20:31] <hdon> no i mean

[02:20:43] <hdon> so the loops are working to exhaust a collection of imagemagick jobs

[02:20:51] <hdon> how do they get their jobs from a single collection?

[02:20:53] <StephenLynx> not exactly

[02:20:57] <hdon> oh

[02:20:59] <StephenLynx> they run based on a date

[02:21:08] <StephenLynx> once they are over the limit date, they finish

[02:21:18] <StephenLynx> once they all finish, I finish the operation

[02:21:43] <hdon> so each loop does jobs i%n==m or i/n==m or some other similar division?

[02:22:01] <StephenLynx> while date < maxDate

[02:22:18] <StephenLynx> or something, can't recall exactly

[02:22:23] <hdon> but how does each loop know that another process hasn't already done the job?

[02:22:28] <StephenLynx> they dont

[02:22:32] <StephenLynx> oh

[02:22:33] <StephenLynx> that

[02:22:46] <hdon> looking at the code

[02:22:49] <hdon> i think i misunderstood you

[02:22:52] <StephenLynx> they skip the amount of cores when increment the date

[02:23:07] <hdon> so you have ONE nodejs process, but more imagemagick processes (one for each core) ?

[02:23:11] <StephenLynx> so if I got 1 core, each iteration increases the date by X, 2 cores, 2X

[02:23:11] <hdon> oh ok

[02:23:14] <hdon> i did understand

[02:23:22] <hdon> this is what i mean by "i%n==m"

[02:23:34] <StephenLynx> I didnt get that :v

[02:23:38] <hdon> i is job number, n is number of cores, m is process/thread number

[02:23:46] <hdon> so it does the job when i % n == m :)

[02:24:11] <hdon> well i want to get to work on this project

[02:24:18] <hdon> thanks for the chat :)

[02:24:28] <StephenLynx> np

[02:25:06] <StephenLynx> and yes, one node core and one imagemagick process per core

[02:25:06] <StephenLynx> since the bulk of the CPU load happens on imagemagick

[02:25:32] <StephenLynx> my code gets to use 100% of all cores, since my own code generating the command doesn`t bottleneck

[02:34:26] <hdon> StephenLynx, i think the common wisdom is to run n+1 jobs since the first bottleneck is usually i/o :)

[02:34:46] <hdon> StephenLynx, it gets more complicated than that but it's the heuristic i've always used

[02:37:51] <StephenLynx> yeah, that could work, but my CPU tells me otherwise.

[02:38:42] <StephenLynx> and that could cause the system running to become unresponsive

[02:38:52] <StephenLynx> since it is designed to run while the server is running too.

[02:39:20] <StephenLynx> so I'd like to leave some breathing room

[08:23:11] <hex20dec> Hey people, how do I add authentication to my mongod.conf file so that I don't have to specify --auth everytime I launch the mongod daemon?

[12:23:05] <mylord> is search by subdoc key O(1) about the same as search by main doc key?

[12:24:16] <mylord> eg, { {a:1}, {b:2} } same as { a:{}, b:{} } ? find({a:{$exists:true}}) is same speed as find({a}) ?

[12:25:02] <mylord> cuz i’m deciding how to store, 1 bit obj, or ~6010 small ones

[12:25:29] <mylord> 1 big obj, or 6011 small ones.. what’s the pro/cons

[13:27:57] <mylord> how to find by property name?

[17:48:48] <molavy> hi

[17:49:34] <molavy> i am in development of service that show nearby client to each client according to some filter

[17:50:25] <molavy> i want know what is best implementation of record geolocation data and fast filter over it to find nearby

[17:50:31] <molavy> clients

[17:50:43] <molavy> or i should use another db engine

[17:52:02] <molavy> any idea?

[17:56:59] <molavy> there is some offer for saving data at https://docs.mongodb.org/manual/applications/geospatial-indexes/

[17:57:11] <molavy> which you prefer in this case

[17:57:14] <molavy> ?

[18:06:46] <hdon> hi all :) i understand that mongo does not have something like join, which means the application developer must implement something like join himself if he needs it. but to avoid the round-trip overhead between mongo and application, does mongo have anything like stored procedures?

[18:09:37] <StephenLynx> no

[18:10:09] <StephenLynx> and if your relation is 1-n you could have an array on your documents.

[18:10:18] <StephenLynx> instead of a separate collection.

[18:10:51] <StephenLynx> given that your sub-documents are not too complex, that usually causes more trouble if you use complex nested sub-documents

[18:11:14] <hdon> hmm... but, there appears to be "stored javascript" -- this is not similar to a stored procedure?

[18:11:41] <StephenLynx> link

[18:11:44] <hdon> http://dirolf.com/2010/04/05/stored-javascript-in-mongodb-and-pymongo.html

[18:12:06] <StephenLynx> nope

[18:12:12] <StephenLynx> that code runs on client-side

[18:12:14] <StephenLynx> not on the db.

[18:12:27] <StephenLynx> wit

[18:12:29] <StephenLynx> wait*

[18:12:31] <hdon> https://docs.mongodb.org/manual/tutorial/store-javascript-function-on-server/

[18:12:33] <StephenLynx> let me double check it

[18:13:30] <StephenLynx> >Once loaded, you can invoke the functions directly in the shell, as in the following example:

[18:13:44] <StephenLynx> >Once you save a function in the system.js collection, you can use the function from any JavaScript context; e.g. $where operator, mapReduce command or db.collection.mapReduce().

[18:14:17] <hdon> yes

[18:14:20] <StephenLynx> hm

[18:15:07] <StephenLynx> I have a guess they run on client-side still.

[18:15:10] <StephenLynx> but I am not sure.

[18:15:28] <hdon> read the second link

[18:15:47] <hdon> it warns at the very beginning:

[18:15:48] <hdon> >Do not store application logic in the database.

[18:15:58] <hdon> >There are performance limitations to running JavaScript inside of MongoDB

[18:16:04] <StephenLynx> ah

[18:16:14] <StephenLynx> yeah, it runs on the server.

[18:16:34] <hdon> hmmmmmmmm

[18:17:26] <StephenLynx> but honestly

[18:17:35] <StephenLynx> how many queries are you saving there?

[18:17:36] <StephenLynx> one?

[18:17:38] <StephenLynx> two?

[18:18:33] <StephenLynx> if the amount of queries you have to perform for a fake relation increases with amount of data, you should change your model.

[18:19:50] <hdon> well in our discussion yesterday, iirc you suggested that mongo was not a good choice for highly relational data set. if this is a widespread view among mongo users, maybe they are underutilizing stored javascript to achieve performant relational schemas?

[18:21:04] <StephenLynx> nah

[18:21:11] <StephenLynx> that is abusing the feature.

[18:21:13] <StephenLynx> and a hack.

[18:21:17] <StephenLynx> you saw the warning.

[18:21:56] <StephenLynx> these users would just use a different db if they needed highly relational features.

[18:23:18] <hdon> the warning says not to put application code in mongo. but schema code, such as allowing for join, is where it makes perfect sense. at most it will getprop, invoke a new quuery, and setprop, in order to compose the response document.

[18:23:38] <StephenLynx> the warning says there are limitations.

[18:24:12] <StephenLynx> and by now the map-reduce has fallen out of use

[18:24:17] <StephenLynx> in favour of aggregation.

[18:24:34] <StephenLynx> so IMO is not a tool that is getting too much attention from developers.

[18:24:51] <hdon> it says there are performance limitations, which should scale with number and type of operations, which is why i think it might be okay for for (getprop,query,setprop)

[18:25:11] <StephenLynx> ¯\_(ツ)_/¯

[18:25:32] <hdon> :3

[18:25:41] <StephenLynx> just don't write yet another "why mongo is worse than ebola cancer after I used it wrong" article.

[18:26:00] <hdon> haha

[18:26:06] <hdon> not my style to create clickbait

[21:37:19] <Freman> any way to tell mongo/mongos to cancel a query when the client disconnects?

[21:38:54] <cheeser> not really, no.

[21:39:39] <Freman> bugger, even with queryguard preventing unindexed queries they're still able to run some qb killing queries, and then they don't wait for them to finish and just disconnect

[21:39:58] <Freman> I suppose I can get queryguard to remember the query then find/kill it

[21:40:32] <cheeser> depending on the driver, you might be able to get at the operation/cursor id and work something out but you'll have to dig around under the hood to work that out

[21:40:52] <Freman> I'm parsing the protocol on the wire, you don't get much more under the hood :D

[21:44:01] <StephenLynx> any particular reason for that instead of using a higher level driver?

[21:45:37] <Freman> proxying their connections to prevent them running queries that don't use an index. Ideally they wouldn't try connecting with robomongo/cli at all and they'd just use the web interface, but *shrug*

[21:45:54] <Freman> https://github.com/freman/queryguard

[21:46:35] <StephenLynx> what keeps you from doing that on a higher level driver?

[21:46:57] <Freman> uh, the higher level driver being mongo or robomogo?

[21:47:17] <StephenLynx> being the back-end the people connect to.

[21:47:32] <Freman> no, mongo or robomongo being the client they're using

[21:47:47] <StephenLynx> wait, you allow

[21:47:49] <StephenLynx> random people

[21:47:53] <Freman> I know right

[21:47:54] <StephenLynx> to connect directly to your database?

[21:48:26] <Freman> If we tried to stop it there'd be a huge stink

[21:48:33] <StephenLynx> waht

[21:48:34] <Freman> ever since installing it the instances of queries breaking the insert rate have fallen from every day to once every now and then.

[21:48:34] <StephenLynx> the

[21:48:36] <StephenLynx> fuck

[21:48:59] <StephenLynx> ever thought of getting a new job? :v

[21:49:04] <StephenLynx> that is TDWTF material

[21:49:11] <Freman> this job is 1000% better than $job-1

[21:51:08] <Freman> the fact that we have a collection that's 260+gb full of logs (99.9% of which no-one ever looks at)... is something that irks me

[23:27:26] <godzirra> Hey guys. If I'm doing a db.collection.find() is there a way to change the name of a column on the return? i've got a column that's columnName: [ "entry1", "entry2" ], and I'm trying to label the two array entries.

[23:28:34] <godzirra> It's a geojson point if that matters. I'm just trying to get them as "latitude" and "longitude"

[23:29:19] <joannac> godzirra: aggregation and $project

[23:29:34] <godzirra> joannac: Thanks. I'll read up.

[23:43:01] <godzirra> Okay. I've got to the point where I can fetch the coordinates, but I can't seem to figure out how to get the specific items inside the array. https://gist.github.com/slooker/e017b69074a94dc9986e

[23:49:21] <joannac> oh, maybe you can't, then

[23:49:29] <joannac> I though you could project based on array indexes

[23:49:36] <godzirra> Well crud.

[23:49:44] <godzirra> Thanks anyways, joannac

Log file Viewer

Help | Karma | Search:

#mongodb logs for Sunday the 7th of February, 2016