[00:27:23] <macwinner> hi, for updating to mongo 3.2 from 3.0, can I just update each node and continue to use mmap.. then to move to wired tiger, I could take secondaries in the replica set, stop them, delete data directory, change to Wired Tiger in config, and then start secondary and wait for sync?
[01:53:21] <hdon> shi all :) i just read "What’s missing from MongoDB is a SQL-style join operation" at http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/ -- is this still the case today?
[01:58:44] <StephenLynx> if you system relies heavily on relations, then mongo is not what you need.
[01:59:06] <hdon> yeah, that is my use case. oh well, i'd wanted to use mongo for something for a while now.
[01:59:20] <hdon> can i ask, if mongo's weakness is relations/joins, where does it really shine?
[01:59:49] <hdon> maybe i'm just part of the rdbms cult and can't think about living without joins, but it seems to me to be a common need, and living in a world of maximum denormalization doesn't sound fun to me
[01:59:56] <StephenLynx> at large amounts of data, specially when its focused on reading and your model is flexible.
[02:00:04] <StephenLynx> and when your data is not too relational.
[02:02:49] <StephenLynx> part of my goal with this project is to try and get people off PHP
[02:02:51] <hdon> i thought you were comparing lynxchan application to php with the 30x performance figure
[02:03:07] <StephenLynx> I am referring to the difference between node and PHP
[02:03:28] <hdon> yeah we use it where i work but i'm moving us away from it... the ceo keeps holding me back though. all time budgets go to maintaining legacy php code :(
[02:03:29] <StephenLynx> that averages around that on the benchmarks I saw
[02:04:07] <hdon> although i did do a project recently to challenge one of my ideas about php: i wrote a traditional client/server request/response preforking server in PHP. i was surprised it could be done, and pretty easily.
[02:05:54] <hdon> preforking: starting listening for connections, then fork, then wait for connections and serve requests. it's an old model, still used today. there are more efficient models though, but it's nice for applications where you need a separate process to serve each request and you can decide on the number of forks ahead of time
[02:06:16] <hdon> in my case it's just load balancing between other remote services and the number of processes scales 1:1 with the number of remote nodes, so it was the perfect choice
[02:06:50] <hdon> apache2 still ships preforking configurations i believe
[02:06:59] <StephenLynx> could your child processes communicate with the master process?
[02:07:48] <hdon> yes. as in my case, the first process run becomes the master process. you call pipe(2) or similar before forking, and then the parent and child each get one end of the pipe
[02:09:09] <hdon> another advantage of preforking is that it's somewhat easier to work with using common *nix signals. if for some reason one of your processes is hanging or something, you can send it a signal to deal with it.
[02:09:28] <hdon> but you don't need this model to be able to manage your server, it's just nice because it's easy
[02:09:54] <StephenLynx> it seems like standard cluster behaviour on node.
[02:10:15] <hdon> i've made several toys with node but never did a cluster with it
[02:10:23] <StephenLynx> it works great out of the box.
[02:10:55] <StephenLynx> you know if the process is a master or a worker, your master spawns workers, workers share the HTTP listener transparently.
[02:11:04] <StephenLynx> you can send messages back and forth
[02:11:13] <hdon> this is i assume some kind of library/framework executed by node which implements clustering?
[02:11:22] <StephenLynx> its part of the default library.
[02:11:53] <StephenLynx> that project would have been much easier and efficient on that, probably.
[02:12:21] <StephenLynx> this is how people make node use all cores, since its single-threated.
[02:12:22] <hdon> >A single instance of Node.js runs in a single thread. To take advantage of multi-core systems the user will sometimes want to launch a cluster of Node.js processes to handle the load.
[02:13:52] <hdon> i worked on GPSEE (a spidermonkey-based competitor of nodejs that no one ever noticed) while nodejs was becoming popular. we had threading, but this is not strictly allowed by emcascript specs, certainly not the way we were doing it
[02:14:28] <hdon> i had designs to apply static code analysis to break up a program into parts that could be run in separate threads and profiling the code to see where best to break it up
[02:14:40] <hdon> but i was never budgeted time for that project :3
[02:14:43] <StephenLynx> maybe its part of V8 behaviour.
[02:16:51] <StephenLynx> and your code still always run on a single thread
[02:17:08] <hdon> and in fact might enable some types of applications to optimize performance by offloading certain tasks to the node runtime to be executed outside of the javascript context (say, encoders and decoders running on your i/o before it reaches your js context)
[02:25:06] <StephenLynx> and yes, one node core and one imagemagick process per core
[02:25:06] <StephenLynx> since the bulk of the CPU load happens on imagemagick
[02:25:32] <StephenLynx> my code gets to use 100% of all cores, since my own code generating the command doesn`t bottleneck
[02:34:26] <hdon> StephenLynx, i think the common wisdom is to run n+1 jobs since the first bottleneck is usually i/o :)
[02:34:46] <hdon> StephenLynx, it gets more complicated than that but it's the heuristic i've always used
[02:37:51] <StephenLynx> yeah, that could work, but my CPU tells me otherwise.
[02:38:42] <StephenLynx> and that could cause the system running to become unresponsive
[02:38:52] <StephenLynx> since it is designed to run while the server is running too.
[02:39:20] <StephenLynx> so I'd like to leave some breathing room
[08:23:11] <hex20dec> Hey people, how do I add authentication to my mongod.conf file so that I don't have to specify --auth everytime I launch the mongod daemon?
[12:23:05] <mylord> is search by subdoc key O(1) about the same as search by main doc key?
[12:24:16] <mylord> eg, { {a:1}, {b:2} } same as { a:{}, b:{} } ? find({a:{$exists:true}}) is same speed as find({a}) ?
[12:25:02] <mylord> cuz i’m deciding how to store, 1 bit obj, or ~6010 small ones
[12:25:29] <mylord> 1 big obj, or 6011 small ones.. what’s the pro/cons
[18:06:46] <hdon> hi all :) i understand that mongo does not have something like join, which means the application developer must implement something like join himself if he needs it. but to avoid the round-trip overhead between mongo and application, does mongo have anything like stored procedures?
[18:13:30] <StephenLynx> >Once loaded, you can invoke the functions directly in the shell, as in the following example:
[18:13:44] <StephenLynx> >Once you save a function in the system.js collection, you can use the function from any JavaScript context; e.g. $where operator, mapReduce command or db.collection.mapReduce().
[18:18:33] <StephenLynx> if the amount of queries you have to perform for a fake relation increases with amount of data, you should change your model.
[18:19:50] <hdon> well in our discussion yesterday, iirc you suggested that mongo was not a good choice for highly relational data set. if this is a widespread view among mongo users, maybe they are underutilizing stored javascript to achieve performant relational schemas?
[18:21:56] <StephenLynx> these users would just use a different db if they needed highly relational features.
[18:23:18] <hdon> the warning says not to put application code in mongo. but schema code, such as allowing for join, is where it makes perfect sense. at most it will getprop, invoke a new quuery, and setprop, in order to compose the response document.
[18:23:38] <StephenLynx> the warning says there are limitations.
[18:24:12] <StephenLynx> and by now the map-reduce has fallen out of use
[18:24:17] <StephenLynx> in favour of aggregation.
[18:24:34] <StephenLynx> so IMO is not a tool that is getting too much attention from developers.
[18:24:51] <hdon> it says there are performance limitations, which should scale with number and type of operations, which is why i think it might be okay for for (getprop,query,setprop)
[21:39:39] <Freman> bugger, even with queryguard preventing unindexed queries they're still able to run some qb killing queries, and then they don't wait for them to finish and just disconnect
[21:39:58] <Freman> I suppose I can get queryguard to remember the query then find/kill it
[21:40:32] <cheeser> depending on the driver, you might be able to get at the operation/cursor id and work something out but you'll have to dig around under the hood to work that out
[21:40:52] <Freman> I'm parsing the protocol on the wire, you don't get much more under the hood :D
[21:44:01] <StephenLynx> any particular reason for that instead of using a higher level driver?
[21:45:37] <Freman> proxying their connections to prevent them running queries that don't use an index. Ideally they wouldn't try connecting with robomongo/cli at all and they'd just use the web interface, but *shrug*
[21:49:11] <Freman> this job is 1000% better than $job-1
[21:51:08] <Freman> the fact that we have a collection that's 260+gb full of logs (99.9% of which no-one ever looks at)... is something that irks me
[23:27:26] <godzirra> Hey guys. If I'm doing a db.collection.find() is there a way to change the name of a column on the return? i've got a column that's columnName: [ "entry1", "entry2" ], and I'm trying to label the two array entries.
[23:28:34] <godzirra> It's a geojson point if that matters. I'm just trying to get them as "latitude" and "longitude"
[23:29:19] <joannac> godzirra: aggregation and $project
[23:43:01] <godzirra> Okay. I've got to the point where I can fetch the coordinates, but I can't seem to figure out how to get the specific items inside the array. https://gist.github.com/slooker/e017b69074a94dc9986e