PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Wednesday the 11th of May, 2016

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[01:30:51] <hays_> are performance O guarantees listed for various mongodb operations?
[01:31:02] <hays_> particularly interested in array/list
[01:49:22] <hays_> anyone?
[01:53:31] <jr3> How can I determine my working set and needed iops on my volume?
[01:54:48] <joannac> hays_: I don't understand the question
[01:55:03] <hays_> joannac: why not?
[01:55:51] <jr3> I think he's asking where to see the cost of operations against an array in a document?
[01:56:06] <joannac> jr3: performance testing while varying iops and physical memory
[01:57:50] <hays_> you guys do know what O notation is right?
[01:58:03] <joannac> hays_: it varies depending on things like: is the document in memory?, are the index(es) that refer to the array field in memory, what other stuff is happening on the server, etc etc
[01:58:55] <joannac> If I have an array with 100 elements, unindexed, all in memory, and I make a single addition, and the document size increases and doesn't need to move, it'll probably be fast
[01:59:39] <hays_> so. O(fast) or O(slow)?
[01:59:45] <joannac> If I have an array with 100 elements, indexed in 5 indexes, all in memory, and I make a single addition, and the document size increases and needs to move,the document needs to be written to a new position in disk, and 5 * 100 index entries need to be updated
[01:59:54] <joannac> That will obviously be slower
[02:00:19] <hays_> so that't typically when you do amortized analysis
[02:03:51] <hays_> meh. guess i'll keep searching. its a bit troublesoome to me if things like this aren't guaranteed.. how do you design a data scheme if you don't know when you start hitting O(n) operations that could become a problem when you scale out
[02:23:43] <YokoBR> hi guys
[02:23:57] <YokoBR> I'm using mongo-beta with php 7
[02:24:17] <YokoBR> It's so much harder then older version (mongo instead of mongodb) on php :(
[02:34:53] <Derick> YokoBR: did you have a look at the php library to go with the new php extension as well?
[02:35:12] <YokoBR> Derick: Which one?
[02:35:26] <Derick> http://mongodb.github.io/mongo-php-library/
[02:42:08] <YokoBR> Derick: I've tryied that, but i aways get Class 'App\Controller\MongoDB\Client' not found
[02:42:27] <YokoBR> ops
[02:42:32] <YokoBR> found the problem
[02:42:34] <YokoBR> sorry
[02:43:11] <Derick> :)
[02:43:16] <Derick> that sounded like a namespace issue
[03:16:37] <YokoBR> what would be a good practice to control users sessions on my app?
[03:17:19] <YokoBR> On mysql i create a table "instance", writing and updating the time that the user is online and then expire it
[03:48:34] <YokoBR> Derick: how do I delete by _id with this? $result = $collection->deleteOne( [ '_id' => $instance ] );
[04:06:05] <YokoBR> please, how do i delete by the ObjectID?
[05:55:39] <YokoBR> guys, i'm trying to fetch one result but it's aways null
[05:55:40] <YokoBR> http://pastebin.com/q3vJzqfb
[06:36:31] <Zelest> Derick, awake yet? :D
[06:44:16] <YokoBR> What's wrong with $collection->findOneAndUpdate( [ 'session' => $session ], [ 'time' => $time ] ); ? I'm getting First key in $update argument is not an update operator
[06:45:09] <joannac> looks like a syntax problem to me
[06:45:16] <joannac> do you need a $set in there?
[06:47:14] <YokoBR> joannac: I just need to get the document that has session = "b7ffa11e4a4955e542bcd38cce90711b", then update "time" with the current time
[07:05:25] <YokoBR> joannac: also i'm using php
[07:58:31] <Zelest> Using the mongodb and phplib, what is the best way to figure out which server I'm reading from? E.g, after a findOne, how can I know which server that data came from?
[08:01:34] <kurushiyama> Zelest: Why the heck would you care? For a standalone its is obvious, you really should only care wether from primary or secondary in a replica set and for a sharded cluster it is supposed to be transparent.
[08:04:01] <Zelest> kurushiyama, In this case, it's for debugging purposes. :)
[08:04:01] <sivi> check
[08:04:13] <kurushiyama> sivi: Ack
[08:04:31] <kurushiyama> Zelest: Please explain.
[08:04:38] <sivi> tks
[08:04:39] <Boomtime> @Zelest: do you use secondary reads?
[08:04:59] <Boomtime> or do you use the default, and thus recommended, primary reads only?
[08:05:49] <Zelest> I have a replicaset of 5 nodes and I use "nearest" atm. The idea was to check the latency while reading from different nodes.. Nothing I plan on using in production
[08:07:18] <kurushiyama> Zelest: So you want to debug MongoDB? Nearest is the one with the lowest latency, however I am not sure which measuring interval applies here.
[08:07:24] <Boomtime> ok, nearest can be quite tricky to tell where it went - most drivers support verbose logging of some sort, so you could figure it out after the fact
[08:07:47] <Boomtime> you could also use the server verbose logging to figure it out
[08:08:16] <Zelest> Ah
[08:08:20] <Boomtime> nearest determines ping times from the client driver itself - based on how fast the server responds to a isMaster periodically - so it isn't just network latency
[08:08:53] <Zelest> I just remember Derick gave me some cute code for the old driver which basically showed the entire "connect sequence" where it selected and filtered nodes based on various options :)
[08:09:23] <Boomtime> the lwest server ping in the set determines the lower bound, all servers within a threshold value of that are considered to be 'nearest' - the driver then picks from this set at random
[08:09:55] <Boomtime> the threshold value is usually controllable via an option, i believe the default is 15ms or something like that
[08:10:31] <Boomtime> certainly, it's large enough that if all servers are in the same datacentre, or even just the same city, they'll probably all be 'nearest'
[08:10:48] <kurushiyama> Boomtime: You know when those latencies are determined? I guess election would make sense, but are they checked regularly?
[08:10:59] <Boomtime> yes, they update from time to time
[08:11:13] <Boomtime> i think the cadence of update is determined by the driver.. lemme check
[08:11:22] <kurushiyama> Boomtime: Thank you!
[08:11:30] <Boomtime> FYI: https://github.com/mongodb/specifications/blob/master/source/server-selection/server-selection.rst
[08:11:55] <Boomtime> so.. somewhere here: https://github.com/mongodb/specifications/blob/master/source/server-selection/server-selection.rst#round-trip-times-and-the-latency-window
[08:13:34] <Boomtime> doesn't actually dictate how often a driver should refresh its view - i think most are around every 10 seconds
[08:14:52] <kurushiyama> Boomtime: If I read it right, basically every time the driver does an isMaster on the connected nodes the latency is updated. heartbeat interval?
[08:15:26] <Boomtime> right, isMaster is specifically dictated to be the command to be used for ping tracking
[08:15:50] <Boomtime> but it doesn't say how often a quiscent driver is required to do that - it will do it on every new connection regardless
[08:16:25] <Boomtime> but once the connection pool reaches steady-state, the spec doesn't seem to dictate further updates
[08:16:53] <Boomtime> the pool can't go completely idle though, as connections are required to have a max lifetime.. so at least a small churn will occur
[08:17:49] <kurushiyama> Boomtime: So basically, one can only be sure of during startup of the driver and after an election, strictly speaking?
[08:17:56] <Boomtime> or.. it could be this: https://github.com/mongodb/specifications/blob/master/source/server-selection/server-selection.rst#heartbeatfrequencyms
[08:18:29] <Boomtime> oh of course, it's in the discovery specl https://github.com/mongodb/specifications/blob/master/source/server-discovery-and-monitoring/server-discovery-and-monitoring.rst#heartbeatfrequencyms
[08:18:54] <Boomtime> but i got it right :-) 10 seconds
[08:19:30] <Boomtime> ok, gotta go - hope it helped
[08:20:04] <kurushiyama> Boomtime: A lot for me! Thank you very much. Did not even know those specs existed. Have something to read!
[08:52:11] <sivi> can anyone tell me what usually causes an empty reply from mongo daemon ?
[08:53:01] <kurushiyama> sivi: HAve you tried via shell?
[08:53:14] <sivi> shell works fine
[08:54:21] <kurushiyama> So presumably the answer is not exactly empty. What language do you use to access MongoDB?
[08:55:06] <sivi> express/ node.js
[08:56:55] <kurushiyama> sivi: Then, I am basically out – I do not speak either.
[08:57:55] <sivi> ok thanks anyway.
[08:59:12] <sivi> maybe it is beacuse my mongodb is on two libraries one in the default /data/ and another one in /var/lib
[08:59:13] <sivi> ?
[09:00:32] <profeten> Hi, I am trying to start my mongod instance with authentication enabled
[09:00:58] <profeten> I'll just quickly paste my .conf
[09:02:05] <profeten> https://gist.github.com/anonymous/313b570a6378c60db0c33ba6a4225de8
[09:02:11] <profeten> configured as such
[09:02:32] <profeten> but when I am trying to start/restart the service using sudo service mongod start/restart
[09:02:36] <profeten> I get fail
[09:03:04] <profeten> tried to check the mongo log files but I was fruitless in finding out what was wrong
[09:03:26] <profeten> ackording to the google results I could find this is the correct way of setting it up?
[09:04:30] <kurushiyama> sivi: Since your MongoDB can only ever have one dbpath, you should be careful there. And it is not called a library, but a directory. Terminology is _important_
[09:05:13] <kurushiyama> sudo is NOT the Unix way of saying "I mean it!"
[09:06:49] <sivi> Ok? but mongo db forces me to use /data/.. and the default installation of mongo was on /var/lib what should one do in that case? what is considered best practice ?
[09:06:52] <kurushiyama> profeten: Absolutely nothing in the log files? Have you checked you OS logs?
[09:07:05] <profeten> kurushiyama: yes there was nothing in the mongo files :S
[09:07:34] <kurushiyama> profeten: And system logs? It is hard to believe that your system would let a service silently fail.
[09:08:07] <profeten> I will check again!
[09:11:11] <profeten> [initandlisten] exception in initAndListen: 98 Unable to create/open lock file: /data/db/mongod.lock errno:13 Permission denied Is a mongod instance already running?, terminating
[09:11:14] <profeten> ok so I got something
[09:11:29] <profeten> I will see if I can make some permission correction and see if that solves the problem
[09:13:00] <profeten> kurushiyama: thanks I'm just way to tired today
[09:13:14] <profeten> studied for exams until 3 in the morning...
[09:16:15] <Zelest> Earlier one could use http://www.php.net/manual/en/mongoclient.close.php to close *ALL* connections (to speed up the detection of the new master in case of a replica member crashing) .. how can I do this using the mongodb/phplib driver? *poke Derick
[09:16:30] <Zelest> E.g, $mc->close(true);
[09:23:18] <Zelest> Is that even possible in the new driver? *can't find it in any of the docs*
[09:24:29] <sivi> is it recommended to include all libraries when install mongodb?
[09:24:57] <sivi> (sorry for all these nubeas issues)
[09:38:42] <samwierema> Is it possible to do a regex search (or a partial search) on a text index? If not, is there another way I can achieve/simulate it?
[09:48:10] <remonvv> \o
[09:48:15] <Zelest> o/
[09:49:33] <remonvv> Question; we're having a weird issue. Every now and then one of our queries returns data that isn't actually present in the database. It seems to only occur on very large cursor walks.
[09:50:25] <remonvv> Same query from the mongo shell works correctly, restarting the query through the Java driver gives the same "magic" data.
[09:50:33] <sivi> resolved: problem was on the express app require('http') was changed to require('https’) which was a problem
[09:51:34] <remonvv> For example, we have a large collection of scores that are always a multiple of hundred. Once every weeks or two we suddenly get a number that isn't a multiple of hundred (95961 today for example). Querying on that value returns no results so it isn't present in the data.
[09:53:35] <remonvv> To thicken the plot a bit. Say instance A is doing the query it will always return wrong results. If we execute the exact same query on instance B it will not find that wrong value and completes the task as expected.
[09:53:58] <remonvv> Does this ring a bell for anyone?
[10:08:03] <mroman> Is there a neat way to cache query results directly supported by mongo?
[10:08:20] <mroman> or more specific: periodic queries that'll be re-run every $time
[10:09:21] <mroman> like a stored procedure that is executed by the DBs every $time and calls to that procedure will immediately return the result of the last run. (so there's never any delay in calling the procedure from a client)
[10:11:49] <mroman> otherwise i'll set up some cronjob calling a script
[10:46:00] <kurushiyama> mroman: Sure. Formulate them as an aggregation and use an $out stage. However, you'd have to run them yourself. That should be easy enough nowadays, almost every language supports timed events one way or the other.
[12:08:54] <mroman> Ok so I have a tree structure...
[12:09:07] <mroman> {"parent" : ObjectId("...")}
[12:09:08] <mroman> like that.
[12:09:32] <mroman> like uhm.
[12:10:55] <mroman> {"data":"foo", "_id" : 0} {"data":"bar", "parent" : 0, "_id" : 1} {"data":"baz", "parent":0,"_id" : 2} <- that's just the tree.
[12:11:34] <mroman> then I have entries {"node" : 1, "content" : "content..."} where node points to a node in the tree
[12:11:43] <mroman> so it's trivially easy to list everything under foo/bar
[12:12:13] <mroman> but listing everything under foo/ can probably not be done in a single query?
[12:12:57] <mroman> the only way I can come up with is using "path" instead of "node" i.e. {"path" : [0, 1], "content" : "content..."}
[12:13:05] <mroman> and then search for "path" contains 0 to list everything under foo/
[12:14:55] <mroman> or I query all node ids before doing the query and then make something like {"node" : {"$in" : ids}}
[12:47:01] <kurushiyama> mroman: Can you give a sample doc?
[12:48:06] <mroman> http://codepad.org/oxtnqhhY
[12:49:01] <mroman> It's pretty simple to build the tree or construct the path to any specific node
[12:49:31] <kurushiyama> mroman: And you want to "cache" the results of the query for a parent, I assume?
[12:49:35] <mroman> you just query for everything having no parent, then recurse etc.
[12:49:40] <mroman> no.
[12:49:47] <mroman> those are two separete problems :)
[12:49:49] <mroman> *separate
[12:49:52] <kurushiyama> Ah
[12:50:13] <mroman> it's a commenting system
[12:50:19] <mroman> with a tree structure of topics
[12:50:44] <mroman> so a topic has the id of the entry in the tree it's under
[12:51:05] <kurushiyama> I would not do that with parent references, tbh
[12:51:17] <kurushiyama> I'd probably rather use materialized pths
[12:51:23] <kurushiyama> s/pths/paths/
[12:51:42] <mroman> but if you want to list all topics under for example ~/science you need to find all ids of all entries in the tree under ~/science
[12:51:59] <kurushiyama> Whut?
[12:52:06] <mroman> well yeah
[12:52:08] <kurushiyama> Say you have ~/science.
[12:52:27] <mroman> a comment in ~/science/computer will have {"node" : <id of ~/science/computer>}
[12:52:39] <mroman> a comment directly under ~/science will have {"node" : <id of ~/science>}
[12:53:02] <mroman> so to list everything under ~/science you need to query for everything have either <id of ~/science/computer> or <id of ~/science>
[12:53:05] <kurushiyama> mroman: db.comments.find({path:"~/science"})
[12:53:19] <mroman> well you'd need regexes
[12:53:31] <kurushiyama> Yes. But only one query
[12:53:42] <kurushiyama> No crisscossing
[12:53:42] <mroman> db.comments.find({path: {"$regex" : "~/science/(.*)}})
[12:53:48] <mroman> something of that order
[12:53:57] <kurushiyama> mroman: correct.
[12:54:17] <mroman> I'm just worried regexs blows up when having 500 Mio. comments
[12:54:47] <mroman> I don't know how $in or $regex compare
[12:54:57] <kurushiyama> mroman: Well, assuming that 'path' is indexed, I'd guess the index would be like... a couple of hundred M at worst...
[12:55:21] <mroman> I don't think an index helps when doing regex :)
[12:55:27] <kurushiyama> WHUUUT?
[12:55:32] <kurushiyama> mroman: Lemme check
[12:55:35] <mroman> I can't imagine how that'd work
[12:56:18] <mroman> "For case sensitive regular expression queries, if an index exists for the field, then MongoDB matches the regular expression against the values in the index, which can be faster than a collection scan."
[12:56:22] <mroman> hm
[12:56:24] <mroman> ok
[12:56:29] <mroman> but it's still an O(n) operation
[12:56:55] <kurushiyama> mroman: And you assume this to be slower than MULTIPLE queries?
[12:57:14] <mroman> no
[12:57:17] <kurushiyama> mroman: With a potential arbitrary depth...
[12:57:35] <mroman> well I'd need a query to find all ids under ~/science
[12:57:36] <kurushiyama> Ok, I have a sample DB with some 10M docs.
[12:57:39] <kurushiyama> Gimme a sec
[12:57:44] <mroman> but since the tree isn't really large that shouldn't be a huge problem
[12:58:24] <mroman> also I can cache that in my application
[12:58:30] <mroman> since the tree doesn't really change
[12:58:37] <mroman> maybe once a month or so
[12:58:50] <mroman> so I don't even really need a db query to find all ids for the $in operator
[12:59:18] <kurushiyama> I do not get why you rather add a moving part than to use a simple query. ESPECIALLY if the tree is not deep.
[12:59:43] <mroman> well that really depends on what's faster
[13:00:05] <mroman> doing {"node" : {"$in" : <ids>}} or {"node" : {"$regex" : <path>}}
[13:00:38] <kurushiyama> mroman: That depends on the size of <ids>
[13:01:00] <mroman> in the first case node references the ObjectId of the tree-entry, in the second case node is actually a path
[13:01:07] <mroman> in ascii
[13:01:28] <mroman> I'm guessing about 20ids maybe
[13:01:44] <mroman> I can actually just use both
[13:01:59] <mroman> having "node" and "path" :)
[13:02:34] <mroman> I was raised hating redundancy
[13:02:37] <kurushiyama> Wait. Are we talking of the fact that you need a document for a tree node? Or do you just need the comments below a tree node?
[13:02:43] <mroman> and always storing the full path with each entry is incredibly redundant :)
[13:03:05] <mroman> I need all comments below a tree node
[13:03:19] <kurushiyama> mroman: Disk space is extremely cheap. If redundancy boosts the performance of a more common use case, I'd allways take it.
[13:04:14] <kurushiyama> mroman: Especially given wT's compression options.
[13:04:43] <mroman> http://codepad.org/OYUnq7Mp <- see here @below a tree node
[13:05:10] <mroman> you can actually call them folders
[13:05:15] <mroman> listing all files in a folder is easy
[13:05:26] <mroman> listing all files in a folder and his subfolders
[13:05:36] <kurushiyama> mroman: I know what a tree structure is.
[13:05:55] <kurushiyama> mroman: If not, I should probably work as a rye bread instead.
[13:07:03] <mroman> so currently a comment has a reference to the node in the tree
[13:07:30] <mroman> whatever
[13:07:41] <mroman> there are the two options: use $in or use $regex
[13:07:49] <mroman> I'll probably have to try both :)
[13:08:01] <mroman> $regex will suck if you rename stuff :D
[13:08:33] <mroman> then you have to do string replacements :)
[13:08:43] <kurushiyama> mroman: The thing is: how you get the values for $in? And more important: How do you keep those values consistent? Assume you have 2 application servers accessing mongodb... ;)
[13:08:58] <kurushiyama> mroman: Uhm, sort of.
[13:09:10] <mroman> the values for $in?
[13:09:16] <kurushiyama> Yes.
[13:09:33] <mroman> like that: http://codepad.org/2kcIs5x8
[13:09:55] <kurushiyama> You are kidding me, right?
[13:10:28] <kurushiyama> You discuss wether potentially arbitrary numbers of queries are faster than a regex query?
[13:10:54] <kurushiyama> In a recursion? In JS, I assume?
[13:11:47] <mroman> kurushiyama: That's just to get the ids of all subfolders.
[13:11:56] <kurushiyama> mroman: Yes
[13:12:13] <kurushiyama> mroman: But we still talk of one query per child node...
[13:12:18] <mroman> which can't be done in a single query
[13:12:23] <mroman> unless you use document nesting maybe
[13:12:28] <kurushiyama> Right.
[13:13:51] <mroman> to find all comments in all subfolders you either know the path as a string and do the $regex or you know all the ids of all the subfolders and do the $in
[13:14:02] <kurushiyama> Say we have a structure like this: ~/science/computer. With your approach, you'd need 3 queries alone for identifying your <ids>, plus an additional query for identifying the comments. And you discuss the complexity of a single query solution?
[13:14:37] <mroman> well those 3 queries for the ids are basically for free.
[13:14:44] <kurushiyama> WHUT?
[13:15:40] <mroman> you can do them at application startup and cache them in-memory
[13:15:59] <kurushiyama> mroman: Oh yes, great. And how you keep it consistent with changes?
[13:16:01] <mroman> and compared to doing 500 mio. regexes those 3 queries even if not using cache are probably so fast you can ignore them.
[13:16:23] <mroman> the tree doesn't really change much
[13:16:24] <kurushiyama> mroman: Whatever.
[13:16:36] <mroman> and if it did you just refresh the cache
[13:16:41] <kurushiyama> mroman: We have no base for discussion here.
[13:16:44] <mroman> that's an trivial thing to do.
[13:16:48] <mroman> no.
[13:16:54] <kurushiyama> mroman: Cross node refreshing? Good luck.
[13:17:29] <mroman> the whole discussion point was basically whether there's a better way than using $in, and you suggested using $regex
[13:18:38] <mroman> I mean really whether you know ~/science/computer or [0,1,2] doesn't matter.
[13:18:45] <mroman> it's just a different encoding for the path.
[13:19:04] <kurushiyama> Oh, ok. Good to know...
[13:19:53] <mroman> I thought that was clear :)
[13:20:50] <mroman> oh wait
[13:20:51] <kurushiyama> mroman: It is outright wrong. I was being sarcastic. Your access method changes dramatically.
[13:21:54] <kurushiyama> mroman: You are not even storing comparable values.
[13:22:31] <mroman> I don't know how mongo handles $in
[13:23:12] <mroman> but {"node" : {$in : [1 , 2, 3]}} is basically the union of {"node":1},{"node":2},{"node":3} which should be quite fast
[13:23:39] <mroman> especially if you have an index on node
[13:24:59] <mroman> because that's just n-comparisons per id
[13:25:07] <kurushiyama> mroman: You are looking at individual parts of a single use case here. From my point of view, that does make little to no sense. Since you want the use case to be optimized.
[13:25:50] <mroman> *n-comparisons per entry if you linearly scan through all entries without an index
[13:26:37] <mroman> hm...
[13:26:49] <mroman> let me just write some scripts that generates bullshit data for both cases and then I can compare performance
[13:32:02] <mroman> might take a while
[13:32:06] <mroman> importing about 4k docs per second
[13:35:17] <mroman> kurushiyama: most likely the use case will be "What's the newest comment under ~/science"
[13:35:53] <mroman> Of course, I could have another collection that is updated on new comment insertion so I don't have to query comments then sort by timestamp
[13:37:28] <mroman> but if you do that you need to walk up the tree anyway
[13:37:43] <mroman> because if you post a new comment in ~/science/computer it'll be also the newest comment in ~/science and also in ~
[13:41:40] <kurushiyama> db.comments.find({path:/^~\/science.*/}). Done
[13:41:54] <mroman> yeah
[13:41:54] <kurushiyama> mroman: Add limits to your liking
[13:42:21] <mroman> I don't have a big enough system to actually compare the performance of $in to $regex :)
[13:42:48] <mroman> With around 10 Mio. comments it's practically same performance
[13:43:44] <kurushiyama> saving a lot of queries or moving parts
[13:44:23] <kurushiyama> Add limits ;)
[13:45:38] <mroman> and I'm not risking inserting more than that for now
[13:45:56] <mroman> otherwise mongodb won't start anymore :)
[13:46:10] <kurushiyama> I do on my 6y old 4GB laptop... ;)
[13:46:52] <mroman> yeah I had two systems crashing when trying to find sha1 collisions using mongo
[13:47:10] <mroman> both died when doing aggregation (even with useDisk)
[13:47:22] <mroman> one I could restart, the other one wouldn't even start anymore
[13:48:00] <mroman> basically what I've learned from that is: use 64bit machine with lots of RAM and hope for the best
[13:48:13] <kurushiyama> nope
[13:48:20] <mroman> but first impression stays :)
[13:48:37] <kurushiyama> Wrong impression, maybe
[13:48:43] <mroman> well yeah
[13:48:52] <mroman> I used to by a sysadmin for MySQL and MSSQL stuff
[13:49:04] <mroman> and seeing a whole DBs going down because a query failed is like a total shocker
[13:49:13] <kurushiyama> How much RAM was free for Mongodb? what dif your app eat?
[13:49:22] <kurushiyama> OS?
[13:49:29] <mroman> we've already had that discussion.
[13:50:02] <kurushiyama> Maybe. I do not recall that stuff.
[13:50:06] <mroman> and my opinion on that is still the same
[13:50:35] <mroman> I don't give a *** how much RAM was free at that point a database system must not become completely unavailable because a query was a bit costly
[13:51:18] <mroman> if there's not enough RAM for the query stop the query and return to normal operation so that at least other queries other database are still available and not becoming offline
[13:51:18] <kurushiyama> mroman: Well, that depends on a lot of factors. If OOM killer kicks in, that is for a reason. ;)
[13:51:56] <mroman> granted, the one on the VM was due to the OOM killer
[13:52:07] <mroman> that's not really mongos fault then
[13:52:28] <kurushiyama> mroman: Misusing a system and blaming it on the system because you had different expectations does not help you in any way, does it? ;P
[13:53:49] <mroman> well if the OS isn't notifying you properly that there's not much memory left, no memory left then it's not the applications fault, yes.
[13:55:11] <kurushiyama> Oh, and the OS does inform you. In a way. It kills processes. Personally, I never had an OOM killer problem. Some reasonable swap space on top of proper dimensions seem to help in that regard. ;)
[13:55:52] <mroman> well usually you might want to employ your own memory allocation stuff
[13:56:07] <kurushiyama> uhm... no?
[13:56:11] <mroman> and you allocate enough memory to at least recover
[13:56:29] <kurushiyama> I m happy to leave meomry allocation to the software...
[13:56:42] <mroman> I meant the software handling allocation
[13:57:00] <mroman> because if a database starts
[13:57:10] <mroman> it must be able to remain operational
[13:57:26] <kurushiyama> The thing is that OOM killer is an OS function. It does _exactly_ what you demand: It keeps the system working.
[13:57:56] <kurushiyama> It is exactly based on allocations.
[13:58:03] <mroman> as far as I know in the case of overcommitting the OS never tells you "No" to memory
[13:58:31] <mroman> so your mallocs() always succeed
[13:58:46] <mroman> but a stable software must be able to recover from malloc() not succeeding
[13:59:14] <mroman> if during a query operation malloc() fails and the whole DBs goes down
[13:59:17] <kurushiyama> Think it through. You'd need to check _every_ memory allocation for success.
[13:59:22] <mroman> well duh...
[13:59:26] <mroman> obviously
[13:59:59] <kurushiyama> That would be like try{ Int a = 1}catch(MemoryAllocationError m){...}
[14:00:04] <kurushiyama> for _everything_
[14:00:23] <mroman> for heap allocations
[14:00:28] <mroman> yes
[14:00:28] <kurushiyama> If that was to be necessary, you'd be better of writing the data to paper.
[14:00:55] <mroman> if you don't the whole DBs goes down
[14:01:12] <mroman> unless you have some other isolation employed
[14:01:15] <kurushiyama> mroman: Expectations. Would you rather have the OS explode? Then NOTHIN is won.
[14:01:22] <mroman> what
[14:01:41] <mroman> that's got nothing to do with the OS.
[14:01:45] <kurushiyama> mroman: If there was no OOM killer, the OS had no defense against overcommiting.
[14:02:02] <kurushiyama> And the OOM Killer _has_ to be an OS fcuntion.
[14:02:17] <mroman> I'm not even talking about the OOm Killer :(
[14:02:30] <kurushiyama> But you are talking of memory management
[14:02:35] <mroman> yes
[14:03:12] <mroman> if you ask for more memory from the os and the os has nothing more available
[14:03:28] <mroman> then the OS will say no to your allocation.
[14:03:35] <kurushiyama> So lets us say mongodb would preallocate some memory to prevent the OOM killer kick in.
[14:03:43] <kurushiyama> Whut?
[14:03:57] <kurushiyama> That would exactly what I described above
[14:04:03] <kurushiyama> cause
[14:04:29] <mroman> It would cause nothing other than your syscall to memalloc fail
[14:04:41] <mroman> The OS remains operational
[14:04:54] <kurushiyama> IT DOES ANYWAY!
[14:04:59] <mroman> that's the whole point of the OS to remain operational. It doesn't hand out more memory that it can afford.
[14:05:03] <mroman> *than
[14:05:14] <kurushiyama> But without the need to check every memory allocation.
[14:05:34] <kurushiyama> Oh, and btw, how would you deal with stack, then?
[14:05:38] <kurushiyama> check that, too?
[14:05:46] <mroman> .max_stack?
[14:06:22] <mroman> If you don't dynamically allocate stuff on the stack and know your max_recursion_depth then stack_size is fixed
[14:06:28] <mroman> it doesn't grow nor shrink during operation
[14:06:35] <kurushiyama> Well, bottom line is: "It does not work like I want it to work, so it is the applications/OS/weathers fault."
[14:07:15] <kurushiyama> mroman: Afaik, you can actually suggest changes to POSIX
[14:07:26] <mroman> well that depends on what kind of guarantees you want in terms of remaining operational
[14:08:00] <mroman> I generally don't want my db to die no matter what circumstances...
[14:08:34] <kurushiyama> mroman: No. There are standards. If you want to stay operational: Do your job and find proper dimesnions. If you can not, use a resource limitation tech to your liking. If you can not do that, you have a problem
[14:08:57] <kurushiyama> So you'd rather kill you OS...?
[14:09:29] <mroman> what...
[14:09:40] <kurushiyama> There is a reason why OOM killer kicks in: Preventing to have all memory allocated so that you can still do administrative tasks.
[14:09:49] <kurushiyama> No mem == no new ssh instance
[14:09:55] <kurushiyama> No mem == no bash
[14:10:05] <kurushiyama> No mem == no login
[14:11:22] <kurushiyama> So if a process would cause all RAM to be allocated, the process taking up the most RAM gets killed. Simply because you only have to kill one process instead of many to keep he OS operational.
[14:11:43] <kurushiyama> s/RAM/available memory/
[14:12:04] <mroman> what
[14:12:08] <kurushiyama> mroman: Like it or not, but that is how it works.
[14:12:25] <mroman> even if a process allocates pretty much all the memory
[14:12:28] <mroman> the OS still remains operational
[14:12:41] <mroman> maybe there's not enough RAM left to start a new bash
[14:12:43] <mroman> but the OS is still there
[14:13:33] <kurushiyama> mroman: Nevermind. Make your propsals to change malloc to your liking and see how it goes.
[14:14:37] <mroman> you can disable overcommiting in linux
[14:14:40] <mroman> and you can set memlimits
[14:15:38] <mroman> but I couldn't really get mongodb operational in such an environment.
[14:16:15] <mroman> the only real problem I have with mongodb is, personally, that it dies when memory allocation fails.
[14:16:40] <kurushiyama> mroman: Like _every.other.program_
[14:16:41] <mroman> which means I can't even set memlimits and turn off overcommit.
[14:16:58] <mroman> kurushiyama: yes but a dbs isn't every other program
[14:17:07] <mroman> I epxect strong guarantees from a dbs
[14:17:12] <mroman> even if it has to use file i/o
[14:17:18] <mroman> even if that's going to be slow
[14:17:53] <kurushiyama> mroman: Then use one to your liking which complies with that.
[14:18:02] <kurushiyama> mroman: Or file a bug
[14:18:34] <kurushiyama> mroman: or even better, do a PR with the according changes
[14:20:24] <mroman> If I were to input data from live feeds at high-frequency, somebody does a query on some database and the whole system dies I'm going to loose data.
[14:21:15] <mroman> but i'll shut up now :)
[14:23:11] <mroman> (I'm too damaged from embedded systems, nanokernels and stuff like that :))
[16:00:36] <cpama> hello all. how would i write a mongo query that would be the equivalent of this: SELECT * FROM `reporting` where type="location_status" GROUP BY name order by playbook_date desc
[16:01:08] <cpama> i'm using a php driver. and so far I have this (working) query:
[16:01:12] <kali> cpama: check out the aggregation framework. this is close to a texbook example
[16:01:22] <cpama> ok
[16:01:31] <cpama> I will google that keyword
[16:01:33] <cpama> thanks!
[16:08:27] <deathanchor> questions about mongo3.2, if we start with unencrypted dbs and move to enterprise for encryption at rest, do we need to rebuild the dbs?
[16:09:05] <deathanchor> can the data be encrypted after the fact?
[16:09:30] <deathanchor> or is there a migration process that requires some downtime?
[16:30:01] <dhanasekaran> Hi Guys, I like know, mongoexport will lock the table ?
[16:30:29] <dhanasekaran> lock the collection
[17:33:30] <deathanchor> dhanasekaran: yes (for 2.x) use a secondary is a way to avoid that.
[17:35:31] <dhanasekaran> deathanchor: currently I am using 3.2.5 so no lock right.
[18:11:00] <awal> hello! any ideas when official packages for ubuntu 16.04 will be relased? even a link to a tracking bug?
[18:13:38] <awal> nvm found it. https://jira.mongodb.org/browse/SERVER-23043?focusedCommentId=1257859&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-1257859 is this "couple months" timeline literally correct? o.O
[18:36:25] <ayee> If I have an AMI built that has the automation agent baked in, with the right apikey, and config. Is there a way when I'm in mongodb ops manager, to have it automatically spin up however many nodes I need for various platforms. After I decide my cluster size, I click aws.. the flavor.. then it spins up n instances of that flavor using awscli?
[18:36:32] <ayee> and it picks my AMI with my baked agent in there
[18:57:27] <cpama> Hi there. i'm a mongodb noob. i created some documents using robomongo gui. but now trying to update documents via command line. I'm not getting an errors, but it's also not updating any record. here's the code and the document: http://pastebin.com/Mx0AMThw
[19:04:24] <cpama> i figured it out.
[21:42:06] <kurushiyama> cpama: What was it?