[01:55:51] <jr3> I think he's asking where to see the cost of operations against an array in a document?
[01:56:06] <joannac> jr3: performance testing while varying iops and physical memory
[01:57:50] <hays_> you guys do know what O notation is right?
[01:58:03] <joannac> hays_: it varies depending on things like: is the document in memory?, are the index(es) that refer to the array field in memory, what other stuff is happening on the server, etc etc
[01:58:55] <joannac> If I have an array with 100 elements, unindexed, all in memory, and I make a single addition, and the document size increases and doesn't need to move, it'll probably be fast
[01:59:45] <joannac> If I have an array with 100 elements, indexed in 5 indexes, all in memory, and I make a single addition, and the document size increases and needs to move,the document needs to be written to a new position in disk, and 5 * 100 index entries need to be updated
[01:59:54] <joannac> That will obviously be slower
[02:00:19] <hays_> so that't typically when you do amortized analysis
[02:03:51] <hays_> meh. guess i'll keep searching. its a bit troublesoome to me if things like this aren't guaranteed.. how do you design a data scheme if you don't know when you start hitting O(n) operations that could become a problem when you scale out
[06:44:16] <YokoBR> What's wrong with $collection->findOneAndUpdate( [ 'session' => $session ], [ 'time' => $time ] ); ? I'm getting First key in $update argument is not an update operator
[06:45:09] <joannac> looks like a syntax problem to me
[06:47:14] <YokoBR> joannac: I just need to get the document that has session = "b7ffa11e4a4955e542bcd38cce90711b", then update "time" with the current time
[07:58:31] <Zelest> Using the mongodb and phplib, what is the best way to figure out which server I'm reading from? E.g, after a findOne, how can I know which server that data came from?
[08:01:34] <kurushiyama> Zelest: Why the heck would you care? For a standalone its is obvious, you really should only care wether from primary or secondary in a replica set and for a sharded cluster it is supposed to be transparent.
[08:04:01] <Zelest> kurushiyama, In this case, it's for debugging purposes. :)
[08:04:39] <Boomtime> @Zelest: do you use secondary reads?
[08:04:59] <Boomtime> or do you use the default, and thus recommended, primary reads only?
[08:05:49] <Zelest> I have a replicaset of 5 nodes and I use "nearest" atm. The idea was to check the latency while reading from different nodes.. Nothing I plan on using in production
[08:07:18] <kurushiyama> Zelest: So you want to debug MongoDB? Nearest is the one with the lowest latency, however I am not sure which measuring interval applies here.
[08:07:24] <Boomtime> ok, nearest can be quite tricky to tell where it went - most drivers support verbose logging of some sort, so you could figure it out after the fact
[08:07:47] <Boomtime> you could also use the server verbose logging to figure it out
[08:08:20] <Boomtime> nearest determines ping times from the client driver itself - based on how fast the server responds to a isMaster periodically - so it isn't just network latency
[08:08:53] <Zelest> I just remember Derick gave me some cute code for the old driver which basically showed the entire "connect sequence" where it selected and filtered nodes based on various options :)
[08:09:23] <Boomtime> the lwest server ping in the set determines the lower bound, all servers within a threshold value of that are considered to be 'nearest' - the driver then picks from this set at random
[08:09:55] <Boomtime> the threshold value is usually controllable via an option, i believe the default is 15ms or something like that
[08:10:31] <Boomtime> certainly, it's large enough that if all servers are in the same datacentre, or even just the same city, they'll probably all be 'nearest'
[08:10:48] <kurushiyama> Boomtime: You know when those latencies are determined? I guess election would make sense, but are they checked regularly?
[08:10:59] <Boomtime> yes, they update from time to time
[08:11:13] <Boomtime> i think the cadence of update is determined by the driver.. lemme check
[08:13:34] <Boomtime> doesn't actually dictate how often a driver should refresh its view - i think most are around every 10 seconds
[08:14:52] <kurushiyama> Boomtime: If I read it right, basically every time the driver does an isMaster on the connected nodes the latency is updated. heartbeat interval?
[08:15:26] <Boomtime> right, isMaster is specifically dictated to be the command to be used for ping tracking
[08:15:50] <Boomtime> but it doesn't say how often a quiscent driver is required to do that - it will do it on every new connection regardless
[08:16:25] <Boomtime> but once the connection pool reaches steady-state, the spec doesn't seem to dictate further updates
[08:16:53] <Boomtime> the pool can't go completely idle though, as connections are required to have a max lifetime.. so at least a small churn will occur
[08:17:49] <kurushiyama> Boomtime: So basically, one can only be sure of during startup of the driver and after an election, strictly speaking?
[08:17:56] <Boomtime> or.. it could be this: https://github.com/mongodb/specifications/blob/master/source/server-selection/server-selection.rst#heartbeatfrequencyms
[08:18:29] <Boomtime> oh of course, it's in the discovery specl https://github.com/mongodb/specifications/blob/master/source/server-discovery-and-monitoring/server-discovery-and-monitoring.rst#heartbeatfrequencyms
[08:18:54] <Boomtime> but i got it right :-) 10 seconds
[08:19:30] <Boomtime> ok, gotta go - hope it helped
[08:20:04] <kurushiyama> Boomtime: A lot for me! Thank you very much. Did not even know those specs existed. Have something to read!
[08:52:11] <sivi> can anyone tell me what usually causes an empty reply from mongo daemon ?
[08:53:01] <kurushiyama> sivi: HAve you tried via shell?
[09:03:04] <profeten> tried to check the mongo log files but I was fruitless in finding out what was wrong
[09:03:26] <profeten> ackording to the google results I could find this is the correct way of setting it up?
[09:04:30] <kurushiyama> sivi: Since your MongoDB can only ever have one dbpath, you should be careful there. And it is not called a library, but a directory. Terminology is _important_
[09:05:13] <kurushiyama> sudo is NOT the Unix way of saying "I mean it!"
[09:06:49] <sivi> Ok? but mongo db forces me to use /data/.. and the default installation of mongo was on /var/lib what should one do in that case? what is considered best practice ?
[09:06:52] <kurushiyama> profeten: Absolutely nothing in the log files? Have you checked you OS logs?
[09:07:05] <profeten> kurushiyama: yes there was nothing in the mongo files :S
[09:07:34] <kurushiyama> profeten: And system logs? It is hard to believe that your system would let a service silently fail.
[09:11:29] <profeten> I will see if I can make some permission correction and see if that solves the problem
[09:13:00] <profeten> kurushiyama: thanks I'm just way to tired today
[09:13:14] <profeten> studied for exams until 3 in the morning...
[09:16:15] <Zelest> Earlier one could use http://www.php.net/manual/en/mongoclient.close.php to close *ALL* connections (to speed up the detection of the new master in case of a replica member crashing) .. how can I do this using the mongodb/phplib driver? *poke Derick
[09:23:18] <Zelest> Is that even possible in the new driver? *can't find it in any of the docs*
[09:24:29] <sivi> is it recommended to include all libraries when install mongodb?
[09:24:57] <sivi> (sorry for all these nubeas issues)
[09:38:42] <samwierema> Is it possible to do a regex search (or a partial search) on a text index? If not, is there another way I can achieve/simulate it?
[09:49:33] <remonvv> Question; we're having a weird issue. Every now and then one of our queries returns data that isn't actually present in the database. It seems to only occur on very large cursor walks.
[09:50:25] <remonvv> Same query from the mongo shell works correctly, restarting the query through the Java driver gives the same "magic" data.
[09:50:33] <sivi> resolved: problem was on the express app require('http') was changed to require('https’) which was a problem
[09:51:34] <remonvv> For example, we have a large collection of scores that are always a multiple of hundred. Once every weeks or two we suddenly get a number that isn't a multiple of hundred (95961 today for example). Querying on that value returns no results so it isn't present in the data.
[09:53:35] <remonvv> To thicken the plot a bit. Say instance A is doing the query it will always return wrong results. If we execute the exact same query on instance B it will not find that wrong value and completes the task as expected.
[09:53:58] <remonvv> Does this ring a bell for anyone?
[10:08:03] <mroman> Is there a neat way to cache query results directly supported by mongo?
[10:08:20] <mroman> or more specific: periodic queries that'll be re-run every $time
[10:09:21] <mroman> like a stored procedure that is executed by the DBs every $time and calls to that procedure will immediately return the result of the last run. (so there's never any delay in calling the procedure from a client)
[10:11:49] <mroman> otherwise i'll set up some cronjob calling a script
[10:46:00] <kurushiyama> mroman: Sure. Formulate them as an aggregation and use an $out stage. However, you'd have to run them yourself. That should be easy enough nowadays, almost every language supports timed events one way or the other.
[12:08:54] <mroman> Ok so I have a tree structure...
[12:55:35] <mroman> I can't imagine how that'd work
[12:56:18] <mroman> "For case sensitive regular expression queries, if an index exists for the field, then MongoDB matches the regular expression against the values in the index, which can be faster than a collection scan."
[13:02:34] <mroman> I was raised hating redundancy
[13:02:37] <kurushiyama> Wait. Are we talking of the fact that you need a document for a tree node? Or do you just need the comments below a tree node?
[13:02:43] <mroman> and always storing the full path with each entry is incredibly redundant :)
[13:03:05] <mroman> I need all comments below a tree node
[13:03:19] <kurushiyama> mroman: Disk space is extremely cheap. If redundancy boosts the performance of a more common use case, I'd allways take it.
[13:04:14] <kurushiyama> mroman: Especially given wT's compression options.
[13:04:43] <mroman> http://codepad.org/OYUnq7Mp <- see here @below a tree node
[13:05:10] <mroman> you can actually call them folders
[13:05:15] <mroman> listing all files in a folder is easy
[13:05:26] <mroman> listing all files in a folder and his subfolders
[13:05:36] <kurushiyama> mroman: I know what a tree structure is.
[13:05:55] <kurushiyama> mroman: If not, I should probably work as a rye bread instead.
[13:07:03] <mroman> so currently a comment has a reference to the node in the tree
[13:07:41] <mroman> there are the two options: use $in or use $regex
[13:07:49] <mroman> I'll probably have to try both :)
[13:08:01] <mroman> $regex will suck if you rename stuff :D
[13:08:33] <mroman> then you have to do string replacements :)
[13:08:43] <kurushiyama> mroman: The thing is: how you get the values for $in? And more important: How do you keep those values consistent? Assume you have 2 application servers accessing mongodb... ;)
[13:13:51] <mroman> to find all comments in all subfolders you either know the path as a string and do the $regex or you know all the ids of all the subfolders and do the $in
[13:14:02] <kurushiyama> Say we have a structure like this: ~/science/computer. With your approach, you'd need 3 queries alone for identifying your <ids>, plus an additional query for identifying the comments. And you discuss the complexity of a single query solution?
[13:14:37] <mroman> well those 3 queries for the ids are basically for free.
[13:20:51] <kurushiyama> mroman: It is outright wrong. I was being sarcastic. Your access method changes dramatically.
[13:21:54] <kurushiyama> mroman: You are not even storing comparable values.
[13:22:31] <mroman> I don't know how mongo handles $in
[13:23:12] <mroman> but {"node" : {$in : [1 , 2, 3]}} is basically the union of {"node":1},{"node":2},{"node":3} which should be quite fast
[13:23:39] <mroman> especially if you have an index on node
[13:24:59] <mroman> because that's just n-comparisons per id
[13:25:07] <kurushiyama> mroman: You are looking at individual parts of a single use case here. From my point of view, that does make little to no sense. Since you want the use case to be optimized.
[13:25:50] <mroman> *n-comparisons per entry if you linearly scan through all entries without an index
[13:32:06] <mroman> importing about 4k docs per second
[13:35:17] <mroman> kurushiyama: most likely the use case will be "What's the newest comment under ~/science"
[13:35:53] <mroman> Of course, I could have another collection that is updated on new comment insertion so I don't have to query comments then sort by timestamp
[13:37:28] <mroman> but if you do that you need to walk up the tree anyway
[13:37:43] <mroman> because if you post a new comment in ~/science/computer it'll be also the newest comment in ~/science and also in ~
[13:49:29] <mroman> we've already had that discussion.
[13:50:02] <kurushiyama> Maybe. I do not recall that stuff.
[13:50:06] <mroman> and my opinion on that is still the same
[13:50:35] <mroman> I don't give a *** how much RAM was free at that point a database system must not become completely unavailable because a query was a bit costly
[13:51:18] <mroman> if there's not enough RAM for the query stop the query and return to normal operation so that at least other queries other database are still available and not becoming offline
[13:51:18] <kurushiyama> mroman: Well, that depends on a lot of factors. If OOM killer kicks in, that is for a reason. ;)
[13:51:56] <mroman> granted, the one on the VM was due to the OOM killer
[13:52:07] <mroman> that's not really mongos fault then
[13:52:28] <kurushiyama> mroman: Misusing a system and blaming it on the system because you had different expectations does not help you in any way, does it? ;P
[13:53:49] <mroman> well if the OS isn't notifying you properly that there's not much memory left, no memory left then it's not the applications fault, yes.
[13:55:11] <kurushiyama> Oh, and the OS does inform you. In a way. It kills processes. Personally, I never had an OOM killer problem. Some reasonable swap space on top of proper dimensions seem to help in that regard. ;)
[13:55:52] <mroman> well usually you might want to employ your own memory allocation stuff
[14:06:22] <mroman> If you don't dynamically allocate stuff on the stack and know your max_recursion_depth then stack_size is fixed
[14:06:28] <mroman> it doesn't grow nor shrink during operation
[14:06:35] <kurushiyama> Well, bottom line is: "It does not work like I want it to work, so it is the applications/OS/weathers fault."
[14:07:15] <kurushiyama> mroman: Afaik, you can actually suggest changes to POSIX
[14:07:26] <mroman> well that depends on what kind of guarantees you want in terms of remaining operational
[14:08:00] <mroman> I generally don't want my db to die no matter what circumstances...
[14:08:34] <kurushiyama> mroman: No. There are standards. If you want to stay operational: Do your job and find proper dimesnions. If you can not, use a resource limitation tech to your liking. If you can not do that, you have a problem
[14:08:57] <kurushiyama> So you'd rather kill you OS...?
[14:09:40] <kurushiyama> There is a reason why OOM killer kicks in: Preventing to have all memory allocated so that you can still do administrative tasks.
[14:09:49] <kurushiyama> No mem == no new ssh instance
[14:11:22] <kurushiyama> So if a process would cause all RAM to be allocated, the process taking up the most RAM gets killed. Simply because you only have to kill one process instead of many to keep he OS operational.
[14:18:34] <kurushiyama> mroman: or even better, do a PR with the according changes
[14:20:24] <mroman> If I were to input data from live feeds at high-frequency, somebody does a query on some database and the whole system dies I'm going to loose data.
[14:23:11] <mroman> (I'm too damaged from embedded systems, nanokernels and stuff like that :))
[16:00:36] <cpama> hello all. how would i write a mongo query that would be the equivalent of this: SELECT * FROM `reporting` where type="location_status" GROUP BY name order by playbook_date desc
[16:01:08] <cpama> i'm using a php driver. and so far I have this (working) query:
[16:01:12] <kali> cpama: check out the aggregation framework. this is close to a texbook example
[16:08:27] <deathanchor> questions about mongo3.2, if we start with unencrypted dbs and move to enterprise for encryption at rest, do we need to rebuild the dbs?
[16:09:05] <deathanchor> can the data be encrypted after the fact?
[16:09:30] <deathanchor> or is there a migration process that requires some downtime?
[16:30:01] <dhanasekaran> Hi Guys, I like know, mongoexport will lock the table ?
[17:33:30] <deathanchor> dhanasekaran: yes (for 2.x) use a secondary is a way to avoid that.
[17:35:31] <dhanasekaran> deathanchor: currently I am using 3.2.5 so no lock right.
[18:11:00] <awal> hello! any ideas when official packages for ubuntu 16.04 will be relased? even a link to a tracking bug?
[18:13:38] <awal> nvm found it. https://jira.mongodb.org/browse/SERVER-23043?focusedCommentId=1257859&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-1257859 is this "couple months" timeline literally correct? o.O
[18:36:25] <ayee> If I have an AMI built that has the automation agent baked in, with the right apikey, and config. Is there a way when I'm in mongodb ops manager, to have it automatically spin up however many nodes I need for various platforms. After I decide my cluster size, I click aws.. the flavor.. then it spins up n instances of that flavor using awscli?
[18:36:32] <ayee> and it picks my AMI with my baked agent in there
[18:57:27] <cpama> Hi there. i'm a mongodb noob. i created some documents using robomongo gui. but now trying to update documents via command line. I'm not getting an errors, but it's also not updating any record. here's the code and the document: http://pastebin.com/Mx0AMThw