[00:23:01] <Freman> kurushiyama: https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#TasksMax= is why mongod wasn't working for us :D
[07:20:05] <sivi> Hello a question about node client via ssl:
[07:20:12] <sivi> var cert = fs.readFileSync(__dirname + "/ssl/client.pem");
[07:20:13] <sivi> var key = fs.readFileSync(__dirname + "/ssl/client.pem");
[07:20:33] <sivi> this is from mongodb guide is it correct?
[10:47:37] <m0rpho> its very difficult to trace as its a production server environment with lots of connections and it doesnt happen in the local testing environment
[10:50:34] <m0rpho> i just thought you guys might have any clues or experiences where there is a 10s timeout
[11:10:19] <m0rpho> and right after one pymongo connection is closed a new one is then instantiated
[11:10:39] <m0rpho> and then exactly after 10 seconds this connection is closed again
[11:11:45] <kurushiyama> m0rpho: Uhm, that sounds like the connection pool doing it's stuff, no?
[11:12:30] <m0rpho> kurushiyama: is there a default 10s timeout?
[11:13:37] <m0rpho> do you have have any idea what I should look for? socketTimeoutMS? maxIdleTimeMS?
[11:13:58] <kurushiyama> m0rpho: I have no clue about pymongo, I just came up with a theory that might fit the facts... ;) maxIdleTimMS sounds about right.
[11:44:03] <Lumio> Hey guys! I was wondering what the best practice is here… I’m thinking of having all my invoices for my clients saved in a MongoDB and I would attach the generated PDF file within it. Do you think this is good practice? I would only store it, so i have everything in one place.
[12:17:36] <grug> Lumio: no - you have all the information in your database (presumably) already, so just generate it when you need it
[12:17:51] <grug> or, generate it and store it on a static host such as S3
[12:17:56] <grug> but don't store that shit in your db
[12:32:09] <Dlabz> Hi, all. If I load a huge json file with geometry in the mongodb , where one of the properties of root is an array of individual geometric objects, will I be able to querry for them? will that be efficient? Thanks.
[12:34:30] <compeman> Dlabz it would be better if you shared an example
[12:35:08] <Dlabz> let me make a smaller one, as my files are huge...
[12:40:13] <Dlabz> "products" property can be enywhere between 10K and 30K objects
[12:41:13] <Dlabz> I'm having issues parsing those files, so I'm planing to pass the file to mongodb, and use nodejs, protocol buffers and websockets to load individual pieces
[12:41:22] <compeman> Dlabz, it will be better if you
[12:57:44] <Dlabz> so, since I need to change my json files, to be able to load them in the mongo in a correct way, what's my best option to programatically convert them to prefered format?
[12:58:22] <Dlabz> I've seen the option to load json in steps, using node.js
[13:01:34] <Dlabz> so, any idividual object needs to be transfered to the client eventually
[13:02:03] <Dlabz> or, a set of individual items from more than one file
[13:04:43] <Dlabz> obviously, end-users don't really think in terms of optimizing the graphics for web, so I end up with a whole hospital of flexy-tubes... easily thousand triangles
[14:14:50] <JsonTooBig_> hi there, I wonder if anyone can give me a hand, I am getting the following error message while replicating to a new 3.2.6 node: [NetworkInterfaceASIO-BGSync-0] Assertion: 10334:BSONObj size: 17338193 (0x1088F51) is invalid. Size must be between 0 and 16793600(16MB) First element: id: 4408015349485
[14:16:40] <JsonTooBig_> this happens several hours into the replication, and afterwards the replication process has to restart
[14:16:47] <JsonTooBig_> any ideas or suggestions?
[14:21:48] <silviolucenajuni> JsonTooBig_: Can be a problem because of a large document to be replicated ? Maybe your master have a document limit higher of your secondary ??
[14:22:53] <JsonTooBig_> its possible, but the error context doesn't give us much information to go by, we don't even know the collection that contains the offending document
[14:24:22] <JsonTooBig_> also the elementid in the error message doesn't makes sense to us, because we don't use numeric ids for any of our documents
[14:26:29] <kurushiyama> JsonTooBig_: It is too big, and you have the ID...
[14:27:11] <kurushiyama> JsonTooBig_: You might want to check other collections and or databases as well.
[14:27:19] <JsonTooBig_> that is the thing, that id isn't for any of our documents, we don't use integer IDs for anything
[14:28:13] <kurushiyama> Well, it has to come from somewhere. You should check, even if it does not make sense to you. Maybe somebody did something dtupid.
[14:35:09] <kurushiyama> silviolucenajuni: Yup, increasing the debug level would probably help.
[14:37:03] <kurushiyama> Whereas I would be more concerned about the fact that somebody managed to increase a doc beyond the BSON size limit. I have never actually tried that and can only assume that this could be caused through updates.
[14:46:24] <silviolucenajuni> JsonTooBig_: loop over all documents in all collections in all db to check size of documents is a options ?
[15:27:37] <kurushiyama> ange7: The thing is that it _is_ slow. It is no replacement for a JOIN. It is just there for saving an extra query for small result sets. You should rather go with redundancy in most cases.
[15:29:08] <ange7> kurushiyama: ok so on million rows it's not recommended ?
[15:30:14] <kurushiyama> ange7: That depends. It is not about the _data_ set, but the result set of your current stage in the pipeline.
[15:32:31] <kurushiyama> $lookup might well be used to save an additional query for small result sets. But since it has either to be called for every doc in the result set or as an $in query, I expect it to be less than perfect for medium result sets already, not speaking of large ones.
[15:34:06] <oky> i would expect it to avoid the N+1 query if it can
[15:34:52] <kurushiyama> oky: Morning. Well, it does correlations between tables. I have no clue about how it is implemented, but as described above, I doubt it will be good for more than just a couple of dozens docs in the result set.
[15:36:11] <kurushiyama> ange7: As said: That depends on the number of docs in the pipeline after $match. If you need to correlate, you are most likely _much_ better off utilizing redundancy properly.
[15:37:07] <kurushiyama> oky: Whereas with small result sets, it _might_ be quite reasonable.
[15:39:20] <kurushiyama> ange7: If you would reference authors by something other than their name, say you use an ObjectId as _id in authors, you would need to look up the author name for each post
[15:40:55] <kurushiyama> ange7: But having something like {_id: new ObjectId(), title:"Use redundancy wisely", author:"Kurushiyama", text:"blah"}, you'd have the author name right away. Albeit the author name might be redundant here.
[15:42:45] <ange7> It's not objectId ahaha i fix this :p
[15:50:20] <ange7> I have a dataset of 1 billion documents i can't add one column now lol
[15:51:59] <kurushiyama> ange7: Well, if your data model does not suit your needs, you _should_ change it. If you can not, you have to life with what you have.
[15:53:17] <StephenLynx> why can't you add a field?
[15:53:31] <silviolucenajuni> Have some diference in a simple query with db.collection.find({'a':'a', 'b':'b'}) and db.collection.find({'$and': [ {'a':'a'}, {'b':'b'} ] });
[15:54:26] <kurushiyama> silviolucenajuni: Not sure what the query optimizer would make of it, but they should be equal. albeit I find the first version much more readable.
[15:54:36] <StephenLynx> if you are figuring why $and exists, its for cases where you can't use the first syntax
[15:54:54] <StephenLynx> like when you want to check for two conditions on the same field
[15:54:55] <kurushiyama> Or nested conditions, I'd guess
[15:55:20] <StephenLynx> hm, I think I said something slightly wrong there.
[15:55:27] <Derick> StephenLynx: you mean like: { 'a': { $lte : 4, $gt : 7 } } ?
[15:55:50] <Derick> StephenLynx: only a little, there are indeed cases where that trick doesn't work I believe
[15:56:23] <StephenLynx> anyway, your default syntax is the first and the second for exceptional cases.
[15:56:51] <kurushiyama> Derick: But the optimizer makes them identical, right?
[15:57:01] <Derick> https://docs.mongodb.com/manual/reference/operator/query/and/#and-queries-with-multiple-expressions-specifying-the-same-operator is a good example
[16:02:01] <ange7> kurushiyama: fo you think is it possible to make one update query to add author field in my collection from my author collection ? :/
[17:29:55] <cffworld> Mongo question: How can I tell if a member of a replication set is frozen or not? I can freeze with `rs.freeze()` but have no idea how to tell if it is frozen.
[17:31:35] <edrocks> cffworld: is it in rs.status()?
[17:32:20] <cffworld> edrocks: but i dont see anything that suggests that it's frozen
[17:32:43] <kurushiyama> cffworld: So it is not? It can be only either of them.
[17:33:18] <kurushiyama> Well, I would guess it is somewhere in admin.
[17:33:31] <edrocks> cffworld: you can unfreeze it if you set to 0 https://docs.mongodb.com/v3.0/reference/command/replSetFreeze/#dbcmd.replSetFreeze
[17:33:33] <cffworld> kurushiyama: sorry im not following. It can be only either of what?
[17:33:40] <edrocks> idk if/where you find if it's frozen though
[17:34:29] <kurushiyama> cffworld: edrocks asked wether it is in rs.status(). You answered "yes". Then you said you can not tell. So IS it in rs.status or not?
[17:35:15] <cffworld> kurushiyama: ah sorry for the confusion. Yes, it is in rs.status. But there is nothing in rs.status that says anything about being frozen
[17:37:33] <kurushiyama> cffworld: "Yes, it is in rs.status. But there is nothing in rs.status" What?!?
[17:42:06] <cffworld> kurushiyama: can't tell if trolling or...
[18:18:13] <silviolucenajuni> saml: in query db.collection.find({'a': 'a', 'b':'b'}) is like an implicit $and. like db.collection.find({'$and': [{'a':'a'}, {'b':'b'}] }) is explicit $and. And in also query the filters are evaluated in short-circuit. If a document don't have field 'a' equal 'a', mongodb don't check if field 'b' is equal 'b'
[18:19:22] <saml> silviolucenajuni, how can you be sure ordering when you write implicit $and?
[18:19:42] <saml> i guess javascript object keys are strictly ordered by a rule?
[18:23:26] <kurushiyama> saml: Aye, they are. Order of implicit and explicit is preserved.
[18:33:51] <kurushiyama> saml: Huh. Can you give an example?
[18:39:59] <saml> imagine a collection with docs like {authoredBy:'author name', tags: ['tag'], publishDate:ISODate()} and I have 20 different queries around authoredBy and tags. some use regex (authoredBy contains something ignoring case). I need to write a report about document count per each 20 query groups by month
[18:42:11] <saml> $group._id is like {year: {$year: '$publishDate'}, month: {$month: '$publishDate'}, and the query}
[18:42:35] <kurushiyama> saml: Under usual provisions, I'd probably write 20 aggregations using the original statements, then group as you described.
[18:43:52] <kurushiyama> saml: Though I have to admit that I still not get a grasp on the use case.
[18:45:10] <saml> number of documents per month by author = simple aggregation. now modify "by author" part with more involved query that business wants
[18:45:11] <kurushiyama> saml: Might well be that you can/should define reporting use cases, of which the aggregations derive rather naturally.
[18:46:25] <kurushiyama> saml: In this case, I'd really go for working out the use cases with the suits and write according aggregations from scratch.
[18:48:11] <kurushiyama> saml: new docs/tag/time unit comes to my mind, to find the hotspots.
[19:36:20] <kurushiyama> oky: Right. Say you have tags host and region. So you would have queries that are _roughly_ comparable to db.measurement.find({host:"somehostname", date:{$gte:someDate,$lte:someOtherDate}})
[19:37:38] <kurushiyama> or db.measurement.find({region:"NCSA", date:{$gte:someDate,$lte:someOtherDate}})
[19:38:04] <kurushiyama> (both would not make much sense, in case we talk of FQDN, at least)
[19:38:28] <silviolucenajuni> Anyone know if the rule # 27 of the book Tips and Tricks for Developer by Chorodow is still a valid rule in mongo 3.2? I'm trying to replicate the results but I can not see difference.
[19:38:52] <silviolucenajuni> Tip #27: AND-queries should match as little as possible as fast as possible
[19:38:52] <kurushiyama> silviolucenajuni: If you would quote it... ;)
[19:40:06] <silviolucenajuni> Tip #27: AND-queries should match as little as possible as fast as possible
[19:41:45] <kurushiyama> silviolucenajuni: Out of context, I guess it is wise to follow Kristina's advice until she is proven wrong ;)
[19:42:52] <kurushiyama> oky: The interesting part of Influx (for me) is that it is rather easy group by time _intervals_, which can become a pita in other DBs
[19:44:00] <yopp> yeah, but main problem with influx that they don't have basic things like increments and stuff
[19:44:12] <kurushiyama> oky: Keeping the example above, you could do something like (pseudocode) db.measurement.find({host:"somehostname", date:{$gte:someDate,$lte:someOtherDate}}).groupBy("5 mins").average()
[19:51:44] <kurushiyama> saml: Na. InfluxDB, for example.
[19:51:48] <saml> A time series database (TSDB) is a software system that is optimized for handling time series data, arrays of numbers indexed by time (a datetime or a datetime range).
[19:51:58] <saml> mongodb can index datetime field
[19:52:17] <yopp> saml, overhead for storing time series is major problem
[19:52:35] <yopp> •overhead for storing the timestamp in time series
[19:52:54] <yopp> in case of 32bit value you will have 64 bit time stamp
[19:57:52] <yopp> basically you need to store raw data indefinitely
[19:58:03] <edrocks> yopp: what do you mean? like robots?
[19:58:10] <saml> and aggregate entire data every hour?
[19:58:22] <kurushiyama> yopp: That is a problem, admitted. Though latency will kill you anyway. You'd probably need local window aggregation for electronics.
[19:59:06] <saml> what do you like to do in the end? make a realtime graph ?
[20:00:58] <yopp> you have hundreds of sensors on each pump
[20:01:34] <yopp> and you have a formula that allows you to use historical sensor values to predict the breakdown
[20:01:57] <yopp> so you can send the guys in the field before your pump is dead
[20:02:09] <yopp> but you can use "momentary values"
[20:02:11] <kurushiyama> yopp: Well, you'd only report values exceeding certain thresholds, no?
[20:02:22] <saml> sure. archiving them is one thing. doing real time analysis of high throughput is another. and analysis yields manageable sized output
[20:02:42] <yopp> saml, you need to analyize data in realtime to predict breakdown
[20:02:58] <saml> all those events, you can archive somewhere. and you also run analysis transforming raw events into something more meaningful to you
[20:03:30] <yopp> it's not working like: okay, once a week we will put this shit in our hadoop cluster and see what it says
[20:03:32] <saml> i'm sure maths you use can memoize historical values
[20:03:36] <kurushiyama> yopp: I doubt that. exceeding stddev as a trigger would be sufficient. The amount by which stddev is exceeded allows breakdown projection.
[20:07:20] <oky> yopp: i used to work on something like that, too
[20:08:25] <oky> yopp: used a custom cluster solution?
[20:09:24] <yopp> nope, they had a hardware that can do that on site
[20:10:16] <yopp> and the problem was like: how we can do that no just on single site, but in our warm office to get a whole picture and to optimize the maintenance routine
[20:28:32] <cpama> hi everyone. i have the following code: http://pastebin.com/y9gTGFfc
[20:28:55] <cpama> as you can see, i have two functions... both update the same collection...but different fields in a document
[20:29:26] <cpama> my bug is that when I run function update_location_status it works... creates the record if it doesn't exist ... and i can it multiple time...everything looks good.
[20:29:39] <cpama> but then another module comes along and wants to add data to an existing location record.
[20:29:58] <cpama> when it calls the function update_location_data the existing document is wiped out and replaced with this new document.
[20:32:08] <cpama> i think maybe the problem is that i'm passing in an array with all the fields i want to update, but this is being interpreted as rewrite the entire doc using this new array. Really, what I want is to update the existing doc with updated values for only the fields defined in the array
[20:41:05] <cpama> i figured it out. had to pass not just the array of new data, but array('$set'=>$newdata)