PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Wednesday the 18th of May, 2016

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:23:01] <Freman> kurushiyama: https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#TasksMax= is why mongod wasn't working for us :D
[07:20:05] <sivi> Hello a question about node client via ssl:
[07:20:12] <sivi> var cert = fs.readFileSync(__dirname + "/ssl/client.pem");
[07:20:13] <sivi> var key = fs.readFileSync(__dirname + "/ssl/client.pem");
[07:20:33] <sivi> this is from mongodb guide is it correct?
[07:21:32] <sivi> http://mongodb.github.io/node-mongodb-native/2.0/tutorials/enterprise_features/?_ga=1.41901026.1114324834.1461242741
[10:18:45] <m0rpho> hi there, my pymongo connections are dropped exactly after 10 seconds, do you guys have any guess what that could be?
[10:40:16] <kurushiyama> m0rpho: All of them or just some?
[10:40:45] <m0rpho> all of them
[10:41:02] <kurushiyama> Strange.
[10:41:30] <m0rpho> is there any kind of timeout or something like that? anything with 10s?
[10:43:28] <m0rpho> i use nginx+uwsgi+django+humbledb+pymongo
[10:43:34] <m0rpho> and i just cant find any timeout
[10:44:07] <m0rpho> maybe a kernel parameter?
[10:46:53] <Derick> strace it and find out?
[10:47:37] <m0rpho> its very difficult to trace as its a production server environment with lots of connections and it doesnt happen in the local testing environment
[10:50:34] <m0rpho> i just thought you guys might have any clues or experiences where there is a 10s timeout
[11:10:19] <m0rpho> and right after one pymongo connection is closed a new one is then instantiated
[11:10:39] <m0rpho> and then exactly after 10 seconds this connection is closed again
[11:10:43] <m0rpho> this is so strange...
[11:11:45] <kurushiyama> m0rpho: Uhm, that sounds like the connection pool doing it's stuff, no?
[11:12:30] <m0rpho> kurushiyama: is there a default 10s timeout?
[11:13:37] <m0rpho> do you have have any idea what I should look for? socketTimeoutMS? maxIdleTimeMS?
[11:13:58] <kurushiyama> m0rpho: I have no clue about pymongo, I just came up with a theory that might fit the facts... ;) maxIdleTimMS sounds about right.
[11:14:52] <m0rpho> ok thanks I'll try ;)
[11:15:59] <m0rpho> not supported by pymongo :/
[11:44:03] <Lumio> Hey guys! I was wondering what the best practice is here… I’m thinking of having all my invoices for my clients saved in a MongoDB and I would attach the generated PDF file within it. Do you think this is good practice? I would only store it, so i have everything in one place.
[12:17:36] <grug> Lumio: no - you have all the information in your database (presumably) already, so just generate it when you need it
[12:17:51] <grug> or, generate it and store it on a static host such as S3
[12:17:56] <grug> but don't store that shit in your db
[12:18:12] <Lumio> grug: good point
[12:32:09] <Dlabz> Hi, all. If I load a huge json file with geometry in the mongodb , where one of the properties of root is an array of individual geometric objects, will I be able to querry for them? will that be efficient? Thanks.
[12:34:30] <compeman> Dlabz it would be better if you shared an example
[12:35:08] <Dlabz> let me make a smaller one, as my files are huge...
[12:38:59] <Dlabz> compeman: https://gist.github.com/dlabz/0d130eaa681250200ce2b84a372727a4
[12:39:11] <Dlabz> well, this is the general idea
[12:39:35] <cheeser> you should be fine, yes.
[12:40:13] <Dlabz> "products" property can be enywhere between 10K and 30K objects
[12:41:13] <Dlabz> I'm having issues parsing those files, so I'm planing to pass the file to mongodb, and use nodejs, protocol buffers and websockets to load individual pieces
[12:41:22] <compeman> Dlabz, it will be better if you
[12:41:26] <compeman> write a script to
[12:41:45] <compeman> write these "products" to a products collection in your mongodb
[12:41:50] <compeman> and
[12:42:01] <compeman> the unique string as 2RHTxMxOFjvm0000002zHJ
[12:42:16] <compeman> a value of a property such as pid
[12:42:29] <compeman> the g and the bouds look like ok
[12:43:11] <compeman> e.g ; pid: "2RHTxMxOFjvm0000002zHJ", g:[], bounds:[]
[12:43:27] <compeman> is one of document in your products collection
[12:43:38] <Dlabz> ah, I get it
[12:43:44] <compeman> then you can use
[12:43:45] <compeman> sailsjs
[12:43:50] <compeman> for apifying it
[12:43:54] <compeman> it is really simple
[12:43:59] <compeman> you will just write a model
[12:44:36] <compeman> with attributes :{ pid: {type:'TEXT',unique:true}; ..etc}
[12:44:41] <compeman> or mongoose.
[12:44:45] <compeman> you can do i think.
[12:44:58] <compeman> you have a good data, just simplfy it ;)
[12:45:39] <Dlabz> yeah, I was considering that option... but I'd face a problem with older files, which I'd need to regenerate all
[12:46:26] <compeman> specially the unique product string is really important
[12:46:27] <compeman> here.
[12:46:39] <compeman> it is dynamic property name
[12:46:47] <compeman> it is really hard to query
[12:46:53] <compeman> you get it i think
[12:47:05] <cheeser> compeman: please don't press enter every 4 words. complete a thought. or four. then hit enter.
[12:47:25] <Dlabz> yeah, I do... I already had that idea, but I was hoping there's a trick to it
[12:47:27] <cheeser> literally most of my screen is full of half statements from you
[12:47:34] <Dlabz> cheeser: he's obviously on a phone
[12:47:46] <cheeser> ¯\_(ツ)_/¯
[12:49:55] <Dlabz> damn, now compeman left :(
[12:50:29] <pihpah> I can't create a new user: db.createUser sis not a function. What's wrong?
[12:51:09] <cheeser> what version are you on?
[12:53:04] <pihpah> db version v2.4.14
[12:53:16] <StephenLynx> v:
[12:53:29] <StephenLynx> i dont even think thats supported.
[12:53:35] <StephenLynx> but I might be wrong
[12:54:38] <cheeser> in 2.4 it's addUser()
[12:54:41] <cheeser> https://docs.mongodb.com/v2.4/reference/security/
[12:55:45] <pihpah> thanks
[12:56:16] <cheeser> np
[12:57:44] <Dlabz> so, since I need to change my json files, to be able to load them in the mongo in a correct way, what's my best option to programatically convert them to prefered format?
[12:58:22] <Dlabz> I've seen the option to load json in steps, using node.js
[12:58:42] <Dlabz> or is there a better trick?
[12:59:39] <Dlabz> I'm messing with all this since php and clients are crashing due huge file size...
[13:00:10] <Dlabz> ... so loading the json file for conversion will still be a huge strain to the server...
[13:00:24] <cheeser> why do you need to change them?
[13:00:26] <StephenLynx> how large are they?
[13:00:44] <Dlabz> sometimes 100mb or more
[13:00:49] <StephenLynx> daium
[13:01:05] <Dlabz> yeah... it's a BIM server
[13:01:34] <Dlabz> so, any idividual object needs to be transfered to the client eventually
[13:02:03] <Dlabz> or, a set of individual items from more than one file
[13:04:43] <Dlabz> obviously, end-users don't really think in terms of optimizing the graphics for web, so I end up with a whole hospital of flexy-tubes... easily thousand triangles
[13:04:58] <Dlabz> .. per tube segment
[13:05:15] <Dlabz> and than 10k of those...
[13:05:27] <Dlabz> it adds up quickly
[13:06:28] <Dlabz> so far, I'm thinking mongodb might not be the correct tool for this task...
[13:07:13] <Dlabz> if I need to parse the huge files using node.js, there's really no point in using the mongodb...
[13:07:45] <Dlabz> I can just store parsed pieces in binary form, since I'll be using that to transfer data to client
[13:09:18] <Dlabz> well, it was nice talking to my self :)
[13:09:29] <Dlabz> cheers, rubber duckies
[13:09:40] <cheeser> good luck out there. :D
[14:14:50] <JsonTooBig_> hi there, I wonder if anyone can give me a hand, I am getting the following error message while replicating to a new 3.2.6 node: [NetworkInterfaceASIO-BGSync-0] Assertion: 10334:BSONObj size: 17338193 (0x1088F51) is invalid. Size must be between 0 and 16793600(16MB) First element: id: 4408015349485
[14:16:40] <JsonTooBig_> this happens several hours into the replication, and afterwards the replication process has to restart
[14:16:47] <JsonTooBig_> any ideas or suggestions?
[14:21:48] <silviolucenajuni> JsonTooBig_: Can be a problem because of a large document to be replicated ? Maybe your master have a document limit higher of your secondary ??
[14:22:53] <JsonTooBig_> its possible, but the error context doesn't give us much information to go by, we don't even know the collection that contains the offending document
[14:24:22] <JsonTooBig_> also the elementid in the error message doesn't makes sense to us, because we don't use numeric ids for any of our documents
[14:26:29] <kurushiyama> JsonTooBig_: It is too big, and you have the ID...
[14:27:11] <kurushiyama> JsonTooBig_: You might want to check other collections and or databases as well.
[14:27:19] <JsonTooBig_> that is the thing, that id isn't for any of our documents, we don't use integer IDs for anything
[14:28:13] <kurushiyama> Well, it has to come from somewhere. You should check, even if it does not make sense to you. Maybe somebody did something dtupid.
[14:28:22] <kurushiyama> s/dtupid/stupid/
[14:28:43] <JsonTooBig_> well, it is possible
[14:29:30] <JsonTooBig_> the error context doesn't mention the collection, I will write a script to iterate and check them all
[14:30:17] <kurushiyama> JsonTooBig_: How many collections/dbs do you have?
[14:31:01] <JsonTooBig_> just over 120 collections, across 6 dbs
[14:31:31] <kurushiyama> o.O
[14:32:10] <kurushiyama> That is quite some.
[14:32:23] <silviolucenajuni> kurushiyama: Don't have a way to get a log with more verbose error ?
[14:33:17] <kurushiyama> silviolucenajuni: Iiirc, log level increase _what_ is logged, not the log details. But the idea is good. Lemme check.
[14:33:21] <JsonTooBig_> i have the verbose error, it is just a lot of noise, other than that assertion which triggers an exit with signal 6
[14:33:36] <JsonTooBig_> kk will try, brb
[14:35:09] <kurushiyama> silviolucenajuni: Yup, increasing the debug level would probably help.
[14:37:03] <kurushiyama> Whereas I would be more concerned about the fact that somebody managed to increase a doc beyond the BSON size limit. I have never actually tried that and can only assume that this could be caused through updates.
[14:46:24] <silviolucenajuni> JsonTooBig_: loop over all documents in all collections in all db to check size of documents is a options ?
[15:20:17] <ange7> Hey
[15:20:28] <ange7> someone know performance about $lookup
[15:25:01] <StephenLynx> is that new join thing?
[15:25:21] <StephenLynx> if I am not mistaken, it's kind of slow, but I might be wrong. haven't looked into it yet.
[15:26:40] <kurushiyama> StephenLynx: Yes
[15:27:37] <kurushiyama> ange7: The thing is that it _is_ slow. It is no replacement for a JOIN. It is just there for saving an extra query for small result sets. You should rather go with redundancy in most cases.
[15:29:08] <ange7> kurushiyama: ok so on million rows it's not recommended ?
[15:30:14] <kurushiyama> ange7: That depends. It is not about the _data_ set, but the result set of your current stage in the pipeline.
[15:32:31] <kurushiyama> $lookup might well be used to save an additional query for small result sets. But since it has either to be called for every doc in the result set or as an $in query, I expect it to be less than perfect for medium result sets already, not speaking of large ones.
[15:33:26] <ange7> $match + $lookup + $group => timeout lol it's very slow
[15:33:36] <oky> $lookup sounds terrible, if that's really how it works
[15:33:42] <oky> morning, everyone
[15:34:06] <oky> i would expect it to avoid the N+1 query if it can
[15:34:52] <kurushiyama> oky: Morning. Well, it does correlations between tables. I have no clue about how it is implemented, but as described above, I doubt it will be good for more than just a couple of dozens docs in the result set.
[15:36:11] <kurushiyama> ange7: As said: That depends on the number of docs in the pipeline after $match. If you need to correlate, you are most likely _much_ better off utilizing redundancy properly.
[15:37:07] <kurushiyama> oky: Whereas with small result sets, it _might_ be quite reasonable.
[15:37:26] <ange7> « utilizing redundancy properly » ^^
[15:37:49] <kurushiyama> ange7: Uhm. Yes. What is so funny about that?
[15:37:57] <ange7> i don't understand lol
[15:38:26] <kurushiyama> ange7: Say you have posts and authors and you want to display the posts with author names.
[15:38:56] <ange7> Yes
[15:39:20] <kurushiyama> ange7: If you would reference authors by something other than their name, say you use an ObjectId as _id in authors, you would need to look up the author name for each post
[15:40:55] <kurushiyama> ange7: But having something like {_id: new ObjectId(), title:"Use redundancy wisely", author:"Kurushiyama", text:"blah"}, you'd have the author name right away. Albeit the author name might be redundant here.
[15:42:45] <ange7> It's not objectId ahaha i fix this :p
[15:43:00] <kurushiyama> ange7: Whut?
[15:43:20] <kurushiyama> ange7: It is just an _example_.
[15:43:54] <kurushiyama> ange7: Your use case might be different. The point is that post holds redundant data to save you one or more queries.
[15:45:07] <kurushiyama> ange7: Or a $lookup.
[15:46:34] <ange7> i don't understand the final of your explain, sorry.
[15:47:14] <kurushiyama> ange7: Compare {_id: new ObjectId(), title:"Use redundancy wisely", author:authorObjectId, text:"blah"} and {_id: new ObjectId(), title:"Use redundancy wisely", author:"Kurushiyama", text:"blah"}
[15:47:42] <kurushiyama> ange7: For the first version, you need a query to get the author's name
[15:47:51] <kurushiyama> ange7: for the second, you do not.
[15:49:41] <ange7> yes,
[15:50:20] <ange7> I have a dataset of 1 billion documents i can't add one column now lol
[15:51:59] <kurushiyama> ange7: Well, if your data model does not suit your needs, you _should_ change it. If you can not, you have to life with what you have.
[15:53:17] <StephenLynx> why can't you add a field?
[15:53:31] <silviolucenajuni> Have some diference in a simple query with db.collection.find({'a':'a', 'b':'b'}) and db.collection.find({'$and': [ {'a':'a'}, {'b':'b'} ] });
[15:53:35] <silviolucenajuni> ??
[15:53:57] <StephenLynx> don't think so
[15:54:26] <kurushiyama> silviolucenajuni: Not sure what the query optimizer would make of it, but they should be equal. albeit I find the first version much more readable.
[15:54:36] <StephenLynx> if you are figuring why $and exists, its for cases where you can't use the first syntax
[15:54:54] <StephenLynx> like when you want to check for two conditions on the same field
[15:54:55] <kurushiyama> Or nested conditions, I'd guess
[15:55:20] <StephenLynx> hm, I think I said something slightly wrong there.
[15:55:27] <Derick> StephenLynx: you mean like: { 'a': { $lte : 4, $gt : 7 } } ?
[15:55:32] <StephenLynx> yeah
[15:55:34] <StephenLynx> I was mistaken
[15:55:50] <Derick> StephenLynx: only a little, there are indeed cases where that trick doesn't work I believe
[15:56:23] <StephenLynx> anyway, your default syntax is the first and the second for exceptional cases.
[15:56:51] <kurushiyama> Derick: But the optimizer makes them identical, right?
[15:57:01] <Derick> https://docs.mongodb.com/manual/reference/operator/query/and/#and-queries-with-multiple-expressions-specifying-the-same-operator is a good example
[15:57:06] <Derick> kurushiyama: I believe so
[15:57:29] <kurushiyama> Good enough for me ;)
[15:57:46] <silviolucenajuni> you know any example of a query that can only be written with $and ?
[15:57:53] <silviolucenajuni> thx man.
[15:57:56] <Derick> StephenLynx: I just linked one
[15:59:09] <ange7> kurushiyama: thank you for your help
[16:00:55] <kurushiyama> ange7: You are welome.
[16:02:01] <ange7> kurushiyama: fo you think is it possible to make one update query to add author field in my collection from my author collection ? :/
[16:02:32] <ange7> posts.update({author: { author.name } }
[16:03:12] <kurushiyama> ange7: I do not think so. You need to do a migration. I am off, will be back in an hour.
[16:09:55] <StephenLynx> ange7, nope, you will have to fetch the data and then update the other collection on a separate query
[16:10:04] <StephenLynx> however, you could use bulkwrite
[16:10:09] <StephenLynx> and do in groups
[16:10:23] <StephenLynx> instead of doing one by one or all of them at once
[16:10:34] <StephenLynx> so you can balance out
[16:10:44] <ange7> bulkWrite to update its possible ?
[16:10:47] <StephenLynx> yes
[16:11:07] <StephenLynx> just keep in mind that it probably won't be able to handle a billion or so operations at once
[16:12:18] <ange7> i will try tomorrow :)
[16:12:23] <ange7> thank you for your help guys
[17:29:55] <cffworld> Mongo question: How can I tell if a member of a replication set is frozen or not? I can freeze with `rs.freeze()` but have no idea how to tell if it is frozen.
[17:31:35] <edrocks> cffworld: is it in rs.status()?
[17:32:07] <cffworld> edrocks: yes
[17:32:20] <cffworld> edrocks: but i dont see anything that suggests that it's frozen
[17:32:43] <kurushiyama> cffworld: So it is not? It can be only either of them.
[17:33:18] <kurushiyama> Well, I would guess it is somewhere in admin.
[17:33:31] <edrocks> cffworld: you can unfreeze it if you set to 0 https://docs.mongodb.com/v3.0/reference/command/replSetFreeze/#dbcmd.replSetFreeze
[17:33:33] <cffworld> kurushiyama: sorry im not following. It can be only either of what?
[17:33:40] <edrocks> idk if/where you find if it's frozen though
[17:34:29] <kurushiyama> cffworld: edrocks asked wether it is in rs.status(). You answered "yes". Then you said you can not tell. So IS it in rs.status or not?
[17:35:15] <cffworld> kurushiyama: ah sorry for the confusion. Yes, it is in rs.status. But there is nothing in rs.status that says anything about being frozen
[17:37:33] <kurushiyama> cffworld: "Yes, it is in rs.status. But there is nothing in rs.status" What?!?
[17:42:06] <cffworld> kurushiyama: can't tell if trolling or...
[17:42:48] <kurushiyama> o.O
[17:43:10] <kurushiyama> edrocks: Was it me being unfriendly?
[17:43:54] <edrocks> kurushiyama: I don't think so
[17:47:32] <kurushiyama> I really would have liked to help him, but I did not quite get him.
[17:48:02] <edrocks> did he leave?
[17:48:25] <kurushiyama> edrocks: Yes :/
[18:08:17] <saml> hey, in query $and short circuits?
[18:08:40] <saml> $and: [queryThatHitsIndex, verySlowScan]
[18:08:52] <saml> vs. {queryThatHitsIndex, verySlowScan}
[18:09:33] <kurushiyama> saml: in general $and is redundant except for special cases. can you show the complete query?
[18:09:48] <kurushiyama> edrocks: saml was the other Gopher.
[18:10:24] <saml> {status:'good', name: /yolo/i} vs. {$and: [{status:'good'}, {name: /yolo/i}]}
[18:10:35] <saml> no i node.js
[18:10:38] <saml> javascript for life
[18:10:44] <saml> i'm kidding
[18:10:51] <kurushiyama> saml: :P
[18:10:59] <saml> i don't likw node.js but only jobs i can find are node.js jobs
[18:10:59] <edrocks> js => uglify goes from 20sec to 2.8min
[18:11:08] <saml> js?
[18:11:17] <saml> we have css minification that runs for 30 minutes
[18:11:17] <silviolucenajuni> saml: $and, explict and implicit is short-cirtcuit.
[18:11:18] <kurushiyama> saml: You freelance?
[18:11:31] <edrocks> saml: wtf?
[18:11:43] <saml> it's mostly due to the way they use globbing
[18:11:46] <kurushiyama> saml: And btw, you can have a compound index on status and name...
[18:12:15] <saml> name is crazy field. i guess i should've set up full text index on name
[18:12:27] <saml> silviolucenajuni, i don't understand
[18:12:48] <saml> kurushiyama, no i don't freelance. i can't market myself as competitive consultant. not raelly a good businessman
[18:13:22] <saml> edrocks, on SSD, it runs 10mins. so fast
[18:13:27] <uuanton> hi yall. Kind people help to execute test.js on remote host
[18:13:50] <uuanton> mongo --host host1.example.com test.js
[18:15:53] <uuanton> something like to print output
[18:16:03] <uuanton> printjson or something
[18:18:13] <silviolucenajuni> saml: in query db.collection.find({'a': 'a', 'b':'b'}) is like an implicit $and. like db.collection.find({'$and': [{'a':'a'}, {'b':'b'}] }) is explicit $and. And in also query the filters are evaluated in short-circuit. If a document don't have field 'a' equal 'a', mongodb don't check if field 'b' is equal 'b'
[18:19:22] <saml> silviolucenajuni, how can you be sure ordering when you write implicit $and?
[18:19:42] <saml> i guess javascript object keys are strictly ordered by a rule?
[18:23:26] <kurushiyama> saml: Aye, they are. Order of implicit and explicit is preserved.
[18:25:46] <silviolucenajuni> https://docs.mongodb.com/v3.0/reference/operator/query/and/
[18:30:05] <saml> hey, if $group _id is varying query, i can't use aggregation, right?
[18:31:01] <saml> i guess i can label each query and project the label and $group by the label
[18:33:17] <saml> no this does not make sense
[18:33:51] <kurushiyama> saml: Huh. Can you give an example?
[18:39:59] <saml> imagine a collection with docs like {authoredBy:'author name', tags: ['tag'], publishDate:ISODate()} and I have 20 different queries around authoredBy and tags. some use regex (authoredBy contains something ignoring case). I need to write a report about document count per each 20 query groups by month
[18:42:11] <saml> $group._id is like {year: {$year: '$publishDate'}, month: {$month: '$publishDate'}, and the query}
[18:42:35] <kurushiyama> saml: Under usual provisions, I'd probably write 20 aggregations using the original statements, then group as you described.
[18:43:50] <saml> yeah
[18:43:52] <kurushiyama> saml: Though I have to admit that I still not get a grasp on the use case.
[18:45:10] <saml> number of documents per month by author = simple aggregation. now modify "by author" part with more involved query that business wants
[18:45:11] <kurushiyama> saml: Might well be that you can/should define reporting use cases, of which the aggregations derive rather naturally.
[18:46:25] <kurushiyama> saml: In this case, I'd really go for working out the use cases with the suits and write according aggregations from scratch.
[18:48:11] <kurushiyama> saml: new docs/tag/time unit comes to my mind, to find the hotspots.
[18:49:18] <saml> yup thanks
[18:50:53] <kurushiyama> saml: Sorry that I just can deliver commonplaces, but for better answers I need more detailed info.
[19:05:39] <uuanton> anyone know when I issue db.updateUser() it doesnt return me anything ?
[19:06:35] <uuanton> db.getSiblingDB('admin').updateUser("admin", { pwd : xxx });
[19:31:29] <oky> kurushiyama: you into other dbs like influx or prometheus? i see a lot of talk about them
[19:31:38] <oky> i'm trying to figure out what they do and what they don't do
[19:32:31] <oky> f.e. a lot of these DBs are calling themselves 'time series DBs' - does that mean only time series queries can be made?
[19:32:38] <kurushiyama> oky: Do a lot of Influx, lately
[19:33:02] <kurushiyama> oky: I can not talk of Prometheus
[19:33:21] <kurushiyama> oky: In InfluxDB, there is a notion of "tags"
[19:33:43] <edrocks> in new influxdb
[19:34:36] <oky> kurushiyama: yeah, so 'tags' = indexed column, 'fields' = unindexed column
[19:36:20] <kurushiyama> oky: Right. Say you have tags host and region. So you would have queries that are _roughly_ comparable to db.measurement.find({host:"somehostname", date:{$gte:someDate,$lte:someOtherDate}})
[19:37:38] <kurushiyama> or db.measurement.find({region:"NCSA", date:{$gte:someDate,$lte:someOtherDate}})
[19:38:04] <kurushiyama> (both would not make much sense, in case we talk of FQDN, at least)
[19:38:28] <silviolucenajuni> Anyone know if the rule # 27 of the book Tips and Tricks for Developer by Chorodow is still a valid rule in mongo 3.2? I'm trying to replicate the results but I can not see difference.
[19:38:52] <silviolucenajuni> Tip #27: AND-queries should match as little as possible as fast as possible
[19:38:52] <kurushiyama> silviolucenajuni: If you would quote it... ;)
[19:40:06] <silviolucenajuni> Tip #27: AND-queries should match as little as possible as fast as possible
[19:41:45] <kurushiyama> silviolucenajuni: Out of context, I guess it is wise to follow Kristina's advice until she is proven wrong ;)
[19:42:52] <kurushiyama> oky: The interesting part of Influx (for me) is that it is rather easy group by time _intervals_, which can become a pita in other DBs
[19:44:00] <yopp> yeah, but main problem with influx that they don't have basic things like increments and stuff
[19:44:12] <kurushiyama> oky: Keeping the example above, you could do something like (pseudocode) db.measurement.find({host:"somehostname", date:{$gte:someDate,$lte:someOtherDate}}).groupBy("5 mins").average()
[19:44:27] <saml> is influx web sacle
[19:44:36] <kurushiyama> yopp: I do not see the need for an increment for time series data.
[19:44:43] <kurushiyama> saml: WebScale is a fallacy.
[19:44:46] <yopp> kurushiyama, bad for you
[19:45:01] <kurushiyama> saml: You can make sqlite webscale, if necessary. ;)
[19:45:10] <kurushiyama> yopp: Enlighten me with an example ;)
[19:45:33] <saml> db.storage.engine = sqlite
[19:46:11] <kurushiyama> saml: But to answer the question properly: Yes, there is sharding. Albeit this is a enterprise feature since recently.
[19:46:23] <edrocks> saml: tip don't try and use uglifyjs on already minified code
[19:46:31] <edrocks> just spent whole day figuring that out
[19:46:33] <saml> woot
[19:46:39] <saml> what's problem with that?
[19:46:53] <edrocks> it makes the uglifyjs time take like 20x longer
[19:47:05] <yopp> kurushiyama, simplest one: http response codes
[19:47:10] <edrocks> my build time went from like 18s to almost 3min
[19:47:13] <saml> what did you use to minify in the first pass?
[19:47:21] <saml> uglify twice?
[19:47:23] <yopp> hey database, here a request with code 500
[19:47:26] <edrocks> from libraries
[19:47:48] <edrocks> was trying to make dev builds faster by loading minified libraries instead of their source
[19:48:00] <yopp> hey database, show me the stats of requests by codes for last hour with 1 min resolution
[19:48:04] <saml> that's weird
[19:48:12] <yopp> and boom, influx can't do that
[19:48:33] <edrocks> saml: https://github.com/mishoo/UglifyJS2/pull/1024
[19:49:57] <kurushiyama> yopp: Whut?
[19:49:59] <saml> yopp, db.request_logs.aggregate({$match:{ts:{$lt:ISODate(1 hour ago)}}}, {$group:{_id:{$hour: '$ts', $minute: '$ts', status: '$status_code'}, count: {$sum:1}}, {$sort: ...
[19:50:27] <yopp> saml, thanks, but then I don't need influx
[19:50:32] <saml> thanks edrocks
[19:50:48] <saml> yopp, i was imagining about $minute . it might not exist
[19:50:48] <yopp> kurushiyama, whut "whut?"
[19:51:04] <saml> https://docs.mongodb.com/v3.0/reference/operator/aggregation/minute/ it does exist
[19:51:18] <yopp> saml, in mongo yes, but we are talking about time series databases
[19:51:19] <kurushiyama> yopp: If you want to query by that, use a code tag?
[19:51:28] <yopp> for example, yes
[19:51:29] <saml> what's time series database?
[19:51:32] <saml> cassandra?
[19:51:43] <yopp> saml, influxdb
[19:51:44] <kurushiyama> saml: Na. InfluxDB, for example.
[19:51:48] <saml> A time series database (TSDB) is a software system that is optimized for handling time series data, arrays of numbers indexed by time (a datetime or a datetime range).
[19:51:58] <saml> mongodb can index datetime field
[19:52:17] <yopp> saml, overhead for storing time series is major problem
[19:52:35] <yopp> •overhead for storing the timestamp in time series
[19:52:54] <yopp> in case of 32bit value you will have 64 bit time stamp
[19:52:55] <yopp> plus indexes
[19:53:00] <yopp> plus bson overhead
[19:53:14] <saml> is this for logging and graphing?
[19:53:31] <yopp> and at the end to store 32 bits you will waste like hundred of bytes
[19:53:41] <kurushiyama> yopp: How else would you store timestamps efficiently? Especially when we talk of comparisons?
[19:53:45] <saml> in mongo yes
[19:53:55] <yopp> saml, basically in every database
[19:54:02] <saml> not in /dev/null
[19:54:09] <yopp> yeah, it's the best
[19:54:12] <yopp> web scale!
[19:54:21] <saml> for hourly data, why not just store in file
[19:54:29] <yopp> kurushiyama, billion of options, starting from bitmap indexes and stuff
[19:54:31] <saml> not sure how much log you get for an hour
[19:54:58] <saml> maybe 4GB, which is limit before you go full big data solution
[19:55:02] <yopp> huh
[19:55:05] <kurushiyama> yopp: and bitmap comparisons are faster than integer comparisons? o.O
[19:55:13] <saml> are you twitter.com ?
[19:55:29] <saml> twitter probably gets a lot of data in an hour
[19:55:46] <yopp> saml, nope, but imagine you have a small factory and you need to store historical values of all sensors
[19:56:31] <saml> yeah storing metrics requires departure from rdbms
[19:56:33] <edrocks> isn't rollup a big plus for using a real time series db like influx?
[19:56:41] <yopp> when we speak about slow processes, like temperature that can't change like hundred times per seconds, its easy
[19:56:47] <edrocks> ie after a few days you don't store super precise data
[19:57:05] <yopp> but when we start talking, say, electricity we are talking about 50-60Hz ;)
[19:57:37] <yopp> edrocks, in automation — no
[19:57:52] <yopp> basically you need to store raw data indefinitely
[19:58:03] <edrocks> yopp: what do you mean? like robots?
[19:58:10] <saml> and aggregate entire data every hour?
[19:58:22] <kurushiyama> yopp: That is a problem, admitted. Though latency will kill you anyway. You'd probably need local window aggregation for electronics.
[19:59:06] <saml> what do you like to do in the end? make a realtime graph ?
[19:59:14] <yopp> saml, a lot of things
[19:59:27] <saml> i don't think single db fits your requirements of a lot of things
[19:59:42] <saml> start with what's actually required. make simple things
[20:00:35] <yopp> saml, from my expirience, say you have an oilfield
[20:00:47] <yopp> you have thousands of pumps
[20:00:58] <yopp> you have hundreds of sensors on each pump
[20:01:34] <yopp> and you have a formula that allows you to use historical sensor values to predict the breakdown
[20:01:57] <yopp> so you can send the guys in the field before your pump is dead
[20:02:09] <yopp> but you can use "momentary values"
[20:02:11] <kurushiyama> yopp: Well, you'd only report values exceeding certain thresholds, no?
[20:02:22] <saml> sure. archiving them is one thing. doing real time analysis of high throughput is another. and analysis yields manageable sized output
[20:02:42] <yopp> saml, you need to analyize data in realtime to predict breakdown
[20:02:58] <saml> all those events, you can archive somewhere. and you also run analysis transforming raw events into something more meaningful to you
[20:03:30] <yopp> it's not working like: okay, once a week we will put this shit in our hadoop cluster and see what it says
[20:03:32] <saml> i'm sure maths you use can memoize historical values
[20:03:36] <kurushiyama> yopp: I doubt that. exceeding stddev as a trigger would be sufficient. The amount by which stddev is exceeded allows breakdown projection.
[20:03:44] <yopp> nope
[20:03:46] <saml> history is history. set in stone. do calculation over history once.
[20:04:01] <saml> then on top of those calculated values, calculate breakdowns
[20:04:07] <saml> of current events
[20:04:38] <yopp> and once again
[20:04:40] <saml> maybe not.. i don't think weather forecast requires scanning all historical values everyday to make forecast
[20:04:56] <saml> just hire some ph.d to come up with better math
[20:05:06] <yopp> you are analysing the process, not the "snapshot". you need to compare current state with prevoius state, with previous state trends
[20:05:16] <yopp> and suprise: when you get new data trend changes
[20:05:37] <saml> sure there are math techniques. i think you need big data scientists ph.d
[20:06:02] <kurushiyama> yopp: So how do you take variance into account when only comparing two values?
[20:06:12] <saml> and your experitise in database could help them implement their maths efficiently
[20:06:31] <yopp> saml, and it's pretty much how it works today
[20:06:40] <yopp> but still, tsdbs are in their infancy
[20:07:04] <oky> yopp: do you work in an industry that has software like what you are describing?
[20:07:16] <yopp> oky, not anymore ;)
[20:07:20] <oky> yopp: i used to work on something like that, too
[20:08:25] <oky> yopp: used a custom cluster solution?
[20:09:24] <yopp> nope, they had a hardware that can do that on site
[20:10:16] <yopp> and the problem was like: how we can do that no just on single site, but in our warm office to get a whole picture and to optimize the maintenance routine
[20:28:32] <cpama> hi everyone. i have the following code: http://pastebin.com/y9gTGFfc
[20:28:55] <cpama> as you can see, i have two functions... both update the same collection...but different fields in a document
[20:29:26] <cpama> my bug is that when I run function update_location_status it works... creates the record if it doesn't exist ... and i can it multiple time...everything looks good.
[20:29:39] <cpama> but then another module comes along and wants to add data to an existing location record.
[20:29:58] <cpama> when it calls the function update_location_data the existing document is wiped out and replaced with this new document.
[20:30:01] <cpama> i hope i'm making sense.
[20:32:08] <cpama> i think maybe the problem is that i'm passing in an array with all the fields i want to update, but this is being interpreted as rewrite the entire doc using this new array. Really, what I want is to update the existing doc with updated values for only the fields defined in the array
[20:41:05] <cpama> i figured it out. had to pass not just the array of new data, but array('$set'=>$newdata)
[20:41:12] <cpama> sorry for the noise.
[20:41:41] <kurushiyama> cpama: Glad you found it.
[22:06:58] <bweston92> Why is some integers wrapped in NumberLong and some aren't>
[22:07:19] <bweston92> Seems $gt/$gte/$lt only work with NumberLong?
[23:22:43] <jr3> is there a way to query the date inside of an objectid so I can find the number of docs made in a date range
[23:30:40] <Boomtime> https://www.kchodorow.com/blog/2011/12/20/querying-for-timestamps-using-objectids/