PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Thursday the 11th of February, 2016

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[01:04:40] <Freman> http://pastebin.com/Wmd8zKa1 - munging things in
[01:56:03] <nyexpress> hi
[02:00:58] <nyexpress> I am dealing with quite a dilemma here: I have a user who has money in their account, it fluctuates as deposits and withdraws are made. I would like to calculate the total transacted amount over the period of a month, and it's becoming quite a challenge to figure out how to approach it, my equation is as such: ( $today['withdraws'] - $lastMonth['withdraws'] ) + ( $today['deposits'] - $lastMonth['deposits'] ), however a user can clear their
[02:00:58] <nyexpress> account spontaneously, which resets all the counts..
[09:38:56] <Ange7> Hey all
[09:43:10] <Ange7> someone can help me and can explain me how to do this : http://pastebin.com/Xk3bnhYY
[09:44:52] <Ceunincksken> Maybe I can
[09:45:02] <Ceunincksken> Do you have a sample JSON file which I can import in Mongo to test?
[09:48:19] <Ange7> No i don't have json :(
[09:49:06] <Ceunincksken> Ok, but I don't undertand your question fully. Let's recap.
[09:49:31] <Ceunincksken> Is my understanding correct that you have a collection that contains 3 fields (_id, user id and a date) ?
[09:50:15] <Ange7> yes
[09:51:27] <Ceunincksken> And what is your expected output?
[09:51:41] <Ceunincksken> You enter a Date range and you want all the users that lies within this range?
[09:52:43] <Ange7> I enter a date range and i want all documents that appear only once in my collection within this range.
[09:53:18] <Ceunincksken> Ok, give some time to summary this.
[09:55:09] <Ange7> thank you Ceunincksken
[09:58:31] <Lope> I want to run something like FindAndModify, but I want it to modify and return multiple documents. But the documentation says: "The findAndModify command modifies and returns a single document."
[10:00:30] <Lope> Okay I'll just do a normal find on all the docs, then update them in another command.
[10:00:58] <Ceunincksken> @Ange7: Can you read the following PasteBin and tell me if this is what you want? http://pastebin.com/AaACip8g
[10:02:59] <Ange7> Ceunincksken: i check
[10:04:22] <Ange7> yes that's exactly this
[10:04:36] <Ceunincksken> Ok, working on it. Please be patient. :)
[10:05:23] <Ange7> thank you ! :)
[10:09:11] <Ceunincksken> Ange7: Do you only need the user_id to return or not?
[10:13:41] <Ange7> Ceunincksken: Yes
[10:13:59] <Ceunincksken> Ok, finished, hang on while I post on PasteBin.
[10:14:09] <Ange7> i only need count of user_id returned
[10:18:41] <Ceunincksken> Ange7: Here you go: http://pastebin.com/aTK3T0DN
[10:18:45] <Ange7> i check
[10:21:17] <Ange7> WoW
[10:21:21] <Ange7> It work perfectly
[10:21:25] <Ange7> Thank you Ceunincksken
[10:21:30] <Ceunincksken> You're welcome :-)
[10:21:34] <Ceunincksken> Glad I could help.
[10:21:43] <Ange7> i didn't know that you can have most of $group & $match in same pipeline
[10:22:25] <Ceunincksken> You can combine all operators in one pipeline such as group, match, geonear, limit, ...
[10:22:38] <Ceunincksken> Can get quite complex though.
[10:25:04] <Ange7> Okay
[10:25:16] <Ange7> Thank you Ceunincksken
[10:25:53] <Ceunincksken> Ange7: No Problem.
[10:29:08] <valera> hi, is there a way to migrate sharded cluster to a single server ?
[10:45:56] <Keksike> I need to somehow monitor what kind of queries go into my mongodb and get some kind of stats from them (whats their operation-time, do they hit indexes, etc.). What monitoring tool can I use to log this information?
[10:48:51] <newbsduser> hello, i have very big dataset, if i don't have enough memory, mongodb responses go very slowly, can ssd disks with raid can solve the io performance problem? (for instance 6 x 450G with raid10)
[11:00:06] <Ceunincksken> Keksike: You have 2 options. Run your query with explain() or set the profiling level on your database.
[11:01:18] <Keksike> Ceunincksken: thanks. Where will explain() log the result?
[11:01:29] <Ceunincksken> newbsduser: Going with SSD disks can solve your problem, but keep in mind that SSD disks are expensive and they are still slower than SSD drives. For optimal performance, you want your data to be in RAM. Why not adding an extra server and use sharding.
[11:02:36] <Ceunincksken> Keksike: When you run a query with explain(), you don't get the results of your query, but you get the explanation of your query in the shell. So, you need to execute your queries directly in your shell, not in your application and see if indexes are being used and if a collection scan needs to be executed, ...
[11:02:55] <Ceunincksken> Keksike: Try to run db.<collection>.find({}).explain() in your shell and see the outcome.
[11:03:43] <Ceunincksken> Kekiske: You should use explain with "allPlansExecution" to get the most detailed stats. See: https://docs.mongodb.org/manual/reference/method/cursor.explain/
[11:03:56] <Keksike> thanks
[11:04:23] <Ceunincksken> You're welcome.
[11:05:45] <Keksike> is there any way to use this .explain inside my clojure-program and log the result of the .explain() into a file?
[11:06:25] <Ceunincksken> Keksike: Not that I'm aware off.
[11:07:16] <Keksike> hmm, damn. that would have been quite perfect
[11:07:29] <Ceunincksken> But, you can try to set the profiling level on your database, it might help.
[11:07:35] <Keksike> yeah, I'll try that
[11:07:50] <Ceunincksken> run db.setProfilingLevel(2) on your database and every query you execute get's logged in a collection named 'system.profile'.
[11:07:58] <Keksike> I'm basically trying to figure out/debug why a certain operation is taking such a long time in my software
[11:08:29] <Keksike> yeah I will do that
[11:08:37] <Ceunincksken> Go with 'setProfilingLevel(2)' and look at the 'system.profile' collection. It might give you a clue;
[11:08:45] <Ceunincksken> Also the logging of the database might help to identify the problem.
[11:09:02] <Keksike> logging of the database?
[11:09:12] <Ceunincksken> look at the file /etc/mongod.conf
[11:09:22] <Ceunincksken> There's a setting that defines where the database write it's logging information.
[11:09:37] <Ceunincksken> but the setProfilingLevel should give you better information.
[11:09:51] <newbsduser> Ceunincksken, if i have multiple clients which are sending data to single server, SSD disks (for instance 6 x 450G with raid10) can be correct solution?
[11:09:54] <Keksike> the problem with setProfilingLevel(2) is that it logs all the operations, not just the ones that I want to debug, and its a pain to filter through all of the logs just to find what im looking for. thats why I wanted to use .explain()
[11:10:17] <Keksike> but yeah, I will try that now.
[11:10:33] <Ceunincksken> You can filter the system.profile on ns ans query.find to find what you are looking for.
[11:11:15] <Keksike> yup. thanks alot for your help Ceunincksken
[11:11:36] <Ceunincksken> You're welcome.
[11:12:13] <Ceunincksken> newbsduser: It can be but I would go with sharding.
[11:45:12] <lowbro> I have a collection with millions of records. If I configure a ttl-index in doctrine-odm on the entity for that collection, how exactly will it be applied? Will applying (and therefor removing) hundreds of thousands have a critical impact on cpu and/or memory usage?
[11:46:36] <Ceunincksken> lowbro: By default, when you build an index, it's created in the foreground, preventing all read/write operations on Mongo. You can consider creating them in the backround. See https://docs.mongodb.org/manual/tutorial/build-indexes-in-the-background/
[11:48:44] <lowbro> Ceunincksken: Okay, I will have a look into that. Thanks
[11:49:10] <Ceunincksken> You're welcome.
[12:01:36] <Ceunincksken> cls
[12:01:37] <Ceunincksken> clear
[12:01:38] <Ceunincksken> clear
[12:03:01] <Derick> you need the /
[12:03:59] <Ceunincksken> Realized it to late :-)
[12:09:43] <lowbro> lol, never knew /clear works in IRC
[12:09:57] <lowbro> ... and I'm using it for over a decade now
[12:11:56] <rom1504> it's not part of the irc protocol
[12:12:03] <rom1504> some clients implement it
[12:14:04] <lowbro> good to know
[13:08:49] <igorwww> hello! I have a question about migrating from mongo-php-driver-legacy to mongo-php-driver, it seems like there is no equivalent to MongoCursor::timeout(), is that correct?
[13:11:32] <rom1504> there is no doc, so who knows
[13:11:48] <igorwww> there is documentation here: http://php.net/mongodb
[13:13:15] <Derick> igorwww: I'm going to need to check that.
[13:13:24] <Derick> rom1504: what makes you say that?
[13:13:34] <rom1504> igorwww: http://stackoverflow.com/a/24339516/1658314 ?
[13:13:59] <rom1504> Derick: spending 3min trying to find the doc for mongo-php-driver with google and failing
[13:14:00] <igorwww> rom1504: that is the legacy driver
[13:14:04] <rom1504> it's well hidden
[13:14:19] <igorwww> the legacy driver supports $cursor->timeout(10000);
[13:14:45] <igorwww> the new driver does not appear to have that method
[13:14:53] <igorwww> legacy driver doc: php.net/mongo
[13:14:58] <igorwww> new driver doc: php.net/mongodb
[13:15:21] <igorwww> the new driver is based on mongo-c-driver
[13:15:36] <Derick> igorwww: it is possible we overlooked that one...
[13:16:20] <igorwww> Derick: I see, is it supported by the c driver?
[13:16:59] <Derick> I am not sure, I'm going to need to investigate.
[13:17:26] <igorwww> ok, thank you. let me know if I can do anything to help
[13:18:27] <Derick> I'll be making a ticket in Jira
[13:19:34] <Derick> igorwww: https://jira.mongodb.org/browse/PHPC-576
[13:19:39] <Derick> feel fre to add comments
[13:20:16] <igorwww> Derick: thanks
[13:20:38] <rom1504> igorwww: http://php.net/manual/en/mongodb-driver-manager.construct.php + https://docs.mongodb.org/manual/reference/connection-string/#connections-connection-options uri.socketTimeoutMS
[13:20:42] <rom1504> is that not equivalent ?
[13:20:56] <igorwww> rom1504: not exactly
[13:21:09] <Derick> nope, that's a connection timeout
[13:21:16] <igorwww> the cursor gets the connection timeout by default
[13:21:24] <igorwww> but there is also a setting per cursor in the old driver
[13:21:50] <Derick> cursor and connection timeout are unrelated I believe...
[13:22:16] <Derick> lunch now, we (php team) will have a chat in the afternoon about it
[13:22:30] <rom1504> it sounds to me like uri.connectTimeoutMS != uri.socketTimeoutMS
[13:22:33] <igorwww> I was referring to https://github.com/mongodb/mongo-php-driver-legacy/blob/master/cursor.c#L180
[13:22:44] <Derick> igorwww: i know you are :)
[13:22:54] <Derick> rom1504: oh yeah, sockettimeoutms is the one
[13:23:08] <igorwww> anyways, thanks for looking into this
[13:24:53] <rom1504> a general thing about mongodb docs : the getting started docs are nice, but when you want to find the reference, it's hard to find (if it exists at all)
[13:25:19] <rom1504> well
[13:25:24] <rom1504> you can find it with google
[13:25:25] <cheeser> varies by project, really
[13:25:39] <cheeser> e.g., http://mongodb.github.io/mongo-java-driver/
[13:25:40] <rom1504> but if you try to find it from the mongo site, it's not obvious
[13:26:00] <cheeser> https://docs.mongodb.org/ecosystem/drivers/
[13:26:09] <rom1504> cheeser: well yes exactly my point
[13:26:19] <rom1504> let's go to http://mongodb.github.io/mongo-java-driver/
[13:26:25] <rom1504> I get a link to "reference"
[13:26:38] <rom1504> http://mongodb.github.io/mongo-java-driver/3.2/ , that's a general description
[13:26:38] <igorwww> https://github.com/mongodb/mongo-php-driver#documentation https://github.com/mongodb/mongo-php-driver-legacy#documentation
[13:26:41] <igorwww> :)
[13:27:21] <rom1504> yeah well ok you have to find it in github
[13:28:34] <rom1504> https://docs.mongodb.org/ecosystem/drivers/php/ "PHP Library API Reference" -> http://mongodb.github.io/mongo-php-library/ still a getting started
[13:28:39] <rom1504> just saying it's a bit confusing
[13:31:03] <rom1504> (and there is no reference for mongodb query syntax AFAIK)
[13:31:13] <rom1504> (except if I just couldn't find it)
[13:38:30] <igorwww> Derick: seems like mongoc_cursor_set_max_await_time_ms() from mongo-c-driver should do the trick
[13:48:52] <mongodbuser2> Hello, I have a question that I've not been able to find the answer to. My _id is strictly increasing and no documents will ever be deleted. All queries will filter on _id with both $gt and $lt. Is there a way for me to have MongoDB use the index to find the first element, then perform a collscan until it hits the last item? As far as I understand MongoDB will use the index to retrieve all items by default, which I believe wo
[13:51:12] <mongodbuser2> I can also add that it seems that .hint($natural: 1) gives slightly better performance than when I let it use the index.
[13:51:21] <rom1504> mongodbuser2: your message was cut at "default, which I believe wo"
[13:51:44] <mongodbuser2> oops, sorry, here's the rest: (...) which I believe would be a bit slower (but still O(n)).
[13:52:54] <mongodbuser2> I guess what I'm asking is; Is there a way to have MongoDB exploit the fact that I know _id is strictly increasing when doing range queries?
[14:07:29] <hakrim> Hi
[14:07:38] <lowbro> Ohai
[14:07:48] <hakrim> Does anyone have experience with the pymongo driver?
[14:07:59] <hakrim> especially with the bulk api
[14:08:05] <StephenLynx> GothAlice does
[14:08:17] <StephenLynx> shes usually busy though.
[14:08:47] <GothAlice> hakrim: Ask your question, gatekeeper, I am not afraid! (IRC is asynchronous, so asking the question instead of asking to ask is generally more effective. ;)
[14:11:45] <hakrim> GothAlice: I'm trying to check wether my bulk has any operations in it before executing, but I'm unable to figure how to do it.
[14:15:04] <gablank> Hello world
[14:15:07] <GothAlice> hakrim: Using BulkOperationBuilder or …?
[14:15:25] <GothAlice> hakrim: For the most part the actual operations being queued up are hidden behind mangled "private" attributes.
[14:15:49] <hakrim> GothAlice
[14:16:20] <GothAlice> http://s.webcore.io/2k3T1k3G0a2c
[14:16:22] <hakrim> GothAlice: I use initialize_ordered_bulk_op()
[14:16:44] <GothAlice> http://api.mongodb.org/python/current/api/pymongo/bulk.html < not seeing that :/
[14:17:12] <GothAlice> Aaah, collection method.
[14:17:33] <GothAlice> Right, so that's returning one of those objects from the bulk module.
[14:17:51] <GothAlice> See my screenshot; you _can_ technically pull out the queued up operations, but… it's not pretty.
[14:18:30] <GothAlice> http://s.webcore.io/0F2C041O3a0r < same process
[14:19:25] <hakrim> Hmmm
[14:19:30] <hakrim> Thanks
[14:20:00] <GothAlice> No worries. (Being private like this means it could change without any warning between versions; I'd armour your code against the attribute not existing—try/except AttributeError.)
[14:20:18] <hakrim> It seems it could solve my problem. It also seems that I still have a lot to learn
[14:21:31] <GothAlice> hakrim: To describe how I found this: I used "ptipython" which is an improved syntax-highlighting Python REPL (pip install ipython ptpython), imported the various things, and used tab completion to explore the attributes of the object. I tracked down which attribute stored the operations by examining the code for the execute method by entering: a.execute??
[14:21:42] <GothAlice> A good REPL shell can save a lot of time exploring things. :)
[14:22:06] <hakrim> On a different topic, what is an acceptable write speed for a well made script importing from txt/json to mongo?
[14:23:11] <gablank> What clients are you using for exploring your dbs? Currently using RoboMongo but it seems it doesn't support WiredTiger.
[14:23:23] <GothAlice> JSON parsing is several orders of magnitude slower than string splitting (a la CSV), so I'd expect it to be substantially slower. On a single local node I get insertion rates in the order of two million documents per second, but that's constructing new records, not trying to translate data.
[14:23:29] <hakrim> GothAlice: You're so far out of my league that it actually feels nice
[14:24:15] <hakrim> Don't
[14:24:58] <GothAlice> gablank: I use either the mongo shell (quick queries) or a Python REPL. Exploration using standard queries and aggregate queries.
[14:25:33] <hakrim> The script I'm working on average 200 docs/s. And we're supposed to be doing "Big Data" ...
[14:28:04] <gablank> When using range based sharding, does MongoDB adjust the limits automatically to balance the number of documents per shard?
[14:46:03] <hakrim> GothAlice: Thanks for the ipython thing, it's magic! No more reading obscure module source code!
[14:46:35] <GothAlice> Hehe, I recommend 'ptipython' not just 'ipython'. The former has improved tab completion display, better paste mode, etc.
[14:46:59] <GothAlice> (ptpython is the package, and if ipython is installed it can use it for the %timeit and ? helpers via running "ptipython".)
[14:47:59] <hakrim> Even better
[15:43:01] <yopp> hey
[15:44:16] <yopp> quick question about ruby driver: is it possible to get "stripped" command (without condition arguments) in instrumentation?
[15:44:50] <yopp> like {"find"=>"sites", "filter"=>{"author_id"=>{"$in"=>[BSON::ObjectId('xyz')]}}} becomes {"find"=>"sites", "filter"=>{"author_id"=>{"$in"=>[???]}}}
[16:24:10] <tantamount> If you have a collection of mixed amount of apples and bananas, e.g. ['a', 'b', 'b', 'a', 'a'] how can you aggregte how many apples and banans there are such that you end up with a document like {apple: 3, banana: 2}?
[16:24:55] <tantamount> Consider that the keys apple and banana cannot be hard-coded because there could be any number of different varieties of fruit in the collection
[16:31:00] <GothAlice> tantamount: Variable field names are a no-no.
[16:31:52] <mylord> how do I prepend “abc” to icon_url in this doc [ { items: [ { icon_url …
[16:32:04] <GothAlice> tantamount: So in this instance, the practical method is to store the counts concretely and keep the counts up-to-date at the same time you execute updates to the overall list.
[16:32:24] <mylord> correction: how do I prepend “abc” to the value of icon_url in this doc [ { items: [ { icon_url: “xyz” …
[16:32:48] <mylord> to all* icon_url in all docs
[16:32:54] <tantamount> GothAlice: sadly that is not an option
[16:33:19] <GothAlice> Well, what you're asking is only possible at the application level, unfortunately. You can't project into variably-named fields.
[16:34:02] <GothAlice> mylord: Is your intention to update those documents with the resulting value, or simply get the records returned with the text prepended (but don't update the data in the DB)?
[16:34:15] <mylord> update the DB
[16:34:25] <mylord> or, actually, the later would be cool to know also
[16:35:16] <GothAlice> Alas, there is no update operator to concatenate. During aggregate projection, however, you can https://docs.mongodb.org/manual/reference/operator/aggregation/concat/
[16:35:41] <mylord> ok, I had trouble converting that doc into somethign that works for me
[16:35:56] <mylord> and how do I merely print the concated?
[16:35:58] <GothAlice> You could, in theory, aggregate and $out into a new collection the updated records, then swap the collections.
[16:37:06] <GothAlice> mylord: http://s.webcore.io/203U1z2j1b3G
[16:37:40] <mylord> will that change the DB?
[16:37:51] <GothAlice> Nope.
[16:37:57] <GothAlice> As mentioned, there is no update operator to concatenate.
[16:39:16] <GothAlice> Aggregate queries return documents transformed by the aggregation pipeline (that argument to .aggregate()). Using $out as a aggregation pipeline stage will let you save the results somewhere, but you might suffer madness if you try to $out to the same collection that it's reading from. Ref my possible solution by swapping collections.
[16:42:19] <mylord> I just did it in node.. :/
[16:42:25] <mylord> on the resuts..
[16:42:37] <mylord> SQL FTW?
[16:43:57] <tantamount> GothAlice: what if I only needed a unique list sorted by frequency? So the result could be ['a', 'b'] if apples were more popular than bananas or ['b', 'a'] if the reverse were true
[16:45:09] <GothAlice> tantamount: Again, by having the choices be variable, you're… somewhat limited in what you can do DB-side.
[16:45:24] <tantamount> Yes, it seems you may be right
[16:46:06] <tantamount> GothAlice: what if we unwound the collection and then grouped by it?
[16:46:13] <tantamount> Surely that would yield a count of each one
[16:47:21] <GothAlice> It would. It wouldn't produce what you were asking for, though, each type of choice would be a separate document instead of having one document with multiple fields. It'd certainly work, though.
[16:47:59] <GothAlice> So you might need to be careful to group by not just the unrolled list choice, but also the original ID in order to preserve that association, if needed.
[16:48:13] <tantamount> Right, it would be only a step in that direction, but I'm sure a series of projections and further manipulations later would be able to yield that result
[16:48:36] <tantamount> I haven't completely thought it through yet ;)
[16:48:57] <GothAlice> And yeah, no, same limitation. Once exploded into distinct documents, re-combining them suffers the same "no variable field names" problem.
[16:49:57] <tantamount> Yeah, I'm sorry for the confusion, but the end result needn't be {apple: 3, banana: 2}; ['a', 'b'] would suffice, where the order is significant because it is sorted by frequency
[16:50:16] <GothAlice> Yup, that's doable.
[16:50:21] <tantamount> Sweet
[16:50:47] <GothAlice> $unwind, $group / $project, $sort, $group / $push
[16:51:20] <tantamount> On it :)
[16:51:22] <GothAlice> Possibly with some {$sum: 1} in the first $group projection.
[16:53:57] <GothAlice> mylord: And heh, sure. But you can't blame the screwdriver for being a poor hammer. (I.e. it helps to use the tool the way it was intended.) ;P
[16:55:03] <mylord> GothAlice: agreed.. #node.js and #javascript often tell me this when I ask anything mongo: http://cryto.net/~joepie91/blog/2015/07/19/why-you-should-never-ever-ever-use-mongodb/
[16:55:06] <GothAlice> MongoDB generally intends people to really, really think about their data and specifically how their data is consumed/used/queried, before actually committing data in there. Certain adjustments after-the-fact are legitimately difficult.
[16:55:06] <mylord> lol
[16:55:26] <GothAlice> At least that one is recent.
[16:55:29] <GothAlice> See also: https://blog.serverdensity.com/does-everyone-hate-mongodb/
[16:55:48] <mylord> In that case, seems good to use the facilities of pre-configured relational DB, if you know how things should be setup from start
[16:55:52] <mylord> ok, will check
[16:57:21] <mylord> GothAlice: Some systems are easier to screw up. It’s good when a product is hard to screw up. Just saying
[16:57:43] <GothAlice> The counter-points to those initial "it's evil" points are trivial. 1. RTFM and learn how to use writeConcern. 2. RTFM. 3. Plenty of counter examples of insane performance. (My own being ~2 million record/second bidirectionally confirmed RPC calls.) 3. Forces? It does no such thing, that's just a developer experience thing. 4. The referenced article for "locking issues" describes locking issues in MySQL, not MongoDB.
[16:57:47] <GothAlice> Etc., etc., etc.
[16:58:01] <mylord> I don’t like node either. Wish I used C# or Scala.. I don’t need asynchronous EVERYWHERE.. hard to code it
[16:58:30] <mylord> ya, that article is bad, there’s a better one
[16:58:30] <GothAlice> mylord: I think a key point in an article which actually bothers to cite references is that the references should say the things the summary suggests they do. :|
[17:00:44] <GothAlice> Also "nightmare to scale and maintain". XD My servers average three years uptime, or did until recently darn you Rackspace maintenance, and I've been storing greater than 30 terabytes of data in MongoDB for many years. (Since 1.2 or 1.4, can't really remember at this point, though back then it was more like 20TB.)
[17:01:29] <GothAlice> mylord: Lesson of the day: check angry blog post references, and take them with a grain of salt in that the entire thing may be because they didn't read. ;P
[17:03:22] <GothAlice> The articles points about using the right tool for the job are spot-on, though. If you're using Mongoose, use something else. If you're storing highly relational data? Use a relational database. (Similarly, for the love of the gods, if you're storing a graph use a graph database… faking graphs is even harsher than faking joins!)
[17:03:33] <mylord> ya, agreed.. Just example of easy system: I didn’t know C#.. I jumped on C# / SQL project.. everything took just minutes to write correctly the first time.. Node +Mongo.. tool me so long to get things workign as intended, and continues that way.. I probably need to read more, but there’s a lot to read, and all kinds of special tricks to get things done.. SQL big queries can be confusing, but it all follows a small set of keywords in
[17:03:33] <mylord> logical ways. Node, promise-exception hell now..
[17:04:21] <mylord> Ya, right tool for right task for sure.. I think there are some cases when Document storage makes sense.. tho most problems are relational in nature
[17:04:52] <GothAlice> MongoDB is a very different way of thinking about your data. If you have been trained to think of data like an Excel spreadsheet, you've basically already lost. ;)
[17:05:21] <GothAlice> Data is so much more than CSV files.
[17:05:33] <GothAlice> (And that's all SQL is. Glorified Excel.)
[17:05:43] <mylord> I do get things done with mongo, and understand it’s different.. It’s very cool in a way
[17:06:04] <mylord> having some buyers remorse.. grass is greener.. i tend to do that
[17:06:25] <mylord> nice to have learned another way of thinking
[17:07:17] <GothAlice> mylord: The most recent example of awesome I use is the document validation system in MongoDB. People often complain Mongo doesn't have schemas. Well, they're not exactly correct. https://docs.mongodb.org/manual/core/document-validation/
[17:07:52] <mylord> what do you think of CouchDB or PostgreSQL document storage?
[17:07:56] <GothAlice> Where a relational/SQL database has a fixed set of columns for a table, MongoDB instead lets you configure an _insanely expressive_ document query which must match for a document to be "valid". This lets you set up conditionals, for example.
[17:08:11] <GothAlice> (I.e. "if this user is an administrator, their e-mail address must end in @companyname.com".)
[17:09:57] <freeone3000> So I have a monitoring agent which is complaining that: https://gist.github.com/freeone3000/0c00776aed960b617d39 . However, I for sure have hosts in my MMS group in the config, and the host is being counted as "down" from MMS (even though application servers can hit it fine), so there's no other monitoring agent.
[17:10:10] <freeone3000> How can I get it to properly recognize the state of its monitoring?
[17:10:50] <igorwww> hello again :) I am running into an error with the mongoc driver for php when trying to set a $comment https://gist.github.com/igorwwwwwwwwwwwwwwwwwwww/ce8a9695e594f0d38004
[17:11:01] <mylord> that’s cool.. I guess if you’re *really* good with it, it provides more power.. but common things like queries across documents, complex updates, etc, that are easy in SQL, take some time to understand how to do right in Mongo.. hard to figure out the correct data model, and have to think about it.. in SQL there’s sort of 1 correct way, orthogonal
[17:11:17] <igorwww> I'm not quite sure where to put $query or $comment in this case to make it work
[17:11:26] <igorwww> would appreciate some guidance
[17:11:44] <mylord> compound that with asynch in node, and it’s been sort of cumbersome to me in some ways, but nice in others
[17:13:40] <freeone3000> Ah. It looks like the monitoring agent for my first cluster is also attempting to monitor my second cluster. That's not going to fly. What's the proper way to segregate MMS monitoring agents for multiple replication sets? I want one per replset because they have different auth
[17:13:53] <igorwww> I found test_comment() in tests/test-mongoc-collection-find.c; but I was not able to deduce from this where the $query should go with a findandmodify
[17:17:30] <tantamount> GothAlice: I still don't get it :( If I sort by fruit, it will just sort lexically. How do I know how many there are of each?
[17:17:34] <GothAlice> mylord: Queries across documents… are relational thinking. You instead structure your documents to meet query requirements: http://www.javaworld.com/article/2088406/enterprise-java/how-to-screw-up-your-mongodb-schema-design.html
[17:17:50] <mylord> ya, but then you might want them different later
[17:18:23] <mylord> i do some of that tho.. nesting them as i need them.. it took a lot of thinking, a lot of changing, at first, esp on first 2 mongodb projects
[17:18:33] <mylord> with orthogonal you don’t need to worry about every future need
[17:18:44] <mylord> but there are other problems
[17:19:43] <mylord> with mongodb, i setup things as i need them, mostly for presenting the client, already nested with all the data i need.. but then sometimes i’d be like.. oh, i should have structured differently
[17:20:01] <mylord> but .. anyhow with sql, you need to think how the API will structure the JSON.. so same problem
[17:20:15] <mylord> with mongo it’s nice you’re already structuring things how you want to e.g. present them
[17:20:57] <GothAlice> tantamount: You need to unwind, then sum, then re-group. I.e. [ {$unwind: "$fruit"}, {$group: {_id: {oid: "$_id", fruit: "$fruit"}, {count: {$sum: 1}}}}, {$sort: {"_id.oid": 1, "count": 1}}, {$group: {_id: "$_id.oid", "orderedfruit": {$push: "$_id.fruit"}}} ]
[17:21:01] <GothAlice> Something like that, anyway. ;)
[17:21:37] <mylord> no translation needed to/from sql, tho maybe doc engine is a bit harder to work with, generally speaking, for complex things on the data itself, in the interim operations
[17:22:37] <GothAlice> "Throw data at the wall to see what sticks" doesn't work any better in SQL than it does in MongoDB, mylord. ¬_¬ If you don't plan, you plan for failure.
[17:23:47] <GothAlice> (A six-way join with sub-queries is as much a bad sign in SQL as almost any type of relational-ness is a bad sign in MongoDB. ;)
[17:24:15] <mylord> It’s easier for me to plan in SQL.. the solutions seem totally obvious to me. So it’s a nice challenge now.
[17:24:55] <GothAlice> I can highly recommend reading through that link I gave you last; it might help give a summary of the major differences and their approaches.
[17:25:03] <mylord> ok, will try
[17:25:28] <mylord> Have 10 bugs to fix tonight.. will try to not watch politics instead of read tech articles
[17:25:31] <GothAlice> Though interestingly, aggregate queries can now do left joins. :)
[17:26:10] <GothAlice> Left outer, specifically: https://docs.mongodb.org/manual/reference/operator/aggregation/lookup/
[17:26:26] <mylord> Cool. I just feel so comfy in SQL.. anything you need, I can do, even if I didn’t know how before. Very simple. I’m starting to get there with MongoDB as well.
[17:26:44] <mylord> It’s a great community in here to help as well
[17:26:52] <GothAlice> We try! ^_^
[17:27:13] <mylord> Sucky with SQL is all that DB setup time. In MS SQL they have nice tools, but PostgreSQL, not so much
[17:27:53] <GothAlice> Nice tools, vendor lock-in, and licensing.
[17:28:17] <GothAlice> Everything always comes down to "pick two of: good, fast, or cheap".
[17:32:41] <GothAlice> tantamount: Any success?
[17:32:55] <tantamount> GothAlice: just cracked it :^)
[17:33:03] <tantamount> Thanks for the assistance
[17:36:48] <GothAlice> It never hurts to help! :)
[17:52:41] <freeone3000> How can I tell my MMS Monitoring Agent which hosts I want it to monitor? MMS is trying to use one monitoring agent for all clusters - I want one agent for one cluster, and another agent for a second cluster.
[17:55:11] <ParisHolley> production cluster crashed. can anyone provide any idea why PID’s are going into the 100K+ mark? i boosted my max pid to help mitigate the issue, but it just keeps draining more cpu and the server is unresponsive. granted, I have a ton of connections coming in (~30K), but this has never been an issue in the past
[17:56:08] <saira_123> hi Professionals plotting mongostat output , can someone tell me if it is possible to plot everything in one graph?
[17:58:24] <saira_123> hi Professionals plotting mongostat output , can someone tell me if it is possible to plot everything in one graph?
[17:58:43] <saira_123> e.g insert query update delete getmore command % dirty % used flushes vsize res qr|qw ar|aw netIn netOut conn time *0 4867 *0 *0 0 1|0 0.2 20.4 0 3.2G 2.6G 0|0 2|0 940k 286m 4 2016-02-11T22:52:05+05:00 *0 5905 *0 *0 0 1|0 0.2 20.4 0 3.2G 2.6G 0|1 2|0 1m 344m 4 2016-02-11T22:52:06+05:00
[19:15:17] <Complexity> Hello
[19:47:03] <uuanton> hi, when secondaries at startup2 state and traffic come to them what happened ?
[19:47:24] <uuanton> it goes to primary or error ?
[19:51:56] <uuanton> &readPreference=secondaryPreferred
[19:53:54] <GothAlice> uuanton: I'm… not entirely sure what lead you to ask that question. Until it's actually in a running state (not startup or init-sync) it won't answer queries at all.
[19:54:28] <GothAlice> So it must go to a different secondary (one already started and waiting) or the primary.
[19:55:33] <uuanton> oh glad you here Alice. There was a fatal on few secondaries i clear data folder and resync all secondaries to primary
[19:56:24] <uuanton> only primary up to date right now and application doesn't work because website suppose to &readPreference=secondaryPreferred
[19:56:46] <GothAlice> Well, secondaryPreferred should fall back on the primary; only secondaryOnly should actually fail if there are no secondaries. :/
[19:57:34] <WalterTamboer> Hi, is there anyone that can help me with the mongodb driver for PHP?
[19:57:53] <uuanton> that what i thought but i dunno
[19:58:03] <WalterTamboer> Specifically: http://php.net/manual/en/class.mongodb-driver-command.php
[22:14:55] <CustosLimen> so I'm doing an import to wiredtiger
[22:15:04] <CustosLimen> and it just stalls randomly (as far as I can see)
[22:15:26] <CustosLimen> like not allowing inserts for 60 seconds
[22:15:30] <CustosLimen> import with mongoimport
[22:16:12] <CustosLimen> stalled for 2 minutes now
[22:22:23] <CustosLimen> very concerning
[23:08:26] <GothAlice> CustosLimen: Is it building an index?
[23:08:39] <GothAlice> CustosLimen: Generally a good idea to look at the server logs to see what's going on, too.
[23:09:24] <CustosLimen> GothAlice, I was looking at mongotop
[23:09:30] <CustosLimen> mongostat I mean
[23:09:34] <CustosLimen> https://bpaste.net/show/e6ed383b5663
[23:10:05] <CustosLimen> GothAlice, look at line 690 onwards
[23:10:10] <CustosLimen> there is a stall
[23:10:43] <CustosLimen> will check logs later
[23:10:59] <CustosLimen> they are rather noisy
[23:11:19] <CustosLimen> but should indexes not be built inline with import ?
[23:11:57] <CustosLimen> what bothers me is if this happens in production use it will be disaster
[23:13:49] <GothAlice> You regularly run parallel, aggressively batched, unlimited rate multi-collection bulk inserts in production?
[23:14:09] <CustosLimen> no - but I dont understand why its doing what its doing ;)
[23:14:55] <CustosLimen> GothAlice, look, I'm not saying from this I can assume it will happen in production use cases - but without understaning why this is happening I will remain worried
[23:15:04] <CustosLimen> but yeah, will spend more time on it tomorrow
[23:15:12] <CustosLimen> its 01:00 here
[23:16:33] <GothAlice> … logs.
[23:16:47] <CustosLimen> GothAlice, yeah yeah ;) tomorrow
[23:25:17] <beryl> hello
[23:26:43] <beryl> now that mongodb is using wired tiger and that there is no limit on the number of collections, is it a bad pattern to create something like one collection per user?
[23:27:15] <beryl> in my company we would like our users to have something like a bucket of data
[23:27:15] <GothAlice> beryl: Consider: do you ever want to be able to ask a question across multiple user's data at once?
[23:27:58] <beryl> no, every collection is self contained, no "joins"
[23:28:05] <GothAlice> Any time I hear someone wanting to do something just because some limit has been lifted I'm reminded of: http://s.webcore.io/d41m
[23:28:33] <beryl> haha i get that
[23:28:57] <beryl> but let's asume that i want every user to have some sort of a bucket
[23:29:09] <GothAlice> And I mean question in a more general sense than "supply answers to your users questions". I mean the company's questions, too.
[23:29:12] <beryl> where he can put anything
[23:29:39] <GothAlice> "Anything" — in terms of database design, you've already lost. :|
[23:30:32] <beryl> well let's say some json, size limited
[23:31:19] <beryl> the user can only list the content of its bucket and add an item to it
[23:31:47] <GothAlice> So… how does adding a field to the data referencing the user who owns it not satisfy that requirement?
[23:33:04] <freeone3000> beryl: You mean like {"user": "user", "data": { } }?
[23:33:11] <freeone3000> beryl: That can already exist in an existing collection.
[23:33:42] <GothAlice> (Dynamic fields being a _terrible_ choice with _substantial_ ramifications, but that's a different issue than tracking who owns what.)
[23:33:45] <beryl> it totally statisfies it, but out of curiosity and open mindedness, is it reasonable to create a collection by user and if not why exactly? that's the question that i can't an answer
[23:34:09] <beryl> find*
[23:34:48] <GothAlice> Split index building, so extra overhead there. On-disk stripe allocation… so, more overhead there. It gives a false sense of data isolation; you can, without some careful rain dancing, easily switch from one collection to another, making the isolation entirely virtual.
[23:35:38] <GothAlice> And you lose on any ability to correlate data. Even if your users don't need it, your company might want to be able to ask questions of the data that goes beyond "one user at a time". I.e. if you need to find out which user has X data, instead of one query that'll be indexed and _blindingly_ fast, you'll need to issue N queries for N of your user count.
[23:36:18] <GothAlice> For real separation, and the additional hardware costs that actually involves, you can use sharding with a sharding key per user.
[23:36:32] <GothAlice> Er, rather, a sharding key referencing a unique value per user. Like their ID, or username.
[23:36:57] <GothAlice> Then user A's data is on shard A, user B's data is on shard B, etc.
[23:37:20] <beryl> even though it is the same collection, ok i get it
[23:37:47] <beryl> I was afraid that because one user will have a lot of data, it would slow down every other users
[23:38:13] <GothAlice> With proper indexes, i.e. including the owner field as a first field in any compound index, it'll stay fast.
[23:38:32] <GothAlice> Though MongoDB can also do index intersections these days.
[23:38:44] <GothAlice> Sorry! Multi-key, not compound.
[23:38:48] <GothAlice> https://docs.mongodb.org/manual/core/index-multikey/
[23:39:20] <GothAlice> … I either need more sleep, or more caffeine, or both. XD
[23:39:29] <GothAlice> https://docs.mongodb.org/manual/core/index-compound/
[23:39:39] <beryl> hahaha
[23:40:54] <beryl> ok, so sharding by user id, multi-key index on the user id and the _id of the record?
[23:42:52] <GothAlice> Sharding by owner is one form of "true" data isolation (that still allows for broader queries), including the owner as a prefix to your compound indexes keeps the queries fast (since basically every query will need to include the owner). "True" isolation that doesn't allow for broader queries is to split into multiple databases and use a connection pool per database, with a MongoDB-level user account per database.
[23:43:41] <GothAlice> If you're doing a lookup by _id, then the owner check is a security measure, not a method of making the query more specific. _id is automatically indexed.
[23:44:32] <GothAlice> For example, if you have a "name" field (just as an example ;) which you want to index, you'd actually index {owner: 1, name: 1} — then queries like db.foo.find({owner: "GothAlice", name: "Bob Dole"}) will be fast.
[23:44:42] <beryl> I don't need true isolation, i just don't want a user with 100 000 records to slow down users with 10
[23:45:05] <beryl> ok
[23:45:26] <GothAlice> beryl: We generate a million log events per month, each user only seeing a fraction of those on their dashboard. Such concern is unfounded, generally.
[23:46:03] <GothAlice> (As long as you have good indexes. Performance, regardless of this isolation problem, almost always comes down to the indexes.)
[23:46:22] <beryl> ok perfect :)
[23:47:22] <beryl> thank you very much for your answers
[23:48:05] <GothAlice> beryl: Alice's Law 146: Optimization without measurement is by definition premature. [https://gist.github.com/amcgregor/9f5c0e7b30b7fc042d81] Turning a collection structured "naturally" (with owner references) into a sharded collection or other arrangement is relatively trivial. Merging a metric ton of individual collections into one after realizing it just makes everything harder… is much, much more painful, too.
[23:49:05] <beryl> haha, now i get that
[23:52:06] <GothAlice> I totally need a bot so I can just reference a law by number and have it pull it up. XD
[23:54:08] <beryl> that would actually be awesome
[23:54:15] <beryl> i would be like a judge
[23:54:32] <beryl> you would be like a judge
[23:54:41] <GothAlice> Well, someone would be, at least. :P
[23:59:57] <chalker_> Hi all. A quick question — is christkv reachable via IRC?