PMXBOT Log file Viewer

Help | Karma | Search:

#pypa-dev logs for Thursday the 2nd of June, 2016

(Back to #pypa-dev overview) (Back to channel listing) (Animate logs)
[18:12:19] <robhudson> any sprinting happening on warehouse at pycon?
[18:16:54] <carljm> robhudson: yes, I heard it mentioned, but I forget who... I think maybe it was happening alongside the core CPython sprint? Maybe?
[18:31:01] <dstufft> robhudson: I'm not at PyCon so I don't know if anyone is doing anything there, I am however around on IRC and GitHub and what not
[18:31:13] <dstufft> and can do other mediums if that's useful to people
[18:31:57] <pumazi_> dstufft: We are working on warehouse at pycon. Just a brief FYI
[18:32:20] <dstufft> pumazi_: hi!
[18:32:34] <pumazi> Hiya. :)
[18:32:36] <dstufft> pumazi_: let me just reiterate what I just told robhudson before you came in then :)
[18:32:52] <dstufft> pumazi: I'm not at PyCon so I don't know if anyone is doing anything there, I am however around on IRC and GitHub and what not and I can do other mediums if that's useful to people
[18:32:52] <karenc> robhudson: we're sprinting on warehouse in portland ballroom 252
[18:33:55] <pumazi> dstufft: Thanks. That's good to know. We are currently working our way through the installation and setup.
[18:34:55] <pumazi> dstufft: We are noticing some differences between the setup in the docs vs the setup in the readme. Which would you recommend we use/update?
[18:35:17] <dstufft> umm
[18:35:38] <dstufft> I think the README is up to date
[18:35:50] <dstufft> the docs should have the same info, but maybe diverged a bit
[18:38:12] <robhudson> pumazi: cool, is there a paper? I did a pass through earlier and didn’t find anyone.
[18:38:31] <robhudson> pumazi: I’m comfortalbe with elasticsearch so I was looking for ES bugs or features if there are any
[18:39:02] <robhudson> I was just looking at https://github.com/pypa/warehouse/issues/328 e.g.
[18:39:27] <dstufft> robhudson: awesome!
[18:39:34] <dstufft> I am not very good at ES (or search)
[18:39:59] <robhudson> an ES prefix query would solve that use case. If you need substring search within strings it gets more complicated.
[18:41:00] <pumazi> robhudson: We have a paper on the table in the room if you want to stop by. :)
[18:42:06] <robhudson> pumazi: cool. I’ll finish doing some set up and come by in a bit. THanks.
[18:43:58] <dstufft> robhudson: re warehouse#328 - We don't need the most perfect implementation ever- if prefix is easy to do and add in then that's a pretty good cut and we can wait to see if people want more than that later.
[18:44:24] <robhudson> dstufft: awesome. I’ll start there.
[18:46:38] <dstufft> pumazi: hopefully setup shouldn't be too hard- the docker containers are meant to make it easy once you get docker installed, but I never know if it actually is or if that's just me already knowing how to do that
[19:15:45] <robhudson> once the docker is built is there a way to populate it with some data?
[19:18:49] <pumazi> robhudson: It looks like `make initdb` does that, but I haven't gotten that far yet.
[19:22:04] <robhudson> dstufft: I’m lookng at the search tests but they only contain the “summary” field for search queries. I don’t see how that is being limited in the tests. E.g. if I only apply prefix query to “name” I would need to assert that somehow.
[19:23:31] <dstufft> yea `make initdb` should populate it
[19:23:54] <dstufft> robhudson: the search tests might just not be very complete- I don't think they hit ES so they're probably hitting constructed responses
[19:24:22] <robhudson> dstufft: ah, I found it. It’s part of the xmlrpc search request which fields to search over.
[19:24:46] <dstufft> oh you're looking at the xmlrpc tests?
[19:25:01] <dstufft> we have the XMLRPC search and the web search
[19:25:08] <dstufft> the xmlrpc search is a bit... special
[19:26:06] <robhudson> oh, I grepped for search and all I found was the xmlrpc one. Let me look for the web search.
[19:27:42] <dstufft> https://github.com/pypa/warehouse/blob/master/warehouse/views.py#L142-L201
[19:28:56] <robhudson> thanks
[19:29:30] <robhudson> does xmlrpc and web search maintain parity
[19:30:32] <dstufft> they both use the same index, but they diverge a bit I guess in their implementation
[19:31:43] <dstufft> a lot of that is because the xmlrpc search is an API so it has to maintain compatability in terms of what options it supports from the original search function like a decade ago :)
[19:32:32] <robhudson> I’m sorry :)
[19:33:22] <dstufft> (I also don't care a whole lot about the xmlrpc API, it's a bad API and we shouldn't have it, and I hope to someday kill it)
[19:37:05] <robhudson> is there an easy way to tweak the logging config?
[19:38:03] <dstufft> robhudson: you'd have to change the code
[19:38:22] <dstufft> https://github.com/pypa/warehouse/blob/master/warehouse/logging.py#L62-L88
[19:43:41] <pumazi> Any chance you could use `pyramid_sawing` for logging configuration?
[19:45:14] <dstufft> pumazi: never heard of it, though googling the main thing I'd be worried about is the fact it's a config file- we don't have any other config files in warehouse atm (it's all env vars)
[19:48:10] <pumazi> Ok, that makes sense. I'm the author of that package BTW.
[19:49:10] <robhudson> this project is set up very well. kudos.
[19:54:39] <dstufft> robhudson: thanks :]
[19:54:57] <dstufft> pumazi: ah cool :] We deploy on Heroku atm (though that might change), hence the env var configuration
[20:09:28] <robhudson> dstufft: do you have any numbers on package popularity?
[20:09:53] <dstufft> robhudson: the closest thing we have to that is download stats, which arent'y wired up into Warehouse yet
[20:10:04] <robhudson> I notice if I search for “django”, e.g., that Django isn’t at the top. I imagine we could boost exact matches, or also boost by some popularity measure.
[20:10:28] <dstufft> nlh_: btw, folks are sprinting on Warehouse at PyCon :]
[20:10:51] <dstufft> robhudson: yea- We have stuff available in a BigQuery db, but not integrated into Warehouse yet, other than the File.downloads model
[20:10:53] <nlh_> dstufft: Hoorah!!!!
[20:11:20] <robhudson> dstufft: ok, I may file that as a separate issue. Until stats are incorporated an exact match may help.
[20:11:36] <dstufft> robhudson: there is https://github.com/pypa/warehouse/pull/1182 which adds it to the project level, where you'd get total downloads over total length of time the thing has existed, but there's some issues with that still
[21:53:40] <robhudson> is `first, *_ = foo.split()` a python3 syntax? I’ve not see this before.
[21:53:52] <dstufft> um
[21:53:58] <dstufft> I think it's py3 synax yea
[21:54:37] <dstufft> yea, py3 only
[21:54:54] <dstufft> it's basically the same as *args in a function signature
[21:58:39] <robhudson> dstufft: in order to work in a prefix query, I think I’m going to need to rewrite the query building in the view and not use a multi_match query. I guess the good thing is it will allow for easy customization of the search query down the road. But just want to make sure that’s fine.
[21:58:55] <robhudson> It’d actually look more like how the xmlrpc search is built up
[21:59:27] <dstufft> robhudson: I'm not sure I understand what that means :) but I'm certainly not married to whatever it does now
[22:00:27] <dstufft> I forget if I added the multi query or if someone else did that, I know that when I added it I was just guessing at what the right thing to do was :)
[22:02:59] <robhudson> ok cool. I’ll ensure tests pass after the refactor. And then add in the prefix query. And possibly the name exact match. I’ll see how far I get. :)
[22:07:36] <dstufft> robhudson: basically, the thing I care about is search is awesome for people, how that's achieved I care less about as long as it works in Heroku :]
[22:08:25] <robhudson> dstufft: ok. Are you using a heroku add-on for ES?
[22:08:37] <robhudson> bonsai? or something else?
[22:08:46] <dstufft> robhudson: not an addon, but we are using Elastic Cloud
[22:11:14] <robhudson> ok cool
[22:11:34] <robhudson> as long as the versions align with what’s in docker it’ll all work
[22:12:57] <dstufft> I think we have the docker versions set to be the same X.Y
[22:13:02] <dstufft> but maybe not the same X.Y.Z
[22:14:12] <dstufft> although I guess we have 2.2 and 2.3 is the latest
[22:14:16] <dstufft> so maybe we should upgrade
[22:15:06] <robhudson> dstufft: can you show me where `request.es` is set up? I don’t see where that gets attached.
[22:16:26] <dstufft> request.es == the return of calling https://github.com/pypa/warehouse/blob/master/warehouse/search.py#L40-L51
[22:16:36] <dstufft> (once per request)
[22:17:42] <dstufft> robhudson: by the way Dustin Ingram has done a bunch of work on the search
[22:18:48] <dstufft> (he honestly probably understands it better than I do)
[22:18:53] <dstufft> I think he's also at PyCon :]
[22:19:12] <dstufft> (or maybe he left already, I don't know)
[22:39:45] <dstufft> robhudson: I'm talking to Dustin, he's asking if you're using the dev DB and getting weird results because of that (if you're running locally, you're using the dev db)
[22:48:13] <robhudson> dstufft: yeah I’m using the dev db.
[22:48:27] <robhudson> dstufft: is there an instance up and running that is using real data?
[23:21:13] <dstufft> robhudson: pypi.io
[23:21:28] <dstufft> robhudson: but, I can send you a copy of the real DB (that has been cleaned so as not to have private info in it)
[23:21:36] <dstufft> that's probably useful if you're working ons earch
[23:29:50] <robhudson> dstufft: +1
[23:30:15] <dstufft> I'm pulling down a copy now, then I'll run my little cleaner script and send it to you
[23:30:57] <dstufft> my only real ask is that you don't be a jerk- if you find something sensitive I forgot to clean, let me know so I can fix my script and trash your copy of the data plz
[23:31:48] <robhudson> sure thing
[23:34:14] <dstufft> once you get it, you can drop it in dev/ as like, dev/prod.sql.xz and then do something make make initdb DB=prod (you'll want to have ran something like ``docker-compsoe rm`` or ``make purge`` first.. the secnod one of those will take a lot longer to rebuild though.
[23:34:46] <dstufft> oh
[23:34:47] <dstufft> nvm
[23:34:51] <dstufft> initdb does the right thing already
[23:34:59] <dstufft> so just ``make initdb DB=prod`` or so :D
[23:37:56] <robhudson> great
[23:54:20] <dstufft> clean script takes so long QQ