[18:32:36] <dstufft> pumazi_: let me just reiterate what I just told robhudson before you came in then :)
[18:32:52] <dstufft> pumazi: I'm not at PyCon so I don't know if anyone is doing anything there, I am however around on IRC and GitHub and what not and I can do other mediums if that's useful to people
[18:32:52] <karenc> robhudson: we're sprinting on warehouse in portland ballroom 252
[18:33:55] <pumazi> dstufft: Thanks. That's good to know. We are currently working our way through the installation and setup.
[18:34:55] <pumazi> dstufft: We are noticing some differences between the setup in the docs vs the setup in the readme. Which would you recommend we use/update?
[18:39:34] <dstufft> I am not very good at ES (or search)
[18:39:59] <robhudson> an ES prefix query would solve that use case. If you need substring search within strings it gets more complicated.
[18:41:00] <pumazi> robhudson: We have a paper on the table in the room if you want to stop by. :)
[18:42:06] <robhudson> pumazi: cool. I’ll finish doing some set up and come by in a bit. THanks.
[18:43:58] <dstufft> robhudson: re warehouse#328 - We don't need the most perfect implementation ever- if prefix is easy to do and add in then that's a pretty good cut and we can wait to see if people want more than that later.
[18:46:38] <dstufft> pumazi: hopefully setup shouldn't be too hard- the docker containers are meant to make it easy once you get docker installed, but I never know if it actually is or if that's just me already knowing how to do that
[19:15:45] <robhudson> once the docker is built is there a way to populate it with some data?
[19:18:49] <pumazi> robhudson: It looks like `make initdb` does that, but I haven't gotten that far yet.
[19:22:04] <robhudson> dstufft: I’m lookng at the search tests but they only contain the “summary” field for search queries. I don’t see how that is being limited in the tests. E.g. if I only apply prefix query to “name” I would need to assert that somehow.
[19:23:31] <dstufft> yea `make initdb` should populate it
[19:23:54] <dstufft> robhudson: the search tests might just not be very complete- I don't think they hit ES so they're probably hitting constructed responses
[19:24:22] <robhudson> dstufft: ah, I found it. It’s part of the xmlrpc search request which fields to search over.
[19:24:46] <dstufft> oh you're looking at the xmlrpc tests?
[19:25:01] <dstufft> we have the XMLRPC search and the web search
[19:25:08] <dstufft> the xmlrpc search is a bit... special
[19:26:06] <robhudson> oh, I grepped for search and all I found was the xmlrpc one. Let me look for the web search.
[19:29:30] <robhudson> does xmlrpc and web search maintain parity
[19:30:32] <dstufft> they both use the same index, but they diverge a bit I guess in their implementation
[19:31:43] <dstufft> a lot of that is because the xmlrpc search is an API so it has to maintain compatability in terms of what options it supports from the original search function like a decade ago :)
[19:43:41] <pumazi> Any chance you could use `pyramid_sawing` for logging configuration?
[19:45:14] <dstufft> pumazi: never heard of it, though googling the main thing I'd be worried about is the fact it's a config file- we don't have any other config files in warehouse atm (it's all env vars)
[19:48:10] <pumazi> Ok, that makes sense. I'm the author of that package BTW.
[19:49:10] <robhudson> this project is set up very well. kudos.
[19:54:57] <dstufft> pumazi: ah cool :] We deploy on Heroku atm (though that might change), hence the env var configuration
[20:09:28] <robhudson> dstufft: do you have any numbers on package popularity?
[20:09:53] <dstufft> robhudson: the closest thing we have to that is download stats, which arent'y wired up into Warehouse yet
[20:10:04] <robhudson> I notice if I search for “django”, e.g., that Django isn’t at the top. I imagine we could boost exact matches, or also boost by some popularity measure.
[20:10:28] <dstufft> nlh_: btw, folks are sprinting on Warehouse at PyCon :]
[20:10:51] <dstufft> robhudson: yea- We have stuff available in a BigQuery db, but not integrated into Warehouse yet, other than the File.downloads model
[20:11:20] <robhudson> dstufft: ok, I may file that as a separate issue. Until stats are incorporated an exact match may help.
[20:11:36] <dstufft> robhudson: there is https://github.com/pypa/warehouse/pull/1182 which adds it to the project level, where you'd get total downloads over total length of time the thing has existed, but there's some issues with that still
[21:53:40] <robhudson> is `first, *_ = foo.split()` a python3 syntax? I’ve not see this before.
[21:54:54] <dstufft> it's basically the same as *args in a function signature
[21:58:39] <robhudson> dstufft: in order to work in a prefix query, I think I’m going to need to rewrite the query building in the view and not use a multi_match query. I guess the good thing is it will allow for easy customization of the search query down the road. But just want to make sure that’s fine.
[21:58:55] <robhudson> It’d actually look more like how the xmlrpc search is built up
[21:59:27] <dstufft> robhudson: I'm not sure I understand what that means :) but I'm certainly not married to whatever it does now
[22:00:27] <dstufft> I forget if I added the multi query or if someone else did that, I know that when I added it I was just guessing at what the right thing to do was :)
[22:02:59] <robhudson> ok cool. I’ll ensure tests pass after the refactor. And then add in the prefix query. And possibly the name exact match. I’ll see how far I get. :)
[22:07:36] <dstufft> robhudson: basically, the thing I care about is search is awesome for people, how that's achieved I care less about as long as it works in Heroku :]
[22:08:25] <robhudson> dstufft: ok. Are you using a heroku add-on for ES?
[22:17:42] <dstufft> robhudson: by the way Dustin Ingram has done a bunch of work on the search
[22:18:48] <dstufft> (he honestly probably understands it better than I do)
[22:18:53] <dstufft> I think he's also at PyCon :]
[22:19:12] <dstufft> (or maybe he left already, I don't know)
[22:39:45] <dstufft> robhudson: I'm talking to Dustin, he's asking if you're using the dev DB and getting weird results because of that (if you're running locally, you're using the dev db)
[22:48:13] <robhudson> dstufft: yeah I’m using the dev db.
[22:48:27] <robhudson> dstufft: is there an instance up and running that is using real data?
[23:30:15] <dstufft> I'm pulling down a copy now, then I'll run my little cleaner script and send it to you
[23:30:57] <dstufft> my only real ask is that you don't be a jerk- if you find something sensitive I forgot to clean, let me know so I can fix my script and trash your copy of the data plz
[23:34:14] <dstufft> once you get it, you can drop it in dev/ as like, dev/prod.sql.xz and then do something make make initdb DB=prod (you'll want to have ran something like ``docker-compsoe rm`` or ``make purge`` first.. the secnod one of those will take a lot longer to rebuild though.