PMXBOT Log file Viewer

Help | Karma | Search:

#pypa-dev logs for Tuesday the 2nd of June, 2020

(Back to #pypa-dev overview) (Back to channel listing) (Animate logs)
[00:03:25] <dalley> I know the XML-RPC APIs (https://warehouse.readthedocs.io/api-reference/xml-rpc/#mirroring-support) are marked deprecated, but is there any idea of the timeline on which they will be removed? and when they are removed, will the JSON APIs be extended to cover some of the lost functionality?
[00:12:38] <dalley> background: we write repository management software for Python, Ruby, Debian, RPM repos and so forth, our Python frontend needs to be able to mirror from PyPI, and it appears the only straightforwards way to do this is through the XML-RPC APIs
[01:01:33] <sumanah> dalley: hi - saw your question
[01:02:07] <sumanah> dalley: so, right now, as far as I know, there's no timeline for the Warehouse API overhaul. We're seeking funding for it https://wiki.python.org/psf/Fundable%20Packaging%20Improvements#Revamp_PyPI_API
[01:02:35] <sumanah> dalley: you mentioned "we write repository management software" - who's the we, if I may ask? :-)
[01:03:53] <sumanah> dalley: you'll probably want to subscribe to some individual GitHub issues for updates on the deprecation and addition of particular bits of the Warehouse API, such as https://github.com/pypa/warehouse/issues/284 , https://github.com/pypa/warehouse/issues/7730 , and https://github.com/pypa/warehouse/issues/1393
[01:05:21] <dalley> sumanah, re: "we" https://pulpproject.org/
[01:07:54] <sumanah> oh got it!
[01:08:06] <sumanah> hi Pulp :-)
[01:09:04] <sumanah> dalley: you mentioned mirroring from PyPI.
[01:09:18] <sumanah> dalley: this may be obtuse of me, but - bandersnatch doesn't work for you? to help do that?
[01:09:34] <sumanah> https://packaging.python.org/key_projects/#bandersnatch
[01:14:01] <sumanah> heading off - if you would like a longer conversation, https://mail.python.org/archives/list/distutils-sig@python.org/ is good
[14:38:28] <dalley> sumanah, sorry I didn't get a chance to respond yesterday :)
[14:39:44] <dalley> the tl;dr of why we can't just use bandersnatch is that our backend is totally different. we focus a lot on the actual "repository management" use cases and so we ingest everything into the database rather than saving flat files in a tree structure in a particular directory
[14:41:49] <cooperlees> dalley: Austin and I tried very hard to get new APIs started ...
[14:41:58] <cooperlees> As in asmacdo ...
[14:42:13] <cooperlees> Was a lot of bike shedding :(
[15:12:04] <dalley> yeah, I can imagine :/
[15:39:45] <techalchemy> dalley, same story here basically
[15:40:24] <techalchemy> cooperlees, any reason not to just ingest metadata into a database and generate index files from there?
[15:41:15] <cooperlees> I have thought about sqlite or any DB support.
[15:41:15] <cooperlees> But originally I wanted to keep bandersnatch's dependencies down
[15:42:22] <dalley> techalchemy, you're working on something like this as well?
[15:42:33] <techalchemy> dalley, yep
[15:43:27] <dalley> public project or internal?
[15:44:38] <cooperlees> Then you can thrash that all you want
[15:44:39] <techalchemy> dalley: i'll say something like, the plan is to upstream and/or open source as much as possible
[15:45:00] <cooperlees> RedHat vs. Canonical! :O
[15:45:21] <dalley> cooperlees, hey, I'd settle with just making it easier to use bandersnatch as a library, to get the stream of data and then the backend can handle it however it wants
[15:45:28] <dalley> lololol
[15:45:35] <cooperlees> dalley: I've seen no issues open to ask for this :)
[15:45:40] <cooperlees> How am I ment to know this is a want
[15:46:07] <dalley> cooperlees, I only started thinking of it about 12 hours ago :) I can totally file an issue
[15:46:09] <techalchemy> cooperlees, https://github.com/cooperlees/banderfront <- basically just needs a database
[15:46:56] <cooperlees> happy to evolve on a more usable idea - banderfront was just my way to allow people to use object stores and not need static HTML files
[15:47:07] <techalchemy> then you only store files in <storage backend> and tell the database about them during the sync
[15:47:22] <cooperlees> and cache HTML in redid or cache of choice?
[15:47:25] <cooperlees> *redis
[15:47:49] <cooperlees> Or even directly from DB is you want I guess
[15:48:08] <techalchemy> i mean, i happen to be ok with the design of banderfront as is as long as you can point it at a db
[15:50:04] <techalchemy> dalley, you're with RH? I wonder how many teams are working on repository management at rh
[15:50:44] <dalley> I work on Pulp, yeah
[15:52:29] <techalchemy> hm yeah redhat has a lot of teams working on some variation of repositories/dependencies/depsolving
[15:57:25] <dalley> I know there's Pulp, Quay (for containers, came from an acquisition I'm pretty sure), and one tool that's just a stripped down fork of Pulp with a more specific target use case
[15:58:46] <dalley> and Spacewalk w/ has been winding down for a while
[16:01:06] <cooperlees> dalley: Is @werwty still on pulp? If so, tell her I say hi!
[16:01:20] <dalley> she's not but I will anyways :)
[16:01:44] <cooperlees> I got to help her do her first asyncio commit :D
[16:01:45] <cooperlees> https://github.com/cooperlees/pypi-api/commit/16858c230dc9cb07fba25806f8e5919dad88aa78
[16:02:21] <cooperlees> I had to look it up to remember her name
[16:02:21] <cooperlees> haha
[16:03:13] <cooperlees> That was the proposed async client to the new paginated JSON API
[16:03:47] <cooperlees> pagination for CDN friendliness
[16:23:44] <techalchemy> dalley, i know the folks who work on https://github.com/thoth-station?type=source somewhat, and I know some folks who work on bits of like dnf and whatnot
[16:24:12] <techalchemy> nobody that well
[16:26:20] <dalley> ah I was thinking specifically of repository management type stufff
[16:27:24] <dalley> cooperlees, supposing I do file an issue, what level of detail should it start out with, just a general idea of the use case? and techalchemy, it seems like it would be useful to you as well so definitely chime in
[16:28:10] <cooperlees> describe all your goals and what bandersnatch can't do that would make your life easier and we can start a discussion there.
[16:28:15] <techalchemy> dalley, repository management is super specific I don't know a lot of people who are that focused
[16:28:18] <cooperlees> Lets not focus on the how, but the why and expected output
[16:28:54] <cooperlees> techalchemy: You should all have meetups since it's your common goal to make it easier - pending employers being happy
[16:29:22] <techalchemy> yeah for sure we can chat more
[16:29:54] <cooperlees> dalley: welcome to come to #bandersnatch were we can deep dive and I'm happy to draw diagrams in a google doc then thrash it out over a Video call
[16:30:03] <cooperlees> divide it up and cut out the work bandersnatch needs
[16:30:05] <techalchemy> +1 to that also
[16:30:19] <cooperlees> The more people that use it the happier I am for the hours I've spent on it :)
[16:30:34] <dalley> cool - I'll need to read through the bandersnatch source code first to get a good idea of what it currently does..
[16:31:01] <dalley> why did they switch away?
[16:31:04] <cooperlees> Well, a few small mirrors for project specific stuff but the current Python Foundation people didn't agree it was needed
[16:31:24] <cooperlees> I disagree, but I'm not on that team anymore so whatever
[16:31:40] <cooperlees> I had it running in 3 DCs geoload balanced with NGINX frontend
[16:31:42] <dalley> ah, you mean they just stopped using it as a company-wide cache except in a few places
[16:31:46] <sumanah> wait, reading backscroll....
[16:31:47] <cooperlees> yeahj
[16:32:03] <cooperlees> So I effectively maintain it but don't use it anymore
[16:32:06] <cooperlees> very sad
[16:32:17] <cooperlees> Facebook was my "test bed" haha
[16:32:23] <sumanah> "the current Python Foundation people didn't agree it was needed" - you mean a team within Facebook called Python Foundation?
[16:32:30] <cooperlees> sumanah: ya
[16:32:34] <sumanah> ohhhhh ok thanks
[16:32:40] <techalchemy> facebook is hilarious
[16:32:42] <dalley> you work for Facebook? that's neat
[16:32:46] <cooperlees> Lukasz started it and brought me in
[16:32:51] <sumanah> Sometimes I see people refer to Python Software Foundation (the nonprofit) as "Python Foundation" so I wanted to make sure I understood!
[16:32:58] <cooperlees> Lukasz == ambv == Python 3 release manager
[16:33:11] <cooperlees> Former desk mate
[16:33:14] <cooperlees> Now back in Poland
[16:33:17] <cooperlees> creator of black etc.
[16:33:23] <techalchemy> you had to share a desk? i thought facebook is rich
[16:33:43] <dalley> it seems more efficient to maintain one cache than many caches spread throughout the company
[16:33:58] <cooperlees> dalley: Who knows
[16:34:08] <cooperlees> Their main reason, was Gluster sucks. I agree.
[16:34:34] <cooperlees> So I wanted to add high level block storage cause "they didn't have time" so they can plug in our S3 like storage if they ever want it again
[16:34:43] <cooperlees> then techalchemy came along
[16:34:47] <cooperlees> (Y)
[16:34:53] <techalchemy> yeah, adding S3 atm
[16:37:29] <cooperlees> dalley: I even originally only sync'd 1 mirror then replicated internally, but that turned out to be painful (basically gluster reasons again)
[16:37:38] <cooperlees> So we were once really good citizens!
[16:47:37] <dalley> we currently have code that syncs from PyPI, it just doesn't support mirroring atm, it works on explicit whitelist only and makes REST API calls manually
[16:48:01] <dalley> which is not really very useful tbh
[16:50:12] <dalley> using bandersnatch as a library to handle the interface with PyPI would be pretty much the best case scenario
[16:50:23] <sumanah> dalley: do you happen to know whether there is any time or budget (within Red Hat's Pulp team) to actually work on Warehouse a bit, to support you better?
[16:51:06] <techalchemy> sumanah, i think the bigger concern is whether warehouse would accept any of those changes
[16:51:06] <cooperlees> dalley: That should be pretty doable today. It's all aiohttp based tho, so you'll need to go async
[16:51:28] <sumanah> techalchemy: could you expand on that a bit?
[16:51:40] <cooperlees> sumanah: Example: We couldn't even get an XMLRPC API accepted which benefited everyone ...
[16:51:48] <cooperlees> *replacement
[16:51:49] <techalchemy> a non xmlrpc api *
[16:51:57] <techalchemy> which is the main thing everyone actually needs
[16:52:03] <sumanah> could we back up a bit? I may have missed some history here
[16:52:09] <techalchemy> and there is constant pushback against actually using the json api for anything
[16:52:28] <techalchemy> which makes me wonder why it exists
[16:54:06] <sumanah> so I remember that back when we were pushing toward getting Warehouse deployed fully as production, and decommissioning legacy, there was some discussion of how much to prioritize various API improvements
[16:55:00] <dalley> sumanah, well, that sort of already happened see: https://github.com/pypa/warehouse/pull/4078
[16:55:03] <sumanah> Could you please clarify regarding the pushback on using the Warehouse JSON API?
[16:55:09] <sumanah> I'm reading now
[16:55:59] <sumanah> ok, so, my apologies on the delay in review and thus the bitrotting
[16:57:45] <sumanah> in the past few years we've been able to make some sustained improvements in various bits of PyPI, and actually review and merge more stuff, during the periods when there were paid people working on Warehouse during paid work time
[16:58:42] <sumanah> this is why the Packaging WG is seeking money to fund several things, including a big revamp of the API https://wiki.python.org/psf/Fundable%20Packaging%20Improvements#Revamp_PyPI_API
[16:59:10] <cooperlees> This hurts when I and some others spent so many hours for free to do a lot of that
[16:59:46] <cooperlees> But on the other side, very happy it might finally happen.
[17:00:04] <sumanah> cooperlees: what I've seen in the past is:
[17:00:43] <sumanah> volunteer effort gets us part of the way there, and then concentrated paid time can build on those PRs and past branches in order to get things reviewed, tested, polished, deployed
[17:01:19] <sumanah> like how the two-factor auth work last year, funded by Open Tech Fund, was informed by a PR by, I think, steiza that started that implementation
[17:01:52] <cooperlees> Where I pulled the pin was that both asmacdo and I didn't want to spend the 10s of hours getting the ~100% unrealistic test requirement of Wasehouse before people would say yes we'll actually accept this if you do it
[17:02:06] <sumanah> ah I didn't know that was a barrier
[17:02:09] <sumanah> for this
[17:02:22] <cooperlees> we did everything asked by key warehouse people but no one would ever say yes we'll accept this if you do the work
[17:02:28] <sumanah> I just saw the message closing it where "This code has rotted a bit, so I'm closing."
[17:02:39] <sumanah> cooperlees: that must have been very frustrating. I'm sorry
[17:02:39] <cooperlees> And then I had split brain between key warehouse people which was also frustrating
[17:02:52] <cooperlees> GO speak to X, go ask Y, I'm not sure
[17:03:00] <sumanah> cooperlees: super frustrating!!
[17:03:06] <cooperlees> I remembered we're all volunteers so I accepted it, but I've seen others get to the same spot.
[17:03:15] <sumanah> :( :(
[17:03:31] <sumanah> I think one good thing about a paid project is that it forces decisions
[17:03:58] <sumanah> and another is that it guarantees reviewer time and much faster feedback
[17:05:33] <sumanah> on a separate note -- I did not know anyone was pushing back and saying "don't use the Warehouse API" -- this is a surprise to me
[17:08:24] <techalchemy> i can possibly find some github issues where its come up
[17:08:55] <sumanah> thank you!
[17:24:31] <techalchemy> dalley, I just saw https://github.com/pypa/pip/issues/7406#issuecomment-589987961 also
[17:25:04] <techalchemy> have you tried to use libsolv for python depsolving?
[17:26:57] <techalchemy> sumanah, https://github.com/pypa/pip/issues/7406#issuecomment-583891169 is one example
[17:27:07] <techalchemy> though it mentions pip specifically
[17:33:07] <sumanah> techalchemy: understood. OK, I updated the funding request at https://wiki.python.org/psf/Fundable%20Packaging%20Improvements#Revamp_PyPI_API to reflect this, and I commented on https://github.com/pypa/warehouse/issues/284 to ask for maintainers' opinions on "whether this kind of feature would be welcome if someone were to try again at implementing it."
[17:33:28] <sumanah> thanks for the background context techalchemy cooperlees dalley
[17:33:36] <techalchemy> basically i think a PEP for the Json API would be step 1
[17:33:44] <sumanah> yes
[17:35:44] <dalley> techalchemy, I haven't but I also think it wouldn't work well due to the way metadata works
[17:40:14] <dalley> so with RPM metadata (I think both DNF and Zypper use libsolv), all the metadata for every package and their dependencies is local and provided up front
[17:40:29] <dalley> and the way libsolv works really relies on that being the case
[17:41:11] <dalley> and it's unfortunately not the case w/ Python metadata
[19:02:21] <cooperlees> techalchemy: Only reason I didn't go the PEP route was the goal was to 1:1 xmlrpc parity, then put an XMLRPC proxy in front of the new API to get more metrics around who's hitting it, then please new API, officially deprecate XMLRPC API, more all things we have control to new API, try and contact stragglers.
[19:02:27] <cooperlees> Even offered to help with all that
[19:03:00] <cooperlees> s/please/release/g
[19:54:25] <sumanah> bhrutledge: if you could help this person https://github.com/pypa/twine/pull/642#issuecomment-637768623 I would really appreciate it - my time is pretty well spoken for today
[21:20:21] <techalchemy> dalley, yeah I'm the one in the thread who mentioned how the metadata is the biggest issue with python dependency solving... I feel like I struggle making that point but I'm not sure it's because they disagree -- it might just be because fixing python metadata would be hard
[21:20:34] <techalchemy> they=whomever
[21:22:56] <dalley> yeah, coordinating a change like that between pip, pypi, and every other stakeholder would be a nightmare
[21:23:15] <dalley> would open a lot of doors though
[21:24:50] <sumanah> hi, fixing Python metadata, you said a thing I'm a fan of
[21:26:47] <sumanah> dalley: there's the work of increasing strictness going forward https://discuss.python.org/t/increasing-pips-pypis-metadata-strictness-summit-followup/1855/ and then there's the work of fixing up all the metadata in the already-uploaded packages....
[21:27:02] <sumanah> which is https://wiki.python.org/psf/Fundable%20Packaging%20Improvements#Audit_and_update_package_metadata
[21:27:35] <sumanah> oh wait, I may have misunderstood
[21:27:58] <sumanah> I meant "increasing the strictness of compliance checks" you maybe mean "changing the metadata format"
[21:29:18] <dalley> yeah, something more along those lines
[21:30:02] <dalley> not needing to make additional requests to perform depsolving
[21:41:34] <sumanah> dalley: like, a metadata-only API would be great, too?
[21:42:27] <sumanah> https://github.com/pypa/warehouse/issues/474 ?
[21:43:28] <dalley> sumanah, just to be clear we're just discussing the reason why libsolv in particular would be a bad fit. even the PR you mentioned (474) wouldn't be enough for that use case
[21:43:36] <sumanah> ah ok
[21:44:37] <dalley> what you would need to make a SAT-based dependency solver (like libsolv) work is a local copy of all of the dependency information for every package
[21:46:03] <dalley> like RPM or Cargo have (I assume Debian too, but I don't know as much about that ecosystem)
[21:48:01] <dalley> techalchemy, we do use libsolv in the Pulp RPM plugin which is why I ended up packaging the wheels for PyPi
[21:48:33] <dalley> *in* pypi
[21:48:51] <techalchemy> sumanah, there is no open issue that would ever come close to allowing a sat solver to work for python
[21:49:02] <techalchemy> any conversation i've ever had on the topic has basically died in 2 seconds
[21:49:07] <dalley> I can't speak English today apparently
[21:49:22] <sumanah> ok
[21:49:50] <techalchemy> dalley, that makes sense -- the conda folks are allegedly doing some kind of sat solving with python packages btw so 'im kind of lying
[21:49:52] <sumanah> the SAT solver aspect of it was one I hadn't seen because of my hasty skimming of backlog. Sorry
[21:51:03] <techalchemy> sumanah, the only proposal there's ever really been on the topic has been a discussion between me and pradyun in here where i suggested that we spin up builders and try to generate metadata for every artifact on all major platforms for all of the things that get uploaded
[21:51:37] <sumanah> techalchemy: ok. do we know anything about how much that would cost?
[21:51:53] <dalley> conda I think has their own entirely different repository setup, I've never looked into how it works exactly past the fact that it's totally different
[21:51:54] <techalchemy> in terms of compute resources or developer time :p
[21:53:29] <techalchemy> sumanah, I don't think anyone has the slightest idea about either piece ^ but the discussion dies when I've had it because historical artifacts still won't have metadata... for building a sat solver it's not a huge deal since you'd start to have that metadata going forward
[21:53:56] <techalchemy> anyhow, we've concluded we don't really need to do sat solving, but that doesn't change the value of having static metadata available up front
[21:54:18] <techalchemy> i'm not sure if anyone else besides me cares about that
[22:05:38] <sumanah> I am guessing at least a few people do, but I am nearing the end of my work day and cannot do the leadership thing right now of helping you find them
[22:55:48] <sumanah> Catch y'all later.
[23:10:32] <travis-ci> pypa/pip#16650 (master - 549a9d1 : Pradyun Gedam): The build passed.
[23:10:32] <travis-ci> Change view : https://github.com/pypa/pip/compare/69a4cb3eedcd84befba5b2d8c3e8b8296e139496...549a9d11a1f269206809b8276bee41abbc7cb533
[23:10:32] <travis-ci> Build details : https://travis-ci.org/pypa/pip/builds/694047649