[07:39:12] <pynpong> Hi, I am relatively new to Python and I am currently using pyenv in order to not contaminate my python's system default installation with other versions I may want to use for dev. I would like to install pip now, but I am not sure how would I go about it, for example, `python get-pip.py` is one way, but are there any other methods? Finally, since I am using `pyenv`, what is the recommended python version ~ to get pip. Thanks!
[21:18:00] <dstufft> r1chardj0n3s: I don't know of a super good way to prevent the problem in general :/ It feels pretty crummy for projects to need to try and register common spelling mistakes to stop that, but I'm not sure if it's a good idea for claiming a name on PyPI to also try and claim "similar" names (and how do you even define what a similar name is? leveinstein distance of 1?)
[21:19:02] <tomprince> How many conflicts are there at leveinstein distance of 2?
[21:20:31] <dstufft> dunno! I have no numbers to back things up
[21:21:41] <r1chardj0n3s> now, I grab a list of names...
[21:22:46] <tomprince> I guess there might be some with inserting a single letter or two letter prefix (which is perhaps reasonable) and very few others.
[21:23:12] <tomprince> I can imagine requiring manual review for conflicts.
[21:34:03] <r1chardj0n3s> this looks simple enough - I've just gotta do the school run and then I'll get back to it
[22:01:31] <dstufft> If you see this page then you came here because you installed some honey pot package over pip. All data that is sent to this server is for pentesting purposes. For more information cosider visiting my blog.
[22:13:28] <r1chardj0n3s> even at distance 1.. there's a heck of a lot of TLA and 4LA names that collide
[22:18:05] <dstufft> r1chardj0n3s: it might make sense to step in the requirements, e.g at 3-4 letters you're going to have a lot of collides no matter what, but maybe at 5-6 a distance of 1 is good, but at 6+ a distance of 2
[22:25:25] <dstufft> Like the name change policy and a CoC, an AUP defines what we define as OK and not OK to put on PyPI, things don't need to be overtly malicious for us not to want to host them. This guy is apparently claiming it was pentesting, that's not exactly malicious but it's something an AUP wouldn't allow
[22:25:44] <dstufft> or well probably wouldn't allow
[22:28:49] <dstufft> I wonder if there's better science behind this than levenstein distance
[22:29:34] <dstufft> r1chardj0n3s: eh, I don't really think any of this is curation
[22:30:02] <r1chardj0n3s> my point was that the only way to solve this problem is with curation
[22:30:15] <r1chardj0n3s> anything else is pissing into the wind
[22:31:11] <tomprince> Well, there curation isn't a black-and-white choice.
[22:31:19] <tomprince> There are many potential levels of curation.
[22:31:36] <dstufft> the only way to solve is completely is curation yes, that doesn't mean there aren't ways to make it less of a problem. Without actually sitting down and doing some numbers though I don't know where the line between "complete wild west" and "whack a mole" lies
[22:32:00] <tomprince> This icident suggests that perhaps some level of curation is appropriate at small levenstein distances.
[22:32:02] <dstufft> We already do some normalization to prevent things like django and Django being different, and lol and 1ol
[22:33:00] <dstufft> it's possible that if we fiddle with the constraints something more (like leveinstein distance, or soudnex or something) might be another useful thing to apply, but it's also completely possible that the noise is too much
[22:33:09] <dstufft> there's another axis to this as well
[22:33:26] <dstufft> the more popular a package is, the more useful it is to protect it
[22:33:53] <dstufft> something like setuptools or requests is a lot more useful than billy-bobs-i-have-no-users-and-was-only-uploaded-once-in-2008
[22:34:24] <dstufft> so it's possible to be stricter about matches to the top X packages than to the bottom Y
[23:14:36] <r1chardj0n3s> dstufft: I now have some Lev numbers for you
[23:14:50] <r1chardj0n3s> I need to crunch them a bit ;)
[23:15:02] <r1chardj0n3s> Imma do a nester nuke first, clean out the obvious trash
[23:17:12] <Ivo> dstufft: something happen to rst rendering? https://pypi.python.org/pypi/virtualenv/
[23:18:55] <dstufft> Ivo: I think virtualenv might have been like that for awhile, but in general rst endering changed everywhere on PyPI and uses readme now
[23:19:09] <dstufft> you check what the actual failures are by running pip install readme && python setup.py check -r -s
[23:20:04] <dstufft> (that doesn't yet take into account "soft" failures, like "you used an element that PyPI is going to escape so it's not going to render correctly", but it will take into account any rst failures)
[23:24:42] <Ivo> is setuptools making use of readme now?
[23:25:19] <r1chardj0n3s> huh, dstufft, just running the nested nuker, and a bunch of the nester files are missing. I wonder whether people are finally going in and trying to nuke their own nester projects, but only nuking the files and the db leaves broken file references?
[23:59:48] <r1chardj0n3s> ionelmc: I noticed them start to appear back when the book was first published, a few years back. Wrote a script to nuke them. Including the German translation :/