PMXBOT Log file Viewer

Help | Karma | Search:

#pypa-dev logs for Sunday the 22nd of February, 2015

(Back to #pypa-dev overview) (Back to channel listing) (Animate logs)
[07:39:12] <pynpong> Hi, I am relatively new to Python and I am currently using pyenv in order to not contaminate my python's system default installation with other versions I may want to use for dev. I would like to install pip now, but I am not sure how would I go about it, for example, `python get-pip.py` is one way, but are there any other methods? Finally, since I am using `pyenv`, what is the recommended python version ~ to get pip. Thanks!
[07:47:18] <pynpong> views/
[07:47:20] <pynpong> views?
[07:47:29] <pynpong> hello
[20:16:43] <r1chardj0n3s> hey dstufft, you awake?
[20:55:21] <dstufft> r1chardj0n3s: sorry was away, am here now
[21:08:04] <r1chardj0n3s> hai, so I guess we need the "how to deal with malicious code" policy finally eh?
[21:10:37] <r1chardj0n3s> thanks for dealing with it.
[21:15:33] <dstufft> r1chardj0n3s: no problem
[21:18:00] <dstufft> r1chardj0n3s: I don't know of a super good way to prevent the problem in general :/ It feels pretty crummy for projects to need to try and register common spelling mistakes to stop that, but I'm not sure if it's a good idea for claiming a name on PyPI to also try and claim "similar" names (and how do you even define what a similar name is? leveinstein distance of 1?)
[21:19:02] <tomprince> How many conflicts are there at leveinstein distance of 2?
[21:20:31] <dstufft> dunno! I have no numbers to back things up
[21:20:58] <dstufft> it's a good question though
[21:21:36] <r1chardj0n3s> https://pypi.python.org/pypi/python-Levenshtein/ \o/
[21:21:41] <r1chardj0n3s> now, I grab a list of names...
[21:22:46] <tomprince> I guess there might be some with inserting a single letter or two letter prefix (which is perhaps reasonable) and very few others.
[21:23:12] <tomprince> I can imagine requiring manual review for conflicts.
[21:34:03] <r1chardj0n3s> this looks simple enough - I've just gotta do the school run and then I'll get back to it
[22:01:29] <dstufft> r1chardj0n3s:
[22:01:31] <dstufft> "
[22:01:31] <dstufft> Usage of this project
[22:01:31] <dstufft> If you see this page then you came here because you installed some honey pot package over pip. All data that is sent to this server is for pentesting purposes. For more information cosider visiting my blog.
[22:01:31] <dstufft> "
[22:01:35] <dstufft> https://zzz.scrapeulous.com/
[22:01:40] <dstufft> the page says that now lol
[22:01:46] <r1chardj0n3s> hahaha
[22:01:49] <r1chardj0n3s> yeah, right
[22:03:01] <dstufft> https://www.reddit.com/r/Python/comments/2wr93b/this_one_looks_odd_doesnt_it/
[22:03:12] <dstufft> apparently people are reporting him to his registrar ands hit too
[22:05:54] <r1chardj0n3s> yeah, so there's a *lot* of Lev <= 2 matches
[22:06:34] <r1chardj0n3s> Lev == 1 is more likely for haxxing though, so I'll look just for those
[22:08:04] <r1chardj0n3s> ugh *nester
[22:13:28] <r1chardj0n3s> even at distance 1.. there's a heck of a lot of TLA and 4LA names that collide
[22:18:05] <dstufft> r1chardj0n3s: it might make sense to step in the requirements, e.g at 3-4 letters you're going to have a lot of collides no matter what, but maybe at 5-6 a distance of 1 is good, but at 6+ a distance of 2
[22:18:06] <dstufft> or something
[22:18:38] <r1chardj0n3s> if they're going for the typos then distance has to be 1
[22:19:04] <tomprince> And I think certain edits are less problematic and more likely (insertion at the begining, in particular, comes to mind).
[22:19:27] <tomprince> Something like json vs. ujson (not that the former is a PyPI package)
[22:19:56] <r1chardj0n3s> this would only produce a flag list, not be used to instaban
[22:20:32] <r1chardj0n3s> dstufft.testpkg, dstufft.testpkg2, dstufft.testpkg22, dstufft.testpkg3 man what a hax
[22:20:50] <dstufft> I can't help it, I'm a bad bad man
[22:22:01] <dstufft> r1chardj0n3s: we might want to poke VanL, maybe we need an AUP
[22:22:09] <r1chardj0n3s> but then there's django-invite, django-inviter, django-inviter2 ;)
[22:22:49] <tomprince> I would be inclined to default to blocking, at least after a transition period, with a procedure for bypassing.
[22:23:15] <r1chardj0n3s> we have an AUP, but it doesn't mention malicious intent
[22:23:51] <r1chardj0n3s> because, well, what would be the point?
[22:23:56] <dstufft> r1chardj0n3s: where is the AUP? All I see is a thing that says "you give us a license"
[22:24:19] <r1chardj0n3s> ok, yeah, that's it
[22:25:25] <dstufft> Like the name change policy and a CoC, an AUP defines what we define as OK and not OK to put on PyPI, things don't need to be overtly malicious for us not to want to host them. This guy is apparently claiming it was pentesting, that's not exactly malicious but it's something an AUP wouldn't allow
[22:25:44] <dstufft> or well probably wouldn't allow
[22:25:54] <r1chardj0n3s> https://pypi.python.org/pypi/bphython
[22:26:10] <r1chardj0n3s> actually has what seems to be a version of bpython
[22:26:26] <r1chardj0n3s> but may very well not be :/
[22:28:08] <r1chardj0n3s> and my checker wouldn't pick up bphyton (not that it exists)
[22:28:35] <r1chardj0n3s> curation, you say?
[22:28:49] <dstufft> I wonder if there's better science behind this than levenstein distance
[22:29:34] <dstufft> r1chardj0n3s: eh, I don't really think any of this is curation
[22:30:02] <r1chardj0n3s> my point was that the only way to solve this problem is with curation
[22:30:15] <r1chardj0n3s> anything else is pissing into the wind
[22:31:11] <tomprince> Well, there curation isn't a black-and-white choice.
[22:31:19] <tomprince> There are many potential levels of curation.
[22:31:36] <dstufft> the only way to solve is completely is curation yes, that doesn't mean there aren't ways to make it less of a problem. Without actually sitting down and doing some numbers though I don't know where the line between "complete wild west" and "whack a mole" lies
[22:32:00] <tomprince> This icident suggests that perhaps some level of curation is appropriate at small levenstein distances.
[22:32:02] <dstufft> We already do some normalization to prevent things like django and Django being different, and lol and 1ol
[22:33:00] <dstufft> it's possible that if we fiddle with the constraints something more (like leveinstein distance, or soudnex or something) might be another useful thing to apply, but it's also completely possible that the noise is too much
[22:33:09] <dstufft> there's another axis to this as well
[22:33:26] <dstufft> the more popular a package is, the more useful it is to protect it
[22:33:53] <dstufft> something like setuptools or requests is a lot more useful than billy-bobs-i-have-no-users-and-was-only-uploaded-once-in-2008
[22:34:24] <dstufft> so it's possible to be stricter about matches to the top X packages than to the bottom Y
[23:14:36] <r1chardj0n3s> dstufft: I now have some Lev numbers for you
[23:14:43] <dstufft> r1chardj0n3s: nice!
[23:14:45] <r1chardj0n3s> but they're a bit crap
[23:14:50] <dstufft> :(
[23:14:50] <r1chardj0n3s> I need to crunch them a bit ;)
[23:15:02] <r1chardj0n3s> Imma do a nester nuke first, clean out the obvious trash
[23:17:12] <Ivo> dstufft: something happen to rst rendering? https://pypi.python.org/pypi/virtualenv/
[23:18:55] <dstufft> Ivo: I think virtualenv might have been like that for awhile, but in general rst endering changed everywhere on PyPI and uses readme now
[23:19:09] <dstufft> you check what the actual failures are by running pip install readme && python setup.py check -r -s
[23:20:04] <dstufft> (that doesn't yet take into account "soft" failures, like "you used an element that PyPI is going to escape so it's not going to render correctly", but it will take into account any rst failures)
[23:24:42] <Ivo> is setuptools making use of readme now?
[23:25:19] <r1chardj0n3s> huh, dstufft, just running the nested nuker, and a bunch of the nester files are missing. I wonder whether people are finally going in and trying to nuke their own nester projects, but only nuking the files and the db leaves broken file references?
[23:25:56] <dstufft> Ivo: No
[23:26:38] <Ivo> h thats bollocks
[23:27:13] <Ivo> I think the problem is that Introduction uses = and Releases uses ~
[23:27:27] <Ivo> as the header marker
[23:31:17] <r1chardj0n3s> ah dstufft, no, it was your file-removal change (retaining the file names) that is the issue :) no worries though
[23:31:34] <dstufft> r1chardj0n3s: ah :D
[23:31:50] <r1chardj0n3s> 90 packages nuked
[23:47:32] <ionelmc> r1chardj0n3s: my package!
[23:47:44] <r1chardj0n3s> yep
[23:47:49] <ionelmc> ok, kidding a bit, what packages were removed?
[23:47:58] <r1chardj0n3s> nested-list-printers
[23:48:05] <ionelmc> oh
[23:48:12] <r1chardj0n3s> created by people following the example in a book
[23:48:19] <ionelmc> how do you find those out?
[23:48:31] <ionelmc> in a _book_?!
[23:51:15] <Ivo> dstufft: could you pull virtualenv master and re-register it?
[23:53:37] <dstufft> Ivo: updated
[23:53:46] <Ivo> fuck you too pypy
[23:55:26] <Ivo> dstufft: cheers!
[23:55:56] <Ivo> r1chardj0n3s: hahahaha
[23:59:48] <r1chardj0n3s> ionelmc: I noticed them start to appear back when the book was first published, a few years back. Wrote a script to nuke them. Including the German translation :/