PMXBOT Log file Viewer

Help | Karma | Search:

#pypa-dev logs for Friday the 29th of July, 2016

(Back to #pypa-dev overview) (Back to channel listing) (Animate logs)
[09:20:41] <pombreda> Hiya :) Any better/ more polite way to get all Pypi metadata short of using either the xmlrpc or the JSON api, doing one call for each package/version?
[12:37:23] <dstufft> pombreda: no that's the only way-- and preferable use the JSON api
[12:59:48] <pombreda> dstufft: ok. I will try be very gentle then. And ensure that I couple this with xmlrpc calls to get only the new incremental parts?
[13:00:31] <dstufft> pombreda: yea, JSON API requests are more or less free for us. Typically they'll get served out of the cache and a cache hit is basically free
[13:00:43] <pombreda> dstufft: good
[13:00:57] <dstufft> If it's a cache miss then it's not free, but cache hit's never hit our own servers, they get served by Fastly and we don't even notice
[13:01:01] <pombreda> dstufft: shall spread it over pypi.python.org and pypi.io?
[13:01:06] <dstufft> XMLRPC API requests are always a cache miss
[13:01:29] <dstufft> pombreda: no need, you can just pick one and go-- pypi.python.org is more likely to have cache hit's though
[13:01:32] <pombreda> dstufft: but to get updates since a date, xmlrpc is the way, right?
[13:01:39] <dstufft> pombreda: yea
[13:02:06] <dstufft> pombreda: though you may want to use the serial number instead of date
[13:02:09] <dstufft> like bandersnatch does
[13:02:18] <dstufft> the date can miss some records
[13:03:41] <pombreda> dstufft: thanks!
[13:05:18] <dstufft> pombreda: no problem
[13:06:58] <pombreda> dstufft: would there be a way to arrange for a one time dump of the DB? I guess the model is in warehouse now right?
[13:06:59] <pombreda> https://github.com/pypa/warehouse/blob/master/warehouse/packaging/models.py#L75
[13:11:06] <dstufft> pombreda: yea that's where the model is defined now, I'm not sure I'll be able to get to that any time soon
[13:11:58] <pombreda> dstufft: no pro. I can do without alright, I just want to avoid something looking like a DDOS :P
[13:12:18] <dstufft> pombreda: oh, we get like 3 billion HTTP requests a month
[13:12:23] <pombreda> and once the initial hurdle is over, the volume will be low
[13:12:33] <dstufft> We won't even notice ~80k
[13:12:37] <pombreda> :P
[13:12:48] <dstufft> particularly if they're to the JSON endpoint
[13:13:15] <pombreda> well that would be 80K times the number of releases for each package, so more in the range of 500K one time
[13:13:48] <pombreda> dstufft: there used to be a mirror in China too?
[13:13:59] <dstufft> yea that's true if you're looking for all the metadata
[13:14:12] <dstufft> pombreda: um, I think some folks had a bandersnatch mirror in china
[13:14:18] <dstufft> no idea if it's still running or not
[13:14:36] <pombreda> I will chekc https://pypi-mirrors.org/ seems handy
[13:15:06] <dstufft> https://pypi.douban.com/ seems plausible
[13:15:44] <pombreda> and the gocept one too :) so I can both be polite and get spread the load somehow :P
[13:21:24] <pombreda> dstufft: none of them mirrors the JSON data so they are kinda useless to me