PMXBOT Log file Viewer

Help | Karma | Search:

#pil logs for Wednesday the 9th of October, 2013

(Back to #pil overview) (Back to channel listing) (Animate logs)
[15:28:46] <wiredfool> ttback: re your question from a few days back
[15:28:55] <wiredfool> there's no explicit multiprocessing support
[15:29:09] <ttback> wiredfool: in python?
[15:29:15] <wiredfool> in pillow
[15:29:25] <ttback> wiredfool: okay, sorry, was in python group
[15:29:31] <ttback> wiredfool: i see
[15:29:39] <wiredfool> I believe that pil releases the GIL deep in the guts when doing image manipulations
[15:29:52] <wiredfool> pil/pillow
[15:30:14] <ttback> wiredfool: what does it mean to release the GIL?
[15:30:33] <ttback> so you could actually run pil/pillow in a multi-thread and works in parallel?
[15:30:36] <wiredfool> the global interpreter lock
[15:31:05] <ttback> i haven't gotten trying this, but right now im thinking to run a python module using pillow as a subprocess
[15:31:13] <ttback> so i can get around the GIL
[15:31:24] <ttback> so have an independently module doing all the image processing
[15:31:33] <wiredfool> I think it's possible to have multiple image ops at the same time, but any python level stuff is going to be effectively single core
[15:31:39] <ttback> and the main program manages it as a subprocess
[15:32:00] <ttback> that's what kinda bothers me
[15:32:13] <wiredfool> if you can architect it as a queue w/ producer/consumer then you can add workers as necessary
[15:32:14] <ttback> we are trying to process a ton of pictures on multi-core machines
[15:32:32] <ttback> hmm
[15:33:00] <ttback> currently the program is architected that way for python threads
[15:33:07] <ttback> using Queue.Queue as queue for process tasks
[15:33:11] <ttback> then thread consumes them
[15:33:18] <wiredfool> yep
[15:33:21] <ttback> but GIL made sure threads in python aren't really threads
[15:33:29] <ttback> as you effectively have single thread
[15:33:52] <ttback> that's what i'm getting from looking at the log and hearing from other ppl
[15:33:53] <wiredfool> you might be able to do that with multiple threads and have a gain, you'd have to test it
[15:34:03] <ttback> so i guess the queue has to be independent from the app?
[15:34:24] <ttback> or the workers
[15:34:32] <ttback> right now i am thinking to spawn subprocesses cuz of this
[15:34:34] <wiredfool> personally, I'm marshalling the workers over http
[15:34:37] <ttback> still have the Queue for tasks
[15:34:46] <ttback> but spawn subprocess workers instaed of threads
[15:34:50] <wiredfool> sounds like a good idea
[15:35:09] <ttback> yeah, we could roll beanstalk but that http call over for large scanned tiffs
[15:35:15] <ttback> doesn't sound too efficient
[15:35:27] <wiredfool> heh. I do it for small scanned tiffs
[15:35:55] <ttback> yeah, i'm doing like page scans
[15:36:13] <ttback> so the ones we worry about aren't small enough for this
[15:36:29] <ttback> if they were super small, it might not be a problem
[15:36:58] <wiredfool> most of mine are <100k
[15:37:05] <ttback> tho the thing with multiprocesssing, i am still profiling more to see if it actually costs us more time
[15:37:11] <wiredfool> and the workflow works well with simple posts
[15:37:17] <ttback> cuz it looks like the upload and download from cloud storage cost 20 times more
[15:37:20] <ttback> than what pil will cost
[15:37:24] <ttback> even on single thread
[15:37:30] <wiredfool> yikes
[15:37:46] <ttback> so i am checking the metrics first before doing any huge changes
[15:38:02] <ttback> it appears that unless i get it down to more than 5 times faster
[15:38:15] <ttback> the i/o operation will negate any gains from multiprocessing the images
[15:38:46] <ttback> it's almost like optimizing a file processor on a slow hard drive
[15:39:07] <wiredfool> are your image operations expensive per image?
[15:39:14] <ttback> doesn't seem like it
[15:39:20] <ttback> it's just a gausian blur
[15:39:42] <ttback> and some concat
[15:40:05] <ttback> i think the blur is the most expensive operation
[15:40:25] <wiredfool> it certainly can be, especially at high radius.
[15:40:55] <wiredfool> I can see where shipping the images off would wind up being a lot slower.
[15:41:16] <ttback> especially when this isn't a real-time web app
[15:41:27] <ttback> so the processing, as long as it is correct and completes sometime
[15:41:32] <ttback> i guess it's fine
[15:42:09] <ttback> if we want to process user uploads in real time, that's gonna take some thoughts
[15:42:32] <ttback> thanks for the reply tho
[15:42:39] <ttback> didn't know what PIL does deep down at all
[15:42:44] <ttback> may have to dig more
[15:42:44] <wiredfool> throwing a queue in the loop to decouple the processing from whatever else is going on is usuallt a win for latency
[15:42:53] <ttback> it would be nicer if they have a multprocess version of pil tho
[15:43:31] <ttback> something like you can punch a list of images in
[15:43:39] <ttback> and it handles the multi-core part for you
[15:43:49] <wiredfool> so like multiprocess map
[15:44:23] <wiredfool> That should be doable at a level a little higher than pillow
[15:44:27] <ttback> pillow seems to focus on pil to other platforms
[15:44:49] <ttback> yeah
[15:44:56] <ttback> sounds like a diff project
[15:45:02] <wiredfool> well, it was packaging, now it's packagin + bugfixes + features + active development
[15:45:28] <ttback> it seems like pil/pillow should focus on new image processing packages
[15:45:45] <ttback> and something else focus on the multiprocess
[15:46:00] <ttback> there is a long list of python modules claiming to help ppl leverage multicores
[15:46:12] <wiredfool> yeah. I've written a few myself
[15:46:40] <ttback> like this one http://www.parallelpython.com/
[15:47:13] <wiredfool> time for multiprocessing for humans
[15:47:34] <ttback> have you followed anything on google's go?
[15:47:46] <wiredfool> some. I'm meaning to try a project in it
[15:48:02] <ttback> currently go appears to market itself as the kit of multiprocessing for humans
[15:48:36] <ttback> it almost sounds like, if you write anything in go, it will automatically leverage multiple cores
[15:48:40] <ttback> i am not sure if it is all hype yet
[15:48:51] <ttback> prob have to try it
[15:49:07] <wiredfool> I think you still have to think about it, but the language tends to push you towards doing things the right way for concurrency
[15:49:17] <ttback> right now python just gives ppl an impression that it will not be the choice if you want to do multi-core programming
[15:49:36] <wiredfool> yeah. it would.
[16:03:27] <iElectric> wiredfool: did you get a chance to test my patch
[16:06:58] <wiredfool> I havent
[16:07:14] <wiredfool> It's in the queue though
[16:09:59] <iElectric> thanks!