A while back I learned about concurrent.futures
, a new module included in Python 3.2. It allows you to create a pool of processes or threads to which you can distribute work and then wait for it to finish. The API is dead simple and there is a backport of the module for Python 2 here.
Today I was working with a small tool we have at work to help us build Debian packages for a number of distributions, which basically calls debuild
a number of times and I’m very happy with the result after using this module. Since the tool is just a helper it used to just build all packages one after another, but as the number of packages grows, doing the operations in a serialized way just takes too long. Since the tasks are CPU bound, threads are of no use here, because of the GIL, so going multi-process can help here.
The idea is simple: create a process pool as big as the number of cores we have available, send the hard work to them and wait until they are all done. Here is an example of such use case:
[gist]https://gist.github.com/3660656[/gist]
As it can be seen, the API is really a pleasure to use and it did solve the problem effectively. :–)
:wq