Trying out concurrent.futures

2012-09-07 by saghul

A while back I learned about concurrent.futures, a new module included in Python 3.2. It allows you to create a pool of processes or threads to which you can distribute work and then wait for it to finish. The API is dead simple and there is a backport of the module for Python 2 here.

Today I was working with a small tool we have at work to help us build Debian packages for a number of distributions, which basically calls debuild a number of times and I’m very happy with the result after using this module. Since the tool is just a helper it used to just build all packages one after another, but as the number of packages grows, doing the operations in a serialized way just takes too long. Since the tasks are CPU bound, threads are of no use here, because of the GIL, so going multi-process can help here.

The idea is simple: create a process pool as big as the number of cores we have available, send the hard work to them and wait until they are all done. Here is an example of such use case:

[gist]https://gist.github.com/3660656[/gist]

As it can be seen, the API is really a pleasure to use and it did solve the problem effectively. :–)

:wq

3 thoughts on “Trying out concurrent.futures”

Juanlu001 says:

2013-09-06 at 23:38

I don’t fully understand the advantages of using `ProcessPoolExecutor` versus the old `Pool` in multiprocessing. Consider these two pieces of code, adapted from yours:

https://gist.github.com/Juanlu001/6470913

In my computer:

[juanlu@nebulae]─[~/Development/Python/test]
└[00:35:30] $ time python pool_multiprocessing.py

real 0m0.190s
user 0m0.217s
sys 0m0.020s
[juanlu@nebulae]─[~/Development/Python/test]
└[00:35:31] $ time python pool_futures.py

real 0m2.426s
user 0m3.400s
sys 0m0.577s

I didn’t understand the advantages, and now I understand the performance even less. Trying to run cProfile on them gives strange errors, so I couldn’t figure out the issue. What do you think?

Reply
1. Juanlu001 says:
  
  2013-09-06 at 23:45
  
  I made a change so now they look almost identical, and the performance of pool_futures.py is still very poor:
  
  https://gist.github.com/Juanlu001/6470913
  
  Reply
2. saghul says:
  
  2013-09-06 at 23:46
  
  To be honest I never used multiprocessing.Pool myself :-S Not sure how much overhead Futures add, here, but you might have found a nasty bug, they should perform equally, I guess.
  
  Reply

3 thoughts on “Trying out concurrent.futures”

Leave a Reply Cancel reply