Trying out concurrent.futures

A while back I learned about concurrent.futures, a new module included in Python 3.2. It allows you to create a pool of processes or threads to which you can distribute work and then wait for it to finish. The API is dead simple and there is a backport of the module for Python 2 here.

Today I was working with a small tool we have at work to help us build Debian packages for a number of distributions, which basically calls debuild a number of times and I’m very happy with the result after using this module. Since the tool is just a helper it used to just build all packages one after another, but as the number of packages grows, doing the operations in a serialized way just takes too long. Since the tasks are CPU bound, threads are of no use here, because of the GIL, so going multi-process can help here.

The idea is simple: create a process pool as big as the number of cores we have available, send the hard work to them and wait until they are all done. Here is an example of such use case:

As it can be seen, the API is really a pleasure to use and it did solve the problem effectively. :–)


3 thoughts on “Trying out concurrent.futures

  1. I don’t fully understand the advantages of using `ProcessPoolExecutor` versus the old `Pool` in multiprocessing. Consider these two pieces of code, adapted from yours:

    In my computer:

    └[00:35:30] $ time python

    real 0m0.190s
    user 0m0.217s
    sys 0m0.020s
    └[00:35:31] $ time python

    real 0m2.426s
    user 0m3.400s
    sys 0m0.577s

    I didn’t understand the advantages, and now I understand the performance even less. Trying to run cProfile on them gives strange errors, so I couldn’t figure out the issue. What do you think?

    1. To be honest I never used multiprocessing.Pool myself :-S Not sure how much overhead Futures add, here, but you might have found a nasty bug, they should perform equally, I guess.

Leave a Reply