greenlet local storage on greenlet 0.4.0

Greenlet 0.4.0 brought an interesting new feature: an instance dictionary on each greenlet object, which makes it a lot simpler to implement greenlet local storage. Here is how greenlet local storage is currently implemented in Eventlet and in Gevent.

As it can be seen, the implementation is not particularly straightforward, mainly due to the fact that the actual information needs to be stored in a separate entity and mapped to each greenlet.

Thanks to the instance dictionary added in 0.4.0, we can use some attribute in it to keep the locally stored objects. The plan is to use a dictionary called __local_dict__ and store the greenlet local attributes there. Here is how it looks like:

[gist]https://gist.github.com/4151154[/gist]

Hope it’s of use.

:wq

Using functools.partial instead of saving arguments

I’m a big fan of functools.partial myself. It allows you to take a function, preset some of its arguments and return another function which you can call with more arguments. Not sure if currying is the right term here, but I’ve heard people refer to functools.partial like that.

Today while browsing the source code for the futures module I came across this class:

[gist]https://gist.github.com/4048428[/gist]

The moment I saw that func, args and kwargs were saved as attributes in the instance I thought: “why not use partial and save a single attribute?”. Then I thought that maybe performance had something to do here, so I wrote a dead simple stupid test to check it out:

[gist]https://gist.github.com/4048405[/gist]

Here are the results using CPython 2.7.3:

In [3]: %timeit testpartial.do_test(testpartial.WorkItem)
100000 loops, best of 3: 5.26 us per loop

In [4]: %timeit testpartial.do_test(testpartial.WorkItemPartial)
100000 loops, best of 3: 2.65 us per loop

We are talking microseconds here, but still the version with partial is almost twice as fast.

Now, lets see how PyPy performs:

In [9]: %timeit -n 10000 testpartial.do_test(testpartial.WorkItem)
10000 loops, best of 3: 154 ns per loop
Compiler time: 554050.78 s

In [10]: %timeit -n 10000 testpartial.do_test(testpartial.WorkItemPartial)
10000 loops, best of 3: 2.49 us per loop

Fun, the version without partial goes into the nanoseconds! And the one with partial doesn’t improve much with regards to CPython. Interesting.

So what’s the take here? Well, whenever I see fit I use partial, code looks nicer and it’s apparently faster, so why not? 🙂

:wq

Fast(er) locks in Python?

While searching for some information on Python locks I recently ran across this great post by David Beazley. In it, he explains how the syncronization primitives in the Python standard library are implemented. Basically, Lock is implemented as a binary semaphore in C, and the rest are implemented in pure Python. Even if the post is from 2009 this is still the case. UPDATE: As Antoine Pitrou points out in the comments, starting with CPython 3.2, RLock is now implemented in C.

This got me thinking. As you know I’ve created pyuv, a Python wrapper for libuv, and libuv includes cross-platform implementations for mutexes, semaphores, conditions rwlocks and barriers, which I never bothered to add to pyuv. I just didn’t add them because I thought they didn’t add any value, but after reading David’s article I decided to do a quick test: implement wrappers for a mutex and a condition variable and use them in a Queue implementation in order to see if there was any difference in performance. Not that I ever ran into performance issues related to that, but I was curious anyway 🙂

Someone may think “oh, but given that Python has the GIL, how does using multiple threads and speeding up the locks matter?”. The GIL is released whenever a IO operation is performed, so if your Python application is multithreaded and it mainly deals with IO-bound tasks, they GIL is not that relevant. If your application is CPU bound, however, better have a look at the multiprocessing module.

So, lets get into the code! I implemented Barrier, Condition, Mutex, RWLock and Semaphore in this pyuv branch, which directly wrap their libuv counterpats. Then I copied the Queue implementation from the standard library and used the freshly wrapped synchronization primitives:

[gist]https://gist.github.com/3997233[/gist]

For the testing part, I used the timeit function from IPython with 5 runs. Not sure if it’s the best way, but results suggest it is a good way 🙂 Here is the test script:

[gist]https://gist.github.com/3997244[/gist]

Here are the results:

The tests were run with 2, 4 and 100 threads, and since I was testing performance, I added PyPy to the mix. Now, as you can see in the results, the custom Queue is about 33% faster than the one in the standard library, so I was pretty happy about that. Until I tested PyPy. It just beats the shit out of both, which is awesome 🙂

The performance increase on CPython is nice, there is a downside, however: libuv treats errors in this primitives a bit “abruptly”, it calls abort(). This means that if you use the locks incorrectly your program will core dump. I personally like it, because it helps you find and fix the problem right away, but not everyone may like it.

:wq

 

uvent: a gevent core implemented using libuv

After working on it on-and-off for a few weeks today I’m open sourcing uvent. uvent is a gevent core implementation using libuv.

I’ve written about libuv earlier in this blog, but to put it shortly: libuv is the new black. Specially when we are talking about asynchronous IO.

uvent is still experimental, but results are encouraging, not all tests are failing! :–) I think libuv is the perfect match for a library such as gevent, we’ll see where this goes.

The code can be checked out here and for those who want to know the scary details I wrote a short document with some notes here

Hope you like it!

:wq

Interactive Python interpreter over TCP for Tornado applications

The Python standard library never ceases to surprise me. I knew about Twisted’s manhole but yesterday I ran across the code module, which is in the Python standard library.

The code module allows you to create a fully fledged Python REPL. So, in order to try it out I decided to implement a “backdoor” server for Tornado applications.

The idea is pretty simple, use Tornado’s TCPServer to serve a Python interpreter. Of course you should only do this in localhost! The can come really useful for debugging, as it doesn’t take any resources from the application, and allows for live inspection.

The result is here on GitHub.

Here is an example, hope you like it!

[gist]https://gist.github.com/3739834[/gist]

:wq

Trying out concurrent.futures

A while back I learned about concurrent.futures, a new module included in Python 3.2. It allows you to create a pool of processes or threads to which you can distribute work and then wait for it to finish. The API is dead simple and there is a backport of the module for Python 2 here.

Today I was working with a small tool we have at work to help us build Debian packages for a number of distributions, which basically calls debuild a number of times and I’m very happy with the result after using this module. Since the tool is just a helper it used to just build all packages one after another, but as the number of packages grows, doing the operations in a serialized way just takes too long. Since the tasks are CPU bound, threads are of no use here, because of the GIL, so going multi-process can help here.

The idea is simple: create a process pool as big as the number of cores we have available, send the hard work to them and wait until they are all done. Here is an example of such use case:

[gist]https://gist.github.com/3660656[/gist]

As it can be seen, the API is really a pleasure to use and it did solve the problem effectively. :–)

:wq

pycares: Python bindings for c-ares, released!

Hi! As you all probably know DNS lookups can get tricky at times, since they block. c-ares is a completely asynchronous DNS resolver written in C, and pycares exports its functionality to Python.

After giving the idea some thought I decided to remove the c-ares based resolver which I had implemented in pyuv and created a standalone package instead.

This way, the implementation is completely independent and it can be used with any asynchronous library like Eventlet, Gevent, pyev, …

Since the c-ares build process can be a bit tricy and uses autotools I decided to use a modified version of the build system used in libuv which is based in plain Makefiles.

You can get the source code on the GitHub repository and checkout the documentation here.

Hope you like it!

:wq

pyuv 0.8.0 released!

Yesterday NodeJS 0.8.0 was released, and libuv got its dedicated branch for maintenance of the 0.8.X version cycle. Since pyuv now implements all the features offered by libuv I chose to follow the same path and branch out pyuv 0.8.0.

From now on pyuv’s v0.8 branch will only get bugfixes from the corresponding libuv branch. pyuv’s master branch will use libuv master and have version number 0.9.0-dev, there will not be a release of that branch soon, I guess.

So, what’s new on pyuv 0.8.0 then?

  • FSPoll handle, for polling for file changes using the stat syscall
  • Several fixes in the fs module for Windows
  • Bugfixes that came with libuv

For version 0.9, apart from following libuv’s development I have plans to make the following changes:

  • Move the getaddrinfo function to the util module
  • Overhaul the dns module, provide a raw wrapper on top of c-ares
    instead of exposing a complete DNS resolver (I’ll cover this in a
    future blog post, when I approach the implementation)

Last but not least, I’d like to thank again the NodeJS and libuv developers, they are really doing a great job!

:wq

Integrating Twisted and Tornado with pyuv

With yesterday’s pyuv release the door was open for integrating pyuv with other event loops or applications. Thanks to the Poll handle we can now create a regular socket in Python and put it in pyuv’s event loop, so we can also use pyuv to replace other event loops ;–)

I created a couple of projects for toying around with this feature. The first project implements a Tornado IOLoop which runs on top of pyuv, and the second one implements a Twisted reactor on top of pyuv.

They are not feature complete yet, but basics are working and I’ll be adding more features as time allows.

Go check them out on GitHub!

:wq

pyuv 0.7.0 released!

It’s been a while since I haven’t mentioned pyuv, let me show you what the new version has to offer.

But first, I’d like to thank Ben Noordhuis, Bert Belder and all contributors to the libuv project for such a great job, the library is pleasure to work with and it gets better every week!

Having that said, pyuv 0.7.0 is out and full of nice features, check them out here. To highlight some:

  • Ability to poll arbitrary file descriptors
  • Overhauled reference counting scheme
  • Python 3 support (was added in a previous release)
  • Windows support (was added in a previous release)
  • And more!

You can download pyuv from the GitHub repository or just install it with pip:

pip install pyuv

Documentation is available here as always.

As of right now pyuv implements all features available on libuv, which took a while to accomplish, but we are finally there :–)

I’ll be publishing a couple of projects built with pyuv shortly, so stay tuned!

:wq