Python data structures

Last night I was browsing pyvideo and found some really interesting videos on Python built-in data structures and their advanced usage.

Here are the three particular videos I liked:

  • “The Mighty Dictionary” by Brandon Rhodes: link
  • “Mastering Team Play: Four powerful examples of composing Python tools” by Raymond Hettinger: link
  • “The Data Structures of Python” by Alex Gaynor: link

If you haven’t watched them yet, do it.


pythonz: a Python installation manager

It’s been a while since I haven’t posted anything around here, so it’s about time :–)

Today I’m releaseing pythonz a Python installation manager, which works for CPython, Stackless and PyPy.

It’s a fork of pythonbrew, with some features removed, some bugfixes and also some new features. The main goal of this project is to provide the a way for installing different Python interpreters. Just that. How to pick one and use it afterwards is out of the scope of the project, for that I recommend virtualenvvirtualenvwrapper.

Lets see it in action!

First lets install pythonz:

curl -kL | bash

Once pythonz is installed add the following line to your .bashrc:

[[ -s $HOME/.pythonz/etc/bashrc ]] && source $HOME/.pythonz/etc/bashrc

Also lets source it now in order to start using it:

source ~/.pythonz/etc/bashrc

Lets install some pythons!

pythonz install --type cpython --no-test 2.7.2
pythonz install --type stackless --no-test  2.7.2
pythonz install --type pypy --url 1.8

With the above commands you’ll get CPython 2.7.2, Stackless 2.7.2 and PyPy 1.8 installed on your system.

Note that PyPy is installed in binary form, so you’ll need to indicate the correct URL, matching the appropriate version for your architecture.

Now lets create some virtualenvs with these Python versions we just installed:

mkvirtualenv -p ~/.pythonz/pythons/CPython-2.7.2/bin/python cpython272
mkvirtualenv -p ~/.pythonz/pythons/Stackless-2.7.2/bin/python stackless272
mkvirtualenv -p ~/.pythonz/pythons/PyPy-1.8/bin/python pypy18

In order to use one of our newly created virtualenvs we can now use workon and deactivate as usual:

saghul@hal:~$ workon pypy18
(pypy18)saghul@hal:~$ python
Python 2.7.2 (0e28b379d8b3, Feb 09 2012, 18:31:14)
[PyPy 1.8.0 with GCC 4.2.1] on darwin
Type "help", "copyright", "credits" or "license" for more information.
And now for something completely different: ``Therefore, specific information,
I was in an ideal context, I had to realize the faith''
(pypy18)saghul@hal:~$ deactivate

Hope you enjoy using it as much as I do :–)

Get the source on GitHUb!


pyuv 0.2.0 released

It’s been a while since I released pyuv. In this time the libuv guys have been really busy adding lots of cool stuff and with this release pyuv now wraps most parts of libuv. Here is a short changelog highlighting the important additions:

  • Made the default loop a singleton
  • Added TTY handle
  • Moved all exception definitions to a standalone file
  • Added set_membership function to UDP handle
  • Added ability to write a list of strings to IOStream objects
  • Added ability to send lists of strings on UDP handles
  • Added open function to Pipe handle
  • Added Process handle
  • Added ‘data’ attribute to all handles for storing arbitrary objects
  • Refactored ThreadPool
  • Implemented pending_instances function on Pipe handle
  • Implemented nodelay, keepalive and simultaneous_accepts functions
    on TCP handle
  • Added ‘counters’ attribute to Loop
  • Added ‘poll’ function to Loop
  • Added new functions to fs module: unlink, mkdir, rmdir, rename,
    chmod, fchmod, link, symlink, readlink, chown, fchown, fstat
  • Added new functions to util module: uptime, get_process_title,
    set_process_title, resident_set_size, interface_addresses, cpu_info

For the next release I plan to add the missing functions for asynchronous filesystem operations and work on Windows support (I even got a pull request to start with, yay!). Stay tuned!

Check it out!



pyuv: Python bindings for libuv

After working on this on-and-of on my free time I’m happy to share it. pyuv is a Python interface to libuv, a high-performance, portable library for asynchronous network communications and more. libuv is the platform layer for the well known NodeJS.

pyuv implements the following features:

  • Asynchronous TCP
  • UDP
  • Named pipes
  • Asynchronous DNS resolver
  • Timers
  • Running operations in a ThreadPool
  • Async handle
  • Prepare, idle and check handles
  • Signal handle
  • System memory information
  • System load information
  • Getting executable path
  • Asynchronous filesystem operations (stat, lstat)

libuv implements more features, which I plan to implement as time allows. In the meanwhile grab the source and have a look at the examples on the GitHub repository.

Documentation is available through ReadTheDocs here.

Hope you like it! :–)


A ZeroMQ to WebSocket gateway, take 2

A while ago I posted a way to build a gateway between ZeroMQ and WebSocket. That mechanism required a custom WebSocket server which has to be maintained and I wasn’t really happy with it anyway so I thought I’d find another way.

Moreover, WebSocket is probably not enough. Since the WebSocket specification has changed a lot, different browsers have implemented different versions of the draft. WebSocket could even be disabled god know why, so we need a fallback mechanism. And then I discovered SocketIO. With SocketIO you can establish a persistent connection between a browser and a server in many different ways: WebSocket, flash socket, long polling, etc. This is great stuff.

If I’m not mistaken SocketIO was born for NodeJS but then was ported to some other programming languages and frameworks. The one I’m particularly interested in is TornadIO. TornadIO is a SocketIO implementation on top of Tornado. In case you have lived in a cave for the last several years ;–) you should know that Tornado is the webserver that powered FriendFeed, which was acquired by Facebook.

So, we now have a nice persistent connection between the browser (using SocketIO JavaScript library) and our server (using TornadIO), but how do we add ZeroMQ to the mix? It couldn’t be easier: ZeroMQ added a way to integrate its event loop with the one in Tornado, so there is really nothing else to be done to integrate them. :–)

Here is an example of a ZeroMQ – WebSocket (SocketIO really) gateway built as I described above. The client will connect to the server (using the web browser) through SocketIO and will just sit there waiting to get data. The server will push anything that arrives through the ZeroMQ socket to all clients connected through SocketIO and the messages will just get printed on the screen. It’s a simple example, but you get the idea ;–)



Experimenting with metaclasses and thread-local storage

Today I found myself thinking about a straightforward way of having “thread-local singletons”, that is, only one instance of a given class per thread. I don’t have a clear usercase for this, but I think it could be applied to something like a “one event loop per thread” concept, for example.

This can be done easily by saving an instance in a module level variable stored in a threading.local container, but I wanted to explicitly try to create the object and always get the same instance back, assuming I’m on the same thread.

Here is the solution I came up with:


A metaclass is used to either create the new class instance or return the existing one from the thread-local storage. Since the class will only be instantiated once (the first time) I have chosen to ignore all initialization attributes, otherwise code would look very weird, because regardless of the initialization attributes, the same object would be returned. This could be improved by keeping a mapping between initialization attributes tuples and object instances, though.

Anyhow, this was a 5 minues quick-and-dirty hack, but do let me know if you hate it a lot :–)


Don’t forget __ne__

Yesterday while going through some code with a coworker, we lost 5 minutes trying to find why 2 objects that looked the same where actually not evaluated equal. We did have the __eq__ method defined on the class, yet the comparison was evaluating to False. Why?

Actually we were not testing for equality but for difference, so I thought: “hum, is ne required in this case?”. Actually it is. I felt stupid for not remembering it, but fortunately the Python documentation contained a great explanation:

There are no implied relationships among the comparison operators. The truth of x==y does not imply that x!=y is false. Accordingly, when defining __eq__(), one should also define
__ne__() so that the operators will behave as expected.

Lets see a quick example showing the failure, and hopefully remember it for the next time:



Implementing a ZeroMQ – WebSocket gateway

In the last post we saw a simple PubSub application using ZeroMQ. Today we are going to extend that and publish all the data we get from the ZeroMQ publishers to web clients by using WebSocket.

First problem I ran into was choosing a WebSocket server to suit my needs. I had a look at EventletGeventtxWebsocket and some others, but since what I wanted was quite simple I decided to go and write my own, in order to avoid such dependencies. Communicating both worlds (ZeroMQ and WebSocket) seemed like the big problem, so I thought the best way to solve it would be to put all sockets in a polling loop running on its own thread and avoid possible multithreading issues. This works thanks to the ZeroMQ Poller, which is able to poll ZeroMQ sockets and anything which defines a fileno() method. Awesome stuff.

Since the single threaded polling experiment worked I decided to make a small standalone project (without ZeroMQ, using select.poll instead) with it for possible future stuff. You can find it here.

Back to our gateway, lets go and check the code:



As you may have guessed, its the same simple multicast producer we saw on the previous post.

client web page:


consumer and WebSocket server:


I had never used poll nor ZeroMQ Poller before, but they were both pretty easy to get started with. In a nutshell, the server will take every piece of data it gets from the ZeroMQ socket and broadcast it to all the connected web clients through web sockets. The gateway is one way only, the input coming from the web sockets is not sent anywhere, I’ll leave that as an experiment for the reader. :–)

To test it just run the server and serve the page. You can use this to serve the page.


Implementing a PubSub based application with Python and ZeroMQ

A while back I came across this awesome post by Nicholas Piël about ZeroMQ and I added test ZeroMQ to my never ending ToCheck list. Today I found the excuse to check it out and here is what I did with it.

I wanted to implement a producer-consumer application to aggregate data contained in an arbitrary number of files in an arbitrary number of nodes. ZeroMQ seemed like a good tool for this, so lets see a helloworld example: producer:




As you can see the code is very straightforward. We create a socket, declare it as a publisher (PUB) or subscriber (SUB) accordingly and subscribe to the topics which we care about (in the case of the subscriber).

Looks like we are writing or reading to/from a standard socket, but this socket is on steroids. You can write to it even if the other end is not connected, all the queuing logic has been taken care for you, it accepts multiple connections, … and many many other things.

UPDATE: After re-reading the docs it seems that ZeroMQ sockets are not really thread-safe unless you pass them using a “full memory barrier”. I guess the fact that Python has the GIL makes my example not crash :–)

ZeroMQ is also thread-safe, and this makes our code a lot simpler than it would if ZeroMQ weren’t thread safe. Lets see an example with 10 threads simultaneously sending messages through a ZeroMQ socket without any locking mechanism.


The approach we took is very flexible in the sense that any number of producers may connect to our consumer, but they need to know where (in terms of IP and port) our consumer is or will be. Could we change that somehow?

Yes we can. We can use multicast for this. We can make all producers join a multicast group and send their messages there. Then the consumer would also join the same group and get them. Of course this only makes sense in a controlled environment where data is not critical or no non-authorized party is able to join the multicast group.

The funny thing is that we only need to change one line of code in the producer and another one in the consumer in order to make them work in multicast mode.





I’ve started a small project I’ll be working on in my free time, hope to post something about it soon :–)