Implementing a PubSub based application with Python and ZeroMQ

A while back I came across this awesome post by Nicholas Piël about ZeroMQ and I added test ZeroMQ to my never ending ToCheck list. Today I found the excuse to check it out and here is what I did with it.

I wanted to implement a producer-consumer application to aggregate data contained in an arbitrary number of files in an arbitrary number of nodes. ZeroMQ seemed like a good tool for this, so lets see a helloworld example: producer:

[gist]https://gist.github.com/1046936[/gist]

consumer:

[gist]https://gist.github.com/1046938[/gist]

As you can see the code is very straightforward. We create a socket, declare it as a publisher (PUB) or subscriber (SUB) accordingly and subscribe to the topics which we care about (in the case of the subscriber).

Looks like we are writing or reading to/from a standard socket, but this socket is on steroids. You can write to it even if the other end is not connected, all the queuing logic has been taken care for you, it accepts multiple connections, … and many many other things.

UPDATE: After re-reading the docs it seems that ZeroMQ sockets are not really thread-safe unless you pass them using a “full memory barrier”. I guess the fact that Python has the GIL makes my example not crash :–)

ZeroMQ is also thread-safe, and this makes our code a lot simpler than it would if ZeroMQ weren’t thread safe. Lets see an example with 10 threads simultaneously sending messages through a ZeroMQ socket without any locking mechanism.

[gist]https://gist.github.com/1046942[/gist]

The approach we took is very flexible in the sense that any number of producers may connect to our consumer, but they need to know where (in terms of IP and port) our consumer is or will be. Could we change that somehow?

Yes we can. We can use multicast for this. We can make all producers join a multicast group and send their messages there. Then the consumer would also join the same group and get them. Of course this only makes sense in a controlled environment where data is not critical or no non-authorized party is able to join the multicast group.

The funny thing is that we only need to change one line of code in the producer and another one in the consumer in order to make them work in multicast mode.

producer:

[gist]https://gist.github.com/1046944[/gist]

consumer:

[gist]https://gist.github.com/1046947[/gist]

I’ve started a small project I’ll be working on in my free time, hope to post something about it soon :–)

:wq

Using any Python module in your Tropo scripting application

In case you didn’t know, Tropo is a set simple and powerful APIs that allow you to build rich voice, IM and text applications in the cloud. It supports many programming languages, Python being among them, of course. Development will also be free, so go and get an account to start playing with it!

With Tropo you can develop applications in 2 ways: hosted scripting or web API. In case you are using the web API, the code runs on your server, so there should be no problem there. When using hosted scripting applications, however, we face limitations imposed by the runtime. The Python interpreter is actually Jython 2.5 and we need to be aware of that. One of the things I really miss is some module to encode / decode JSON.

But hey, didn’t we say we could import modules directly from the web in the previous post? Lets just leverage that and build a test application.

My test application will just reply with a random Chuch Norris quote by using the JSON API provided by The Internet Chuck Norris Database. The good thing about Tropo is that you only need to write your application once, but it’s multi-channel: you may connect to it using Jabber and you’ll get a text back, or you may call it using SIP or Skype, and you’ll listen to the quote using text to speech. Cool uh?!

Lets see the code:

[gist]https://gist.github.com/1045544[/gist]

Code should be simple and self-explanatory, leaving out the importing magic. The application is running, you may test it:

Skype Voice: +990009369996104671
SIP Voice: sip:9996104671@sip.tropo.com
Jabber: saghul_test2@tropo.im

Happy Tropo-ing!

:wq

Importing Python modules from the web

Last week I came across this piece of code (sorry, I lost the original source, if you know it please let me know!) which uses GitHub raw mode and Python’s import hooks to import modules directly from GitHub.

I modified it slightly, to make it a bit more generic and allow you to import Python modules from arbitrary web URLs. Why would you want to do this? Is it safe? Well, you shouldn’t use this unless you are running your script in some restricted environment where you can’t import arbitrary modules. Security could be an issue here, so be careful!

Here is the code:

[gist]https://gist.github.com/1037345[/gist]

It contains 3 test cases that show how it could be used. I’ll post a ‘real world’ usage later today.

:wq

Name binding and UnboundLocalError in Python

Yesterday I ran into a silly error which wasn’t very obvious at first, so I thought I’d share. It happened after renaming some module imports, here is the code and the traceback:

[gist]https://gist.github.com/1028793[/gist]

As you can see, there is a module called hostand also a local variable called the same used in a for loop. Then why does Python think the first use of the host variable is done before its assignment? The answer can be found here in the Python reference guide. Let me highlight an important paragraph:

If a name binding operation occurs anywhere within a code block, all uses of the name within the block are treated as references to the current block. This can lead to errors when a name is used within a block before it is bound. This rule is subtle. Python lacks declarations and allows name binding operations to occur anywhere within a code block. The local variables of a code block can be determined by scanning the entire text of the block for name binding operations.

This means that because the host variable is assigned in the function, no matter where in the function, its scope is that function, and the global scope is not taken into account. One could think that adding global host at the top of the function would solve it. But that is plain wrong. If you do that, the host module will be overwritten by the last value of the host variable after iterating the for loop.

Then how do we solve it? There is no way to do it in Python 2.X, just change the name of either the imported module or the local variable. In Python 3 the nonlocal keyword can be used, as pointed out here.

:wq

Be careful with default function arguments

This is a quite common mistake in Python: using mutable types as default function arguments. Lets see how this happens.

First, here is an example of default function arguments with non-mutable types:

[gist]https://gist.github.com/1008875[/gist]

As you can see, the bar variable has a different id on each instance, they are in fact, different objects. Now lets see what happens if the default argument is a mutable type, like a list.

[gist]https://gist.github.com/1008874[/gist]

Oh, the bar variable is modified on both instances! It has the same id, son they are effectively references to the same object.

Why is this? In Python, default argument values are evaluated only once, when the file is parsed. So, a list object is created, and its reference is used as the default argument to every instance of the class we create.

In order to avoid this we should pass None as the default value and then set bar to the empty list if the passed argument is None:

[gist]https://gist.github.com/1008879[/gist]

:wq

TunnelIt, a reverse SSH forwarder

Hi! Today I want to show you a small project I’ve been working on during some free time. I called it TunnelIt and its a reverse SSH forwarder.

I borrowed inspiration from the popular Tunnlr service and built it using Twisted. Twisted is a really good tool for this purpose because apart from being a cross-platform asynchronous framework, it implements many well known protocols, such as SSH.

Code and instructions are up on the TunnelIt repository on GitHub, feel free to use it, fork it and send pull requests :–)

Happy tunneling!

:wq

Serializing DB operations with Twisted and SQLObject

Long time no write! A while ago I wrote a post about how to combine Twisted with SQLObject. The method describe there, however, doesn’t work particularly well when using SQLite.

Since SQLite databases are a single file (or a memory area) some operations will need to lock when writing. In the other post we used deferToThread, which will run a function on a different thread and return a Deferred which will fire when the operation is finished. The thread on which the operation will be run is taken from the reactor thread pool, so it’s not a single thread. This means that if more than one write operation is executed almost at the same time, the first operation will get the lock, and the others will need to wait. When using SQLObject, the operation will fail if the lock can’t be acquired, so let’s see what we can do about it.

One option would be to use the timeout parameter and set it to some reasonable value, so that the operation would wait that amount of time before giving up on the lock. It’s not 100% guaranteed that it will get it, so this could be complemented with a wait-retry strategy.

Or we can just run all database operations in a single thread. This way only one operation will be executed at a time so we avoid the issue. Also, since we return deferreds nothing will be blocked waiting for results.

In order to do this I chose to use Twisted’s ThreadPoolclass, with a single thread. The reason for doing it this way is that Twisted provides us with nice functions to run operations on one of these pools and return a deferred.

First we setup our thread pool:

[gist]https://gist.github.com/994097[/gist]

We will need to stop the thread pool when our application ends, hence the call to addSystemEventTrigger. It will run the specified function when the reactor is about to be stopped.

The we’ll create a decorator which we’ll use to decorate functions that will run on the thread pool:

[gist]https://gist.github.com/994098[/gist]

Since the thread pool only has a single thread we are effectively serializing all database operations, if they are called using our decorator. Lets see a complete example:

[gist]https://gist.github.com/994096[/gist]

Hope you find it useful!

:wq

lighttpd + PHP-FPM on Debian Squeeze

I’m not a super-BOFH but I do like to host my own blog (not this on, though) and play with stuff.

In the past I used to use Apache, but I never liked the VirtualHosts configuration, specially when it gets big. Don’t get me wrong, I’m not saying its not right, its just not right for me. So I switched to lighttpd and I loved the configuration. It also has good performance.

This lighttpd server hosts a couple of WordPress blogs among other stuff and I noticed some slowness from time to time. Since its not critical to me I didn’t bother much. But then, reading some of my RSS feeds I heard about PHP-FPM. Never heard of it before.

It will run as a system daemon and take care of spawning PHP processes for handling requests when necessary, among other things. Its performance should be better than old FastCGI, so I decided to give it a try.

There is no PHP-FPM package on Debian Squeeze, but fortunately we can use the Dotdeb respository which does have one. Follow the instructions here and install PHP-FPM:

apt-get install php5-fpm

You may also want to upgrade all your PHP related packages, Dotdeb
contains good and updated packages.

Now we need to configure lighttpd to use PHP-FPM. Lets create a
configuration file for that in /etc/lighttpd/conf-available with the
following content:

[gist]https://gist.github.com/936469[/gist]

I suggest you name the file 10-fastcgi-fpm.conf. Now lets disable old FastCGI and enable our new module:

lighttpd-disable-mod fastcgi
lighttpd-disable-mod fastcgi-php
lighttpd-enable-mod fastcgi-fpm

And last, restart the server:

/etc/init.d/lighttpd force-reload

I didn’t benchmark it thoroughly, but my rtGUI page used to take up to 10 seconds to load and now it takes less than a second. Not bad!

:wq!

Checking Google Voice SIP service availability

During the past weeks Google has been turning on and off the inbound SIP service in Google Voice. Suddenly you woke up and saw someone in Twitter claiming that it worked, and hours later it just didn’t work anymore. Then I thought about making a way to ping Google Voice servers and getting this information in a nice way.

When inbound SIP support is enabled Google Voice servers (which run OpenSIPS, by the way) respond to OPTIONS requests, so my idea was very simple: send a SIP OPTIONS message to GV servers and print the result somewhere. This somewhere had to be the web. But I didn’t want to host a website, so I thought “hey, maybe it’s time to try that Google App Engine!”.

Applications run in a sandboxed environment on GAE, so you can’t have extension modules or even access the socket module. But you can make outgoing HTTP requests. With these limitations in mind this is what I came up with:

GAE has cron job support, that is, it can do a GET request at regular intervals. So why not leverage this and make a GET request every 5 minutes to some web service which tells us the GV SIP service status and store it? This way we don’t need to connect anywhere every time someone wants to see this data, because is kept up to date by the cron job.

This required a component which must able to handle HTTP requests and generate SIP OPTIONS requests. I didn’t have this, but since I work every day with the python-sipsimple library and I know some Twisted it took me very little time to build one. I called it SIPwPing and it’s available on my GitHub.

Some of you may think this is completely over-engineered. And you are right. But I wanted to play with GAE and this was a very good excuse to do so. 🙂

Application is now live running at http://gvoice-sip-status.appspot.com and the source code will be available on my GitHub account later today.

PS: The application is using the free plan and if any quota is exceeded at some point I’ll not bother much, since I already had the fun making it work 🙂

:wq

Gae_gv_error
Gae_gv_unknown
Gae_gv_ok

FOSDEM: awesomeness^3

It has been a very long weekend, I’m writing this on the train while traveling back from FOSDEM. I knew the event was big, but I couldn’t imagine it was that big.

Everything was so perfect: the talks, the speakers, the facilities, the connection, … Specially the connection. On conferences usually network sucks because the access points can’t cope with the load or there is not enough bandwidth for everyone… who know. This was not the case at FOSDEM. Every device got a public IPv4 address and also a IPv6 one. We did simple test and downloaded 10GB from a FTP server. It took 2 minutes.

I was lucky and got the opportunity to give a talk in the Open Source Telephony Devroom about SIPSIMPLE SDK the SIP library we develop at work

The talk went through the implementation of different server applications for SylkServer a SIP application server we recently launched. Simple SIP client examples were also shown. The presentation is available for viewing and downloading here:

[slideshare id=6830569&doc=developingrichsipapplicationswithsipsimple-110206091415-phpapp02]

The code examples used for the presentation can be downloaded from my GitHub repository.

Once more, kudos to the organization os FOSDEM, it was a really great event, I would definitely recommend it to anyone. Hope to be there again next year!

:wq