Importing Python modules from the web

Last week I came across this piece of code (sorry, I lost the original source, if you know it please let me know!) which uses GitHub raw mode and Python’s import hooks to import modules directly from GitHub.

I modified it slightly, to make it a bit more generic and allow you to import Python modules from arbitrary web URLs. Why would you want to do this? Is it safe? Well, you shouldn’t use this unless you are running your script in some restricted environment where you can’t import arbitrary modules. Security could be an issue here, so be careful!

Here is the code:

[gist]https://gist.github.com/1037345[/gist]

It contains 3 test cases that show how it could be used. I’ll post a ‘real world’ usage later today.

:wq

Name binding and UnboundLocalError in Python

Yesterday I ran into a silly error which wasn’t very obvious at first, so I thought I’d share. It happened after renaming some module imports, here is the code and the traceback:

[gist]https://gist.github.com/1028793[/gist]

As you can see, there is a module called hostand also a local variable called the same used in a for loop. Then why does Python think the first use of the host variable is done before its assignment? The answer can be found here in the Python reference guide. Let me highlight an important paragraph:

If a name binding operation occurs anywhere within a code block, all uses of the name within the block are treated as references to the current block. This can lead to errors when a name is used within a block before it is bound. This rule is subtle. Python lacks declarations and allows name binding operations to occur anywhere within a code block. The local variables of a code block can be determined by scanning the entire text of the block for name binding operations.

This means that because the host variable is assigned in the function, no matter where in the function, its scope is that function, and the global scope is not taken into account. One could think that adding global host at the top of the function would solve it. But that is plain wrong. If you do that, the host module will be overwritten by the last value of the host variable after iterating the for loop.

Then how do we solve it? There is no way to do it in Python 2.X, just change the name of either the imported module or the local variable. In Python 3 the nonlocal keyword can be used, as pointed out here.

:wq

Be careful with default function arguments

This is a quite common mistake in Python: using mutable types as default function arguments. Lets see how this happens.

First, here is an example of default function arguments with non-mutable types:

[gist]https://gist.github.com/1008875[/gist]

As you can see, the bar variable has a different id on each instance, they are in fact, different objects. Now lets see what happens if the default argument is a mutable type, like a list.

[gist]https://gist.github.com/1008874[/gist]

Oh, the bar variable is modified on both instances! It has the same id, son they are effectively references to the same object.

Why is this? In Python, default argument values are evaluated only once, when the file is parsed. So, a list object is created, and its reference is used as the default argument to every instance of the class we create.

In order to avoid this we should pass None as the default value and then set bar to the empty list if the passed argument is None:

[gist]https://gist.github.com/1008879[/gist]

:wq

TunnelIt, a reverse SSH forwarder

Hi! Today I want to show you a small project I’ve been working on during some free time. I called it TunnelIt and its a reverse SSH forwarder.

I borrowed inspiration from the popular Tunnlr service and built it using Twisted. Twisted is a really good tool for this purpose because apart from being a cross-platform asynchronous framework, it implements many well known protocols, such as SSH.

Code and instructions are up on the TunnelIt repository on GitHub, feel free to use it, fork it and send pull requests :–)

Happy tunneling!

:wq

Serializing DB operations with Twisted and SQLObject

Long time no write! A while ago I wrote a post about how to combine Twisted with SQLObject. The method describe there, however, doesn’t work particularly well when using SQLite.

Since SQLite databases are a single file (or a memory area) some operations will need to lock when writing. In the other post we used deferToThread, which will run a function on a different thread and return a Deferred which will fire when the operation is finished. The thread on which the operation will be run is taken from the reactor thread pool, so it’s not a single thread. This means that if more than one write operation is executed almost at the same time, the first operation will get the lock, and the others will need to wait. When using SQLObject, the operation will fail if the lock can’t be acquired, so let’s see what we can do about it.

One option would be to use the timeout parameter and set it to some reasonable value, so that the operation would wait that amount of time before giving up on the lock. It’s not 100% guaranteed that it will get it, so this could be complemented with a wait-retry strategy.

Or we can just run all database operations in a single thread. This way only one operation will be executed at a time so we avoid the issue. Also, since we return deferreds nothing will be blocked waiting for results.

In order to do this I chose to use Twisted’s ThreadPoolclass, with a single thread. The reason for doing it this way is that Twisted provides us with nice functions to run operations on one of these pools and return a deferred.

First we setup our thread pool:

[gist]https://gist.github.com/994097[/gist]

We will need to stop the thread pool when our application ends, hence the call to addSystemEventTrigger. It will run the specified function when the reactor is about to be stopped.

The we’ll create a decorator which we’ll use to decorate functions that will run on the thread pool:

[gist]https://gist.github.com/994098[/gist]

Since the thread pool only has a single thread we are effectively serializing all database operations, if they are called using our decorator. Lets see a complete example:

[gist]https://gist.github.com/994096[/gist]

Hope you find it useful!

:wq

lighttpd + PHP-FPM on Debian Squeeze

I’m not a super-BOFH but I do like to host my own blog (not this on, though) and play with stuff.

In the past I used to use Apache, but I never liked the VirtualHosts configuration, specially when it gets big. Don’t get me wrong, I’m not saying its not right, its just not right for me. So I switched to lighttpd and I loved the configuration. It also has good performance.

This lighttpd server hosts a couple of WordPress blogs among other stuff and I noticed some slowness from time to time. Since its not critical to me I didn’t bother much. But then, reading some of my RSS feeds I heard about PHP-FPM. Never heard of it before.

It will run as a system daemon and take care of spawning PHP processes for handling requests when necessary, among other things. Its performance should be better than old FastCGI, so I decided to give it a try.

There is no PHP-FPM package on Debian Squeeze, but fortunately we can use the Dotdeb respository which does have one. Follow the instructions here and install PHP-FPM:

apt-get install php5-fpm

You may also want to upgrade all your PHP related packages, Dotdeb
contains good and updated packages.

Now we need to configure lighttpd to use PHP-FPM. Lets create a
configuration file for that in /etc/lighttpd/conf-available with the
following content:

[gist]https://gist.github.com/936469[/gist]

I suggest you name the file 10-fastcgi-fpm.conf. Now lets disable old FastCGI and enable our new module:

lighttpd-disable-mod fastcgi
lighttpd-disable-mod fastcgi-php
lighttpd-enable-mod fastcgi-fpm

And last, restart the server:

/etc/init.d/lighttpd force-reload

I didn’t benchmark it thoroughly, but my rtGUI page used to take up to 10 seconds to load and now it takes less than a second. Not bad!

:wq!

Checking Google Voice SIP service availability

During the past weeks Google has been turning on and off the inbound SIP service in Google Voice. Suddenly you woke up and saw someone in Twitter claiming that it worked, and hours later it just didn’t work anymore. Then I thought about making a way to ping Google Voice servers and getting this information in a nice way.

When inbound SIP support is enabled Google Voice servers (which run OpenSIPS, by the way) respond to OPTIONS requests, so my idea was very simple: send a SIP OPTIONS message to GV servers and print the result somewhere. This somewhere had to be the web. But I didn’t want to host a website, so I thought “hey, maybe it’s time to try that Google App Engine!”.

Applications run in a sandboxed environment on GAE, so you can’t have extension modules or even access the socket module. But you can make outgoing HTTP requests. With these limitations in mind this is what I came up with:

GAE has cron job support, that is, it can do a GET request at regular intervals. So why not leverage this and make a GET request every 5 minutes to some web service which tells us the GV SIP service status and store it? This way we don’t need to connect anywhere every time someone wants to see this data, because is kept up to date by the cron job.

This required a component which must able to handle HTTP requests and generate SIP OPTIONS requests. I didn’t have this, but since I work every day with the python-sipsimple library and I know some Twisted it took me very little time to build one. I called it SIPwPing and it’s available on my GitHub.

Some of you may think this is completely over-engineered. And you are right. But I wanted to play with GAE and this was a very good excuse to do so. 🙂

Application is now live running at http://gvoice-sip-status.appspot.com and the source code will be available on my GitHub account later today.

PS: The application is using the free plan and if any quota is exceeded at some point I’ll not bother much, since I already had the fun making it work 🙂

:wq

Gae_gv_error
Gae_gv_unknown
Gae_gv_ok

FOSDEM: awesomeness^3

It has been a very long weekend, I’m writing this on the train while traveling back from FOSDEM. I knew the event was big, but I couldn’t imagine it was that big.

Everything was so perfect: the talks, the speakers, the facilities, the connection, … Specially the connection. On conferences usually network sucks because the access points can’t cope with the load or there is not enough bandwidth for everyone… who know. This was not the case at FOSDEM. Every device got a public IPv4 address and also a IPv6 one. We did simple test and downloaded 10GB from a FTP server. It took 2 minutes.

I was lucky and got the opportunity to give a talk in the Open Source Telephony Devroom about SIPSIMPLE SDK the SIP library we develop at work

The talk went through the implementation of different server applications for SylkServer a SIP application server we recently launched. Simple SIP client examples were also shown. The presentation is available for viewing and downloading here:

[slideshare id=6830569&doc=developingrichsipapplicationswithsipsimple-110206091415-phpapp02]

The code examples used for the presentation can be downloaded from my GitHub repository.

Once more, kudos to the organization os FOSDEM, it was a really great event, I would definitely recommend it to anyone. Hope to be there again next year!

:wq

Managing timezone aware datetime objects

Dealing with date and time can be quite tricky. Specially if we need to convert local times from different timezones. Lets see what standard Python datetime module does:

[gist]https://gist.github.com/805087[/gist]

The generated timestamp is in ISO8601 format, well, not really, microsecds should be removed, but that’s not the biggest problem. The problem is that the offset from UTC is not added. If a timestamp is generated in a local form, the offset to UTC must be provided, lets see an example:

2011-02-01T00:56:23+01:00

That means that the local time is 00:56:23, which is +01:00 hours ahead UTC. Regular Python datetime objects are not timezone aware. Getting proper timezone information is a tough matter, but fortunately the dateutil module will help. Lets see it in action:

[gist]https://gist.github.com/805107[/gist]

Hey! Now the timestamp is generated correctly! (I set the microsecond to 0 so that its fully ISO 8601 compliant). We can also use the dateutil module to parse a timestamp and generate a timezone-aware object out of it:

[gist]https://gist.github.com/805117[/gist]

One last note: if you need to perform operations such as calculating the difference between two datetime objects, they both need to be timezone aware, otherwise bad things happen:

[gist]https://gist.github.com/805121[/gist]

:wq

Using SQLObject with Twisted

Over the past week I’ve been working on a small personal project with Twisted. I need to access a database to store and retrieve data, so I started with the obvious, using APIs provided by Twisted.

Twisted provides a database API called adbapi. The API is pretty straightforward, and the operations I wanted to perform were not rocket science anyway, so it served the purpose, but I wans’t 100% satisfied.

I was using the runQuery function, mainly, and putting there a regular SQL statement. I didn’t like that. Then I remembered SQLObject.

SQLObject is a ORM, providing an object oriented API to several databases (I was aiming at MySQL and SQLite). This is what I was looking for. But there is a problem: accessing a database is a blocking operation.

Twisted uses the reactor pattern thus you can’t run a blocking operation in the event loop’s thread without affecting the whole program. Database accessing libraries tend to be blocking, so Twisted runs database operations in another thread and then gets results in a callback in the main thread. This makes the illusion of non-blocking database access.

In order to integrate SQLObject nicely with Twisted this is exactly what we want to do. We’ll defer all database operations to another thread and we’ll get results (or failures) in callback functions. The key function here is deferToThread which will run the specified function in the reactor thread pool and return a deferred. In order to make our life easier we’ll use a decorator which will run the decorated function in the reactor’s thread pool and return a deferred:

[gist]https://gist.github.com/793936[/gist]

Now lets see a simple (yet full) example of how all this works:

[gist]https://gist.github.com/793982[/gist]

As you can see the results (or errors) are retrieved in the got_result/got_error callback functions asynchronously, and as the operation was executed in a different thread this didn’t affect the main event loop.

:wq