Name binding and UnboundLocalError in Python

Yesterday I ran into a silly error which wasn’t very obvious at first, so I thought I’d share. It happened after renaming some module imports, here is the code and the traceback:

[gist]https://gist.github.com/1028793[/gist]

As you can see, there is a module called hostand also a local variable called the same used in a for loop. Then why does Python think the first use of the host variable is done before its assignment? The answer can be found here in the Python reference guide. Let me highlight an important paragraph:

If a name binding operation occurs anywhere within a code block, all uses of the name within the block are treated as references to the current block. This can lead to errors when a name is used within a block before it is bound. This rule is subtle. Python lacks declarations and allows name binding operations to occur anywhere within a code block. The local variables of a code block can be determined by scanning the entire text of the block for name binding operations.

This means that because the host variable is assigned in the function, no matter where in the function, its scope is that function, and the global scope is not taken into account. One could think that adding global host at the top of the function would solve it. But that is plain wrong. If you do that, the host module will be overwritten by the last value of the host variable after iterating the for loop.

Then how do we solve it? There is no way to do it in Python 2.X, just change the name of either the imported module or the local variable. In Python 3 the nonlocal keyword can be used, as pointed out here.

:wq

Be careful with default function arguments

This is a quite common mistake in Python: using mutable types as default function arguments. Lets see how this happens.

First, here is an example of default function arguments with non-mutable types:

[gist]https://gist.github.com/1008875[/gist]

As you can see, the bar variable has a different id on each instance, they are in fact, different objects. Now lets see what happens if the default argument is a mutable type, like a list.

[gist]https://gist.github.com/1008874[/gist]

Oh, the bar variable is modified on both instances! It has the same id, son they are effectively references to the same object.

Why is this? In Python, default argument values are evaluated only once, when the file is parsed. So, a list object is created, and its reference is used as the default argument to every instance of the class we create.

In order to avoid this we should pass None as the default value and then set bar to the empty list if the passed argument is None:

[gist]https://gist.github.com/1008879[/gist]

:wq

TunnelIt, a reverse SSH forwarder

Hi! Today I want to show you a small project I’ve been working on during some free time. I called it TunnelIt and its a reverse SSH forwarder.

I borrowed inspiration from the popular Tunnlr service and built it using Twisted. Twisted is a really good tool for this purpose because apart from being a cross-platform asynchronous framework, it implements many well known protocols, such as SSH.

Code and instructions are up on the TunnelIt repository on GitHub, feel free to use it, fork it and send pull requests :–)

Happy tunneling!

:wq

Serializing DB operations with Twisted and SQLObject

Long time no write! A while ago I wrote a post about how to combine Twisted with SQLObject. The method describe there, however, doesn’t work particularly well when using SQLite.

Since SQLite databases are a single file (or a memory area) some operations will need to lock when writing. In the other post we used deferToThread, which will run a function on a different thread and return a Deferred which will fire when the operation is finished. The thread on which the operation will be run is taken from the reactor thread pool, so it’s not a single thread. This means that if more than one write operation is executed almost at the same time, the first operation will get the lock, and the others will need to wait. When using SQLObject, the operation will fail if the lock can’t be acquired, so let’s see what we can do about it.

One option would be to use the timeout parameter and set it to some reasonable value, so that the operation would wait that amount of time before giving up on the lock. It’s not 100% guaranteed that it will get it, so this could be complemented with a wait-retry strategy.

Or we can just run all database operations in a single thread. This way only one operation will be executed at a time so we avoid the issue. Also, since we return deferreds nothing will be blocked waiting for results.

In order to do this I chose to use Twisted’s ThreadPoolclass, with a single thread. The reason for doing it this way is that Twisted provides us with nice functions to run operations on one of these pools and return a deferred.

First we setup our thread pool:

[gist]https://gist.github.com/994097[/gist]

We will need to stop the thread pool when our application ends, hence the call to addSystemEventTrigger. It will run the specified function when the reactor is about to be stopped.

The we’ll create a decorator which we’ll use to decorate functions that will run on the thread pool:

[gist]https://gist.github.com/994098[/gist]

Since the thread pool only has a single thread we are effectively serializing all database operations, if they are called using our decorator. Lets see a complete example:

[gist]https://gist.github.com/994096[/gist]

Hope you find it useful!

:wq

Checking Google Voice SIP service availability

During the past weeks Google has been turning on and off the inbound SIP service in Google Voice. Suddenly you woke up and saw someone in Twitter claiming that it worked, and hours later it just didn’t work anymore. Then I thought about making a way to ping Google Voice servers and getting this information in a nice way.

When inbound SIP support is enabled Google Voice servers (which run OpenSIPS, by the way) respond to OPTIONS requests, so my idea was very simple: send a SIP OPTIONS message to GV servers and print the result somewhere. This somewhere had to be the web. But I didn’t want to host a website, so I thought “hey, maybe it’s time to try that Google App Engine!”.

Applications run in a sandboxed environment on GAE, so you can’t have extension modules or even access the socket module. But you can make outgoing HTTP requests. With these limitations in mind this is what I came up with:

GAE has cron job support, that is, it can do a GET request at regular intervals. So why not leverage this and make a GET request every 5 minutes to some web service which tells us the GV SIP service status and store it? This way we don’t need to connect anywhere every time someone wants to see this data, because is kept up to date by the cron job.

This required a component which must able to handle HTTP requests and generate SIP OPTIONS requests. I didn’t have this, but since I work every day with the python-sipsimple library and I know some Twisted it took me very little time to build one. I called it SIPwPing and it’s available on my GitHub.

Some of you may think this is completely over-engineered. And you are right. But I wanted to play with GAE and this was a very good excuse to do so. 🙂

Application is now live running at http://gvoice-sip-status.appspot.com and the source code will be available on my GitHub account later today.

PS: The application is using the free plan and if any quota is exceeded at some point I’ll not bother much, since I already had the fun making it work 🙂

:wq

Gae_gv_error
Gae_gv_unknown
Gae_gv_ok

Managing timezone aware datetime objects

Dealing with date and time can be quite tricky. Specially if we need to convert local times from different timezones. Lets see what standard Python datetime module does:

[gist]https://gist.github.com/805087[/gist]

The generated timestamp is in ISO8601 format, well, not really, microsecds should be removed, but that’s not the biggest problem. The problem is that the offset from UTC is not added. If a timestamp is generated in a local form, the offset to UTC must be provided, lets see an example:

2011-02-01T00:56:23+01:00

That means that the local time is 00:56:23, which is +01:00 hours ahead UTC. Regular Python datetime objects are not timezone aware. Getting proper timezone information is a tough matter, but fortunately the dateutil module will help. Lets see it in action:

[gist]https://gist.github.com/805107[/gist]

Hey! Now the timestamp is generated correctly! (I set the microsecond to 0 so that its fully ISO 8601 compliant). We can also use the dateutil module to parse a timestamp and generate a timezone-aware object out of it:

[gist]https://gist.github.com/805117[/gist]

One last note: if you need to perform operations such as calculating the difference between two datetime objects, they both need to be timezone aware, otherwise bad things happen:

[gist]https://gist.github.com/805121[/gist]

:wq

Using SQLObject with Twisted

Over the past week I’ve been working on a small personal project with Twisted. I need to access a database to store and retrieve data, so I started with the obvious, using APIs provided by Twisted.

Twisted provides a database API called adbapi. The API is pretty straightforward, and the operations I wanted to perform were not rocket science anyway, so it served the purpose, but I wans’t 100% satisfied.

I was using the runQuery function, mainly, and putting there a regular SQL statement. I didn’t like that. Then I remembered SQLObject.

SQLObject is a ORM, providing an object oriented API to several databases (I was aiming at MySQL and SQLite). This is what I was looking for. But there is a problem: accessing a database is a blocking operation.

Twisted uses the reactor pattern thus you can’t run a blocking operation in the event loop’s thread without affecting the whole program. Database accessing libraries tend to be blocking, so Twisted runs database operations in another thread and then gets results in a callback in the main thread. This makes the illusion of non-blocking database access.

In order to integrate SQLObject nicely with Twisted this is exactly what we want to do. We’ll defer all database operations to another thread and we’ll get results (or failures) in callback functions. The key function here is deferToThread which will run the specified function in the reactor thread pool and return a deferred. In order to make our life easier we’ll use a decorator which will run the decorated function in the reactor’s thread pool and return a deferred:

[gist]https://gist.github.com/793936[/gist]

Now lets see a simple (yet full) example of how all this works:

[gist]https://gist.github.com/793982[/gist]

As you can see the results (or errors) are retrieved in the got_result/got_error callback functions asynchronously, and as the operation was executed in a different thread this didn’t affect the main event loop.

:wq

Setting core dump limits from Python

It’s very annoying for a program to crash, but it happens. When it happens if the system was configured to dump the core you’ll get a core file with hopefully enough information to debug the crash.

But we don’t know if the system will be configured to dump the core, so we may want to do this from our Python program. We can use the resource module in the standard library. The following example shows how to enable core dumps if —dump-core option is specified:

[gist]https://gist.github.com/772101[/gist]

PS: This worked just fine for the root user but didn’t seem to do the job for a normal user.

Implementing registry pattern with class decorators

Registry is a quite common design pattern which I recently needed to implement and I found Python class decorators very useful so I thought I’d write about it. :–)

In the registry pattern we have a global object we call the registry or registrar which contains references to shared objects and it’s the only entity which can access those.

In my case, I had a plugin style architecture in which different plugins might be dynamically added without any configuration. The first implementation I made used a metaclass to add the plugin class to the registry:

[gist]https://gist.github.com/771884[/gist]

That worked, at least for the beginning. But later I found a limitation: plugins needed the Plugin metaclass, so I couldn’t use another metaclass on them if I needed to. And at some point I did feel the need to do that. I wanted the plugins to use the Singleton metaclass (to implement the Singleton pattern) but then I couldn’t use the Plugin metaclass because only one metaclass can be used (and I didn’t want to create a SingletonPlugin metaclass). And class decorators saved the day:

[gist]https://gist.github.com/771888[/gist]

Class decorators are supported only in Python >= 2.6, but that wasn’t a problem. :–)

:wq

lambda vs. functools.partial

In Python we have a way for creating anonymous functions at runtime: lambda. With lambda we can create a function that will not be bond to any name. That is, we don’t need to def a function, and lambda can be used instead.

[gist]https://gist.github.com/707898[/gist]

Lambda can also be used to call another function by fixing certain argument:

[gist]https://gist.github.com/707899[/gist]

However, lambda functions use late binding for the arguents they get. That is, when you create a lambda function passing a variable as an argument, it’s not immediately copied, a reference to the scope is kept and the value is resolved when the function is called. Thus, if the value of that argument changes within that scope the lambda function will be called with the changed value. Lets see it in action:

[gist]https://gist.github.com/707901[/gist]

Unexpected? It depends, but it was definitely unexpected when I run into it. The solution for this is to bind early, by copying the value at the time of the creation of the function. We can do this 2 ways:

Using a fixed parameter in the lambda function:

[gist]https://gist.github.com/707903[/gist]

Using functools.partial:

[gist]https://gist.github.com/707904[/gist]

I personally like the second approach, using partial, since its more explicit and I really see option one as a workaround to how lambda functions work.