A while ago I was bored at home and decided to play a bit with the Twitter API. As a result I created a simple Twitter Bot which retewitted everything with the *#asterisk* hashtag (http://twitter.com/asteriskbot).>
It’s functionality was very simple andI didn’t even touch the (crappy) code for months, but Twitter decided to drop the basic authentication support in favor of OAuth so it was the right time for updating the bot.
Since the bot was created lots of other bots have, and it feels annoying to get the same tweet over and over again, so I though I’d try to make the bot a bit *smarter*. The main goal was to detect if a tweet was already tweeted (or RTd) not to repeat things. AFAIK there are 2 types of retweets:
- “some text RT @user1 RT @user2 some super cool tweet here!”
- “some super cool tweet! (via @user)”
The current bot detects both formats by using regular expressions and extracts the tweet itself. However, I felt this could not be enough, because we could get the retweet *before* the real tweet so it’s be great to save the entire tweet and then look for it.
As the bot was using a SQLite database I decided to use its *Full Text Search* (fts3) capability, inspired by a colleague. With FTS searching for an entire string is amazingly faster than doing a regular SELECT query. Here is an example taken from the SQLite website http://www.sqlite.org/fts3.html
CREATE VIRTUAL TABLE enrondata1 USING fts3(content TEXT); /* FTS3 table */
CREATE TABLE enrondata2(content TEXT); /* Ordinary table */
SELECT count(*) FROM enrondata1 WHERE content MATCH 'linux'; /* 0.03 seconds */
SELECT count(*) FROM enrondata2 WHERE content LIKE '%linux%'; /* 22.5 seconds */
The (silly) code for this bot is available on GitHub: http://github.com/saghul/twitterbot
Happy tweeting!