I had the idea to get the Raspberry Pi to be a Twitter radio. Not novel and not very useful for the boat although vocalising NAVTEX messages as they arrive might be handy. Anyway it's fun.
In a previous XQuery version, I had used twitter search and the returned RSS feed, but in view of today's annoucement about Twitter dropping XML support, this may not be around for much longer. So I tried the json feed instead which was in fact easier to handle in Python anyway. e.g.
http://search.twitter.com/search.json?q=from:george_szirtes
The script polls the search results at intervals and parses the json into tweets. The text needs cleaning up to be suitable for TTS. To avoid repeating tweets, I retain the timestamp of the last tweet spoken and ignore tweets with a earlier timestamp. Tweets are reversed so they come out in chronological order:
#!/usr/bin/env python3 | |
import time, json, re, sys | |
from urllib.request import urlopen | |
from urllib.parse import quote_plus | |
from email.utils import parsedate_tz | |
# local modules | |
import speak | |
# todo : | |
# use a different tone to indicate @ and hash | |
# specify a voice so different channels can be distinguished | |
# eg ./tweets.py from:george_szirtes | |
# replacements wil be expanded as new abbreviations are encountered | |
replacements = { "U.S." : "United States" ,"&":"and","...":"etcetera"} | |
# RE to remove all the http links | |
removeLinks = re.compile('https?://(\S*)') | |
class twitter_json(object) : | |
def __init__(self,query,pause,refresh) : | |
self.query = query + " lang:en" | |
self.url = "http://search.twitter.com/search.json?q=" + quote_plus(query) | |
print(self.url) | |
self.last = "2000-01-01T00:00:00" | |
self.pause_secs = pause | |
self.refresh_secs = refresh | |
def refresh(self) : | |
page = urlopen(self.url).read().decode("UTF8") # fetch the JSON | |
parsed_page= json.loads(page) | |
tweets = [] | |
for tweetj in parsed_page['results'] : | |
tweet = tts_tweet(tweetj) | |
if tweet.pubDate > self.last and tweet.accept() : | |
tweet.clean() | |
tweets.append(tweet) | |
if len(tweets) > 0 : | |
self.last = tweets[0].pubDate | |
tweets.reverse() # to get them into chronological order | |
for tweet in tweets : | |
print(">",tweet.pubDate,":",tweet.text) | |
speak.say(tweet.cleanText) | |
time.sleep( self.pause_secs) | |
time.sleep(self.refresh_secs) | |
class tts_tweet(object) : | |
def __init__(self,json) : | |
self.text= json['text'] | |
pubDate = json['created_at'] | |
pubDate = parsedate_tz(pubDate) | |
self.pubDate = time.strftime("%Y-%m-%dT%H:%M:%S",pubDate[0:9]) | |
def clean(self) : | |
mtext = removeLinks.sub(" ",self.text) | |
mtext = mtext.replace("#"," ") | |
mtext = mtext.replace("@"," ") | |
mtext = mtext.replace("'","") | |
mtext = speak.expand(mtext,replacements) | |
self.cleanText = mtext | |
def accept(self) : | |
if self.text.startswith("RT") or self.text.startswith("@") : | |
return False | |
else : | |
return True | |
query = sys.argv[1] # query string | |
tweeter = twitter_json(query,4,20) | |
while True : | |
tweeter.refresh() | |
The full code is in GitHub
Nothing much to this script and nothing specific to the RPi, except that the RPi makes it feasible to do fun things, like a roomfull of tweet vocalisers with which visitors could interact. I guess some filtering or human mediation would be needed here - my attempts to do this in lectures via SMS and a collaborative browser-based whiteboard elicited mainly shockingly bad language.
I'm looking for twitter streams worth listening to. Tom Sutcliffe on Saturday Review recommended the poet George Szirtes @george_szirtes and he is excellent. I wish I had had this running when Jennifer Egan's story Black Box was being tweeted by the New Yorker. A news channel would generate tweets at a useful rate and might be amusing to have in the background. However the problem is to do a better job of cleaning the text and adding SSML markup to the text so the TTS can make a better job of vocalising. It's a bit challanging to listen to but actually I'm amazed how well espeak does.