Search

Bristol Big Day Out- an XQuery/JQuery/Twitter photo wall

Background

The Bristol Civic Society are planning to celebrate the National Civic Day on  25th of June with events around Bristol.  We had a brain-storming session with the project coodinator Jeff Lucas at Bristol Dorkbot earlier this week and there were a number of ideas for using social media to create a vibrant interactive multi-media experience:

  • photo-booths in parks
  • tracker devices to outline the boundaries of public spaces
  • mapping flickr photos from the day
  • aggregating photos and texts, possibly geo-coded into a single stream

We will be sailing soon, so I won't be there for the event. However I thought it would be interesting to explore the task at gathering photos posted via twitter as a start for students who will be helping in these projects. On the way,  I discoveredhttp://picfog.com - looks great and I stole the thumbnail idea from this site.

My prototype  creates a  wall of photos from tweets, e.g. containing "paris" and another for "bristol". A note of caution - this is not work-safe - I've been rather disgusted during testing each service to see the kinds of photos which can appear in tweets. Neither is it a production service. Parameters set the initial search string, the maximum number of pages to search and the refresh period in minutes. Initialization takes a while but incremental updates are quicker. Perfomance for an event like the Civic Society day should be better because we can search by a chosen hashtag.

Implementation

The starting point is the  twitter search API  and here and the search operators.  This feed is public and not rate-limited as we can easily retrieve the lastest tweets containing a word or hash tag as Atom (XML).

To provide an incremental update to the page, we use the id of the tweets and the since_id property to search only those tweets since the last update. We also limit the search to tweets with links, but seaching is quite slow because the hit rate on photos is quite low. Search results are paged, a maxiumum of 100 results per page, so we have to repeat the search over several pages. 

declare function wall:tweets($query as element(query)) as element(atom:entry)* {
let $search := if ($query/tag) then concat("#",$query/tag) else $query/q
let $so := encode-for-uri(concat($search, " filter:links"))
return 
   wall:tweets-x($so,$query/n,1,$query/since_id)
};

declare function wall:tweets-x(
               $so as xs:string, 
               $n as xs:integer,
               $page as xs:integer,
               $since_id as xs:string?)
        as element(atom:entry)* {
let $url := concat("http://search.twitter.com/search.atom?rpp=100&page=",$page,"&q=",$so, 
                   if (exists($since_id)) then concat("&since_id=",$since_id) else ()
                   )
let $entries := doc($url)//atom:entry
return 
   if (count($entries) < 100)
   then $entries
   else if ($page <= $n )
   then ($entries, wall:tweets-x($so, $n , $page + 1 , $since_id))
   else ()
};


From those tweets we need to extract the URLs of the photo pages.  There are a number of photo services in use:

and others.

Descriptors define these services  e.g.:

 <source name="twitpic" 
           root="http://twitpic.com/"
           path="$page//div[@id='photo-wrap']/img/@src" 
           thumbnail="http://twitpic.com/show/thumb/*" 
   />


A regexp concat($source/@root,"[a-z|A-Z|0-9|/]+") is used to locate URLs in the tweet text which match one of these services. For some services, we can derive a persistant URL for the thumbnail from the page URL - twitpic, twitgoo and yfrog. The thumbnail attribute defines the pattern of this modified URL, where * stands for the part of the page URI after the root. So for a tweet containing http://twitpic.com/abcdef , the URL http://twitpic.com/show/thumb/abcdef is a persistant link to a thumbnail.

However a service like lockerz or instagr requires the page to be fetched and the path expression evaluated to retrieve the image URL - much slower.

Matching tweets are converted to an application-specific XML format, and then transformed to HTML for use with AJAX, converted to a full HTML page or cached.

To make a wall which updates, we  need a bit of Javascript to use AJAX to get this incremental set of photos and to update the wall.

To do

What is mainly missing is server-side caching. This is needed for several reasons:

  • improved performance when multiple browsers are looking at the same search - on the local server and on Twitter's. 
  • moderation can be interposed- a moderator's screen would present the same wall but with a check box to authorise an image- the public wall would then only see authorised images.
  • integration of  images coming from other sources - email, flickr georss feed etc
  • better deduping - duplicates in a batch are removed, but for the incremental update, they need de-duping over the full set - or Javascript should do this

Further enchancements needed include:

  • Mapping images - locations are being gathered where they are provided but these are very sparse - geo-coded flickr photos could be fed into the cache.
  • Additional services - tumblr is not working yet
  • Include shortened URIs  such as twitters link service http://t.co These would need to be fetched and analysed to see what actual service is being used - much better  to avoid shortening URIs.