Search

Twitter Photo wall - server-side caching

The Twitter photo wall needs server-side caching for several reasons:

  • improved performance when multiple browsers are looking at the same search - on the local server and on Twitter's. 
  • moderation can be interposed- a moderator's screen would present the same wall but with a check box to authorise an image- the public wall would then only see  accepted photos.
  • integration of  images coming from other sources - email, flickr georss feed etc
  • better deduping - duplicates in a batch can be removed, but for the incremental update, they need de-duping over the full set
  • a permanent record of photos for the event would be desirable

In version 2 of the photo wall, an authorised user can create, modify and delete photo walls. A wall may be moderated or unmoderated. If moderated, a moderation screen shows all unmoderated photos, the oldest at the top. The moderator can accept or reject a photo by clicking the appropriate button  The moderation element of the photo in the cache will be updated and the photo removed from the moderation screen. The public page of a moderated wall will show only accepted photos. Each photo is timestamped so that screens can be updated with those photos acquired since the last refresh.

This achitecture is more efficient since the twitter search is done only once so each public page only has to access the cached data. However there are now three lags in the system: the interval between searches of the twitter stream, the pause whilst a human operator moderates the photos, and the interval between refreshes of the wall.

You can view a few walls which have been created but are most likely currently stopped.  If anyone wants a a login to use the prototype, drop me a line.

Implementation

The data for each wall is the query description and a sequence of photo descriptors.  This structure is held in an XML file in the database. It is updated in situ using the eXistdb XQuery update extensions when new photos are acquired, when photos are moderated and when the query parameters are updated 

When a wall is created, the XML file is created containing the initial query.  The twitter stream is searched for the first time using the query parameters  and matching photos added to the XML file, ignoring duplicates. In addition a task is scheduled to re-run the search task at the defined refresh rate.

The moderation page uses AJAX both to repetitatively fetch the set of all unmoderated photos and as each moderation decision is made, to update the moderation status in the photo in the database. The public page also uses AJAX to fetch newly acquired or moderated photos.

Current line count is about 600 lines of XQuery and 30 lines of JavaScript/JQuery.

To do

The wall display is very basic. New photos are inserted at the top of the page which results in a jerky appearance. One idea would be for new photos to appear at the centre of the screen and drift outwards, reducing in size as new photos arrive - well beyond my JavaScript skills.

Short urls like Twitters own t.co need to be converted to their unabbreviated form to detect which photo service is being used, if any. The usual approach using Curl is to request the function not to follow Locations in the header, but the httpclient module in eXist does not have this option. This means that the page has to be fetched and then analysed to see whether it is an image service and if so which one - messy.