Conversion from Posterous

I migrated from blogger a few years ago and now I have to move from Posterous. This latest manifestion is running on my own server, written in XQuery on eXist-db. It is a work in progress. The code is on Github.

The conversion has been a bit tricky. Posterous export is rather flawed. The entries are supplied in two formats , HTML and RSS but initially neither was well-formed XML. The XML files lacked namespace declarations although after pointed this out, the XML is now OK. However links to images still point to the posterous site. The HTML points to the images exported but it isn't well-formed either, and the published/draft status is missing. On balance HTML seemed the lesser of two evils so my conversion uses this as its input.

After unzipping the export zip file, I flattened the images, held in year and month subdirectories into a single directory. I similarly flattened the html files. The conversion script reads these files and converts each to an XML file in my blog XML format. During the conversion I can fix some of the HTML. For example, Gist links are converted to the corresponding JavaScript include.

Despite the simple nature of the blog, lacking commenting let alone an editor and with an ugly url, I like the feeling of being in control. Access to the blog feels much easier than in Posterous already. I can also find and begin to repair the alarming number of bad links. It would be interesting to use this data to work out the half-life of a link, ie. how long before there is a 50% chance that it's dead.

One improvement over the Posterous blog is that it has a print stylesheet to clear out all the web furniture - surprising how many blogs aren't printable.

March 20

The server has been updated to exist version 2.0 and has a clean URL now, via Apache mod-rewrite and XQuery scripts.