Search

A masterclass in XSLT and yet more Poor Man's Pipeline (POMP?)

More fiddling with my little pipeline framework this morning, this time to add a simple path operator. The initial application was to pull a table from a wikipedia page:

WWII Casualties extracted

For later processing, this HTML table needs to be cleaned up to plain XML with element names taken from the heading.  I habitually write this kind of generic transformation in XQuery, but that doesn't play so well with the pipeline architecture. XSLT would fit better but I wasn't quite sure how to write this generalised transformation. My plea for help via Twitter elicted an early morning masterclass in XSLT from @JeniT and @AlainCouthures (thanks again folks, I'm really touched). Here is Jeni's work.

In the end I used a simpler transformation with numbered column headings. Adding that step I can now get the XML - its messy because theWikipedia tables have complex cells so the XSLT needs to be specialised for scrapping:

WWII Casualties as XML

Now to write the next transform to a visualization. Another XSLT subsets the table to get just the British Imperial Forces casualties and create the graph XML, and finally  the transformation to a pie chart using Jan's XSLT.

The pipeline can be run from the location line for testing and then embedded in a generated page:

WWI Casualties as HTML

Next job is to create a pipeline editor which will make it easier to view the stylesheets and the intermediate documents.