Banging my head on the table: elements not attributes for data! elements not attributes for data!
Every time I cut corners and use attributes for base data, I come to regret it. I usually dont, but my attempts at page scaping to XML and the RDF did take this approach and it was silly. Nowhere to put attributes of the data, like datatype and restricted to one/zero multiplicity. Duh!
Here's what I should do.
First convert the page to a formalized XML with nested elements. Resources have an id attribute which is valid in a URI, Properties have attributes to define datatype and language. I need a permissive schema definition to check this structure - must learn Schematron.
Here is the revised code for the premier league:
import module namespace convert = "http://www.cems.uwe.ac.uk/xmlwiki/convert" at "../lib/convert.xqm"; let $uri := "http://news.bbc.co.uk/sport1/hi/football/eng_prem/table/default.stm" let $html := convert:get-html($uri) let $table := $html//table[@class="fulltable"] let $date := $html//div[@class="fulltableHeader"]/text() let $xml := element football-league { attribute id {"Barclays_Premier_League"}, element label {"Barclays Premier League"}, element valid-date { $date }, element acquired { attribute datatype {"xs:dateTime"}, current-dateTime()}, element source {attribute datatype {"uri"} , $uri}, for $row in $table/tr[@class=("r1","r2")] return element team { attribute id {replace ($row/td[2]," ","_")}, element label {string ($row/td[2])}, element position {attribute datatype {"xs:integer"}, string ($row/td[1])}, element games-played {attribute datatype {"xs:integer"}, string ($row/td[3])}, element goal-difference {attribute datatype {"xs:integer"}, string ($row/td[14])}, element points {attribute datatype {"xs:integer"}, string ($row/td[15])} } } return $xml
Here is Premier League data as XML Live or Cached The date needs reformatting to an ISO date.
This formalized XML can then be converted to RDF with a revised XQuery function:
(:~ : convert formalized XML to RDF : elements which become resources have an id attribute, properties may have a datatype attribute, which is uri if a URI :@param element the XML element to be converted to RDF :@param base base for resource URIs :@param path hierarchical path to element resource - initially () :@param prefix default prefix for local property names :@param map XML document used to map local names to external vocab names :) declare function convert:element-to-rdf-v2 ($element,$base,$path,$prefix,$map) { let $epath:= concat($path,"/",local-name($element),"/",$element/@id) return ( element rdf:Description { attribute rdf:about {concat($base,"/resource",$epath)}, element rdf:type { attribute rdf:resource {concat($base,"/vocab/type/",local-name($element))} }, for $property in $element/*[empty(@id)] let $localname := local-name($property) let $localname := if ($map/property[@local=$localname]) then string($map/property[@local=$localname]/@external) else concat($prefix,$localname) return element {$localname} { if ($property/@datatype = "uri") then attribute rdf:resource {string ($property)} else ( if ($property/@datatype) then attribute rdf:datatype {$property/@datatype} else (), string ($property) ) }, for $child in $element/*[@id] return element {concat($prefix,local-name($child))} {attribute rdf:resource {concat($base,"/resource",$epath,"/",local-name($child),"/",$child/@id)} } }, for $child in $element/*[@id] return convert:element-to-rdf-v2($child,$base, $epath,$prefix,$map) ) };
The $map provides a map between local names and external names. If no entry found, the local name is prefixed by $prefix. For example
<map> <property local="label" external="rdfs:label"/> <property local="latitude" external="geo:lat"/> <property local="longitude" external="geo:long"/> </map>
RDF output - cached or Live (ish)
Still not Linked data of course, for all the reasons mentioned in previous posts, but it's a cleaner approach to data/XML/RDF conversion. Now to tackle the election data again.