This process used three custom XQuery scripts to do the CSV to XML conversion and the recursive XML to RDF function in the last post to make the RDF. The minor problems encounted included:
An XQuery script provides a basic, standardized extract from the XML datasets: eg. Bath
The difference between entity/attribute/relationship based models (including XML) and RDF really struck me with this exercise.
Entities dont need to be created. Resources like the election, constituencies and candidate are not themselves ever created. All we do is add triples which make statements about their URIs. Of course it's nice if the URIs are dereferencable but the RDF is usable without this.
The model is composable. The results RDF/XML document adds consitutuency properties like the name and candidate properties like the name, party and number of votes, but does not say who was elected. The constituency RDF/XML document provides properties about voting numbers in the constituency, for example the turnout and for the winning candidate, the boolean elected property. When uploaded to the same dataset these separate triples are pooled and SPARQL queries can return aggregated data based on common types, properties and Resource URIs.
Triples are unique. There are duplicate triples generated in these three files because of the process of conversion from the separate XML files to RDF but these will be ignored when loaded into the same datastore, or should be ignored by the query engine if in different graphs. RDF is kind-of auto-normalized.
Literal / Resource dilemna
The age-old Attribute/Entity dilemna appears in RDF as the Literal/ Resource dilemna. When modeling the candidates, I modeled the party as a literal. I later added the distribution of seats, whereupon I was forced to model a Party as a Resource with properties like label and number of seats. This meant reworking the results XML. Perhaps I should have anticipated this, but the nature of RDF means that the scope is not closed as it is in localised modeling, so these challenges will inevitably happen. However a generic process for schema evolution is possible in RDF, partly because RDF datasets are typically highly redundant. It thus seemed OK to add links to party resources alongside the inital party-name literals.
Now to get a new Talis store to put this stuff into ...