I started the election modeling from the data sources, but I should have constructed a data model first,and used the model to drive decisions about the mapping.
For teaching ER modeling and model-driven development through to relational databases, I have developed some XML/XQuery based tools for modeling and model-transformation. The sub-text of this tool-set is the power of XML representation and transformations.
I developed this model in my XML vocabulary (with a couple of tweeks) of the election data which can be transformed to a diagram using the tool and Graphviz (where would I be without it!).
The notation is basic ER with crows foot. I struggled to get Graphviz to generate a bar across relationships to denote dependent (weak) relationships in which the foreign key becomes part of the child primary key. I failed, but hit on the idea of cutting one leg of the crows-foot which is nicely symbolic of a weak link.
In this notation, derived attributes (so far only rdfs:label) are included and a rule defines the derivation - this is more easily seen in the Data Dictionary view. The key addition was to be able to generate a basic rdf/rdfs vocabulary from this model. Since the model has additional semantics, I will later add some OWL to capture relationship multiplicities.
The code to generate the RDF is quite small. One advantage for me of those model-based approach is that I have trouble remembering which property is in rdf and which in rdfs - I think the code has it right!
declare function er:model-to-vocab($model) { <rdf:RDF xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:owl="http://www.w3.org/2002/07/owl#" xml:base="{$model/@vocab-base}" > {for $entity in $model/entity[not(contains(@name,":"))] (:just the local namespace :) let $class := string($entity/@name) return element rdfs:Class { attribute rdf:about {$class}, element rdfs:label {$class}, if ($entity/comment) then element rdfs:comment { string($entity/comment)} else () } } {for $property in $model/entity/attribute[not(contains(@name,":"))] (:just the local namespace :) let $datatype := string(($property/@xmltype,$model/type[@name = $property/@type]/@xmltype,"xs:string")[1]) let $class := string($property/../@name) let $name := string($property/@name) return element rdf:Property { attribute rdf:about {$name}, element rdfs:label {$name}, element rdfs:domain {attribute rdf:resource {$class}}, element rdfs:range {attribute rdf:resource {$datatype}}, if ($property/comment) then element rdfs:comment { string($property/comment)} else () } } {for $relationship in $model/relationship let $domain := $relationship/role[1] let $range := $relationship/role[2] return element rdf:Property { attribute rdf:about {string($relationship/@name)}, element rdfs:label {string($relationship/name)}, element rdfs:domain {attribute rdf:resource {string($domain/@entity)}}, element rdfs:range {attribute rdf:resource {string($range/@entity)}}, if ($relationship/comment) then element rdfs:comment { string($relationship/comment)} else () } } </rdf:RDF> };
Second(or is that third?) Thoughts
Naturally, developing the model has caused me to change some aspects of my RDF model.
I initially replaced the local label property with rdfs:label but this left no local names. This is a bad move since there is nowhere to explain the semanics of the name. So the model re-introduces names as local attributes but also defines rdfs:label and ids as derived attributes. I think that's a good general principle - always include the local properties even when they will be replicated with other, linking properties. I think Leigh includes this in his Linked Data Patterns book.
I initially distinguished between types and properties by path in the vocab - bad move - I've changed to uppercase class names and lower case property names so that the same word, say 'candidate' can be used both for the Class and a property of a constituency.
Because property names are scoped by the vocabulary whereas ER attribute names are scoped by the entity type, more care is required to name attributes so that any non-uniqueness is intentional.
Resource paths
The model provides a rational for the resource paths. These are type/primary key structures. The weak one-many relationships give rise to concatenated primary keys. Primary keys will be generated by coercing the primary key elements in the XML to a string acceptable in a URI (for example by replacing spaces with underscores.)
A future task will be to use this model to guide the translation process. For now, I just have to go back and rework the XML and RDF translators and test this model by uploading to a triple store. Lesson learnt (again)- model first, code later.