The Wallace Line.

Blog

Index

ER models and RDF vocabularies

I started the election modeling from the data sources, but I should have constructed a data model first,and used the model to drive decisions about the mapping.

For teaching ER modeling and model-driven development through to relational databases, I have developed some XML/XQuery based tools for modeling and model-transformation. The sub-text of this tool-set is the power of XML representation and transformations.

I developed this model in my XML vocabulary (with a couple of tweeks) of the election data which can be transformed to a diagram using the tool and Graphviz (where would I be without it!).

The notation is basic ER with crows foot. I struggled to get Graphviz to generate a bar across relationships to denote dependent (weak) relationships in which the foreign key becomes part of the child primary key. I failed, but hit on the idea of cutting one leg of the crows-foot which is nicely symbolic of a weak link.

In this notation, derived attributes (so far only rdfs:label) are included and a rule defines the derivation - this is more easily seen in the Data Dictionary view. The key addition was to be able to generate a basic rdf/rdfs vocabulary from this model. Since the model has additional semantics, I will later add some OWL to capture relationship multiplicities.

The code to generate the RDF is quite small. One advantage for me of those model-based approach is that I have trouble remembering which property is in rdf and which in rdfs - I think the code has it right!



declare function er:model-to-vocab($model) {
<rdf:RDF
   xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
   xmlns:owl="http://www.w3.org/2002/07/owl#" 
   xml:base="{$model/@vocab-base}" 
   >
  {for $entity in $model/entity[not(contains(@name,":"))]  (:just the local namespace :)
   let $class := string($entity/@name)
   return 
   element rdfs:Class {
        attribute rdf:about {$class},
        element rdfs:label {$class},
        if ($entity/comment)
              then element rdfs:comment { string($entity/comment)}
              else ()
       }
  }
  {for $property in $model/entity/attribute[not(contains(@name,":"))]  (:just the local namespace :)
        let $datatype := string(($property/@xmltype,$model/type[@name = $property/@type]/@xmltype,"xs:string")[1])
        let $class := string($property/../@name)
        let $name := string($property/@name)
        return
           element rdf:Property {
              attribute rdf:about {$name},  
              element rdfs:label {$name},
              element rdfs:domain {attribute rdf:resource {$class}},
              element rdfs:range {attribute rdf:resource {$datatype}},
              if ($property/comment)
              then element rdfs:comment { string($property/comment)}
              else ()
           }
   }
 
  {for $relationship in $model/relationship
   let $domain := $relationship/role[1]
   let $range := $relationship/role[2]
   return 
     element rdf:Property {
              attribute rdf:about {string($relationship/@name)},  
              element rdfs:label {string($relationship/name)},
              element rdfs:domain {attribute rdf:resource {string($domain/@entity)}},
              element rdfs:range {attribute rdf:resource {string($range/@entity)}},
              if ($relationship/comment)
              then element rdfs:comment { string($relationship/comment)}
              else ()
          }
  }
</rdf:RDF>
};

Second(or is that third?) Thoughts

Naturally, developing the model has caused me to change some aspects of my RDF model.

I initially replaced the local label property with rdfs:label but this left no local names. This is a bad move since there is nowhere to explain the semanics of the name. So the model re-introduces names as local attributes but also defines rdfs:label and ids as derived attributes. I think that's a good general principle - always include the local properties even when they will be replicated with other, linking properties. I think Leigh includes this in his Linked Data Patterns book.

I initially distinguished between types and properties by path in the vocab - bad move - I've changed to uppercase class names and lower case property names so that the same word, say 'candidate' can be used both for the Class and a property of a constituency.

Because property names are scoped by the vocabulary whereas ER attribute names are scoped by the entity type, more care is required to name attributes so that any non-uniqueness is intentional.

Resource paths

The model provides a rational for the resource paths. These are type/primary key structures. The weak one-many relationships give rise to concatenated primary keys. Primary keys will be generated by coercing the primary key elements in the XML to a string acceptable in a URI (for example by replacing spaces with underscores.)

A future task will be to use this model to guide the translation process. For now, I just have to go back and rework the XML and RDF translators and test this model by uploading to a triple store. Lesson learnt (again)- model first, code later.