Search

Model-based triple materialisation.

I've been playing with a process whereby derived properties are defined  in the Conceptual data model and then materialized in the RDF triple store using a common script.

For example, here is the definition of the seats-won atttribute of the Party entity in the Election model:



 <attribute name="seats-won" type="int" min="0" derived="true">
            <sparql>
                <![CDATA[
select distinct ?party (count(?candidate) as ?seatswon) where {
  ?constituency a :Constituency;
                      :MP ?candidate.
  ?candidate :party ?party. 
      } groupby ?party            
]]>
            </sparql>
            <comment>The number of winning candidates in this party as computed from the raw data</comment>
        </attribute>


The query returns pairs of values: the first is the subject, here the party URI, the second the value of the attribute of that subject, here the number of seats won.  From these pairs we can generate triples of the form

  $subject :seats-won  $value

as RDF and post them back to the triple store.

Generated RDF

Other queries will require some post-processing - to generate the links to TheyWorkForYou site, we define a derived attribute as:



<attribute name="theyworkforyou" type="uri" derived="true">
            <rule>
                <expression>
                    concat(
                         "http://www.theyworkforyou.com/mp/?c=",
                         replace(lower-case("{name}"),' ','_')
                         )
                </expression>
                <param attribute="constituency-name" name="name"/>
            </rule>
</attribute>


Since this is based on the constituency name, we first generate and execute a SPARQL query to get pairs of constituency URIs and names. Local names in the query are sometiimes required because SPARQL names can't have hyphens in them. Then construct and eval an XPath expression to construct the URI to the site for each consttuency. The attribute type is uri (a ref to rdf:resource in the model). Again the resultant RDF can then be posted back to the data store to materialize this derived attribute. 

A benefit of this model-based approach is that the derivation rule for the attribute is explicit and could be applied lazily when generating a view of the resource or, as in this implementation, generated eagerly and materialized into the data store. Eager materialisation will require retraction of the original data set to avoid getting duplicate values for these functional properties.  I think this would be done by retaining the RDF file and then generating a change set to update the Talis store but I havent done that yet.

Later :

This was useful just now when I learnt how to link to the TheyWorkForYou candidate quiz.  I defined the new constituency attribute in the model, generated the RDF and uploaded in a few minutes (including detecting a small bug!)

http://www.cems.uwe.ac.uk/xmlwiki/rdf/Election/UK2010/Constituency/Bristol_West

 http://www.cems.uwe.ac.uk/xmlwiki/rdf/Constituency/Bristol_West  (model revised)