Derived triples in RDF

Working on generating the election RDF got me puzzling about derived triples, how they arose and whether they should be materialized in the RDF.

It seems to me that there are several use-cases for derived triples, depending on the complexity and stability of the derivation.

triples which could be induced from a vocabulary:  if election:candidate-name is defined as owl:subProperty of rdfs:label, then a query engine supporting OWL inference would be able to provide the candidate-name in response to a request for the rdfs:label. SPARQL engines in the main still don't do that simple inference, so we need to add rdfs:label redundantly to the base RDF. Such early materialisation is presumably non-problematic because the relationship between election:candidate-name and rdfs:label is likely to be stable. Attribution is not needed here since the schemas provide the rationale for the inference.

inverse properties.  Currently the Election RDF includes triples of the form

   Candidate party Party

but not the inverse triples

  Party  candidate Candidate

A SPARQL query can find all the properties of, say a ?Party using a Union

    {?Party ?p ?o}  union {?s ?p ?Party}

If the inverse property was defined as such in OWL, a query engine could infer this relationship.  In fact someone writing a query has to understand the data model and the vocabulary anyway, so inverse properties are of little use.

short-cuts To get the name of the party of a candidate, the SPARQL query would need to include:

   ?candidate :party ?party.

   ?party rdfs:label ?partyname

It is tempting to add the short-cut property :party-name to Candidate to short-cut this query, but there is no limit to such short-cuts, and multiple paths to the same data are confusing, so these dont seem like a good idea.

aggregates  The number of seats a party has won can be computed by using grouping and counting in a query like this:

PREFIX rdf: <>
PREFIX rdfs: <>
select distinct ?party (count(?cname) as ?seatswon) where {
    ?election rdfs:label "UK2005".
    ?election :constituency ?constituency.
    ?constituency :MP ?candidate.
    ?candidate rdfs:label ?cname.
    ?candidate :party ?party.
} groupby ?party

but this query is SPARQL 1.1 and is expensive to compute. However it is easy to generate from the query result the derived triples and push these into the RDF datastore.  This worth doing because the base data lumps all the minor parties into "Other" , so I ran an XQuery script to generate the appropriate triples and added them to the dataset. Parties now have a derived seats-won property.

external links.  For example, given a constituency-name,  TheyWorkForYou provide an API to get the easting/northing.  I have an XQuery script which generates latitudes and longitudes from easting/northings. We can then use these coordinates in say OpenStreetMap to link to a map.  These  triples can either be added to the base RDF in the election dataset or generated on the fly by a user. However if these are materialised into the RDF they become available to the SPARQL query engine. However provenence is needed here since the resources used to perform the materialization are arbitrary.

Derived triples are logically redundant and redundant data raises just the same problems in RDF datastores as they do in an un-normalized relational database.  Changes in the base data require re-computation of the derived triples, so any change must retract not only the base triples but also all the materialized triples. Since maintaining a trace of what triples have been derived is hard, I guess the normal practice is to retract the entire dataset and regenerate from scratch.  This is feasible if RDF is treated only as a data warehouse, but is not practical if an RDF data set is a live database.

Of these only the rdfs:labels have been added. The first query can now be written as (arguablly less readable)

PREFIX rdf: <>
PREFIX rdfs: <>
select ?cname ?name where {
  ?election rdf:type :Election;
        rdfs:label "UK2005";
        :party ?party.
  ?party rdfs:label "SNP".
  ?cand :party ?party;
       rdfs:label ?name.
  ?cons :candidate ?cand;
       rdfs:label ?cname.