Search

Linking Candidates in the Election RDF

In the Election model, Candidate entities describe a Candidate in a Constituency in an Election.  Many Candidates in the 2010 election are the same person as the person of a Candidate in the 2005 election.  Two problems arise in modelling this situation: we have to decide which Candidates are the same Person, and we have to model that relationship.

Matching

Most matches can be found by automated comparision. First we extract the constituency names and URIs from the RDF datastore with SPARQL queries,..



select  ?cand ?name ?cname where {
    ?election rdfs:label "UK2005".
    ?election :constituency ?constituency.
    ?constituency :candidate ?cand; rdfs:label ?cname.   
    ?cand rdfs:label ?name.
}


Since the format of names in the two elections are different, the first name, surname form in the 2010 list are converted to the surname and initials form in the 2005 data.  Then we take the two lists of candidates to do a fuzzy match on these keys. Constituencies are matched by name ( consitutuencies with changed names will have to be handled separately.)

The fuzzy match takes the two sets of names in each pair of constituencies  and matches them using Levenstein distance below a threshold value. This yielded about 650 matches, of which on inspection  4 were or might have been false-positives.  No idea on the number of false negatives of course.

Modeling.

It seems most natural to introduce a Person entity and then link each Candidate to its Person. However we lack a natural unique identifer for Person and would be forced to generate a local surrogate key. For ease of generation this would need to be a UUID but this gives rise to unreadable URIs.

An alternative is to link Candidates directly with a custom relationship  - for example  samePersonAs  This local property is symetric and transitive. I initially considered reusing owl:sameas but that would be quite inappropriate, since only the personhood of the matched Candidates is the same, not the whole set of properties (which includes the number of votes cast). [A paper by Halpin and Hayes was useful here].

For a symetric property, there is a strong case for adding both the triple and its inverse.  Otherwise the query would have to be expressed as:

 



 {?cand  :samePersonAs ?other} UNION {?other  :samePersonAs ?cand}


However RDF browsers would then show both triples unless the inverses are  filtered out.  For the moment, only one triple has been added.

Application

RDF triples using samePersonAs were added to the datastore for each pair of matched Candidates. For example Steven Williams in Bristol West was a candidate (and now MP) in both  2005 and 2010.

(Later)

This turned out to be a bad idea - in version 2 I've included Person as a Class (will need mapping to foaf:Person sometime)