Viewing and reviewing the Department for Business Innovation and Skills Linked data

Talis has been leading the development of a demonstration of linked data for the Department for Business Innovation and Skills.  Richard Wallis describes the project and some of the visualisations in Nodalities .

I ran my inference browser over the SPARQL endpoint - it took a few minutes to build the whole model - types and their cardinalities, properties and their multiplicities, and inverse properties.  The URIs and their prefixes are added by hand at present.  From this data the browser can generate an Entity Relationship diagram.

This prototype browser supports limited faceted browsing - not as yet from the diagram.

Starting at the Types list, navigating to a type ,say foaf:Project, shows the properties used, their ranges and multiplicities. Clicking the + sign adds this type to a filter condition, and shows the resources of that type . There are over 2000 matching resources in the set of resources matching the current filter (foaf:Projects) so only a sample of 200 are listed.  To restrict the resources further, list the set properties.  This page shows the properties of members of the set, the number of distinct values in the range of each property and the percentage (as a tool tip and  bar chart) of resources with a value for that property.  The bar chart is green if all resources have a value, yellow if multi-valued and red if some are missing.  Where only a small proportion are missing, there is a link to those resources to help with validation of the data. (so 2 projects have missing funders etc).  The number of distinct values indicates which properties would be most useful in refinement of the filter.

Looking at the numbers, we see a few which are complete or nearly so, but a large number with low levels of completion.  For example only 9 % of project have area defined.  I guess this is an artifact of the current sample data, but analyses such as this are useful for assessing data quality. The numbers also hint at some further possible normalisation : Project ID and Project Abstract have the same cardinality, as do SICCode and SIC Description, suggesting hidden types here.

Looking at Funding Mechanism   (83% completion, 7 distinct values), we see that the vast majority are classed as Research Grants. Looking at the small number of Fellowships, the link from Fellowship itself shows everything about that resource: it is only a code and a label but the backlinks show the Projects of which it is the funding mechanism.  The external link to the resource URI ( shown as an arrow)  takes us to a invalid page (as do most if not all of the URIs).

To refine the filter we click the add  link. The next page shows the (reduced) properties of the filtered resources - now foaf:Projects with a Finding Mechanism of Fellowship.   Curiously there are more abstracts than projects - perhaps non-English language versions?  In the navigation bar, clicking on Resources shows the matching resources.  From this list (No search here yet)  one can navigate to a project on, say, Engineering Culture , read the abstract, link via the role to the (only) Project Role (this resource has no rdf:label so its not clear what the role means)  which reveals the institution (the browser has aproblem with multiple labels here) and the researcher.  Institutions are related to Locations (a minted type in the bis vocabulary) which have latitude and longitude, so this browser knows it can be shown on Google Map .It seems that URIs for the institutions have been minted for this dataset but there are URIs in the education data set - for example for King's College London which could be used.

I vacilate about the continued development of this prototype browser, but after this exercise, I'm encouraged to believe that it would have value in understanding and reviewing both the data model and the data content of RDF datasets as they are developed..




Well impressed by your RDF browser. The greatest difficulty I find in formulating a SPARQL query is to get an idea of the model underlying the data- this gives a pretty good picture.