The Wallace Line.

Blog

Index

John Berger's Ways of Seeing - a linked-data collaborative illustrated edition

John Berger's Ways of Seeing is a seminal work of art criticism, though dated now nearly 40 years later. It is the next book for my reading group in Bishopston.

Berger uses a large number of paintings to support his argument but these are printed rather small in black and white. I thought it would be a nice experiment to develop an online linked-data version with full-color images together with a hi-res version where available, and links to in-depth pages about the painting, the artist and the paintings location.

To prepare this site, I scanned the pages in the List of Works Reproduced section, OCRed the images using SimpleOCR to create a raw (and flawed) text list. This file was parsed with XQuery to identify the different data items, cleaned up by hand and then split to separate data on paintings from that about artists, locations and the use of the painting as an illustration. The result is a browsable database from which the pages of the book can be reconstructed in colour.

There is much work to be done on the database to check and locate links to online images and supporting web pages, particularly for the 150 or so paintings. To help in this task, I'm recruiting my friends in the reading group in an experiment in collaborative editing. We will see what level of contribution the project acquires. In the meantime, I'm about 40% of the way through the linking task.

The application architecture is based on an ER model ( EAR model ) of the underlying XML database, and the editor and views of the database are generated on the fly with reference to that model. This model-based approach also allows derived properties to be computed on demand. 80% of the code is generic and it is increasing as I refactor the code. There are limitations on the model, in particular that the framework doesn't yet handle that old bugbear, multi-valued properties. Once the data has been collected, I intend to make it available as RDF with both instance data and vocabulary generated via the model.

The application raises several issues of copyright. I'd love to be able to include Berger's text but that would clearly infringe copyright. Perhaps I should ask but who? I do wonder if even using the List of Works Reproduced is OK. The images of paintings is, I discover, an ongoing issue. Whilst (I think) all the paintings themselves are out of copyright because of their age, the issue of copyright on photographs of paintings remains. Wikimedia Commons takes the approach, guided by the case of Bridgewater Art Library v Corel Corp, that photographs of works in the public domain cannot themselves be copyright. Bridgewater Art Library and other libraries continue to claim copyright on the images. Wherever possible I've used images from Wikimedia Commons and all power to this great collection.