Ontology-Based Metadata

Click here to browse the Beethoven knowledge-base.

This page describes the University of Michigan Digital Library (UMDL) Beethoven Project. First, we designed a formal ontology of bibliographic relationships. Then we generated a knowledge-base of metadata from (493) US MARC records on works by or about Beethoven. Our Java interface lets you browse families of works by navigating relations between works -- a capability not provided by the average library catalog!

Contents:
Bibliographic Relations
Generating the Knowledge-Base
Potential Benefits
People
Publications

Bibliographic Relations

Bibliographic relations are relationships between works. They can be quite complex. For example, consider a recording of Beethoven's opera "Fidelio". The recording was performed by an orchestra and singers, based on some arrangement, and might have been remastered at some point. The libretto was originally adapted from a play by J. N. Bouilly, revised several times, and translated from the French. A copy of the score may be available, and perhaps there is also a movie version.

Currently, most catalog information about bibliographic relations is found in notes written in natural language. These notes are certainly valuable for research, but they are difficult to use computationally.

The UMDL Ontology group has developed a formal conceptualization of bibliographic relations. The ontology defines a fairly elaborate structure of precisely defined concepts. For example, instead of talking about a "work", we talk about CONCEPTIONs (associated with the idea), EXPRESSIONs (the work's content), MANIFESTATIONs (the work's publishing format), DIGITIZATIONs (the encoding), and INSTANCEs (the particular copy). A translation and an original work will share the same CONCEPTION. A new edition will share its EXPRESSION with the original, a reprint will share the MANIFESTATION, and so on. Thus works are part of families. Families are structured as trees. Each kind of relationship between works is located at a certain level in the structure (see Figure 1). The UMDL ontology is described in stylized natural language, and you may also view the implementation in description logic (Loom).

Figure 1 - A work with five derivation relations

Generating the Knowledge-Base

Generating the knowledge base involved mapping data from MARC to the ontology, and reasoning about the data to establish relationships. More specifically, we generated the knowledge base in four steps:
  1. Convert binary MARC data to tagged text. We map MARC fields and values to ontology concepts with a set of control files that identify the MARC fields and codes, and establish priorities and conditions to resolve cases where several MARC values produce one ontology value, or where one MARC fields can produce several ontology values.

  2. Extract coded attributes and values from natural language comments in the tagged text. This was done in a rapid, mindless way that could also be achieved computationally with natural language processing (we spent an average of only 40 seconds per record on the transcription).

  3. Convert the tagged text to Loom TELL statements. At this stage every value is treated as a distinct object.

  4. Reason about the data to establish relationships. First, merge matching conceptions, expressions, and manifestations. Then, based on transcribed clues and other information, deduce relations between works. When a preceding work is not in the collection, generate knowledge-base objects that reflect what we do know about it.

Frequently, the MARC data is imprecise or partial. It is often difficult or impossible to identify relationships with absolute certainty, even for trained professionals with hours of time available to play detective! To represent this data in the knowledge-base, therefore, we decided to use "tentative" relations in cases where we were not able to deduce relationships with confidence.

Potential Benefits

We identify four important classes of benefits that may result from this work.

People

The Beethoven Project was designed and implemented by Peter Weinstein. Gene Alloway and Judy Ahronheim added their deep and extensive knowledge of library matters.

Publications

These papers are not immediately available due to copyright restrictions, but you may request copies.


The UMDL ontology.
An online interface for browsing the knowledge base.
To mail Peter Weinstein.
To mail the UMDL Ontology group.