Integrating Ontological Metadata:
algorithms that predict semantic compatibility

This dissertation attacks the fundamental problem of semantic heterogeneity: the need for elements of large distributed systems to communicate using terminology that, from a global perspective, has inconsistent meanings. This problem does not include many of the difficulties of natural language understanding (statements are preparsed, do not use pronouns, are never ironic), but does necessitate bridging the same gap between symbols and the world.

To provide a basis for communication, I require that terminology be defined in differentiated ontologies. In these structures, concepts are defined by their relations to other concepts using description logic or an equivalent formalism. Rather than working with terms as tokens, therefore, we have a directed graph for each term. I also require that local concepts inherit definitional structure from shared concepts, so we have common ground to work with. These restrictions reflect our expectations that while it is possible for diverse participants to adhere to terminological standards on a general level, it is important to permit specialized meanings to diverge in local communities.

I have developed a number of ways to compare pairs of concept definitions. I call these measures of description compatibility. Some of these measures build one-to-one correspondences between two concept graphs. Others work with definition syntax and their location in the ontological structure in different ways, with varying requirements for knowledge of the domain.

To evaluate the description-compatibility measures, I generate description-logic ontologies in artificial worlds. The key idea here is to define concepts that distinguish selected objects from other objects, by specializing existing concepts. In real-world applications, it is difficult to quantify the meaning of a concept. In my simulations, however, a concept's meaning is a set of objects in a finite universe. The semantic compatibility of two concepts is a function of the intersection of their object sets. The simulations thus provide a model-theoretic semantic basis for studying the accuracy of the syntactic description-compatibility measures (for example, see Figure 1).

Figure 1: Semantic basis of the "probabilistic shared roles" measure

Measures of description compatibility can be used to translate queries across heterogeneous database schemas, and to guide search for reusable components or agent services in communities that subscribe to differentiated ontologies.


For more information see:
Weinstein, P., Birmingham, W.P., 1998, Comparing Concepts in Differentiated Ontologies (submitted for conference review).