Project Proposal for EECS 592 (Winter 1995)

February 16th, 1995

Fritz Freiheit

KB Assertion Set to KB Assertion Set Correlations

I propose to analyze various sets of knowledge base (KB) assertions (the search sets) against a specified set of KB assertions (the profile set) to determine a degree of correlation between sets. This analysis should result in some preferential ranking of the various search sets.

The motivation for this work stems from the need to take high level, or abstract descriptions, of the contents of KBs or other types of databases (including literature collections) and determine the likelihood that they will contain some lower level, i.e. detailed item of interest. As an example, the University of Michigan Digital Library (UMDL) project has as one of its primary components collections (of documents) each of which is described at a high level by a database which contain assertions about their contents. One of the potential ways that a user might search for documents of interest to them is by specifying a set of assertions as profile (of their interests) which can then be correlated with various collections content descriptions to determine if documents of interest might be contained within them.

An outline of this analysis is as follows:

1. Perform a literature search, both in the AI domain and the library science domain, to determine what the current "state of the art" is.

2. Determine what must be known about the KB structure to support this task and construct an high level description of it.

3. Construct a high level description of the correlation algorithm.

4. Compare the minimal structure to a sample test bed KB (within the context of the UMDL).

5. Implement an algorithm to perform the preferential ranking in the test bed KB.

1. Literature Search

The literature search is intended to determine, to what degree I can, to what extent similar work has already been done. I expect to find that significant work of this nature has already been done, both in the area of computer science (and more specifically, in AI) as well as within arena of library science. The selective dissemination of information (SDI, as it is called in library sciences) has been around since at least the 1970's, and is a good starting point for the library science end of things. Various forms of knowledge representation (KR) in AI present another potentially rich resource for literature search.

2. Knowledge of KB Structure

To support the task of generating correlations between sets of assertions I expect that there is a minimum amount of knowledge about a KB's structure that must be known. At this level I propose to form an abstract description of KB structure. This description will support the construction of a high level correlation algorithm.

3. High Level Correlation Algorithm

The construction of a high level description of the correlation algorithm for comparing assertion sets should be supported by the results of the literature search. At minimum an algorithm could be constructed based on a syntactical comparison of assertions in the KB, a stronger algorithm might be constructed by bringing some level of semantic description (at an abstract level) based on the KB structure determined in 2 above.

4. Compare KB Structure to Test Bed

The next step is test the correlation algorithm. To do this, a mapping from the abstract description of the KB to some real KB is necessary. I propose to use the UMDL project as a test bed. The collection entities in the UMDL are good examples of KBs which assert some high level descriptions about their contents, while the user profiles present a set of assertions to compare the content assertions to. UMDL presents a significant number of existing collection descriptions of real data. Because the high level descriptions of the collection entities actually have underlying detailed information attached to them, the results of any correlations can be checked.

5. Implement a Correlation Algorithm

Based on the high level description of the correlation algorithm and the actual mapping between the abstract KB structure and an actual KB structure that the UMDL represents, will allow the construction and testing of an correlation algorithm.

There are several potential benefits of this project. They include the identification of modifications or additions to the UMDL collection and user profile architectures and implementations. The construction of a real implementation of the algorithm would produce components of the UMDL that will do useful work.

The final step will be to present the results of this project in report form and as a Web page. This will include a description of the results of each of the steps of the outline above.