MetaData Mine-ing
Peter Weinstein, Brian Dunkel, Nandit Soparkar
The quantity and complexity of information on the Web is growing rapidly.
While several simple interactive search tools exist to help locate items
of interest on the Web, the increasing sophistication of uses and the sheer
quantity of data together indicate a need for techniques to locate relevant
application-specific information. To this end, we describe a two-stage,
customizable, information indexing strategy --- especially suited for
situations with specialized and long term needs. In the first stage a
customized filter is configured by selecting from a library of test and
data extraction functions. A search tool ``crawls'' the Web in some
appropriate way, and assesses encountered documents using the filter.
Useful metadata is extracted from selected documents and stored in a
proprietary metadatabase. In the second stage, the metadatabase, which
may be regarded as a view of the Web, may be queried to locate information
relevant to a specific inquiry. Our approach potentially achieves greater
flexibility and specificity as compared to currently available search engines.
Also, filter libraries can incorporate domain-specific, or ad hoc,
information filtering mechanisms. We describe the preliminary design,
implementation, and experimentation for our proof-of-concept effort.
Test results for sites passing filter are stored as relational data
This paper is currently available in hard copy only.