Summary of Query Responses

Summary of Query Responses


December 1996 Responses


Glossary
Glossaries of Internet, Cataloging and Computer terminology can be found at: http://www.library.nwu.edu/iesca/glossary/

Business and Economic Resources Online (BeOnline) Project

Response from Allene F. Hayes, Project Leader
Library of Congress, Library Services
Computer Files Team LM-543
Washington, D.C. 20540-4371

The pilot project Business and Economics Online is a model designed to enable library users to improve access to the Internet resources through the LC Web Site. The LC is still in the process of developing their action plan. They have not created any records yet. They are using Netscape, HTML Assistant Pro (for home page),MUMS?, TCEC application (an LC in-house program). They also intend to focus on free electronic resources in the areas of business and economics, both monographs and serials. Some information about the project plan was posted on Intercat about five months ago. They also have developed selection and cataloging guidelines.


Max-Planck-Gesellschaft Project

Report from Dr. Heinrich C. Kuhn (coordinator libraries &c.)
Postfach 10 10 62 / D-80084 Muenchen
T: +49-89-2108 1565 / F: +49-89-2108 1565
eMail: hck@ipp-garching.mpg.de, kuhn@mpg-gv.mpg.de

"Max-Planck-Society (non profit, mostly sponsored by public money) has some 80 institutes doing basic research in various science and humanities disciplines. This research is supported by some 80 libraries and other units for information gathering and providing, most of which are part of the institutes.

They have in various places various interneto- graphies, clearinghouses and the like for various subjects. Entries not rarely are multiplied across these indexes. Part of these indexes are for electronic material only and part of them are for printed things as well. "Normal cataloguing" is found by the cataloguers as not suited for at least part of the electronic material. Part of the material catalogued is pertinent for more than one subject, and has to be indexed in more than one way more or less specific for certain disciplines only. Cataloguers suffer from extra work due to lack of cooperation when indexing "non standard" materials. Customers and librarians complain that almost anything they need is indexed somewhere in the various catalogues, internetographies and the like, but that it becomes less and less likely that they will be able to find out where and how information about the item they seek might be retrieved. The "big" search engines are not enough help due to lack of really specific indexing and "up-to-date-ness". Maintaining the structures of the internetographies becomes more and more cumbersome. Meta-data contained in the electronic documents is scarce, and often not sufficient to retrieve the items searched. Indexing by authors themselves often is no viable solution. Combination of intellectual and automatic indexing according to a vary detailed and very flexible set of categories. Use of a relational database for indexing, retrieval and automatic generation of several types of documents. WWW-based interface both for indexing and retrieval. Cooperative indexing. Indexing can (but does not have to) make use of several indexing schemes at the same time. The project started late autumn 1996. In the "solution database". Several thousands, in clearinghouses, internetographies etc. that are to be merged into the "solution database". They are using RDBMS plus some things that we'll have to program or will have to have programmed. This project will start with various types of electronic material, but the structure is such, that it can incorporate library catalogues and "classical" bibliographic database entries as well." There is a description of the set of categories used for indexing. Descriptions are in German only. He can send the TaskForce copy if they are interested.


University of Nebraska--Lincoln

Libraries Internet Resources Catalog
Report from Sue Ann Lewandowski, 402-472-3545

The project started in April 1995 and includes nearly 1,000 records. They don't use any special software, but they use NetTerm, Netscape and sometimes Notepad, Ewan and Qtvnet. First, They cataloged gopher sites. Now they are cataloging only catalog Web sites.

There aren't any written descriptions or documentation for this project, but there is access to the Web catalog directly at http://libfind.unl.edu:2020/home.html The sources are hot-linked to the catalog records. They place each resource in one of 22 subject categories (e.g. Distance Education; Chemistry; Law, Political Science, & Govt.). Then they assign "subject keywords," which are all valid LCSH headings. Each record also has an accession number. Accessing the site can show how they have designed the catalog and how the records are constructed.

Project Runeberg (Nordic Literature and Art)

Report by Lars Aronsson, coordinator of Project Runeberg
aronsson@lysator.liu.se

Project "Runeberg" started in December 1992 and has published 170 titles of Nordic literature and art in full electronic text on the Internet. This mens scanning old books, converting the text to HTML file, and putting them up on the web server at http://www.lysator.liu.se/runeberg/

Project Runeberg uses general-purpose Internet and UNIX software. Project Runeberg publishes only literature in Nordic languages, which are Swedish, Danish, Norwegian, Icelandic, and Finnish. It also publishes informaion about Nordic authors. All there is to read about Project Runeberg is on the web at http://www.lysator.liu.se/runeberg/ Project Runeberg is not a library catalog. However, they do publish information about the "collections" they "possess". The collections of electronic text, and the text itself is also published of course.


Project Aristotle

Automated Categorization of Web Resources
Gerry McKiernan, Curator, CyberStacks(sm)
Iowa State University
152 Parks Library
Ames IA 50011

"Project Aristotle(sm),is a clearinghouse of projects and research devoted to the automated categorization of Web resources. The URL for Project Aristotle(sm) is: http://www.public.iastate.edu/~CYBERSTACKS/Aristotle.htm

For each project, it's name, if known, principal investigator, project description, and relevant citations are provided. A hotlink to an available demonstration or prototype is also provided, if available. Entries are organized alphabetically by the name of the organization with which the principal investigator is affiliated."


Suggestions from Linda Hill, Alexandria Digital Library Project

Linda suggested that we should, of course, expand our investigation to include the 'traditional' approach of abstracting and indexing services which have curiously been 'traditionally' ignored by the core cataloging community. There is a wealth of information about that, including well-established thesauri for subject domains and a very respectable standard (Z39.19) from NISO - National Information Standards Organization.

In addition, we should investigate the FGDC metadata scheme and the various digital library initiatives. The Federal Geographic Data Committee has developed an extensive Content Metadata Standard for the description of geospatial data sets. You can find information about this at http://www.fgdc.gov. Mappings to MARC have been done from this standard and several other developments are going on concerning it: ISO is developing a standard based on the FGDC work and FGDC intends to base its revision of their standard on whatever ISO comes up with; and the National Biological Services of the USGS has adapted the FGDC Content standard for their purposes to describe biological data sets.

Several of the federally-funded digital library projects are working on metadata-type issues. The University of Illinois is working with publishers of engineering journals to provide SGML versions of the issues directly to the library in electronic form. See http://dli.grainger.uiuc.edu/. The University of California at Santa Barbara (the Alexandria Project) has combined the FGDC Content Standard and MARC fields to create a metadata schema for their collection. See http://alexandria.sdc.ucsb.edu.

Also, you should look at the GILS approach to a system to describe and locate primarily government-generated information. This approach is receiving wide-spread attention, in particular from Canada and from the European countries. See http://www.usgs.gov/gils/index.html.


Spring 1996 Responses


The Fermi National Accelerator Laborator project /response by Robert Atkinson

Along with several other high energy physics laboratories around the world participate in the building of a database which tracks physics preprints, now mostly electronic. The preprint file (HEP) is on a SPIRES database at the Stanford Linear Accelerator Center (SLAC) in Palo Alto. They add records to this database, and SLAC runs an automatic conversion to SPIRES of preprint cites for electronic papers on the Los Alamos e-print archive (http://xxx.lanl.gov/). They telnet to SPIRES to catalog and download selected records for conversion to MARC (including urls) and batch loading into their on-line catalog.

The project started in Dec. 1992 and contains over 1,000 records. These records are accessioned by report number. There is no classification scheme. Some records contain subject headings, assigned by a laboratory in Hamburg which are drawn from a special physics thesaurus, developed by the Deutsches Elektronen Synchrotron.

The Social Science Information Gateway (SOSIG) project in the UK/response by Debra Hiom, SOSIG project researcher.

This is a HE funded project to provide a catalog of high quality resources on the Internet. They are one of a number of subject based services that have been funded under the Electronic Libraries Programme (eLib) http://sosig.ac.uk/
The project started in July 1994 and conatins approximately 1,000 records. They are using software developed as part of the ROADS project. A variety of materials are cataloged. As part of the ROADS project they are using IAFA templates as the standard for describing resources. The use UDC as a classification tool.


The Iowa Policy and Planning Data Project (IPPDP)/response by Gregory Wool, Monographs/special Project Cataloger

The Iowa Policy and Planning Data Project (IPPDP) sought to broaden access to machine-readable numeric data files held in various agencies and offices (*not* libraries) of the State of Iowa. Its bibliographic component, the Iowa Planning Data Catalog (IPDC) extends traditional library cataloging (MARC, AACR2, LCSH) to these resources, and displays the records in a specialized database on the Iowa State University Library's NOTIS system.

What is nontraditional about IPDC, aside from the situation of a library cataloging materials it does not control, is that the records are based on information supplied by the custodians of the data on questionnaire forms they fill out. At present, the forms are paper, and the conversion to MARC is manual, but that probably could be changed easily enough. The supplied information is transferred into the records with very little editing, and the consistency of mapping questionnaire items to MARC fields, while not perfect, is very high.

The project started in the spring of 1992 and contains about 130 records. There are records from OCLC and original records for Internet resources in the catalog as well. The focus has always been machine-readable numeric data files of potential interest to policy planners and researchers in Iowa. Recently they have begun to catalog Internet sites on the same basis (but quite traditionally).


The Alexandria Digital Library (ADL) project/response by Mary Larsgaard

The following relates to the Alexandria Digital Library (ADL), one of six NSF-funded Digital Libraries Initiative grants; ADL focuses on georeferenced information, whether hardcopy.

The project started in October 1994 and contains about 3,300 records. Sybase is being used as a database manager, with Ilustra, Oracle, and O2 either currently being tried out or scheduled to be tried out. Inputting for most fields is done on MicroSoft Access, since one of the goals for the ADL is to have distributed software, so that persons who are first-time metadata creators or who seldom do such work will have an interface with which - given the near omnipresence of MS Office - they may will be familiar.

Currently ADL focuses on spatial data, and especially on remote-sensing imagery such as aerial photographs and satellite images. The aerial photographs have been scanned into digital form.

The ADL has started out using USMARC fields, AACR2R, and LCSH, and has added to or departed from them as appropriate for the materials. For example, about 26 fields that contain technical information for remote-sensing imagery (e.g., altitude of sensor) have been taken out of the General Note (500) field and each given separate fields of their own, as additions to USMARC. They have just started an Alexandria Metadata Manual, arranged by field, that provides information on each field. They keep records, by field number/name, of ADL departures from USMARC, AACR2R, and LCSH. A minor example of the latter is the use of: Earth (Planet) for items that depict the Earth's surface; LCSH does not have a subject heading for thematic spatial data items - "World maps" may be used only for general materials, and does not follow the pattern of LCSH treatment of all other planets, which is "Name of planet (Planet)".

ADL will be working with the University of Illinois and the University of Arizona on a project to merge/meld LCSH with at least one and possibly 2 other subject-heading thesauri, namely GeoRef and the Art and Architecture thesaurus, in an attempt to have as a result an SH thesaurus that will have terms appropriate for beginner or advanced user. ADL is very much involved with the efforts of the U.S. Federal Geographic Data Committee and its Content Standards for Digital Geospatial Data. At a recent meeting of FGDC with users of the Standards document, there was a move toward using SGML, so it seems likely that ADL will be working toward using SGML.

The Alexandria Digital Library has recently loaded about 3,200 records for scanned Space Shuttle photographs 3,300. This brings up another practice in ADL cataloging which is done infrequently in standard library practice, and that is the ingest be the system of records that are not in USMARC format. They construct a "crosswalk" between the fields in the set of records to be ingested; any fields that do not exist in what we call the Alexandria Metadata Schema are added to that schema.


OCLC Intercat Cataloging Project Colloquium/response by Amenda Xu, Seriall Catalogers at MIT Libraries

Field Report: Access Information on the Internet: A Feasibility Study of USMARC Format and AACR2/by Amanda Xu and Stephen Skuce
M.I.T. Libraries

When the MIT joined the OCLC Intercat Project, thier first concern was the feasibility of using MARC formats and AACR2 for describing and accessing Internet resources of various types. Are there any other information discovery and retrieval standards or techniques that can adequately replace our traditional cataloging tools?

This field report searches for answers via titles that they contributed to the Project, by mapping the data elements and the data structure designed for describing Internet resources among metadata standards such as the Dublin Core Metadata Element Set, the TEI Header, the Uniform Resource Characteristic (URC), and the USMARC format. This report compares the relative flexibility, compatibility, comprehensiveness, reliability, and sophistication of data structure for each of these standards.

The report also evaluates primary search tools currently available on the Internet, such as robot-based search engines, general purpose catalogs, and locally-created "classification" schemes (e.g. library web pages) that arrange resources alphabetically, chronologically, geographically, by subject, or in various combinations thereof. While these engines and catalogs are powerful tools for retrieving massive amounts of data, their search results are usually indiscriminate, so that the user must spend a great deal of time identifying worthwhile and reliable information. The typical library web page, on the other hand, presents evaluated materials, but usually offers limited access points, and segregates internet resources from the library's catalog.

Internet resources organized by MARC formats and AACR2 offer important benefits:

  1. they have been filtered by library subject selectors to suit the needs of a given user community
  2. they have been controlled formally and concisely via bibliographic description, authority control, and subject analysis
  3. the automated library systems in which they reside have been developed to handle sophisticated searches of very large data quantities.
But most important, Internet resources can be integrated with the millions of bibliographic entities already indexed with MARC formats.

This experience helped to understand the pros and cons of Internet information access standards and technology, with emphasis on the value of USMARC format and AACR2. It also helped to identify the limitations of, and potential ways to improve, USMARC format and AACR2.


Last updated: 2/13/97
By: Magda El Sherbini melsher@magnus.acs.ohio-state.edu