F e d e r a l D e p o s i t o r y L i b r a r y P r o g r a m ADMINISTRATIVE NOTES Newsletter of the Federal Depository Library Program [ PDF version ] [ Back Issues ] --------------------------------------------------------------------- May 1, 2001 GP 3.16/3-2:22/07 (Vol. 22, no. 07) --------------------------------------------------------------------- FDLP Electronic Collection Update Remarks by George Barnum Electronic Collection Manager Depository Library Council Meeting San Antonio, TX April 2, 2001 I'd like to take a moment as I begin this to thank Mr. DiMario for altering his travel arrangements, so that this morning's lineup had to be shifted slightly, taking me out of the "right before lunch" slot, which is never an enviable place to be. It is, as ever, a pleasure to be before you with an update on what's simmering on my stove currently. My objective this morning is to update you a bit, and remind you as well, about where we are and what we're doing about the creation of this "comprehensive digital library of U.S. Government information." First some background. In April, 2001 we are almost five years into the transition to a more or mostly electronic FDLP. We can observe that in practice the FDLP has evolved to perform four broad functions: * Deposit. The functions that relate to selection, acquisition, distribution, and physical control of publications (classification, etc.) by GPO, including the retention of ownership of deposited publications by the Government, and inspection to assure compliance; * Assurance of current and permanent public access, including the requirements made of depository libraries for free access to the general public, retention schedules, and service to users of Government information; * Provision of locator tools, including the statutorily mandated catalogs and indexes GPO produces as well as bibliographic description and other types of finding aids; * Promotion and facilitation of use, including training opportunities, conferences, and marketing. It is in the first two categories, deposit and assurance of access, that the transition to a more electronically-based program has had the most fundamental effect. In the print world the system of deposit provides a stable and secure environment in which information is, as a by-product of the legal requirement that Government printing be either performed or contracted for by GPO, funneled into a geographically distributed and fairly closely regulated system of outlets. In the Internet environment, federal agencies no longer have an imperative to involve GPO in the dissemination of their information, and the need for redundant housing of copies of publications to achieve geographical equity is obviated by the ability to use a single source from multiple remote locations. At the same time, needs and expectations on the part of librarians and library users for access to this information have grown. The attempt to reinvent distributed, permanent access has centered on the creation of the FDLP Electronic Collection, a digital library conceived on fairly traditional library collection development principles, and consisting of an interdependent set of locator tools, user interfaces, links to content on agency servers, a digital archive, and various kinds of metadata. The collection is being built using a standard collection development document which emphasizes a blending of new and adapted roles for the depository program. Fundamentally, the FDLP must continue to provide access, through its network of designated libraries, to the information that its enabling statute describes as being in scope. The everyday realities of providing both actual electronic access and bibliographic/intellectual access tools have been in a state of almost constant change since the first introduction of electronic products in the early 1990s. Previously the processing of materials from the printing press through GPO's verification and distribution mechanisms and into libraries was a highly detailed process not far removed either in concept or practice from other mass-production processes employed in a large printing and publishing concern. The shift to a digital FDLP has altered this model, changing the skills and workflow required to provide access. As you well know, the size and composition of the workforce performing these tasks at GPO is changing, with an increase in the need for so-called knowledge workers superseding the need for production-line materials handlers and lower-level clerical employees. Many of you are seeing a shift in the skills needed within your libraries and documents operations as well. I've now been back at GPO as Electronic Collection Manager for 18 months. During my first tour at GPO, between mid-1997 and mid-1999, we began in earnest to erect the framework of the "more electronic FDLP" described in the 1996 Transition Study and Transition Plan. We published the Collection Plan in 1998, giving the Electronic Collection not only its name but its basic structure: the universe of U.S. Government electronic publications divided into four broad categories: * Core legislative and regulatory GPO Access products that reside permanently on Government Printing Office (GPO) servers (e.g.: Congressional Record, Bills, Slip Laws, House and Senate Reports & Documents) * Products which GPO manages on the GPO Access site, and content partnerships * Products that GPO identifies, describes, and links to but which remain under the control of the originating agencies * Tangible electronic Government information products distributed to Federal depository libraries (e.g.: CD/ROM; DVD; floppy disk). In the context of the Collection Plan we identified key areas of activity that, taken together, comprise an architecture for the collection: * Intake 1. Discovery 2. Evaluation 3. Selection 4. Acquisition * Registry 1. Item number/Classification assignment 2. New Electronic Titles * Storage 1. FDLP/EC Archive 2. Partner Archives 3. Agency Agreements * Cataloging and Locators 1. CGP 2. PURL 3. Browse Topics * User Interface We've been active in every one of these areas, and all the areas are closely interconnected. I want to focus today on our archive activities, but bear in mind that it's impossible to talk about archiving without talking about many of the other areas of activity. To begin with, I want to explain, once again, just what we're doing about archiving for permanent public access. Remember that that's our point: permanent public access. We're not an archival repository in the traditional, "preserving essential evidence" sense. Our strategy is (and has been) that no single solution will be the be-all and end-all of archiving. Thus we're putting together a varied menu of solutions that includes: * Agreements with agencies that are willing to guarantee that their publications will remain available on the web, from the agency server, for all time (the most recent are NCLIS and NLM) * Partner sites such as UIC and North Texas which have specific emphases * The core legislative and regulatory material on GPO Access, which is permanent by statute * Our own on-site archive We are also investigating other kinds of solutions: * Archiving on servers operated by contractors/vendors (including redundant/mirror sites or failsafe arrangements * The Stanford LOCKSS model, which distributes copies to caches in a ring Our own archiving effort is in full operation, for publications that meet the following criteria: * Electronic only in the FDLP (no paper distribution) * Not covered by an agency agreement * Not included in a depository partnership * Not available only in a proprietary format or with proprietary access software We are capturing publications, listing them in NET, cataloging them (as appropriate) in CGP, and retaining a copy of the captured publication on our servers. Every one of these new additions receives a PURL. Those links are being checked regularly and when we discover that a publication is no longer available on the publishing site (by that check or by being told by someone else) we verify what has become of the publication and redirect the PURL to the archived copy. This actually happens with two titles currently. The final piece of our initial development of this model is the screen that appears to the user when that redirect to the FDLP/EC Archive takes place, informing the user that the copy they're getting is an archived copy. Is this system perfect? No, probably not. Is it working so far? Yes. Is it a retrospective effort covering every single electronic title we've ever heard of? Nope. We started it in a full scale way in 2000. That said, I can tell you that we're negotiating right now with a depository library to do the retrospective work that will get all the pre-FDLP/EC Archive PURLs (about 2500) and URLs (about 2200) checked, verified, and archived (and in the case of the URLs, PURL-ed). I can't reveal yet which institution is bravely contemplating this task, but an announcement will be forthcoming VERY soon, and if YOUR valiant institution would like to help out in the effort, see me and we can talk. In January we went live with a page of Frequently Asked Questions about the Electronic Collection and our discovery, archiving, and cataloging activities. Let me encourage you to check the FAQ at + http://www.access.gpo.gov/su_docs/fdlp/ec/faq.html As a little sidebar to this archiving activity, I want to talk a bit about the best library conference I've ever been to. I was fortunate to give a paper (that I wrote with the collaboration of former Transition Specialist Steve Kerchoff) at "Preservation 2000: An International Conference on the Preservation and Long Term Accessibility of Digital Materials" sponsored by the Cedars project in the UK, and held in the north of England at York in early December. In one room were 150 people from libraries across Europe and the US who are making the preservation of digital library materials happen. It was there that I learned about (and got very excited about) LOCKSS, and heard about initiatives at the National Library of Canada, the Bibliotheque Nationale de France, the Cedars group in the UK, and the National Library of Australia, among others. I learned that our activities compare extremely favorably with those cutting-edge projects. I also learned that the National Library of Australia, whose digital library program is extremely successful, planned their work in much the same pragmatic, seat-of-the-pants way that we have, and are actually using some of the same cheap-but-good software that we are for capturing publications from Web sites. The proceedings from the conference can be found at , and my presentation has appeared in Administrative Notes (v. 22, # 5, 3/15/01). OK, back to the update. Most of you know that we have been working on various sorts of projects with OCLC, Inc. for the last several years. At the conclusion of the GPO/OCLC/ERIC pilot project, we began talking with them about the part of that project that we felt never got addressed: digital archiving. Through late 1999 and into last year, we worked on developing a high-level requirements document that describes a toolkit for discovering, documenting, saving, storing, and cataloging electronic publications. We are presently at work on an investigative project in which OCLC is prototyping a system to manage all these functions in an integrated environment. It will be based on the CORC interface, with lots of added functionality. OCLC has promised us a first working model by summer. We hosted a site visit by some members of the development team at the big red buildings last week, and it's safe to say that everybody who participated learned a lot. The initial prototype will give the staff in DAB a tool for making sure that all the bits and pieces of information that are created and recorded while we're acquiring a publication are done in a systematic way, and that the information, such as item and class, PURL and originating URL, dates, and all the rest, are turned over into other parts of the process, like archiving and bibliographic control. The site visit was a real eye-opener for the OCLC project team, most of whom are pretty solidly of the dot-com generation, and who've never really come across a production-line setting like ours. As the project develops, we hope to gain functionality for identifying electronic publications (the part I like to call 21st century gray bins, in honor of our venerable old GPO sorting bins that help us sort out newly received pubs for classification) for storage on a "vault" server operated by OCLC, and for other activities surrounding the actual download or "harvest" of publications. The tools that we hope to come out of this project with will support the functions that we've had the most trouble with since we began dealing with electronic publications: the acquisitions and classification tasks that are supported by our old mainframe systems, and their interaction with the creation of NET and the FDLP/EC Archive. You've heard about our systems modernization efforts from Gil Baldwin earlier this morning, and will hear more detail from Laurie Hall. We're hopeful that this project will dovetail in nicely. There is another major project that we're at work on that I want to report to you about. As you probably know by now, the Census Bureau has (finally) firmed up the dissemination plans for the Census 2000 data products. You've been walked and talked through this a number of times already, I'm sure. There will be basic printed reports. All the Summary Files (which correspond to the Summary Tape Files of yore) will be available on disk with accompanying retrieval software (it's looking like DVD will be the optical medium of choice) and these will be distributed to depositories. In addition, the Summary Files will be available as compressed comma-delimited ASCII files by ftp from Census.gov. This was very welcome news for us because, from a permanent access point of view, the DVDs with proprietary software aren't very good. We are now in negotiation with a major academic library depository to establish a partnership in which the ASCII Summary Files will be monitored, downloaded, and archived by the partner. They also hope to develop a front end for using the files. This will be a specifically FDLP effort, and Census is very supportive, since it will help reduce load on their servers and make the data more widely available. We'll be making a more formal and detailed announcement very soon, as well as looking at other accompanying projects. I wanted to wrap this update up on a positive note, saying something hopeful and upbeat. It's a ticklish time for that, since so much of what lies ahead is unclear to us at this point. I can look with some satisfaction over the time I've been at GPO and point to some real progress in creating the more electronic FDLP. If you go by the numbers, our numbers are strong. There are many ways to slice and dice that figure of 61%, but the reality is that we're handling ever-more electronic publications, and finding ways to cope with this transitional period. A lot of our biggest challenges currently are fairly transient in nature, centered on keeping up with our "traditional" work while incorporating the electronic pubs in a way that we feel is consistent with the statutory mandate. We continue to feel that eventually we will see most of the publications in the program disseminated in electronic form. We're certainly taking as active a stance to accomplish that as we can, but we're also led to a great extent by the agencies themselves, who continue to buy printing services from GPO.