ADMINISTRATIVE NOTES Newsletter of the Federal Depository Library Program [ PDF version ] [ Back Issues ] --------------------------------------------------------------------- February 15, 2001 GP 3.16/3-2:22/03 (Vol. 22, no. 03) --------------------------------------------------------------------- EIDS Update Remarks by T.C. Evans Director, Office of Electronic Information Dissemination Before the Federal Documents Task Force Government Documents Round Table American Library Association Washington, DC January 13, 2001 Introduction As always, I appreciate the opportunity to update the library community on the current and future state of GPO Access. Since we last got together, many things have happened and many more are in the offing. Of particular note is the death of Marty Mehlberg, who as the manager of the Text Processing Section worked tirelessly to ensure the availability and integrity of GPO Access data. He is sorely missed. I would also like to note the recent retirement of another key cog in the birth and development of GPO Access, Russ Duncan. Russ headed the Graphic Systems Development Division and personally programmed many of the applications on GPO Access and he will also be greatly missed. Size GPO Access continues to grow, with over 1,700 official government databases offered through some 80 applications. At this time, over 200,000 electronic titles are available through the FDLP Electronic Collection, with more than 116,000 titles on GPO servers and almost 84,000 titles linked to from GPO Access. Usage GPO Access usage continues to amaze, with recent months bringing us to some significant milestones. The more than 26 million retrievals in October propelled total usage of GPO Access to over 1 billion documents retrieved since the service premiered in 1994. The average number of monthly retrievals is steady at just above 26 million and the average size of these documents is currently about 49Kb. According to the Center for Advanced Computing Research at CalTech, 2Kb equals one typewritten page. Therefore the average document retrieved from GPO Access equates to some 24.5 typewritten pages. This means that the average number of monthly retrievals from GPO Access measures almost 1.3 terabytes in size and is equivalent to 637 million typewritten pages. As is usually the case, information on hot topics always brings on a burst of use. The Supreme Court decisions relating to the recent election certainly fit this profile. On the day after the Court released its final decision, the site recorded some 6.4 million page views, compared to 1.5 million page views during an average month. User Support contacts went through the roof at the same time, with the GPO Access User Support Team handling nearly 5,500 e-mails from the Supreme Court site in a single week. This represented almost three times the monthly average number of e-mails. Referrals to GPO Access from other Web Sites We have begun monitoring the number of referrals to GPO Access from other Web sites and which sites are most often referring users to us. This is accomplished through the use of referral logs that record the host domain from which a referred user was directed to one of the pages on GPO Access. It is important to note, however, that this does not measure the number of links established on other sites using the GETDOC feature that pull documents directly from our databases. It has been most gratifying to see just how many referrals occur in the short time we have been analyzing these logs, as well as the broad array of sites who direct users to us. The numbers have remained remarkably consistent for the first two months we have analyzed. For October and November we averaged some 600,000 referrals, with about 58% of those coming from no specific referrer. This group includes referrals from favorite lists stored on Web browsers, search engines who do not forward referral information from their results lists, and others in which no information is provided from which the referrer can be identified. There is another three to four percent whose address cannot be resolved from the information provided. The largest identifiable group of referrals come from other Government sites. Representing more than 17% of the total, this is clear evidence of how the information of GPO Access is used to facilitate the missions of agencies across Government. It is also evidence of the broad diversity of the constituencies who are served through use of the products and services we provide. Next in the referral pecking order are the dot-coms at between 11 and 12%. Heading the list are a string of popular search engines, led in both months by Google. It will be important to remember this category when I discuss our search engine project in a few minutes. No other category approaches 10%. This includes education (.edu) addresses at a little more than 5% and organizations (.org) at just under 2%. Based on a list of domain addresses for Federal depository libraries maintained by the Library Programs Service, depository sites account for approximately 3% of the total referrals to GPO Access pages. Top among the more than 500 depository addresses are sites at the University of Maryland, the University of Michigan, Louisiana State University, the University of North Texas, and Vanderbilt University. There are about 400, however, which send more than five referrals per month, which is the number commonly suggested to be the best indicator of at least one prominent link at the site. Part of the impetus for reviewing these referral logs came from a request to determine how many referrals we have been receiving from FirstGov. In the first two months, their totals have consistently represented about one half of one percent of the total referrals received. It is interesting that we see FirstGov referrals from 11 different addresses that are not redirects. This means that they are maintaining these as separate individual sites. As more data is received and time permits, additional analysis will be performed. I will continue to report these results as they become available. System Performance System performance has improved and efforts to enhance system response time continues. The increased bandwidth easily withstood the onslaughts during the election, although a severe strain was placed on our server farm. At one point during the process, the Supreme Court materials were being served from 19 servers through the server controller array. We are currently exploring a relationship with a prominent content delivery network that should produce dramatic additional improvement in service from GPO Access. This will allow for copies housed on some 8,500 servers located at Internet service providers around the world to quickly supply copies of large and popular files to nearby users, while greatly reducing the load on our server farm. What's new on GPO Access There are a number of recent changes to GPO Access that should be mentioned. The most notable are: * The Economic Report of the President 2001 is now available. * A new browse feature is available for the United States Code. The browse feature allows users to browse individual U.S. Code titles, down to the section level, for the latest available update. * The United States Government Printing Office Style Manual, 2000 is available. * The United States Government Policy and Supporting Positions, 2000 (the Plum Book) is available. * All volumes of The Public Papers of the Presidents of the United States covering the period of 1994 through 1998 are now available. * A new browseable table of contents feature is available on the Weekly Compilation of Presidential Documents, beginning with the first issue of 2001. What's on the Horizon for GPO Access As always, work is under way to add more content to GPO Access and to refine access to the materials already provided. Some key examples of current efforts are: * An agreement with the Department of Labor to put the Davis-Bacon Wage Determination materials on GPO Access has been reached and a written memorandum of understanding has been delivered for signature. The application has been built and reviewed by Labor, and the release date is currently scheduled to coincide with the release of the new basic manual in early February. * The FY 2002 Economic Outlook, Highlights from 1994 to 2001, FY 2002 Baseline Projections will tentatively be available on GPO Access and for sale through the Superintendent of Documents, January 16, 2001. * EIDS staff is in the process of evaluating Helpdesk software for customer support that will further improve customer service available through the GPO Access User Support Team. * An eCFR application, which will be updated daily as opposed to the current quarterly updated Code of Federal Regulations application, should be fully available by summer. * As a result of the development of the free eCFR application, the Sales program is developing a new e-mail subscription service. Customers will be able to purchase subscriptions that will allow them to be notified via e-mail of any changes in one or more CFR titles and/or parts, as they are published in the Federal Register. Search Engine Project The fifth installment of our ongoing effort to improve the accessibility of GPO Access resources through popular search engines has been completed. Although the full report is available on the Federal Bulletin Board, the following stood out among the results of this effort: * The numbers indicated that overall performance again declined, with test searches returning a top-30 hit only 25% of the time. Top-10 returns dropped to 21% for all searches. * Of the seven GPO Access pages studied, four did improve in top-30 performance (U.S. Government Online Bookstore 67%, Ben's Guide to the U.S. Government 43%, The Catalog of U.S. Government Publications 32%, and the Federal Register 15%), but three did worse (CBDNet -3%, GPO Access Home Page -23%, and the Congressional Record -23%). Sadly, the decreases were in the pages that are most successful, bringing the overall average down. * Of the 23 search engines studied, 12 increased in top-30 performance, 10 decreased, and two remained the same. GoogleUncleSam was far and away the best performer, as 46% of the searches yielded the appropriate GPO Access page in the search results. The next best was Excite at 37%, then Magellan at 34%, and Google, Lycos, and MSN Search rounded out the top five at 31% each. FirstGov finished in a tie for 10th at 26%. Yahoo, the search engine/directory/portal discussed in an interesting article by Laura Cohen in the January issue of American Libraries, found the appropriate GPO Access page only 20% of the time. * It was clear that what we have done to date is not working and that it is not easy for potential users to find the resources of GPO Access through these search engines. * In addition to search engines, directories and other portals were examined as well. Based on our initial exploration, it is clear that we need to learn more about them and how they work before we can adequately measure their performance as relates to GPO Access. Reportable results should come out of the sixth installment of the project. As a result of these findings, we have taken a number of steps to improve performance and the results of these actions will be measured as part of the sixth installment of the project. In doing so we reached the end of those things that could be done for free, so we have begun to test the use of methods which have a cost associated with them. The steps taken are: * We revamped the meta tags imbedded in our study pages based on excellent feedback received in an open forum held at the Depository Library Conference in October. * We have begun to insert Dublin Core metadata elements into major GPO Access pages to aid search engines index these resources. * We have subscribed to a submission service that registers our pages with over 1,000 search engines and directories each month and provides us with reports on the success of their efforts. * We have purchased and begun using software to continually provide fresh submissions of our own to more than 1,000 search engines and directories. In addition to monitoring the effects of these efforts to improve our positioning, we are also continuing our research to find out as much as possible about how search engines and directories work. There are a number of challenges to this, including the fact these organizations are extremely reluctant to discuss their methods with us. Another disturbing challenge is the apparent commercialization of the process. There are increasing indications that the industry is moving more and more to allowing sites to purchase positioning through the use of techniques that afford them to achieve favored status in the indexing process as a result of buying keywords or advertising. We have begun exploring the process of keyword buying and plan to test the procedure, despite our philosophical disagreement with the practice, to see if it can improve performance. The implications of this last trend are disturbing. We do not yet know how pervasive the practice is at this time, but we are compelled to learn about it and test it if we are to achieve the goal of improving the visibility of the products and services of GPO Access. The popularity of these imperfect tools demand it, since our most recent survey indicated that one third of the respondents stated that they had found GPO Access through a search engine. Thank you for your attention and I urge you to stop by Booth Number 347 and see the additions and changes to GPO Access. As always, I want to thank you for your feedback and I look forward to discussing your ideas for a better GPO Access during the conference.