2.1 Background and Introduction

This is a multi-group proposal from the University of Michigan for coordinated research and development to gain insight into the creation, operation, and use of advanced digital libraries. The participants are faculty, research staff and students from the Department of Electrical and Computer Engineering, the School of Information and Library Studies, the Department of Atmospheric, Ocean and Space Sciences, the University Libraries, the Computer-Aided Engineering Network, the Information Technology Division, Ann Arbor Public Schools and Library, the New York Public Library, Bellcore, McGraw-Hill, UMI, Elsevier, Encyclopedia Britannica, IBM, Apple, and Kodak. We also have access to significant public domain collections of scientific data and information. We are using the term "digital library" as the generic name for federated structures that provide humans both intellectual and physical access to the huge and growing world- wide networks of information encoded in multi-media digital formats.

Most all new information is now born in digital forms (computer created documents, DDD audio recordings, digital video recorders, photoCD, scanners, etc.) and the rate of retrospective conversion is large and growing. Information (1) encoded in digital formats can be represented by electronic, optical, and magnetic phenomena and thus transmitted and processed near the speed of light and stored at atomic-scale densities. This "de-massified" information world also operates at minuscule energy levels with respect to the world of paper and transportation.

The fundamental mission and centuries old tradition of libraries and the library profession has been to provide intellectual and physical access to -- and the preservation of -- the human record. Although this fundamental mission will continue to be vitally important, the manner in which libraries fulfill this mission is being radically altered by the sudden change in the physical basis for information representation. This shift will alter the structure and process by which humans create, find, and use/re-use information they need and want. In particular, the digital library has the potential:

But the realization of these potentials requires much more than "bitways" and network navigation of hierarchies of shared files. Although "surfing of the Internet" provides physical access to useful information, we need much better systems to provide productive, selective and relevant intellectual access in reasonable time. It requires fundamental research, and development of user-centered pilot projects to address the complex array of technical and socio-economic issues. It also requires major investment in educating and re-educating the human resources to create, manage, use, and educate others to use this new world of digital libraries.

Recognizing the potential of digital libraries and related concepts such as the collaboratory, knowledge management models, and ubiquitous computing, the University of Michigan has set a strategic course for leadership in research, exploratory development, and education of the human resources to create, understand, and utilize such environments. In particular, many at the University are convinced that these ideas are the basis not only for new forms of commerce, but also for new forms of scholarly communication, new modes of collaborative-based learning, and indeed new forms of world universities.

Although the potential is high, we understand that reaching the potential requires a long-term investment and involves mastery of a complex array of technical and social/organization challenges. The performance of the raw technology is now well ahead of our understanding of how to apply it to serve real people doing real things. Without success in the type of research and development proposed here, our greatest fear is that the technology will create disorder, confusion, and much wasted time.

As a major part of this strategic agenda, the President of the University, James Duderstadt, has given the School of Information and Library Studies (SILS) new leadership, new resources, and a new future-oriented mandate. The three part mandate is 1) to restructure and augment itself to educate the information/knowledge resource professionals to lead us into the new age of digital libraries; 2) to be a home and bridge builder for multi-disciplinary research focused on the design and understanding of these systems; and 3) to be a "skunk works" for creating testbeds to better understand and demonstrate the application of information technology in ways that contribute directly to the strategic, fundamental mission of the University (creation, dissemination, and preservation of knowledge).

To move quickly on the first part of the mandate, we have submitted a large five-year proposal to the W. K. Kellogg Foundation to enable us to form and lead a national consortium which will define and pilot a new program to educate information resource professionals for the world of digital libraries. The proposal has received strong support from Kellogg senior program officers and will be recommended for funding to the Foundation's Board of Directors in February 1994.

We are now submitting this multi-group proposal to enable rapid progress on the other parts of the mandate. We seek a four-year cooperative agreement under the NSF/ARPA/NASA Digital Library Initiative for coordinated experimental research to explore basic issues in the structure and behavior/function of large-scale, distributed (but federated), and continuously evolving digital information environments. The University and other external partners are offering significant co-investment.

We will focus particularly on mechanisms to support humans in the timely, relevant, and economic harvesting of information they need and want in a vast and growing "information wilderness." Our ultimate goal is to help create an environment in which people (working alone or in groups) have on a desktop a personalized ("special") library built upon collections of world wide information sources. We want an environment that makes clients smart about information and its use.

As illustrated by the "UMDL Venn diagram" in the figure below , "Constituent Competencies," the project is organized around the synergistic intersection of three sub-activities: relevant basic research in computer, information, and social science; design and construction of an evolving testbed system; and its deployment, use, and evaluation. These activities are built upon the complementary strengths of multiple groups, both internal and external to the University. Figure 2.1 also summaries the competencies and partners in the project. The additional figure, "Overview of the UMDL Project and Proposal," is a graphical overview of the project and the proposal which we also hope will help as a reader's guide.

Constituent Competencies


Figure 2.1. Illustrates the linkages between participants, research specialization, and the three focal points for the projects research.

Overview of the UMDL project and proposal


Figure 2.2. A roadmap for the entire project with sectional references to the proposal.

2.2 Relevant Basic Research

Research of the broad goal of personalized harvesting ("berry picking") in the information wilderness is organized around agency-based architecture. Beginning from the perspective of the human's desktop (or other personal information appliance), we will explore the creation and evaluation of 1) information viewers and query facilities; 2) supporting search and retrieval services; 3) data and document structures; and 4) collection and access management services. Key issues for investigation in each of these areas are summarized below:

Information viewers and query facilities

Adaptive, context-dependent ("what for and when needed?") and experience (novice/expert) dependent interface agents.

Search and retrieval services

Methods for creating agents to conducted distributed database searches and make decisions with incomplete information.

Data and document structure services

Methods for structuring and extracting meta-data and native structure from documents and continuous media.

Collection and access management services

Development of protocols and ontologies for network federation across disparate systems.

2.3 Design and Construction of a Testbed

The proposed research is focused and grounded by the goal to design, construct, deploy, and evaluate a testbed of a digital library. Although much of the research will be generic with respect to information subject area, our testbed will focus on the subject domain of earth and space science (ESS). This choice of ESS was motivated by the following considerations:

As part of this proposal, the University of Michigan plans a comprehensive deployment activity, both on and off the University of Michigan campus. Partnerships have been established with both the publishing community as well as with a representative user community that will allow us to undertake testing and evaluation of the research proposed under realistic user conditions and with a large and representative collection. The fact that we have these relationships already established, combined with the existing development of an image based digital library system (DIRECT) already in place at the University, will allow us to begin initial deployment of the testbed immediately.

NSF is currently supporting an experimental collaboratory for upper atmospheric research involving Michigan scientists and colleagues in Denmark, the University of Maryland, and the Stanford Research Institute. The digital library testbed will be made available to the researchers in this project, and will provide an important supplement to the ongoing collaboratory research. The Foundation has also awarded several curriculum development projects to the Michigan AOSS and Computer Science faculty to develop new approaches to teaching and learning in this discipline at the high school level.

Basic science curricula in high schools contain earth and space science concepts, making these topics highly relevant to 10-12th grade students. Through our digital library, it will be possible to provide these students with a broad range of information in a variety of formats and to connect them to the researchers and research activity at the University, as well as at remote specialized laboratories. We have developed working relationships with several high school programs in Ann Arbor, and are developing them with Bloomfield Township, Battle Creek, the New York Public Library, and Styversant High School in New York.

In addition to having significant links already established to these test user populations, we have also forged strategic alliances with several primary and secondary publishers of key journals, textbooks, magazines, and other reference materials relevant to this domain. These publishing partners, including Elsevier, McGraw Hill, and University Microfilms, have agreed to make available an extensive array of published materials, in full text, formatted form, for the purposes of this research projects. Initially many of these materials will be provided in image (scanned) form. However, we have agreements with all of these publishers to work aggressively with them to capture much of the material in source digital form, and to provide it to us over time in a structured digital form such as SGML.

The collection will include journal, monograph, and reference material which will span the range of user sophistication and types of resources--e.g., journals such as Aviation Week and Space Technology, Remote Sensing of the Environment, and Atmospheric Research; the McGraw Hill Encyclopedia of Science and Technology; and Elsevier's major bibliographic tool GEOBASE.

Data sets to be included in the testbed will come from a variety of sources including federal, professional associations, and academic providers. Partnerships with academic units at the University will allow us to include significant data such as: EPA Air Quality Archives, I.R.I.S. Real Time Seismic Data, and UNIDATA Real-Time Meteorological Data. These will be complemented with geographic data including Michigan Land Use/Land Cover Data, N.A.S.A., G.I.S.S. Global Vegetation and Land Use Databases, and an array of U.S. Geological Survey data.

We start deployment of the testbed with a significant advantage: a small software development project at the University of Michigan, DIRECT (Desktop Information Resources and Collaboration Technology) has produced a prototype digital library system for image-based documents. Funding for DIRECT has come from Digital Equipment Corporation and internal University funds. The initial deployment of DIRECT has been undertaken with a journal set provided by Elsevier Science Publishers under its TULIP (the University Licensing Program) initiative.

2.4 Deployment, Use, and Assessment

The client (user) communities for the testbed will include expert researchers, graduate, undergraduate, and high school students, and the general public. We will build a microcosm of content levels and media types ranging from page images to interactive, compound documents and real-time interaction with real time scientific date, replays of collaborative sessions, and human expertise. We will also be addressing issues about how users of the UMDL add content back into it the digital library.

Participants in this part of the project include clients and staff of the University of Michigan Libraries; clients and staff of the Business and Science Division of the New York City Public Library System; faculty, science teachers, media specialists/librarians and students involved in creating and applying knowledge scaffolding strategies for learner-centered science education at selected sites in the public schools and libraries mentioned above; and the international research community evolving around the Upper-Atmospheric Research Collaboratory project centered at the University of Michigan.

Usage will be monitored, both for any billing needs and anonymously for usage studies and research. Usage statistics will be fed back to the researchers and user-studies groups, who will in turn suggest changes, improvements, and new features for further testing. By using this iterative process through the duration of the project, the system will continue to evolve both to incorporate new research results and to meet the changing needs of the user community.

Currently, learners -- be they high school students or library patrons -- have very limited access to timely scientific information. Students must make do with textbooks that can be years out of date, while library patrons are confronted with digests of scientific information. The UM Digital Library effort should provide a model for how learners can make "interesting use" of timely, primary source, scientific information. That is, learners -- be they high school students or adults -- can access and manipulate scientific information to answer questions that they have; they can carry on informed discussions and arguments using those data. The challenge, then, of this effort from the educational standpoint, is how to scaffold the process so that learners can indeed transform information into knowledge and understanding.

Critical to the exploitation of these resources will be ongoing programs of training, user assistance, and outreach to promote use of the digital library. Closely associated with these issues of user support is the ongoing development of the collection of information resources through continued partnerships with information providers, including commercial, governmental, or academic sources. The user support structures envisioned for this project will bring together these themes of technical assistance, user skill development, and responsiveness to user need both in terms of tapping existing information resources and the development of future resources.

2.5 Project Organization and Management

The project will be managed by a Project Operating Committee chaired by the Project Director, Professor and Dean Daniel Atkins. The Project Operating Committee will consist of representatives of each of the three areas of the project: basic research, testbed design and construction, and deployment and assessment. This committee will meet weekly to review the project status and direct or resolve operational issues. In addition, the operating committee and the entire project team will receive counsel from an external advisory group consisting of Peter Banks, Dean of the College of Engineering and Professor of Space Science, University of Michigan; John Seely Brown, Vice President and Chief Technologist, Xerox Corporation; Richard LeFaivre, Vice-President Advanced Technologies Group, Apple Corporation; Karen Hunter, Vice-President and Special Advisor to the President, Elsevier Sciences Group; Anne Okerson, Director, Office of Scientific and Academic Publishing, Association of Research Libraries.

1. We will use the term information to represent the broader hierarchy of data, information, knowledge and, if you like, wisdom. Return

2. The digital library, appropriately generalized, is strongly related to the concept of a "collaboratory." As part of a collaboratory, the digital library supports the usual library function of informing and diffusing intellectual work. In addition, however, it offers the potential for capturing not only the end products of intellectual work, but also the process and rationale, both formal and informal, by which they are created. Return


Return to Table of Contents

Return to the Main Page

Comments or questions may be sent to: UMDL.INFO@umich.edu