March 9-11, 1997
Report Version: September 20, 1997
Supported by a Grant from the National Science Foundation
(NSF-IRI-9712586) to the
University of Michigan School of Information
Daniel E. Atkins, Project Director
atkins@umich.edu
This report was written by Paul Duguid (duguid@socrates.berkeley.edu) with editing and modifications by Daniel E. Atkins, based upon input from many workshop participants.
Three years after the launch of the Digital Library Initiative (DLI) and as the initial period of funding draws to a close, this workshop was convened to consider the next step in this vein of research. The very broad title of the workshop, "Distributed Knowledge Work Environments," was deliberately chosen to encourage thinking that would transcend current notions of digital libraries. The consensus after the workshop was, however, that the phrase "digital library" did not overly constrain another round of advanced initiatives and that the phrase "distributed knowledge work environments" was unnecessarily broad. It was noted, however, that the concept of a "digital library" is not merely equivalent to a digitized collection with information management tools. It is rather an environment to bring together collections, services, and people in support of the full life cycle of creation, dissemination, use, and preservation of data, information, and knowledge. The challenges and opportunities that motivate an advanced digital library research initiative are associated with this broad view of digital library environment. Additional digital library research will also both exploit and help motivate investments in advanced networking and high-end computation.
The participants included representatives of the initial DLI projects, representatives of the DLI funding agencies (NSF, DARPA, NASA), and representatives of various bodies both public and private involved in similar activities and/or for whom the work in progress offers considerable promise in dealing with the mounting problems of collecting, archiving, processing, and presenting digital data. (For a list of participants, see Appendix A.)
With the spread of the Internet worldwide, and the large-scale adoption of the World Wide Web as an environment for publishing and sharing, our society has stretched its vision of the reality of the anytime, anyplace, any format digital information world. We now more widely realize both the potential and the shortcomings of the Web, and the importance of improving the utility, effectiveness, performance, scalability and sustainability of current and future digital services and collections. Although the Web provides access to millions of sites, it can obscure quality, genre, and source of information.
Previous private and government investments in initiatives on scientific databases, collaboratories, and digital libraries has advanced the state of the art as well as education in these areas. It has helped the United States build upon its earlier lead in electronic publishing and information access technologies, and provided glimpses of how research, learning, government and commercial activities can become more competitive in a world entering the Information Age. Large and important sectors of the United States economy depend upon efficient support of knowledge workers, and large-scale funding of digital library projects in Australia, Japan, Singapore, Korea, United Kingdom, and Europe provide opportunities for joint ventures as well as encouragement for further United States initiatives.
Work on digital libraries aims to help with generating, sharing and using knowledge. It aims to improve practices of communities so they are more effective, efficient, productive and maximize the benefits of collaboration. It seeks to extend the content and utility of digital libraries to aid existing communities and to facilitate the emergence of new communities of discourse, research, and learning. Communities in this case are defined on multiple dimensions: geography, common interests, values, needs, culture, language, goals, etc.
The workshop quickly revealed a strong, widely shared sense of the progress that has been made in understanding the primary digital library issues over the last three years and of the direction that future work might take. This report sets out to map that direction in broad terms, noting first the promise that participants saw in digital library research (section 2). It then goes on to discuss the central issues around which it seemed particularly important to frame future research in order to fulfill this promise (section 3). It presents this framework under the headings "System-Centered Issues," "Collection-Centered Issues," and "User-Centered Issues." (No priority is implied in this order.) It next turns to consider various interested groups with whom it seemed likely that partnerships would prove synergistic (section 4). Some of these are public institutions and some private. The report then reflects the many discussions held about the structure of future research, including questions of size and duration of projects (section 5). Finally, it offers some conclusions drawn at the workshop.
The participants of the workshop represented different interests and came from a range of distinct academic disciplines and sectors of society. At the workshop they divided into four heterogeneous breakout groups that worked together for two days and then reported back in a plenary session. (For the participants in each group, see Appendix B)
Given the diversity of participants, there were, inevitably, diverging views. It would be misleading, then, for this report to suggest that there was simple unanimity. There was not. For example, some favored research in high-end systems with specialized testbeds, others championed a more populace approach; some took a tool-centered view, others were more system- or user-oriented. Few, though, argued that any of these views were mutually exclusive. Indeed, most argued for future initiatives that should embrace them all. The four working groups created basically the same set of research issues and programmatic features but with different emphasis. Thus, at a high level, this report is genuinely a community consensus.
Over all, there was remarkable agreement on both the need for a complementary research program and the general direction it should take. Indeed, when participants in a concluding session were asked what they had heard that had surprised or discomfited them in the reports from groups in which they had not participated, there was little to report.
Dan Atkins of the University of Michigan opened the workshop by laying out the "vision space" created by the convergence of technological and social forces. He with Y.T. Chien and John Cherniavsky of NSF depicted a future of converging platforms, converging institutions, and converging media coupled with diverging possibilities and uses for digital libraries. Part of this future could be understood by extrapolation from present practice and technologies, Atkins argued, but much would require more radical insight, innovation, and transformation.
The participants responded to this plea for creative thinking with a variety of pictures of possible futures. The goal of creating large scale, ubiquitously accessible, fully integrated, user-centered digital libraries to support both learning and knowledge work was widely supported. Here, in particular, the value of digital library research for education, research, and commerce was especially clear.
Among the specific challenges researchers set themselves were:
Digital library research, it was argued, should provide users with toolkits to overcome the risk of an information surfeit, allowing people to navigate, to make sense, and to use productively increasingly rich and heterogeneous data sources. Overall, the goal of supporting analysis and discovery in the emerging information space was seen as a crucial promise.
Others pointed to the potential of transnational and multilingual databases to promote intercultural and international harmony. The pursuit of technical, semantic, and linguistic interoperability was presented as viable and particularly important. Several people noted that we should avoid an English language only approach.
In general terms, some of the visionary goals that would be made possible through the "coevolution of high performance computing and communication and digital libraries" might include systems that could support "collocation" and "collaboratories." These should be extensional, compositional, high-capability environments that seamlessly interconnect people and knowledge. Digital libraries should, for example, become part of the infrastructure supporting scientific analysis of collections of data. The ability to analyze terabytes of data will require supercomputer performance and scientific applications in turn will need access to digital libraries as a mechanism for organizing and publishing observations and simulations.
High performance digital libraries might support a "unified field theory" of knowledge representation that could help capture and make usable everyday experience and present, summarize, and visualize data from dynamic sources. And through all this, the necessity for digital library research for facilitating collection, preservation, and long-term access (physical and intellectual access) to digital artifacts was revisited.
The goal is supporting people to find appropriate, timely, trusted information from the exploding number of digital collections offered by individuals and organizations. Furthermore we reminded ourselves often that as information becomes more plentiful, human attention is becoming scarcer. Put more simply, it was suggested that the goal was simply to use computer/communication technology which is fueling information overload to reverse it, and simultaneously, eliminate unintended information depletion. Customizable individual and organizational tools capable of integrating documents, searching archives, and filtering data from repositories anywhere in the world would need to be prepared for desktop use by ordinary users.
In the near term, the digital library initiatives are directed toward scientific research and education. Longer term, the knowledge and technology realized through the initiatives will apply to a very wide array of globallly connected research, learning, and commerce activities. Examples of the promise mentioned at the workshop are:
The purpose of a digital library initiative is to develop understanding and technology to a point where projects such as the above would be feasible.
If they differed on specifics, all agreed that the promise of digital libraries was both immense and crucially important to the national interest. They also agreed that there was a great deal of work to be done to fulfill the promise and that digital libraries research requires experimental systems with real collections and real users. Many noted that research will have to deal with the complex problem of balancing present demands and future goals.
Attempting to prompt thought on these issues, the initial plenary session included presentations by Bill Arms (CNRI) and Ron Larsen (DARPA), attempting to move beyond conventional assumptions. Christine Borgman (UCLA) raised user-centered issues while Margaret Hedstrom (University of Michigan) spoke on the importance of long-term access to collections.
At its most general, the problem faced is one of managing complexity--the complexity of systems, of resources, and of users. Digital libraries must work with a highly diverse range of collections of digital objects, assembled on different principles by numerous contributors and continuously changing as more content and value are added to them. Equally, they must work with users who will be as diverse as society itself, with ever-changing needs and expectations, while breaking down conventional distinctions around which existing collections were shaped--such as expert/novice or provider/user. They must be useful to different communities for different purposes, at different times
Some participants schematically grouped issues into three areas, each with its own particular tensions and problems. This report lays out the three separately here. It is important to note, however, first, that most saw these as interdependent, not independent, aspects of digital libraries research, and, second, that views diverged over where emphasis and concentration should best fall.
Work from the original DLI coupled with the simultaneous impact of the World Wide Web have brought into focus the system-centered issues that will need intensive study for systems, as one participant put it, "to provide the connective tissue to bring users and collections together." Research will continue to address system architectures and their functional components, addressing issues of scale, interoperability, extensibility, federation, and composability.
The information infrastructure has scaled up dramatically and, driven by Moore's law, continues to obey a power law for growth in capacity and an inverse power law for cost. Yet, as one participant put it, the real problem is that "We don't know yet how to use the Internet productivity and effectively." This presents digital libraries with the challenge of applying increasing computational capacity and bandwidth to manage terabytes of information that need to be accessible in full, yet be reducible to usable, human scale.
These will require a scalable, open architecture that is
These unlimited resources will inevitably comprise multiple data sources, heterogeneous objects, and multiple schemas federated on a global scale. Moreover, they will be built on and consulted through diverse platforms by equally diverse and distributed users. Top-level issues here include issues of cross-domain, seamless federation that allows:
It was noted in several ways that the design of a digital library should not be posed as physical vs. digital objects (atoms vs. bits) but rather as co-existence and interoperability between the two. Emerging digital repositories will co-exist with more traditional libraries for an indefinite period. In addition, users of digital repositories will be converting digital documents to paper documents (i.e., printing and faxing them), as well as converting paper documents to digital ones (i.e., performing scanning and document recognition). Facilitating the transparent interoperation of paper and digital documents poses technical and social challenges.
Few of the huge number of collections a digital library will bring together will be static. Most will grow as the platform, the collections, and the users themselves develop and grow. Such changes will need adaptable, dynamic, flexible systems able to deal with interactive use. Issues here include
Across these adaptations, digital libraries will face the challenge of preserving and presenting context as a key way to provide structure to unstructured data. Doing so will call for a better understanding and deployment of:
Furthermore, multimedia and multimodal databases will present new challenges as users look beyond simple key word and Boolean text search for means to explore not just text but images, video, or music as well. Will we, one participant asked, be able to search by singing or by sketching as easily as by typing?
Around these issues, developing research will encounter issues of
As mentioned earlier, the stretched vision of an advanced library supports the full cycle of knowledge creation and use by individuals, teams, organizations, and communities. Special attention should be given to digital libraries which support collaboration in all four variations of same and different, time and place.
A second initiative, many thought, would benefit from more symbiotic partnerships between systems developers and existing collections. This would bring to light a number of the issues faced by collection holders, including
One of the most energetically discussed questions at the conference involved issues of archiving. Preservation has, of course, been one of the major contributions of conventional libraries and will remain one for digital libraries. It was felt that more interaction between the digital library research and investigations into long-term digital preservation would be particularly fruitful. In the past, preservation has mostly been addressed in practical ways and has reflected the need to rescue digital data from imminent destruction rather than to consider its long-term viability from the start. Discussion of a more principled approach returned the workshop again to issues of standards, metadata, and interoperability.
Research in digital libraries must always be motivated by the information needs of people. On-line information is breaking down the traditional separation between author, designer, publisher, librarian, user, archivist, etc. (And one person in his time plays many parts.) The rapid growth of on-line information has created a new set of research challenges that can be described as "human centered research". Although many of the achievements of the current DLI have been user centered, a new digital library initiative, many thought, would benefit by being even more responsive to users and use.
At the conference, there was considerable discussion about the question of information selectivity. How can all the information in the world be distilled into the small quantity most relevant to a specific individual? One approach to this question is to take digital library collections as they are and devise enhanced methods for exploring, searching, filtering, and so forth. Another approach is to consider the construction and use of collections together. Within this general theme, some of the issues raised were:
Continuing digital library work, many thought, should be responsive to the overlapping but distinct needs of individuals, communities, and institutions. It would face the challenge of simultaneously augmenting privacy and trust while underwriting seamless collaboration and collaboratories.
Steve Griffin of NSF presented some desirable distinctions between the current DLI and future programs. DLI, he noted had involved
He suggested that future initiatives should, by contrast have
Some participants further stressed the need for more emphasis on the applications of digital libraries in order to build user support for digital libraries, to deliver value to teachers and scholars in different contexts, to link up with the commercial publishing world, and to focus research in the most valuable directions.
In the plenary session on the first day of the workshop, Y.T. Chien of the NSF stressed the importance of strategic partnership--private, public, and international--to give synergy to digital library research. In order to build and deploy information and knowledge work environments, it was pointed out, coinvestment is required.
Partners will help share the cost of research in a variety of ways. Some will provide funding; others may bring their own expertise as an in-kind contribution. And yet others may bring existing collections, authentic users, and a valuable understanding of real-world problems to the table. As one put it, there should be "diverse ways of participating."
All natural types of collaborations and cofunding opportunities should be encouraged to meet research objectives. Some problems, like solving multilingual and multicultural access challenges, by their nature suggest certain types of collaboration (e.g., with initiatives in Europe or Asia). Some types of collections (e.g., images, space data, geographic information) suggest involvement of NASA, European Space Agency, National Agricultural Library or others. Some private foundations may help fund and encourage subjects beyond the traditional science/engineering (e.g. arts and humanities to do digital library research). Other foundations might invest in the exploration of digital libraries in education of youth as well as life-long learning of adults. summary, cofunding to meet research objectives and increase leveraging of federal dollars is highly desirable.
At the workshop, several possible partners were represented, these included
Among those agencies potentially interested in joint-agency initiatives and sharing the costs of research with the NSF and participating institutions, people suggested DARPA and NASA (which had both been involved in DLI), and NIH. It was also hoped that private foundations might be willing to participate, especially in socially relevant application domains.
Participants also suggested agencies and institutions with collections of their own that might be interested in partnering research centers in continuing research. These included:
It was clearly recognized that research will need cooperation among many fields. These would include
Bruce Schatz of the University of Illinois introduced the discussion of structure of digital library initiatives at the plenary session. He suggested a pyramid of activities whose costs are shared among partners. A central aim should be to leverage funds and communities with a series of projects designed to draw on existing knowledge bases to produce a high return for low cost.
There was general agreement over the days of the workshop that any new initiative should be structured to encompass a "diversified portfolio" of research, embracing small and large, highly specific and highly general research trajectories. These would call for strategically structured funding initiatives involving cooperative agreements and grants with the various partners suggested in the previous section.
There was widespread agreement that to pursue the full range of research questions embraced by the digital library, small projects working in tandem with larger ones would be particularly efficacious. These could produce diversity without duplication and coordination without stifling of initiative. Both will be needed to pursue the different kinds of interoperability. Such a structure would also allow several projects to work around a particular testbed, allowing the digital collection to coordinate the research community (much as text collections have done in the past). It was generally agreed that one of the beneficial outcomes of the first initiative and related efforts is a cooperative community of digital library researchers and that a new initiative should seek to foster a similar but larger community.
Instead of the handful of projects of the first initiative, many proposed these alliances as a way to increase the number of projects by one or even two orders of magnitude without having a similar effect on cost. It was also suggested that different projects be organized with different periodicity to provide successive waves of research, building on antecedents, within the overall funding cycle.
It was suggested that such projects be selected with a mix of the following attributes
research projects--projects to engage in new research directions that will not likely be addressed by industry or routine implementation projects.
testbed projects--building environments to support, focus, and inform research work, and
clearinghouse projects--to gather and disseminate existing research, testbeds and tools.
It was suggested that projects could be funded for as little as $100,000 per year to as much as $5,000,000.
It was suggested that funding should be as much through cooperative agreements as through direct grants.
Sponsor funding of DLI has been leveraged in the ratio 1:1 to 2:1 ($2 of matching by cash or in kind per $1 of Federal support) depending on the particular project. Similar or higher levels should be pursued on future projects:
· Testbed projects should be highly leveraged because of prior work and the potential support of commercial interests.
· Clearinghouse projects can be highly leveraged if ties are made to other clearinghouse or repository projects.
· Research projects must be highly leveraged because of their cost and because their success depends on partnerships.
Projects might run for from two to five years. The former might be adequate for small projects. Larger projects would need the longer time to deliver robust results.
Workshop background and source material.
Please forward any comments to Daniel Atkins (atkins@umich.edu).