This essay has three main sections. The first section will explore the concept
of the digital library, examine how it differs from traditional libraries, and
in the process develop a definition of the digital library. The second section
is devoted to developing a set of criteria for evaluating digital library
models and proposing a conceptual model to help clarify what a digital library
should be. The final section concerns the aftermath of having created a
digital library. Once one exists, how should it be stocked with resources?
I. What is a Digital Library:
Before I begin talking about models for the digital library, or criteria for
judging one, I will first develop what I think a digital library is. The term
itself is widely used in a variety of contexts and has already been endowed
with so many connotations that to discuss a "digital library" without
explaining what is meant is to risk misunderstanding. At the same time, a
precise description is difficult since the concept of the digital library is
evolving rapidly. I shall endeavor to at least narrow the range of concepts
embodied in the rubric to a manageable and coherent range.
Toward a Working Definition
Thus, a digital library should have whatever set of resources are of most use to its clientele. And, since it is irrelevant to the search engine how the materials are displayed to the user, the organizational scheme that most benefits a set of users should be employed for that set. The flexibility of the digital library is one of its main selling points. The digital library can be many things because it can be customized. Thus, it should be able to attract any user who might currently not use the traditional library because of the arcane or misunderstood organizational scheme employed there. If a user wants to browse a collection in Library of Congress cataloging order, so be it. If she prefers a straight alphabetical order, then she should be accommodated.
As far as content, this is an even broader question. Just as the digital
library is not bound by a physical organizations, it will not be (by its very
nature) bound by any "spatial" constraint--since "space" is no longer an apt
term. As both Bill Birmingham and Mike Wellman described [Lectures to ILS 605,
October 5, 1994 and November 30, 1994, respectively], and as is logical
given the capabilities of decentralized computing, there will not be a digital
library, but rather digital libraries, interconnected but independent. Many
heralds of the digital future speak with hushed tones about the wonders of
decentralization and how it will change the face of information retrieval and
access. These heralds have, perhaps, become too caught up in the future to
remember that, at present, information is already distributed. As outlined
above, there already exists a variety of paper libraries with specializations
ranging from the very broad to the very narrow. Libraries are already
decentralized. They are even interconnected, but use Interlibrary Loan (ILL)
in place of the NSF Backbone. The true advantage of the digital library will
not be decentralization or interconnectedness, but the speed at which a user
can access any particular digital library within the overall environment.
Existing models
A digital library can be defined broadly or narrowly. It seems to be that the
digital library is, in fact, not a new creation--we already have and use them,
although their current form is not as extensive or elaborate as what is
envisioned for the future. As a starting definition, a digital library is a
collection of electronic resources which can be searched from a common (often,
but not necessarily, central) location. Dialog, the well-known collection of
on-line databases, is a digital library, as is Lexis-Nexis. User interfaces
for these databases are not as developed as they might be, and require a great
deal of training and experience for successful use. Nonetheless, they are
digital libraries. A large number of disparate information resources are
available "under one roof," as it were, and although the vocabulary and syntax
needed to search one file may be different from that needed for another, the
process is essentially the same.
Although this definition is a good starting point, it does not quite capture
the scope (I might even say "grandeur") of what will soon be the digital
library. The future digital library should be much larger, should allow
searches without much, if any, specialized training or experience, should adapt
to the needs and expectations of the user, and should be.
II. Archives as a Model
This section to be added
Any digital library design must, therefore, be able to expand to be able to handle many more users and provide access to many more resources than even seems likely to be extant. One lesson (and possibly even a rule) of the information revolution of the last few years is that use grows far faster than expectations. For example:
These are but two examples of the phenomenal growth experienced by Internet resources and service providers. While such rapid rates can not continue indefinitely, it is reasonable to expect the number of service providers and users to increase significantly over the foreseeable future. Any system must be expandable beyond what seems reasonable at the time it is created.
When the digital library is perceived as better to use than the traditional library for a given purpose, the usefulness criteria will have been met. This is not to say that a digital library can not be somewhat useful; it might work sufficiently well in a given area to be useful for one purpose, or for one group of people, but not for another.
Can users find information on a topic? If the system allows users to find information on a topic of interest to them, then the system has passed the most important test of usability. Obviously, a digital library will not contain information on every topic immediately. There will be a prolonged development stage before even an approximation of "all knowledge" is available. In the short run, though, if a digital library can provide the bulk of its users with information resources they need, it will be useable.
Is the information provided by the digital library at an appropriate level? A useable digital library will have information in a variety of formats and for a variety of purposes--from cursory and introductory to thorough and intensive. Not all users will want highly detailed, footnoted, and researched information; for many purposes, an overview will do. The age and education level of the individual user must also be taken into account. A grade school student will not be able to use a graduate-level explanation of how and aircraft flies, just as a Ph.D. candidate will have little use for a high-school text on political party theory.
Is the access interface sufficiently easy to use that it can be employed by people at different educational levels? No matter how good the program that matches users with resources, no good results will be achieved if the user cannot effectively instruct the computer what he wants. The interface must be intuitive, and must be flexible enough that more advanced users, who better know how to use it, can access more advanced functions. A WWW-browsing tool like Netscape or Lynx might be an appropriate model for the interface used at a lower level of library savvy--much as inexperienced library users go right for the card catalog or on-line public-access catalog without first looking through a thesaurus of subject terms (LCSH, for example). For more advanced users, a less scripted interface would be appropriate.
A wonderful metaphor for this, coined by Yuri Rubinsky, is that, much like Disneyland, a digital library must keep the technology (the "magic", if you will) hidden. In Disneyland, the magic is tunnels beneath the entire park--the same tunnels are beneath Space Mountain as Mr. Toad's Wild Ride. In the digital library, the tunnels become the programming--completely transparent to the user. There is, in Mr. Rubinsky's phrase, no "difference between 'asking a question' and 'doing research'." [Yuri Rubinsky, Electronic Texts the Day after Tomorrow, p. 12.]
Usability is not an absolute; what works in one environment, with one group of users, will not work so well (or at all) in a different context. The system must therefore be able to communicate at a variety of levels.
There are certainly other measures by which a digital library can be evaluated. I am not considering economic measures because I do not think that they should be the guiding force by which a digital library should be judged--not to say that basic economic factors will not or should not be considered, but that while the costs of a digital library can be added quite easily, the benefits to society of better information are invaluable but unfortunately inestimable. In an article which advocates the quickest possible development and implementation of the digital library, Brian Hawkins of Brown University writes that "the electronic library is specifically both a solution to the economic problems facing libraries and a vehicle for a new functionality that promises to transform scholarship and bring the cultural, social, and economic benefits of information to many." [Brian Hawkins, "Creating the Library of the Future: Incrementalism Won't Get Us There", New Scholarship: New Serials, 1994.]
I think that this dream is what underpins much of the enthusiasm for the digital library. The digital library is neither the first technological/philosophical creation to be hailed as mankind's panacea, nor will it be the last. There is a tendency in the world of Internet and computer experts to do something just because it can be done. I think those people interested in the digital library should think carefully about what a digital library should be, and be careful to create a system that meets the criteria outlined above, and others, and does not provide features that are wonders of programming but do not serve a particularly useful function.
Collection development in the digital library will not be concerned with finding resources (as it is today), but with separating the wheat from the chaff. If previous experience with the Internet is any guide at all, the future of the digital library will be an information feast, not a famine. The proliferation of information on the Internet, and particularly on the World Wide Web (WWW), leads me to believe that the digital library's shelves will be filled. Again, based on the experience of the WWW, quantity is not likely to be the main issue, but quality without doubt is.
It might seem that a collection development policy is not important for a digital library because of the interconnected nature of the network. It is true that any resource will likely be made available on any terminal. But I think it also likely that the "Ann Arbor Public Virtual Library" will have some resources loaded and ready to use, while others will have to be located remotely (much like the distinction between the reference room and the stacks in most public and academic libraries today).
The digital library actually has two separate foci. A particular resource might be created with a very narrow audience in mind. The Human Genome Project, for example, may be vastly important to a certain class of scientists and scholars, but beyond that relatively small group is not comprehensible. Or a resource might be of very broad interest--electronic texts of out-of-copyright literature, for example, are in this category. A collection development policy does not really come into play at this level. However, at the broader, system, level, it does. In this case, it could well be embedded in the computer code that matches a user with a resource by whatever mechanism.
In the archival environment, it is sometimes possible to find a single collection that answers a specific question--especially if the subject of research is an individual. When the topic being covered is an institution, it is often necessary to examine several different collections to discover the entire story. Much will be true in the digital library environment. The relationships among and between collections are sometimes explicit--noted on the catalog card or computer record, as is the case in libraries as well--but often not. The contents of one collection will lead the researcher to a second, and so on. Digital collections must develop themselves so that the researcher who finds one collection can move to another transparently.
The digital library must make the same decisions, but as above must do so at a different level. The creator of a resource must determine its level of use and interest and inform the digital library of that so appropriate users can be directed to it. While this process might be hidden from the user, it must be done carefully and accurately. High-school aged users will find the digital library useless without materials accessible to that educational level, while the layperson will find abstruse technical descriptions of chemical reactions unintelligible. It goes without saying, though, that the system must not prevent users from finding information that is not at their presumed level.
The digital environment is less sure, though; resources can come and go, and be changed at whim. A good digital library must act like an archives and seize on a good resource when it appears. The choice is less irrevocable for a digital library than an archives; the usual alternative disposition of documents not taken by an archives is the local landfill, which will not happen with electronic resources. Nonetheless, the digital library must be sure that is policy neither excludes chance arrivals nor accepts them all.
In neither digital libraries nor traditional libraries does a collection development policy state explicitly what resources should be purchased (except to the extent that periodically revised reference sources might be mentioned). In the digital library environment, however, such prescriptions are, in the short run, not recommended because sources will come and go. At present, WWW and other servers come on and off line with abandon, and often refuse to connect to users if too many are already accessing them. Furthermore, a truly useful resource could be entered into a digital library from one server, only to disappear when the creator moves on. Since the network will be decentralized--resources will be not be stored on one machine--there is very little to prevent the only copy of a resource from vanishing into the void--the equivalent of "out of print" except that, in the digital environment, there would not necessarily be a library which possessed a copy from which it could be borrowed.
The need for a collection development policy in the digital library environment is probably even greater than it is for the traditional library. The traditional library is just that--traditional--so there are conventions and expectations about what should be found on the shelves. Since the mass-use digital library is a new and rapidly evolving concept, a collection development policy for it is all the more urgent. A digital library which includes resources simply because they are available might be acceptable in the earlier, testbed, phases of development, but a "working" digital library so created will be a very poor one indeed. Unfortunately, the foregoing discussion presumes the existence of sufficient resources to select among them. At the current stage of digital library development, implementing the above outline is not likely to be very effective. It is presented not as a description of what should be done today, but as a way of thinking about the problem in the future.
As a last thought, I would like to mention briefly an interesting model for developing WWW resources. A class titled Internet Resource Discovery and Organization has been offered at the University of Michigan's School of Information and Library Studies (SILS) for the past two years. It mixes the traditional collection development issues with the Internet. The course focuses on locating, evaluating and describing Internet resources on a specific topic. The evaluation and description take the form of a guide to that subject area which is made available through various Internet tools to the world at large. While only a beginning, the combined efforts of two years of students, and many other, non-SILS, people has resulted in about 150 subject guides to Internet resources--a first step toward cataloging the Internet. While that is not the avowed purpose of these guides, and they by no means cover all, or even most, of the information available, it is a start. And a start using a mix of traditional library tools and concepts with the new organizational tools of the Internet.
As strong as the inclination to start fresh may be, I think that would be a mistake. The current world of library and archival science has a great deal to offer the digital library environment. Our expertise is not in programming, but in helping the programmers create interfaces between the database and the end user (who, it must never be forgotten, is not a computer expert, is not a librarian or archivists, and, for that matter, is not even an expert in navigating the public library of today). It is likely that many ILS types and many computer programmers will not see the importance of working with one another. While unfortunate, it is to be expected. Whatever systems are created and thrown into the marketplace for public use, the ones that are easiest to use and most successful at locating appropriate information resources will be the ones in highest demand (especially if there is money involved). The successful ones will take advantage of what both the traditional and digital libraries have to offer.