Intellectual Property and Economic Issues for Digital Libraries: A Framework for Research
Preface: This document presents some of the significant intellectual property and economic issues bearing on digital libraries, and outlines a framework for organizing public research on these and related questions. It was drafted by Spyros Lalis, Christos Nikolaou, and Michael P. Wellman, based in large part on contributions by members of the US-European Working Group on Intellectual Property and Economic Issues in Digital Libraries (DL.IPE). The current white paper should be considered a preliminary draft, not officially endorsed by the members of the working group.
Introduction
With the rapid advance of technology, information is increasingly being captured, processed and produced in digital form. Instead of employing special purpose facilities for printing text, developing film negatives, and broadcasting audio and moving images, information can be made available to the entire world simply by putting it on a server connected to the Internet. Information may then be accessed using off-the-shelf personal computers with appropriate software, rather than being communicated on paper or interpreted via dedicated devices such as radios, hi-fi systems, television receivers, and video recorders. In other words, all kinds of information can be published through a common, global infrastructure in a straightforward way.
Digital librariesfor example, traditional library institutions making their collections electronically available, publishing houses operating their own electronic archives, and other providers of digital worksare in the epicenter of this development. Acting as an essential link between readers, authors, and other digital libraries, they maintain indexed stores of digital intellectual works, guarantee free or favorable access for their community of recognized users, perhaps even support value-added information services. Thus they are strongly interested in shaping and disseminating electronic publications in a cost-effective manner, controlling access to their collections, and achieving advanced interlibrary cooperation.
It is essential to realize that electronic publication revolutionizes the world of information, disrupting historical balances that have set the rules according to which publishers and libraries have cooperated for centuries. This change yields considerable advantages, but also gives rise to problems. On the one hand, information providers can publish information faster, cheaper, and for a wider client community than ever before, while consumers can consult a multitude of information sources located all over the world, from a single point of access. On the other hand, however, there are major intellectual property and economic issues. Since digital information can be copied at almost zero costat lightning speed, and without any loss of qualitymaking information available in digital form can be unattractive both for authors and distributors. In addition, due to the low cost of electronic publishing in combination with economic globalization, new information providers may emerge, which will compete with the existing publishing institutions in a global market for electronic information.
These are core issues of the modern information society. Thus, they should not be treated lightly, simply by acknowledging the fact that format in which information is being conveyed has changed and that some jobs have become easier. Only by addressing them in an organized, practical, and holistic way, will it be possible to upgrade structures and mechanisms of the traditional library and publishing business in order to achieve economic and social prosperity. In this paper we focus on major legal, economic and technical problems, give an overview of the controversial situations that arise due to the novel capabilities brought by technology, and point out open research issues.
Legal Issues
A key problem in electronic publishing is that current legislation does not deal with the intricacies of computer-based, networked systems, resulting in many gray areas [Dreier 98]. In some cases, strict application of law in its current form can even result in severe restrictions that eliminate advantages brought by technology. Of course, it is possible to reinterpret existing law in its application to intellectual works in a digital networked environment. But this will simply expand the area of uncertainty. It is in the interest of owners, intermediate information providers, and users of digital works to reduce the risks arising from ambiguous law or law that is costly to interpret.
Cached Copies
Generally speaking, copyright legislation forbids the copying and distribution of a work without license from its owner. Moreover, copyright protects work from being plagiarized, even if no act of copying takes place, thereby serving as a major guard of intellectual property. While both protection modes are absolutely necessary for digital works, the notion of a copy may have to be relaxed or refined for digital artifacts. The main reason is that it is generally desirable to download copies of electronic objects from servers into the local memory of computers before displaying them to the user.
Caching is essential for achieving satisfactory performance when network connections are slow and one wishes to randomly browse through the contents of large documents. This is becoming increasingly important, since state-of-the-art digital works can be highly complex multimedia compilations. In fact, digital objects can be more than passive units of information; they may contain active objects (programs) that have to be installed on the client machine in order to use the document. Hence, caching at least parts of a document into local memory cannot be avoided.
The notion of time is particularly critical in this respect. In other words: what should the lifetime of a cached copy be, or how long can a cached copy remain on a computer before becoming a "true" copy? Destroying cached data eagerly, perhaps while the user is still consulting the corresponding documents clearly defeats the purpose. Yet on the other hand it is not acceptable to keep such copies indefinitely. A solution to this problem could be to keep cached copies in the memory of computers for the duration of the connection with the server providing access to the document, or to let cached copies bear a lifetime value indicating when these are to be destroyed. Disconnected operation as a result of temporary network failure poses an additional problem, because the time period during which there is no communication with the server holding the original document can be quite long. Should then a cached copy be regarded as a true copy, despite that fact that the user has no intention of duplicating the object in question?
To address these questions, an unambiguous and practical distinction between true and cached copies of digital objects is needed. The former being subject to copyright and the latter being considered merely as temporary copies that are created by software for efficiency purposes. It could also be important to distinguish between voluntary and involuntary acts of copying. For instance, explicitly activating a "save as" command from within programs that display digital works can be considered as an act of infringement, whereas the transparent creation of temporary copies by programs or the operating system itself, could be regarded as fair use.
Access
Actually, in electronic publishing, distribution as defined in copyright law may be less important than access. Contrary to a physical object that cannot be viewed outside the premises of the institution where it is being kept unless it is copied and distributed to other locations, objects stored in on-line digital libraries can be viewed over the network. For public digital libraries connected to the World Wide Web, this has major consequences. Users are not primarily interested in copying published objects, because these can be accessed at any point in time and from any location with an Internet connection.
Straightforward access also introduces problems. By mounting a copyright protected document on an unprotected public server, one does not directly infringe copyright but does enable an act of copying and distribution by others. Thus, access turns distribution on its head, inverting the active and passive entities involved, which suggests that the notions of primary and contributory infringement be reversed. It would also be possible to handle passive and active infringement in analogy to public performance and distribution, respectively, as defined in the current copyright legislation. Recent court rulings in the US seem to adopt this rationale, having condemned intermediate information providers for exposing protected works to the open public.
Above all, this stresses the significance of access control. From a legal point of view, legislation concerning violation of access restrictions in computer systems must be developed to back-up access management software and to discourage potential intruders from breaching access and copy protection systems. Closely related to this issue, is the great controversy raised in the U.S. as bill H.R. 2281 is up for a vote in the House of Representatives. This bill introduces circumvention of copy-protection systems or manufacturing of technology that can be used for this purpose as a new kind of crime. It is claimed that this is done in a manner that could essentially nullify the doctrine of "fair use" and restrict activities that are otherwise lawful. For instance, according to this law, going around a protection system is considered offensive even if the person is legally entitled to the information being accessed.
Traversing, Indexing, Archiving, Referencing
Due to the openness and increased interoperability of modern information systems, and web-based systems in particular, it becomes straightforward to implement programs that traverse collections and extract part or all of their content automatically. One reason for doing this is to locate objects of interest. This approach can also be used to populate private digital libraries and data warehouses, or to build private metadata indices that are used simply as a front-end for more efficient searching. Notably, the latter is the method employed by most search engines in order to construct their indices. The legality of these actions is questionable though. Using for free the advanced searching facilities that have been implemented by others, perhaps at a substantial cost, is not necessarily acceptable, even if the search interface or data offered is public.
An intuitive solution would be to let owners of digital collections specify whether automatic traversals are desired. It should also be possible for owners to adopt different access policies for various classes of clients. For instance, a node administrator of a federated digital library may wish to grant programs of cooperating institutions access to its collections, yet at the same time enjoy full legal protection against intruders. The economic impact also plays an important role. Making information collection traversals purely for private purposesfor example, to maintain a personal archive that is never published, without any intent to exploit results commerciallymay be acceptable. On the contrary, extracting data out of a public web site to republish them or use them to publish value-added information could interfere with the commercial interests of the collection owner. A rudimentary legislative analysis is in place so that information brokers can rely on robust legislation, instead of gambling with the gray areas of law.
In the U.S., this issue is being addressed via bill H.R. 2652: the "Collections of Information Antipiracy Act". However, critics argue that if the bill is enacted there will be no limit to the kind of information that can receive protection once it is put into a database. For example, this may seriously limit the access to public domain data simply by dropping them into a computerized database.
With the WWW technology supporting hyperlinks, references can also become the source of subtle legal problems. Namely, one may be violating the law simply by publishing a document that has links to other digital objects stored in remote web collections. The obvious case is when the referenced objects fall under the protective regime of copyright, because by introducing online links to them one essentially makes them available to the open public.
But legal issues go far beyond this simple scenario. A somewhat different case would be for an author to include a link to an object that is initially public domain, but becomes copyrighted at a later point in time without the author taking notice. Which party is then responsible if the protected document is copied or accessed via that particular link? A similar situation arises if the contents of the linked object change unexpectedly, say revealing illegal material (e.g., pornography). Furthermore, assume that the owner of a collection receives financial support via advertisements on the collection's main web page. Then, creating links that allow readers to access documents of this collection directly rather than through the main page represents a potential financial hazard. This is because the hit count of the main page may drop and companies may no longer wish to use this site for promotion.
Coming up with appropriate legislation for regulating these issues is important. Moreover, this must be achieved in a way that does not unnecessarily restrict the huge potential of hyperlink technology.
Privacy
Libraries have historically guarded the privacy of patron records quite zealously. There are laws in most parts of the U.S. that forbid libraries from revealing records of user activities in the library and that place responsibilities for protection of user privacy. Thus, an important issue is to what extent digital library developers are required to respect user privacy and build in anonymity features to protect it.
For instance, it would be easy to compile a list of materials particular readers might be interested in and what prices they might be willing to pay for them. This is valuable information that can be used to adjust the prices of products, possibly but not necessarily in favor of the consumer, and to tune the performance parameters of the underlying system. But it is questionable whether this is fair use of user data.
The digital library developer might be tempted not just to compile such profiles itself, but also to offer user profiles for sale or exchange with other digital libraries. Suppose that digital libraries share lists of people who are not willing to pay the price requested for a piece of information. Who will be responsible for the defamation if a particular user is not really a deadbeat but has a legitimate grievance against the digital library? Is it legal to put a persons name on a list that is published without consent?
These issues are very important because computer systems can easily track user activity. Unless appropriate legislation is developed to guarantee privacy, people may be wary of using such systems.
Contracts vs. Copyright
One way to ameliorate the uncertainty about existing copyright and intellectual property laws is to rely more heavily on explicit contracts and license agreements. Using contracts to specify allowable uses can reduce the scope of situations where the background doctrines of fair use and copyright apply. This would require that librarians and other representatives of intellectual property users develop expertise in the design, negotiation, and verification of license agreements [ARL], [LIBLICENSE].
Digital libraries must also deal with legal problems that result from their international scope. Transactions can occur across many government and legal domains and it is not clear which laws control what content and actions. Digital libraries will have to conform to various jurisdictional laws and policies regarding the content provided as well as addressing differing intellectual property laws.
Virtual Space, Agents, Active Objects
The modern internetworked environment has many intricacies concerning the application of conventional law. The virtual space in which people and objects interact defy the customary understanding of public and private space. As a consequence, agreements whose legitimacy traditionally depended on whether the venue in which a transaction occurs is public or private space, become very impractical for a global communication infrastructure that allows practically anyone to access anything and from any location. A more promising direction seems to concentrate on the act being performed, instead of the location where it was initiated.
In addition, in computer-based systems, programs perform actions on behalf of users, both with and without their knowledge. Invoking software may thus result in the duplication and display of an object, or in trespassing of a protected information domain. A program can also delegate one or more tasks to other software agents, possibly residing on different machines, which in turn may activate other software components, so that a highly complex interaction may result from a single user request. Such (hidden) invocations will increase in the future, as we move from accessing static documents to interacting with computable objects. In other words, the reader of a work is transformed into a user interacting with active objects that may trigger additional actions on other objects, in ways quite different from what is possible with print.
It becomes evident that in order to establish human volition or agency it is necessary to investigate how systems may affect personal liability. The circumstances must be identified under which the party responsible for an illegal act performed by a program is the person that invoked it, the host institution that provided it and enabled its execution, or the person who developed it. Of particular significance are indirect acts of infringement or violation, where it is not always clear whether the offense can be attributed to the person who initiated the execution or to some other party that enabled activation of an intermediate program.
Economic and Policy Issues
Electronic publishing changes the way institutions produce and distribute intellectual work. Furthermore, it can boost new forms of business based on the exchange and combination of information to compile value-added products. In this newly emerging market for information, it is of major importance to determine how digital libraries can coexist and be economically viable. Hence, the essential ingredients of this market have to be captured and analyzed in order to gain insight about the rules that will lead to thriving information economies.
Operating Costs
The costs of electronic publishing and distribution are significantly less than print, but this does not mean that they have disappeared. Even though the process of editing manuscripts and distributing copies of books, journals, and magazines has changed, the process of filtering, labeling, refining, and packaging still requires considerable resources. Collecting material that comes in all possible digital formats, selecting a page layout, and choosing an appropriate way to structure documents within an edition can be time consuming tasks. Moreover, technology produces new ways and tools for processing and presenting information so that the staff must be continually trained to take full advantage of the new potential. In fact, the rate at which changes are imposed onto the publishing community is sometimes so high that there is hardly enough to time to get accustomed to the current generation of tools before the arrival of the next generation.
An additional, and probably much more important, cost factor is indexing and archiving. Dazzled by the huge potential of technology one often forgets that once a digital work has been compiled or acquired, it must be classified, catalogued, and stored in a database from where it can be retrieved on request. Modern software systems support these tasks at little incremental cost, but their maintenance is not free. This is due to the fact that media and storage technology evolves rapidly, and one is forced to upgrade and extend such systems periodically, which can be quite expensive both in terms of time and money.
From the above, it can be inferred that the overhead involved in maintaining a digital library is far from negligible. It is thus necessary to identify the cost factors of electronic publishing and archiving, instead of burying them in the overhead attributed to the general "cost of doing business". Only then will it be possible to develop new economic models that can account for these costs and promote sustainable development.
Revenue Model
Digital technology has changed the cost structure of information duplication and distribution. In traditional paper publishing, duplication was costly and there was a strong tradeoff between duplication and fidelity. For example, it is less expensive to photocopy a book than to produce an offset-printed copy, but quality suffers. With a digitally published document, duplication cost is practically $0 with fidelity of 100%. To the extent that provision of publishing and distributing services is competitive, which seems to be the case, there will be a strong pressure to sell products and services that cost $0 for a price of $0. Zero or low per-copy transaction pricing will not amortize the cost of first-copy production and management of document collections.
It is precisely this new context that dictates a paradigm shift in the accompanying pricing strategies. New economic models are needed to achieve socially desirable competition and widespread publishing of digital documents. These must deviate from the traditional per-copy or per-unit approach. Building, operating, and managing collections of digital works are the new cost factors that greatly determine the pricing policy that is needed to recover value from users.
In this respect, nonlinear pricing schemes for information goods seem to play a decisive role [Varian 1996], [MacKie-Mason and Riveros 97], [MacKie-Mason and Jankovich 97]. The basic idea is simple: when the user preferences for the number of information goods are heterogeneous, publishers may be able to recover more value by offering a variable price per unit. In other words, total revenue is not linear in the quantity purchased, which effectively serves to sort customers into different price classes.
It is also important to explore new product bundling structures in order to accommodate the transformation of publishing, including distribution, into a zero incremental cost business. We have barely begun to scratch this surface. As observed in [Bakos and Brynjolfsson 97], relatively little economic literature contemplates the possibility that the incremental cost of production and distribution for an item would be near zero. More recent work focuses on bundling and how different bundles of information goods affect the publisher's ability to extract value from consumers. This problem has been studied only for highly stylized settings, however. Further work is required to model the decision of a publisher to design multiple sub-bundles containing disjoint sets of some of the publisher's articles.
When thinking about novel pricing and product bundling structures, it becomes evident that the problem space is too big. There are a surprisingly large number of different dimensions along which one can structure bundles or nonlinear pricing schemes. For instance, a journal comes in volumes consisting of several issues, which contain many articles. In turn, articles can be viewed as compositions of finer-grained objects, such as bibliographic header, abstract, sections, figures, pictures, video, and audio. Various bundles could be constructed along any of these dimensions; in addition, pricing could be nonlinear in the quantity of all or some these items. There could also be differentiated strategies depending on the access rights being transferred.
Optimizing all of these factors is unlikely to be feasible, and consumers would surely rebel against offerings that are too complex by spending less on digital products. Publishing and distribution institutions and market forms will evolve with negotiation flexible only on a small subset of these many dimensions, just as in traditional publishing. These dimensions will have to be determined by the interaction between the heterogeneity of consumer preferences across different dimensions, and the costs of bundled production and distribution. Hence, it will be important for institutions to remain flexible. They will have to constantly monitor consumer behavior and dynamically adjust their policy to meet the demands of the market.
Market Positioning
With the advancement of technology, publishers, library institutions, and other information providers, suddenly become players in the global market for digital information. Thus, seizing a firm position in this market becomes a major goal, since it will determine the viability of the digital libraries maintained by these institutions.
The Internet and the WWW is a tough place to compete. Users have an astonishingly accurate knowledge of what is being offered; they pinpoint qualitative work and abandon second-class sites without hesitation and at considerable speed. Then, with e-mail and thousands of interest groups around, the word spreads quickly, so that various tips and recommendations are forwarded within a few hours even to the ones that did not have time to investigate a particular domain.
Making digital work available for a low price is no long-term solution unless the product is of high fidelity [Getz 97]. In this case the provider might well receive a reasonable price for what is being offered. On the contrary, inferior productseven if offered for freewill drive users away from a digital library, so that it will get increasingly hard to recover operating costs no matter how low. Hence, quality of service and focus on specific client communities seem to be the rules of survival. This has to be achieved via rigorous market analysis, satisfactory performance, and efficient user interfaces. In turn, this requires further work in the corresponding research areas, targeted to the specific characteristics of the new, global communication infrastructure and the new media paradigms being used.
The position of traditional public and academic libraries in the information market is particularly difficult. With authors and publishers of articles, books, journals, and magazines being able to provide direct access to their digital works, library institutions face an identity crisis. Libraries have no option but to enter the market of free trade aggressively [Reenen 98], by engaging in publishing activities of their own, designed for their well-defined client communities. For instance, cultural libraries and museums can exploit the material that is already available in their premises, by digitizing it and making it electronically available to the wide public. Scientific and academic libraries can start to publish issues of conference proceedings, scientific journals and reports, and even course material suitable for distance-learning purposes. Public and scholarly libraries could also strive for advanced federative schemes that will enhance cooperation between participating institutions and enable the implementation of value-added information services. However, experience with large federated systems is limited, and further research is necessary to achieve truly satisfactory results [Lalis and Nikolaou 1998].
At the same time, it is important to work towards new formulas that will make it attractive for publishers to allow libraries to act as intermediate providers of digital works. In a sense, this is contradictory to the open market approach where readers purchase electronic documents as individuals directly from the source that produces them. However, a purely commodity-oriented model could erode the rights of peer groups, most notably researchers and academics, that have greatly contributed to the advancement of knowledge. This issue unavoidably boils down to the following simple questions: How significant is the primary function of a library, namely to provide certain communities with cheap (or entirely free) access to intellectual works? And assuming this is indeed acknowledged, who shall provide the financial support to leverage the acquisition cost of external works? These are decisions that must be addressed in depth, carefully weighting their social and economic impact.
Market vs. Legal Incentives
One of the hottest issues in electronic publishing policy and research debate is the provision of incentives to avoid unlawful copying of digital work, to compensate authors for their effort, and to create new intellectual property. Legal mechanisms (e.g., copyright) certainly help, by discouraging users from using a digital object without providing compensation to its creator. In this case, a potential violator of a property right considers the balance of benefits from use without compensation and the expected costs of legal penalties in order to choose whether to respect the property rights or not. But since digital works can be accessed and copied easily and possibly without a trace, such legislation may not bring the desired results.
A different, and perhaps more effective, way to achieve these goals is to use market incentives. A market-based approach creates costs of use without compensation, which must be compared to the benefits of such use. Copy-protection schemes are a good example. Suppose that a company distributes digital works (e.g. documents, programs, etc) in form of "protected" packages that cannot be copied in a straightforward way. Although this mechanism may not be 100% bullet-proof, if the process of bypassing it is costly compared to the acquisition price, then people will most probably buy, rather than try to crack, digital products. There has been concern though, that technical protection systems would be used to undermine access to public domain works, or to interfere with fair use or other public purposes of copyright law. Digital library developers need to understand the public policy implications of these protection systems.
In practice, a combination of legal and market methods is used to achieve the desirable behavior: violating copyright by photocopying a book is discouraged both by the expected legal penalties and by the cost of photocopying. The balance between these two methods of providing compensation to creators depends in large part, on the relative costs and efficiencies of the methods. The radical decrease in the cost of reproduction technologies shifts the cost balance. Sound policy needs to be informed by a consideration of what is now and will soon be possible given the changes in relative costs. For example, it may be that creators will on average be able to obtain higher returns than before simply from market-based methods of recovery; thus a costly and possibly clumsy legal apparatus to accomplish the policy purpose may not be needed [Schlachter 97]. Conversely, the relative cost shift may be more helpful to those who would appropriate intellectual property without providing compensation, in which case a more restrictive legal method is needed to maintain the same level of compensation to creators.
Technical Issues
The aforementioned legal and economic issues have to be addressed by taking into account several technical issues. This is because technology partly created these problems in the first place, and it is through technology that new visions and paradigms can be developed. The main research issues that will considerably influence the evolution of electronic publishing are described below.
Open Architectures for Electronic Publishing and Commerce
In order for economies based on the use and exchange of electronic information to function effectively, infrastructures are needed that provide middleware services. A comprehensive commerce infrastructure should support the entire commerce cycle, including finding goods and services (and associated agents) of interest, negotiating terms, as well as the actual exchange.
This requires a classification of the active components of such a system. Customers, merchants, institutions providing intermediary services, and intelligent agents acting on behalf of these players, must all be modelled in a sufficiently rigorous yet flexible way. In addition, it is necessary to describe the interactions between these parties and implement components that will support these functions efficiently. Thus, directory and repository services for locating the various components of the system, market services for matching requests of producers and consumers of goods, control mechanisms for authorization and protection, and languages for expressing the complex interactions of commercial transactions must be developed. These commonly used functions have to be factored out into system objects that can be shared across diverse hardware and software platforms. They should also be implemented in the form of generic services that provide support for basic requirements in electronic commerce applications, and which can be used to develop more advanced, value-added services.
Work in this area is underway in the form general-purpose architectures and market-based resource allocation models [Ferguson et. al 95], [Lalis et al. 98], [Marazakis et al. 97], [MARX], [Mullen and Wellman 96], [Sairamesh et al. 96a]. [Tsvetovaty at al. 97], which can support a variety of electronic commerce applications. There is also a number of projects that focus on mechanisms for the establishment of pricing structures and effective mass-manipulating information on the Internet, especially for networked digital libraries [Atkins et al. 96], [Paepcke at al. 96], [Sairamesh et al. 96b], [Schatz et al. 96], [Wilensky 96].
Cryptography and Watermarking
The Internet raises security issues that were unknown in closed, private systems. Information must be exchanged over untrustworthy network connections without allowing eavesdroppers to interpret it, or change and retransmit it without being noticed. It is important to guarantee that persons will not be able to masquerade, either to avoid the legal consequences of their actions, or to seize access rights they are not entitled to. Last but not least, in many cases it is desirable to protect digital products not only during their transfer between different computer systems, but also for their entire lifetime.
In this area, research has already produced many results. Cryptographic methods, such as the Digital Encryption Standard and Public Key Cryptography, allow data to be encrypted in such a way that it is extremely time consuming to decode. These methods have been widely exploited to implement a wide range of protection and authentication tools that can be bought off the shelf.
However, watermarking continues to be an open research issue. While it is possible to place digital signatures within objects, robustness must be enhanced to deal with various copying and modification schemes. The authors signature has to be preserved despite potential changes that may be inflicted on the original object in order to be able to track by-products. Also, due to the particular processing intricacies of the various media, different watermarking techniques may have to be developed for text, pictures, moving image and audio.
Contracting and Control Services
As already mentioned, transactions are getting increasingly complex involving mediating software components that act on behalf of persons and institutions. Moreover, these components can reside on different machines, located in different institutional domains. It thus necessary to develop control services for open, dynamic systems.
One challenge is to come up with scalable contracting, authorization, and access control schemes. The conventional Kerberos architecture, where tokens containing authorization and access information in a way that can be exchanged between different systems without being manipulated, largely addresses secure authorization over insecure channels. But once clients receive data from the server, they have unlimited control over it.
With digital work growing beyond the notion of the traditional passive document, essentially becoming an active object, advanced protection mechanisms and new forms of compensation recovery are becoming feasible. It is namely possible to build systems where objects can be freely copied and distributed without losing control over who can effectively use them. These systems employ access control mechanisms that dynamically link to electronic payment or authorization systems to control usage at the point of consumption and allow users to dynamically acquire access rights by paying the corresponding amount of money. Examples include the work on trusted systems [Stefik 97a],[Stefik 97b], "cryptolope" technology [Lotspiech 97], consumption-oriented copyright enforcement mechanisms [Prevelakis at al. 97], the digital object identifier [Rosenblatt 97], and the ideas of superdistribution [Cox 97].
Notably, these technological developments may considerably reduce the importance of copyright law; code is an efficient means of regulation, simply because the user of a program has no choice. While it is still possible to "hack" around a technical protection system, this type of action is likely to the discouraged via appropriate legislation.
Another problem is that institutions may employ different paradigms for arranging their products and implementing access control internally. Which means that appropriate standards and conversion mechanisms must be designed not only to pass control information from one system to another, but also to achieve semantic interoperability between the various subdomains of control. Thus, there is a strong need for establishing common terms and conditions that can accommodate the vast potential complexity of the product space, as well as languages and tools for understanding, negotiating, and contracting.
Standards for exchanging descriptions of products between computer systems, such as the Resource Discovery Framework [RDF] and the Standard for the Exchange of Product Data [STEP] are already available. Current work focuses on rights management mechanisms that allow users to inspect and edit protection rules for recording the usage terms and conditions for each property, support the storage and preservation of such rules, and allow their transmission over the network. An overview and a suggested combination of existing tools in an integrated infrastructure are given in [Gladney and Lotspiech 98]. This research area is very important to develop scalable and flexible contract technology that will enable dynamic negotiation at a minimum of trustworthiness between the various components in an open distributed system, such as the Internet.
Monitoring mechanisms are also needed to record the various actions that take place in the systems. This data can be further processed to produce value added traffic statistics according to which institutions may design their pricing, admission, and contracting policy. The Application Response Measurement standard [ARM], proposing how such information can be provided by the various underlying components, is definitely a step in the right direction. Moreover, with appropriate tools it should be possible to actively monitor or reconstruct complex transactions that spread across several domains. This can be extremely valuable in cases of dispute, where it is essential to identify the intermediate components that were activated as well as trace back the party that initiated the invocation.
The Underlying Communication Infrastructure
The networking infrastructure enabled the vision of a global information system. It will also play a decisive role in the formation of information economies. Independently of the products offered to the public, digital libraries and information providers will largely depend on the network to efficiently serve their clients. Hence, while the importance of geography diminishes, it is the network topology that begins to influence the balances of electronic trade. Practically the same limitations that were imposed by geographic distances a few decades ago, will now be caused by the lack of bandwidth.
For digital libraries, information providers, and any business connected to the Internet, the implication is clear. Consumers will often choose a site of a particular company or institution simply because it responds fast to their requests. Even if they are aware of a "better" site, they will not invoke its services if these prove to be considerably slower. Hence, fairness in electronic trade can be achieved only if there is enough connectivity to guarantee that clients will access competing sites via connections with sufficient capacity in order to evaluate their true capabilities without experiencing network delays. Connectivity also influences the willingness of users to respect legislation, such as copyright. For instance, a user who can promptly connect to a server and access information of interest will not feel the need to copy it. These considerations must be taken into account when designing the next-generation backbones that will support the communications of the information society.
Research Agenda
It is clear from the discussion above that the future of digital libraries is marked by considerable uncertainty, much of which can be categorized as issues of intellectual property and economics. Many open questions exist, both about what is technologically possible, and what will actually happen. Various types of organizationsincluding libraries, schools and universities, publishers, learned societies, technology providers, as well as individuals (e.g., researchers, authors)have a stake in the outcomes, and would benefit generally from a reduction in uncertainty. Research may shed light on these questions, by exploring possibilities, and by collecting and disseminating data about the current situation to inform choices made by the various entities that will shape this future.
Our proposed framework for DL.IPE is premised on a vision of the evolving information environment as an adaptive, open, exchange system for library services. Research can be organized at three levels.
Any research agenda for DL.IPE must take into account the diversity of types of questions and stakeholders, and employ a corresponding variety of research modes and research disciplines.
Finally, we note that this research takes place within an extraordinarily dynamic environment, where new developmentstechnological, commercial, and institutionalregularly introduce dramatic structural changes to the operating environment. For this reason, it is doubtful that an agenda comprising specific questions to be addressed will be very stable. Nevertheless, examples of current questions are perhaps suggestive of possible near-term research achievements, and may provide a useful guide to those setting public research priorities.
Public and International Research
Before presenting the specific DL.IPE framework, some remarks are in order regarding the need for research that is publicly supported and internationally coordinated.
Although much useful research is being conducted and will continue by private entities, there exists a distinct public interest in DL.IPE research. Various ways that technology could develop will have different impacts on the respective stakeholders, and therefore there may be an important public sector role in facilitating those developments deemed most beneficial. For example, alternate configurations of electronic commerce infrastructure may make certain types of information distribution policies more or less feasible. Understanding exactly what these effects are will itself be an important product of public-supported research.
Information about the economic environment (e.g., the demand for various forms of information goods and services) can be a significant strategic asset to participants in the information economy. For this reason, it is natural to expect that "market research" and other data collected by private entities may not be made available for public uses: decision making by policy makers and the broader class of participants. Therefore, it will be largely up to the public sector to provide such information, resulting in more principled policy decisions, and more effective deployment of resources by individuals, libraries, etc.
It is equally clear that the interest in DL.IPE research results spans national boundaries. Although many particular questions are specific to particular legal systems, existing academic institutions, or regional or cultural conventions, the information systems themselves will generally serve international constituencies. International coordination of research can serve both to make more effective use of research resources generally (organizing and disseminating results over a broader scope of participants), and to deal specifically with issues that arise at the interface of national systems (e.g., law, trade).
A Framework for DL.IPE Research
In framing DL.IPE research, we make several basic assumptions:
Attempting to understandmuch more so to design or influencesuch a complex and dynamic system requires a broad array of tools from a variety of academic disciplines. The most directly relevant disciplines include social sciences (economics, psychology, sociology), information sciences (computer science, library science), and law. Although it is not possible to partition the research question by disciplineas argued below, many questions require multiple perspectivesour framework layers the concerns into realms loosely corresponding to these areas of study. At the broadest level, we consider the institutional and social policy context within which all digital library mechanisms must operate. The specific architectures and mechanisms availablethe computational infrastructureconstitutes the next level. The most detailed level concerns the particular content and services to be provided. We consider these layers briefly in turn.
Institutional and Social Policy Context
Any set of mechanisms that mediates exchanges among individuals and organizations is subject to some underlying institutional framework, including legal principles and statutes, mediating institutions, and conventions. For systems that deal with information goods and services, the most obviously relevant element of this underlying framework is intellectual property (IP) law. Applicable laws, and prevailing interpretations of those laws, define the status of existing intellectual property, and entitlements (e.g., fair use) of those "possessing" information. Moreover, IP lawin conjunction with broader commercial lawgoverns the forms of exchange of information services that are possible and enforceable within various legal jurisdictions.
It is commonplace to observe the incongruity of legal regimes crafted for old print media applied to new electronic realms. But rapidly changing technology leaves no alternatives; continual evolution of law and persistent uncertainty is to be expected as long as the target keeps moving. Interest groups exert political forces based on their best predictions of the effects of alternate regimes. One role of IP research is to inform the ongoing debates, so that all interested parties (including those who may not be actively participating in the process) may understand better the implications of current and proposed laws. For example, Samuelsons series of "Legally Speaking" articles in CACM (e.g., [Samuelson 97]) presents to computer specialists a legal scholars perspective on issues affecting the distribution and use of their artifacts.
The policy context goes beyond IP law as well. Privacy policies dictate conventions or rules regarding dissemination of information about individuals. Certification authorities provide means to authenticate identity or other factsfor example providing pedigrees for digital documents. The broader legal regime and enforcement authority determines what rules must be respected, and what remedies are available when they are not.
Architecture and Mechanism
Given an underlying policy context, there will be a large space of possible system architectures, and component mechanisms, for constructing digital libraries. As noted above, the dynamic open nature of the library environment argues against a rigid architectural design. That is, any design must allow for new entrants, and accommodate new types of information goods and services.
Perhaps the most basic need is for information infrastructure specifically to support institutions mediating arrangements among entities participating in digital libraries. Taking the predominant type of arrangement to be economic, that is, based on exchange of goods and services, the major type of infrastructure required is that supporting commerce. This includes much more than payment mechanisms, indeed, payment is only a small part of the final stage in commerce: executing the exchange. A comprehensive commerce infrastructure would support the entire commerce cycle, including finding goods and services (and associated agents) of interest, negotiating terms, as well as the actual exchange. Components of such an infrastructure (i.e., the "middleware") might include:
This list is a rough cut, and there is no real need at this point to decide exactly what should or should not be included. Much of it is necessarily demand-driven, based on what research and experience at the service level suggests. That is, from the IPE perspective, the aim is to design architectures and mechanisms that support effective bottom-up organization of the overall digital library.
Content and Services
Computational infrastructure is but an empty shell for creation of digital libraries. The substance of what the library actually provides is in the available content and services. Indeed the content and service providers themselves, driven by demands of information consumers, represent perhaps the major source of innovation for digital libraries, present and future.
Those providing library services face difficult decisions regarding what services they should develop, how they should be delivered, and at what terms they should be offered. Those requiring services (perhaps including some of the same entities providing other services) face equally hard choices about which services to use, and what terms to offer. These are not necessarily different fundamentally from analogous choices in the non-digital or semi-digital realm. However, as the digital environment is newer and thus less familiarand the multiplicity of choices perhaps greaterthere exists a great need for guidance in how to approach these decisions.
For instance, many academic electronic journals are relatively new enterprises, run by research communities without special expertise in publishing (e.g., JAIR, the Journal of Artificial Intelligence Research [Wellman and Minton 98]). As far as we know, there exists no systematic compilation of guidance for those setting up new journals, advising them on the many critical strategic decisions they face. This is probably due to the fact that there is no accepted wisdom on such matters. Similarly, society publishers taking their collections online (e.g., ACMs digital library) must blaze their own trails, as little collected experience yet exists on the successes or failures of such efforts.
Of course, there exists much work in the social sciences bearing on the behavior of researchers, learners, and other users of information. Similarly, economic models of consumers and producers may be applicable to these environments (though information goods have several special properties that can increase the complexity of analysis). However, these models require empirical calibration, often unsupportable by existing public data. For example, exactly what are the expected "first copy" costs of preparing content for various modes of distribution? The benefit sidethat is, the value of information servicescan be much harder to measure than the costs. Further research will be required to understand how to translate usage information in to more generally applicable measures of value.
Research Modes
The preceding comments entail an extensive multidisciplinary program of research across a broad front of DL.IPE issues. In addition to the combination of disciplines, a successful body of effort must also include a combination of several research modes, including the following:
Conclusion
Technology has radically changed the world of publishing, thereby rewriting the rules according to which authors, publishers, libraries, and users interact with each other. Digital libraries are intricately entangled in this development. In transitioning from a world of print to electronic publications, there is an opportunity to analyze shifting costs, to design new economic models, to promote open infrastructures, and to adjust legislation. However, it is important that this is done in a balanced way that will protect both economic and social interests, thereby guaranteeing the welfare of the information society.
References
[Alrashid et al. 98] T.M. Alrashid, J.A. Barker, B.S. Christian, S.C. Cox, M.W. Rabne, E.A. Slotta, and L.R. Upthegrove, Safeguarding Copyrighted Contents: Digital Libraries and Intellectual Property Management, D-Lib Magazine, April 1998.
http://www.dlib.org/dlib/april98/04barker.html[ARM] Application Response Measurement 2.0 API Guide.
http://www.cmg.org/regions/cmgarmw[ARL] Association of Research Libraries, Principles for Licensing Electronic Resources Final Draft, 1997.
http://arl.cni.org/scomm/licensing/principles.html[Atkins et al. 96] D.E. Atkins, W.P. Birmingham, E.H. Durfee, E.J. Glover, T. Mullen, E.A. Rundensteiner, E. Soloway, J.M. Vidal, R. Wallace, and M. Wellman, Toward Inquiry-Based Education Through Interacting Software Agents, IEEE Computer, Special Issue on the US DLI, 1996.
http://computer.org/computer/dli/r50069/r50069.htm[Bakos and Brynjolfsson 97] Y. Bakos and E. Brynjolfsson, Bundling Information Goods: Pricing, Profits and Efficiency, The Economics of Digital Information (tentative), Cambridge MA, MIT Press, 1998.
[Cox 97] B. Cox, Object as Property, IEEE Software magazine, January 1997.
http://www.virtualschool.edu/cox/IEEE97.html[Dreier 98] T. Dreier, Copyright Principles in a Digital Scientific World, I. Butterworth (ed.) The Impact of Electronic Publishing on the Academic Community, Portland Press, April 1997.
http://www.portlandpress.co.uk/books/online/tiepac/session2/ch2.htm[Ferguson et. al 95] D.F. Ferguson, C. Nikolau, J. Sairamesh, and Y. Yemini, Economic Models for Allocating Resources in Computer Systems, Scott Clearwater (ed.), Market Based Control: A Paradigm for Distributed Resource Allocation, World Scientific Publishing Co., 1995.
[Getz 97] M. Getz, An Economic Perspective on E-Publishing in Academia, Journal of Electronic Publishing, Archive.
http://www.press.umich.edu/jep/archive/getz.html[Gladney and Lotspiech 98] H.M. Gladney and J.B. Lotspiech, Safeguarding Digital Library Contents and Users, D-Lib Magazine, May 1998.
[Lalis and Nikolaou 98] S. Lalis and C. Nikolaou, Federated Digital Libraries in Europe - A Forecast, Proceedings of the #### Conference, Academia Europea, Darmstadt, Germany, April 1998.
[Lalis et al. 98] S. Lalis, C. Nikolaou, D. Papadakis, and M. Marazakis, Market-Driven Service Allocation in a QoS-capable Environment, Technical Report TR 217, Institute of Computer Science FORTH, 1998.
[LIBLICENSE] Licensing digital information: A resource for librarians.
http://www.library.yale.edu/~llicense/index.shtml[Lotspiech 97] J.B. Lotspiech, U. Kohl, and M.A. Kaplan, Safeguarding Digital Library Contents and Users: Protecting Documents Rather than Channels, D-Lib Magazine, September 1997.
http://www.dlib.org/dlib/september97/ibm/09lotspiech.html[MacKie-Mason and Jankovich 97] J.K. MacKie-Mason and A. Jankovich, Pricing Electronic Access to Knowledge: A Field Experiment, Library Acquisitions: Theory and Practice, 1997.
[MacKie-Mason and Riveros 97] J.K. MacKie-Mason and J. Riveros, Economics and Electronic Access to Scholarly Information, The Economics of Digital Information (tentative), Cambridge MA, MIT Press, 1997.
[Marazakis et al. 97] M. Marazakis, D. Papadakis, and C. Nikolau, The Aurora Architecture for Developing Network-Centric Applications by Dynamic Composition of Services, Technical Report TR 213, Institute of Computer Science FORTH, 1997.
[MARX] The MARX Project
http://ai.eecs.umich.edu/MARX/[Mullen and Wellman 96] T. Mullen and M.P. Wellman, Market-based Negotiation for Digital Library Services, Proceedings of the 2nd USENIX Workshop on Electronic Commerce, November 1996.
http://ai.eecs.umich.edu/people/mullen/papers/usenix/[Paepcke at al. 96] A. Paepcke, S.B. Cousins, H. Garcia-Molina, S.W. Hassan, S.P. Ketchpel, M. Roscheisen, and T. Winograd, Using Distributed Objects for Digital Library Interoperability, IEEE Computer, Vol. 29 No. 5, May 1996.
http://computer.org/computer/dli/r50061/r50061.htm[Powell and Schatz] K. Powell and B. Schatz, The Interspace Kernel Architecture
http://www.canis.uiuc.edu/interspace/technical/canis-report-0004.html[Prevelakis et al. 1997] V. Prevelakis, D. Konstantas, and J.H. Morin, Issues for the Commercial Distribution of Electronic Documents, Proceedings of the 2nd European Research Seminar on Advances in Distributed Systems, Zinal, Switzerland, March 1997.
[RDF] Resource Description Framework
http://www.w3.org/RDF/[Reenen 98] J. van Reenen, Library Consumerism in the Digital Age, The Journal of Electronic Publishing, March 1998.
http://www.press.umich.edu/jep/03-03/vanreenen.html[Rosenblatt 97] B. Roseblatt, The Digital Object Identifier - Solving the Dilemma of Copyright Protection Online, The Journal of Electronic Publishing, Vol. 3 No. 2, December 1997.
http://www.press.umich.edu/jep/03-02/doi.html[Sairamesh et al. 96a] J. Sairamesh, D. Ferguson, and Y. Yemini, Pricing Paradigms in Information Systems and Networks, presented in the Workshop on Networking Games and Resource Allocation, New York, March 1996.
[Sairamesh et al 96b] J. Sairamesh, C. Nikolaou, D.F. Ferguson, and Y. Yemini, Pricing Services in Digital Libraries, Proceedings of the 1st DELOS Workshop on Digital Libraries, Sophia Antipolis, March 1996.
[Samuelson 97] P. Samuelson, The never-ending struggle for balance. Communications of the ACM 40(5):1721, 1997.
[Schatz et al. 96] B. Schatz, W.H. Mischo, T.W. Cole, J.B. Hardin, A.P. Bishop, and H. Chen, Federating Diverse Collections of Scientific Literature, IEEE Computer, Special Issue on the US DLI, 1996.
http://computer.org/computer/dli/r50028/r50028.htm[Schlachter 97] E. Schlachter, The Intellectual Property Renaissance in Cyberspace: Why Copyright Law could be Unimportant on the Internet, Berkeley Technology Journal, Vol. 12 No 1, 1997.
[Stefik 97a] M. Stefik, Shifting the Possible: How digital property rights challenge us to rethink digital publishing, Berkeley Technology Law Journal, Vol. 12 No. 1, 137-159, 1997.
[Stefik 97b] M. Stefik, Trusted Systems, Scientific American, Vol. 276 No. 3, pp. 78-81, 1997.
[STEP] Standard for the Exchange of Product Data
http://www.nist.gov/sc4[Tsvetovaty at al. 97] M. Tsvetovaty, M. Gini, B. Mobasher, and Z. Wieckowski, MAGMA: An Agent-based Virtual Market for Electronic Commerce, Applied Artificial Intelligence, 1997.
[Varian 96] H. Varian, Differential Pricing and Efficiency, School of Information Management and Systems, University of California, Berkeley CA, 1996.
[Wellman and Minton 98] M.P. Wellman and S. Minton, JAIR: An electronic journal by and for the AI research community. IEEE Expert 13(1), 1998.
[Wilensky 96] R. Wilensky, Toward Work-Centered Digital Information Services, Computer, Special Issue on the US DLI, 1996.
http://computer.org/computer/dli/r50037/r50037.htm[WFTC 96] Workshop on Formalisms for Terms and Conditions, Report from the DLI workshop at Columbia University, September 24-26, 1996.