Interoperability

Report from Partial Group Meeting - Pisa, September 1997

The ERCIM part of the Interoperability Working Group met for an informal get together during the First European Digital Library Conference in Pisa, Italy, on September 3, 1997, Pisa.

The following researchers participate in the group (those in bold were present):

  • Markus Tresch / Hans-Joerg Schek (ETH Zurich, Switzerland)
  • Fausto Rabitti (CNR, Italy)
  • Sophie Cluet / Serge Abiteboul (INRIA France)
  • Vassilis Christophides / Panos Constantopoulos (FORTH Greece)
  • Olle Olson / Sverker Janson (SICS Sweden)

The goal of the workshop was to come up with a short position paper describing te group's opinion on the role of interoperability in digital libraries to be sent to our US counterparties. The individual opinions are summarized below.

Fausto Rabitti

Topics of interest for CNR (CNUCE-IEI), Pisa:
  • Interoperability between Digital Libraries
    • standards (ie. Z39.50) and its extensions
    • integration of traditional Online Library Systems (IR&DBMS technology, like ISIS, SABINI, and SQL-based systems) and Digital Library systems (like DIENST/NCSTRL)
    • Extension of interoperation standards to accomodate digital documents and queries including multimedia data types

  • Supporting multimedia information retieval in Digital Libraries
    • extensions of digital document structures as well as query languages to include multimedia data types
    • integration of indexing and retieval mechanism developed for multimedia databases (e.g. image databases)

Serge Abiteboul

From a DL perspective, database systems, and in particular, object database systems provide a nice basis for future DL systems. More generally, database research provides solutions to many DL issues even if these are partial or fragmented. From a DB perspective, digital libraries propose beautiful applications and challenges to DBMS technology. They suggest a number of improvements to DBMSs that could be beneficial beyond DL applications.

We are studying DL from a database perspective and more particularly problems posed by the storing and accessing of documents in/from a database. This is a continuation of previous works on document or hypertext databases, cartographical databases, and object repositories for software engineering or for the Web. We focus on data that is not as regular as in conventional (relational) database applications. We are interested by issues such as bulk loading, document restructuring, access to external documents, introduction of IR features in object database systems, maintenance of data replicated under multiple forms (e.g., HTML and object database).

Olle Olsson

A digital library is defined by
  • their actual content
  • the way this content is described according to the description model for the library
  • the way the library can be accessed

The content of digital libraries will be broadened to cover not only classical document objects, but also all kinds of multimedia objects, as well as actual services. This means that not only need the objects in the library be described in novel ways, but also the entire concept of object delivery needs to be revised. It cannot be assumed that the object is delivered as a block of bytes to be rendered at the client site. A library can contain a translation service, with which the user (client) interacts to get the desired service.

The user cannot assume the responsibility to handle all the variations in description models and library access. This responsibility must reside on intelligent software that mediates between the view of the user and the view as seen from the library itself. This is where the concept of agents provide leverage.

Intelligent agents interact with and supports the user in the task of clarifying what the user is interested in. The responsibility of the agent is both to map these descriptors into the models supported by the libraries, and to use these models to support the users task of specifying the requirements.

The agent then contacts the digital libraries (information suppliers) adapting to the protocols supported by that library, translates the users requirements into the structure and terminology supported by the library, and requests information about what matching objects the library possesses.

In the electronic marketplace, using the services of digital libraries may be associated with payments, which means that the user would like to have a good cost/benefit ratio. To off-load the cost/benefit evaluation from the user (imagine scanning thousands of libraries for some object), the agent assumes responsibility for finding a satisficing offer. Future libraries will be able to offer different kinds of qualities of information services each associated with a different price. To further extend the market metaphor, the agent can perform certain negotiation with the libraries, with the aim of getting a better offer.

By introducing agents in this way as mediators between users and actual libraries, we achieve a greater degree of interoperability as well as flexibility. Heterogeneous library description models can be accomodated, as well as heterogeneous protocols for interacting with the libraries. And the users do not have to adjust to new/updated standards in the basic set of library protocols.

The conclusion is that flexible interoperability is best achieved by adding a "middle-ware" layer of intelligent agents that mediate between libraries and users.

Vassilis Christophides

Providing a uniform access to multiple, distributed, heterogeneous, autonomous databases is a topic that has been studied in the database research community for well over a decade (see related literature on Multidatabase/Federated Systems). We believe that Digital Libraries and Museums applications can benefit a lot from previous experience on this topic, in order to provide retrieval functionalities that go beyond traditional keyword- or full text-based search. Some effort has to be devoted to examining in more detail which features of heterogeneous database querying and integration are also applicable in this context.

One fundamental requirement is interoperability, i.e., the ability to uniformly share, interpret and manipulate information from heterogeneous sources. There are several issues involved in the requirement of interoperability for Digital Libraries and Museums: semantics issues (e.g., interpreting and cross-relating information from various organizations, queries in multiple languages, etc.), syntactic issues (e.g., heterogeneity in data forms and structures, query processing capabilities, etc.) and systems issues (e.g., platforms, communication protocols, etc.).

During the last few years, the Information Systems & Software Technology Division of ICS-FORTH has been involved in several projects addressing the above issues: development of domain specific ontologies, federated object-oriented schemata, Z39.50 & WWW gateways, etc. Today our research interests are directed towards the exploitation - using semantic models- of shared concepts and metadata between the various information sources in order to provide advanced query mediation services.The term metadata refers to data about the meaning, content, format, organization, or purpose of the data. Metadata may be as simple as the general purpose Access Points (i.e., the various Z39.50 profiles, the Dublin Core elements, etc.) which are widely used today, or more complex ones, such as entire structures of data (i.e., relational or object-oriented database schemata, SGML DTDs, etc.) and knowledge descriptions concerning the source, derivation, units, accuracy, and history of individual data items. A major challenge for interoperability is then metadata interchange (using for instance XML-Data), whereby each data source and potential receiver of information may operate with different metadata.

This approach may be employed by a mediator service by (semi-)automatically integrating metadata of sources in order to provide sophisticated query processing: a) determine the appropriate set of information sources to answer the query, as well as generate the appropriate subqueries or commands for each information source and b) obtain results from the information sources, perform appropriate translation, filtering, and merging of the information, as well as return the final answer to the user or application. Knowledge about the metadata concerning the data sources not only allows the user to better refine his/her needs (and therefore improve precision), but also to express structure-based queries (and therefore derive information not actually stored in the data sources).

Markus Tresch

Database research groups have developed federated database technology. Specifically ETH Zurich, for example, has developed techniques for the coordination of distributed, heterogeneous, and autonomous subsystems in a CIM environment, that now must be made applicable to other applictions, like for example, digital libraries.

Atop of low-level protocols available for interoperability of digital libraries, concepts are required to coordinate dependecies that will exist between information and services that are distributed (replicated and/or partitioned) over multiple digital libraries. Hence, federated database system technology for managing distributed, autonomous and heterogeneous library repositories must be employed.

In particular, this group will consider the following high-level interoperability issues:

  • multi-database query processing for efficiently retrieving distributed information that is stored in cooperating libraries,
  • distributed transaction processing for the consistent management of inter-library dependecies, e.g., as they occur between repositories and indexing services,
  • agents for monitoring autonomous library systems in order to propagate changes to distributed information

The interoperability working group intends to maintain close connection with the metadata working group because there are common research issues. In particular the role of CORBA as an object-oriented communication infrastructure is central for both groups, Close relationships exist also to the retrieval and optimisation group.