|
Interoperability
Report from Partial Group Meeting - Pisa, September 1997
The ERCIM part of the Interoperability Working Group met for
an informal get together during the First European Digital Library Conference
in Pisa, Italy, on September 3, 1997, Pisa.
The following researchers participate in the group (those in bold were present):
- Markus Tresch / Hans-Joerg Schek (ETH Zurich, Switzerland)
- Fausto Rabitti (CNR, Italy)
- Sophie Cluet / Serge Abiteboul (INRIA France)
- Vassilis Christophides / Panos Constantopoulos (FORTH Greece)
- Olle Olson / Sverker Janson (SICS Sweden)
The goal of the workshop was to come up with a short position paper
describing te group's opinion on the role of interoperability in
digital libraries to be sent to our US counterparties. The individual opinions
are summarized below.
Fausto Rabitti
Topics of interest for CNR (CNUCE-IEI), Pisa:
- Interoperability between Digital Libraries
- standards (ie. Z39.50) and its extensions
- integration of traditional Online Library Systems (IR&DBMS technology, like ISIS, SABINI, and SQL-based systems) and Digital Library systems (like DIENST/NCSTRL)
- Extension of interoperation standards to accomodate digital documents and queries including multimedia data types
- Supporting multimedia information retieval in Digital Libraries
- extensions of digital document structures as well as query languages to include multimedia data types
- integration of indexing and retieval mechanism developed for multimedia databases (e.g. image databases)
Serge Abiteboul
From a DL perspective, database systems, and in particular, object
database systems provide a nice basis for future DL systems. More
generally, database research provides solutions to many DL issues even
if these are partial or fragmented. From a DB perspective, digital
libraries propose beautiful applications and challenges to DBMS
technology. They suggest a number of improvements to DBMSs that could
be beneficial beyond DL applications.
We are studying DL from a database perspective and more particularly
problems posed by the storing and accessing of documents in/from a
database. This is a continuation of previous works on document or
hypertext databases, cartographical databases, and object repositories
for software engineering or for the Web. We focus on data that is not
as regular as in conventional (relational) database applications. We
are interested by issues such as bulk loading, document restructuring,
access to external documents, introduction of IR features in object
database systems, maintenance of data replicated under multiple forms
(e.g., HTML and object database).
Olle Olsson
A digital library is defined by
- their actual content
- the way this content is described according to the description model for the library
- the way the library can be accessed
The content of digital libraries will be broadened to cover not only
classical document objects, but also all kinds of multimedia objects,
as well as actual services. This means that not only need the objects
in the library be described in novel ways, but also the entire concept
of object delivery needs to be revised. It cannot be assumed that the
object is delivered as a block of bytes to be rendered at the client
site. A library can contain a translation service, with which the user
(client) interacts to get the desired service.
The user cannot assume the responsibility to handle all the variations
in description models and library access. This responsibility must
reside on intelligent software that mediates between the view of the
user and the view as seen from the library itself. This is where the
concept of agents provide leverage.
Intelligent agents interact with and supports the user in the task of
clarifying what the user is interested in. The responsibility of the
agent is both to map these descriptors into the models supported by
the libraries, and to use these models to support the users task of
specifying the requirements.
The agent then contacts the digital libraries (information suppliers)
adapting to the protocols supported by that library, translates the
users requirements into the structure and terminology supported by the
library, and requests information about what matching objects the
library possesses.
In the electronic marketplace, using the services of digital libraries
may be associated with payments, which means that the user would like
to have a good cost/benefit ratio. To off-load the cost/benefit
evaluation from the user (imagine scanning thousands of libraries for
some object), the agent assumes responsibility for finding a
satisficing offer. Future libraries will be able to offer different
kinds of qualities of information services each associated with a
different price. To further extend the market metaphor, the agent can
perform certain negotiation with the libraries, with the aim of
getting a better offer.
By introducing agents in this way as mediators between users and
actual libraries, we achieve a greater degree of interoperability as
well as flexibility. Heterogeneous library description models can be
accomodated, as well as heterogeneous protocols for interacting with
the libraries. And the users do not have to adjust to new/updated
standards in the basic set of library protocols.
The conclusion is that flexible interoperability is best achieved by
adding a "middle-ware" layer of intelligent agents that mediate
between libraries and users.
Vassilis Christophides
Providing a uniform access to multiple, distributed, heterogeneous,
autonomous databases is a topic that has been studied in the database
research community for well over a decade (see related literature on
Multidatabase/Federated Systems). We believe that Digital Libraries
and Museums applications can benefit a lot from previous experience on
this topic, in order to provide retrieval functionalities that go
beyond traditional keyword- or full text-based search. Some effort has
to be devoted to examining in more detail which features of
heterogeneous database querying and integration are also applicable in
this context.
One fundamental requirement is interoperability, i.e., the ability to
uniformly share, interpret and manipulate information from
heterogeneous sources. There are several issues involved in the
requirement of interoperability for Digital Libraries and Museums:
semantics issues (e.g., interpreting and cross-relating information
from various organizations, queries in multiple languages, etc.),
syntactic issues (e.g., heterogeneity in data forms and structures,
query processing capabilities, etc.) and systems issues (e.g.,
platforms, communication protocols, etc.).
During the last few years, the Information Systems & Software
Technology Division of ICS-FORTH has been involved in several projects
addressing the above issues: development of domain specific
ontologies, federated object-oriented schemata, Z39.50 & WWW gateways,
etc. Today our research interests are directed towards the
exploitation - using semantic models- of shared concepts and metadata
between the various information sources in order to provide advanced
query mediation services.The term metadata refers to data about the
meaning, content, format, organization, or purpose of the
data. Metadata may be as simple as the general purpose Access Points
(i.e., the various Z39.50 profiles, the Dublin Core elements, etc.)
which are widely used today, or more complex ones, such as entire
structures of data (i.e., relational or object-oriented database
schemata, SGML DTDs, etc.) and knowledge descriptions concerning the
source, derivation, units, accuracy, and history of individual data
items. A major challenge for interoperability is then metadata
interchange (using for instance XML-Data), whereby each data source
and potential receiver of information may operate with different
metadata.
This approach may be employed by a mediator service by
(semi-)automatically integrating metadata of sources in order to
provide sophisticated query processing: a) determine the appropriate
set of information sources to answer the query, as well as generate
the appropriate subqueries or commands for each information source and
b) obtain results from the information sources, perform appropriate
translation, filtering, and merging of the information, as well as
return the final answer to the user or application. Knowledge about
the metadata concerning the data sources not only allows the user to
better refine his/her needs (and therefore improve precision), but
also to express structure-based queries (and therefore derive
information not actually stored in the data sources).
Markus Tresch
Database research groups have developed federated database technology.
Specifically ETH Zurich, for example, has developed techniques for
the coordination of distributed, heterogeneous, and autonomous
subsystems in a CIM environment, that now must be made applicable to
other applictions, like for example, digital libraries.
Atop of low-level protocols available for interoperability of digital
libraries, concepts are required to coordinate dependecies that will
exist between information and services that are distributed
(replicated and/or partitioned) over multiple digital libraries. Hence,
federated database system technology for managing distributed,
autonomous and heterogeneous library repositories must be employed.
In particular, this group will consider the following high-level
interoperability issues:
- multi-database query processing for efficiently retrieving distributed information that is stored in cooperating libraries,
- distributed transaction processing for the consistent management of inter-library dependecies, e.g., as they occur between repositories and indexing services,
- agents for monitoring autonomous library systems in order to propagate changes to distributed information
The interoperability working group intends to maintain close connection
with the metadata working group because there are common research
issues. In particular the role of CORBA as an object-oriented communication
infrastructure is central for both groups, Close relationships exist
also to the retrieval and optimisation group.
|