We are now three years through the UMDL project. At the start of the project, one of our objectives was explore various ways of defining, building, evaluating, and deploying digital libraries. Towards this objective, we created an agent architecture that served as the basis for a deployable testbed. This architecture was populated with a variety of agents that provide services, content, and user interface.
In the course of building and researching digital libraries over the past three years, we have gained both experience and insight into what digital libraries are all about. We have spent some time over the past several months evaluating what we have accomplished, where we have fallen short, and most importantly, where we need to go in the remaining year. The plans for the coming year reflect some of this evaluation.
Over the past 6 months, we have continued our development and integration of commerce mechanisms within the UMDL architecture. We designed protocols for automatic generation of markets for information goods and services, which were demonstrated at the December DLI meeting at Stanford. The auction manager determines which markets are applicable for which agents, and creates new service descriptions and markets when appropriate. Several additional agent types have been enhanced to participate in market-based negotiation.
The Service Classifier Agent (SCA), developed in 1996, is the first UMDL agent based on the ontology. To advertise their services, agents define their capabilities using terminology from the UMDL ontology. In other words, this is the language that is used to describe goods and services in UMDL.
In addition defining capabilities, the SCA is able to inference over the language. This allows the SCA to perform sophisticated "directory" services, such as naming and look up.
The SCA works as follows: agents send their definitions to the SCA. The SCA uses Loom, a description logic system, to place the definitions into their proper locations within a highly organized taxonomy of available services. This type of computational reasoning, called automatic classification, is made possible by the precision of ontological definition. We call the taxonomy of available services a "dynamic ontology", since concepts are defined at runtime. Agents can also define services that build on definitions created by other agents. Using the dynamic ontology, the SCA helps agents requesting services find the best agent available to meet their needs. Because ontologies are very expressive, descriptions of desired services can be highly complex: multi-dimensional, at any level of granularity, and from any perspective.
In 1996, the SCA was utilized in the Service Markets Society (SMS), a simulation of a digital library implemented as a computational economy. The SMS is the most thorough demonstration of the UMDL architecture thus far in the project. Agents use auctions to buy services at prices that reflect current supply and demand. The SMS includes six types of agents, with potentially dozens of active instances. The SCA permits agents providing services in the SMS to dynamically define their capabilities, and agents requesting services to find the best service currently available. When new agents are launched that provide services that meet existing requests better than existing services, the agents requesting those services automatically switch to the new agent. The agents requesting services periodically repeat their search for the best available service, so they make the switch to the new service without requiring any modification or even notification. Thus service classification encourages third party development of new agents, by eliminating barriers to profitable participation in the society.
Substantial progress was made in 1996 towards developing, and using, the UMDL ontology. The content area of the ontology is now developed to the point where are ready to use it to describe a small collection. We have also built an agent that uses the ontology to classify agent services, in a dynamic way that greatly enhances the ability of societies of agents to change and grow.
We decided to base the structure of the content area of the ontology on a hierarchy proposed by the International Federation of Library Associations (IFLA). This hierarchy describes work as a creative process, starting with an abstract idea, and ending with concrete (digital) information in the library. We extend the IFLA hierarchy to be appropriate for digital libraries by adding a final stage in the realization of work, and we greatly sharpen the original definitions by relating the stages to other ontology concepts. Thus, we have WORK (an abstract idea), EXPRESSION (content specified in some GENRE), MANIFESTATION (produced in a PUBLISHING-FORMAT), ITEM (encoded in a DIGITAL-FORMAT), and INSTANCE (a particular digital copy). Our current concept definitions are represented in stylized English. To test, refine, and demonstrate the ontology's content area, we have planned a project to convert 1000 records related to Beethoven from US MARC to a knowledge base in Loom. The ontology concept definitions will be translated manually from the stylized English to Loom's representation language.
Metadata creation in the UMDL proceeded at an accelerated pace in 1996. Rapid and substantial changes in the ontology demanded better and more comprehensive external representation. In support, members of the group examined numerous current metadata efforts, including USMARC, GILS, the Dublin Core, and MCF from Apple. This effort and the ontology development resulted in two substantial iterations or our metadata attributes and domains. We were able to represent all stages of realization of particular works in our current metadata set, which will enable anchors to particular licenses and services associated with different stages of a work, and provide means of associating related works. We also began representing relationships,not only single concepts. Our first effort at this is seen in our audience/concept level attribute, which seeks to create a meaningful descriptive matrix of the two concepts. This enables a far more comprehensive and meaningful representation than having the two concepts in separate attributes. More efforts to represent this kinds of relationships in metadata are continuing.
Overall, our efforts in this area so far have created a flexible, conceptually extensive metadata set for Intellectual Work, and a foundation for the next stage of representation of services and licenses.
The AUI component of the UMDL project has as its charter to push the state of the art in human computer interaction in a Digital Library context. In midyear 1996 we modified our group goals because we had become convinced that a substantial benefit in innovation could result from a much deeper understanding of information gathering and sensemaking activities, and from a more comprehensive resulting design process. We undertook a conscious effort of "user centered invention" for the support the extended information gathering, organizing and using tasks. User centered invention is known to be difficult, so we spent considerable effort during the second half of the year developing a new general methodology in close conjunction with pursuing basic user understanding and system design. The methodology is based on what we are calling Technologically Informed Task
Generalization Analysis (TI/TGA). It mixes a particularly broad and thorough look at a variety of related tasks with simulaneous technology explorations (building "micro-prototypes"). This yields a general understanding of what people need to do in diverse information gathering circumstances (in our case) that is more likely to allow novel technology solutions. In the second half of 1996 we devloped this methodology, applied it the information gathering tasks, have drawn design implications and inspirations from this analysis, built various "microprototypes" and have begun the general system design.
For the coming year, our plan is to extend the Auction Manager: the mediator agent responsible for generating and tracking auction agents. Given a description of goods or services of interest, the Auction Manager will use the Service Classifier agent to identify existing auctions serving those interests, and generate new auctions when necessary. Since the number of potential service offerings and negotiation options is unbounded, we require some mechanisms to manage the scope of markets actually available to agents. Our approach will be to balance the economic efficiency benefits of additional markets with their additional transaction costs. The Auction Manager will provide a vehicle for experimenting with alternate market configurations, auction-generation policies, and agent strategies for market search and bidding.
In 1996, we first applied the ontology for service classification, a technique that helps agents select services (i.e., goods and markets) in societies of agents that are dynamic, and evolving. In 1997, we plan to move towards extending the capabilities of service classification to handle societies that are large and terminologically heterogenous. In large societies, it will be necessary to have multiple Service Classification Agents (SCAs) that communicate with each other. As the number and variety of agents grow, we expect that different terminologies will emerge; we term this a terminologically heterogenous society. In these societies, SCAs will maintain ontologies that do not share all of their vocabulary, and it will be necessary to translate between these overlapping, but different ontologies.
In 1996, the SCA was utilized in the Service Markets Society (SMS), a simulation of a digital library implemented as a computational economy. One of the goals of SMS is integrate various agents of UMDL (e.g., auctions, task planners, and SCAs), and to demonstrate how they can be used to both demonstrate the UMDL architecture principles and provide useful services.
The SMS currently includes six types of agents, with potentially dozens of active instances. In the second half of 1997, we plan to scale up the SMS to include thousands of agents. This simulation will provide a testbed for demonstrating and testing many facets of our research. These may include:
In 1997, we plan to continue the development of concept definitions in the ontology, and also to begin to test our work by using the ontology to describe a collection. This first application of the content area of the ontology will demonstrate the feasibility of developing ontology-based catalogs various intellectual works, and the significant benefits for powerful queries that result. We also expect this work to help us advance the precision of our definitions in several tricky areas of the ontology.
The UMDL agent-building toolkit will be a set of tools for simplifying the task of building agents that are compliant with the UMDL Agent Architecture. This toolkit allows users to design agents based on the protocols that they engage in to achieve goals, while abstrcting the complex code needed for communication. These protocols can be created specifically for new types of agents, or can be combined with previously designed protocols supplied in a protocol library supplied with the toolkit. The toolkit will contain protocols already used by UMDL, as well as related agent systems, where possible.
These protocols define the type, sequence and content of messages exchanged among a set of agents. Once the agent is specified in terms of the protocols is uses, skeleton code will be generated for the agent that implements the communication necessary for the agent to interact with other agents. Users will then fill in the necessary decision-making capabilities to make a fully functioning agent.
In 1996, we developed two iterations of metadata drawn from the ontology and registry interfaces utilizing the metadata to catalogue collections. In 1997 we will further elaborate and define metadata elements drawn from the ontology, with at least one more iteration of metadata describing work and creation of metadata attributes and domains describing services.
This year, we will be focusing on taking our various design implications and design fragments and prototyping one or more designs based on novel PAD++ (infinite zoom) and CCR (social/spatial computing) platforms. We will also be pushing our understanding and modeling activities, and methodology advancements.
Since the UMDL became operational, search and retrieval of web pages has been limited to conspectus queries. UMDL did not search the text of web pages in collections queries. Search and retrieval staff developed the web book agent to make it possible for UMDL users to search web pages.
A web book is a collection of pages from a single web site. Computer programs create web books by following hyperlinks from a head page to other pages related to and at the same server as the head page. Web books are identified and gathered into a collections that is made available to the UMDL. Staff created computer-based tools that enable collections builders to find content and perform collection development activities on the world wide web. This is done through a web browser and standard forms which the collection builder uses to identify a web book's head page. The builder can actually see automatically-collected pages. When satisfied with the web book's automatically-collected pages, the builder creates a conspectus entry for the web book.
A collection maintenance program indexes both web book pages and conspectus entries using OpenText's PAT indexing algorithms. The collection interface agent (CIA) is linked to the newly indexed web book so that UMDL users can search and retrieve the web book through conspectus queries and collection queries. The collection query language of the UMDL allows users to search on many different parts of each page in the web book collection, e.g., title, author, subjects from the Broad Subject Ordering (BSO), language, and other criteria. When users' queries are searched in the web book collection, the CIA handles the UMDL query language and other protocols. It passes the query to a PAT-specific agent that handles OpenText's query language and search engine. Query results are enhanced with HTML tags so that UMDL's User Interface Agent can display web book retrievals to users on their web browser.
The past year has seen an evolution of the testbed into various communities of agents. One community makes up the production system, which is the collection of agents which implement UMDL as it will be deployed into the local public schools this year. Other communities implement research subsystems which are protoypes for services and protocols which should eventually find their way into the production system.
This year has seen only a slight increase in the number of agents in the production system. However, every agent in the production system has been partially or completely redesigned and reimplemented to increase its performance, functionality, and dependability. Our goal is to evolve these agents from research prototypes to production-quality agents capable of providing the level of service necessary for our scale of deployment.
The most significant effort has gone into the redesign of our user interface agent (UIA) and the development of a completely new user interface, Artemis, implemented in Java. Artemis provides a graphical interface to a personal user workspace. Functions include vocabulary assistance for search, result browsing, search history maintenance, and bins for storing selected search results. The UIA was redesigned and reimplemented to provide all these functions.
Work is underway to develop a full-fledged Collection Interface Agent (CIA) for the encyclopedias in our testbed. The CIA will be implemented in Q1-2 "97 in a staged effort, gradually taking more advantage of the SGML capabilities. For example, it will begin with a simple implementation of title subject, and full text access, with simple hit lists; later work will take advantage of "scaffolding" mechanisms available through searching the SGML to improve feedback through the CIA.
New research on the intersection between SGML and user interface issues, particularly in the area of navigation has been initiated. The Meta Content Format (used, for example, with Pad++ and Apple's HotSauce) was automatically generated from the SGML subject information in order to develop graphical browsing of subject information.
In the campus's Digital Library Production Service, related work continues in developing and delivering significant content to the University community. This includes a collaborative undertaking with Cornell University to convert and provide access to approximately 5,000 volumes of American history resources, a visual resource project including some 10,000 museum images, and delivery of some 20,000 encoded texts in the humanities. Michigan is also in the process of loading all 1100 journals from Elsevier and embarking on pricing research with test institutional sites. Our SGML host program for humanities texts now has 12 customer sites for content and middleware.
Based on our experiences this past year in the high schools in Ann Arbor, we are in the process of revising are online curriculum materials (e.g., students tend to not read the materials, and thus we are moving to trying convey information in a more visual, direct manner). Moreover, we are working closely with the participating high school teachers to develop effective strategies for dealing with the wide range of issues involved in having students carry out science inquiry using online resources (e.g., how to help students stay engage over a number of days on a multi-day investigation, how to help students reformulate their driving questions and online searches as a function of they find over the multi-day investigation). Thus, we have developed a set of driving questions (e.g., brainstorm 3 questions before going to the computer lab; go to the computer lab and visit some "hot sites" first to gain a background, and then develop 3 driving questions; develop 3 questions since some questions will not be researchable using online resources). We are in the process of producing a "how to" guide for teachers, where we codify instructional strategies that they might use to better support their students in the various inquiry activities.
Work will begin on building search scaffolding agents to increase the precision of results from Boolean-based retrieval engines. We plan to implement scaffolding for search engines using pre- and post-processing.
In spring 1997, Artemis will be deployed as the front end to the UMDL. There will be a rich set of collections in the areas of geology and astronomy represented in the UMDL Conspectus. Students, therefore, should be able to rely solely on the UMDL for their online resources.
Inasmuch as the UMDL/Artemis explicitly scaffolds the inquiry process and provides a rich set of collections from which to draw, we expect that the high school students should be able to be more effective in their inquiry activities. We should see this impact in both process measures (e.g., amount of time spent online searching vs. time spent constructing an argument) and product measures (e.g., quality of final report). We have some baselines from this year's evaluation effort that we may be able to use for comparative purposes.
Work will continue on workshops to teach teachers and students about Internet and UMDL searching and integration of results into classroom assignments. Work will begin on develping data-collection instruments to enable us to conduct a pilot stude of high schoolers' success using UMDL. Basically we are asking the question "Do UMDL users retrieve useful material?" The pilot study will help us develop longitudinal studies at high schools in Ann Arbor and Detroit in the 1997-98 academic year.
Populate the UMDL with Supportive Agents. By late summer, we will be using the UMDL agents to develop a range of new services to better support inquiry, e.g., a Pointcast-like information "push" service that provides students with sources, and a registry for driving questions and relevant resources that uses collaborative filtering techniques, and a range of data/multimedia viewers to bring more than text and static images to students.