University of Michigan Digital Library Project

NSF Cooperative Agreement ITI 9411287

Quarterly Report

Daniel E. Atkins, Project Director

May, 1997

 

Introduction

Following is a brief summary of objective and progress of the University of Michigan Digital Library (NSF-UMDL) project from February 1997 through April 1997.

 

Advanced User Interface

In the first quarter of 1997, the Advanced User Interface group has emphasized bringing last year's background research to bear on design activity, in an effort to obtain a more novel, yet informed design design for an advanced user interaction paradigm for the DL.

The latter part of last year we had drawn out a set of design implications from our observational and analytic work and made a first approximation round of global design sketches. From these we identified common themes which we pulled together into general design desiderata. A principal part of this effort was to create a list of properties and behaviors that we want all information objects in the system to exhibit as seamlessly and automatically as possible. For example, as much as possible, we want all objects in the system to be represented as fractal. Blurring the distinction between collections and documents, we envision everything as nested information objects, from super collections to collections to documents to sections to paragraphs all the way down to words. We also want all objects, whether in external DL collections or the local user's workspace, to be indexed and able to be queried at any level of granularity.

Furthermore any objects should be able to serve *AS* queries ("sweep out these objects-as-queries and submit them to those objects-as-collections"). We also want all these fractal objects to be navigable, with internal and external structure that can be visually presented and browsed. Our list continues with other desiderata including objects being enriched with usage history, evaluations for social filtering, shared with other users, etc. This list we explored in some detail, refining the concepts and looking at ways they might be implemented in the interface. We then prioritized the list and made a few more global design overviews, and have started to work in earnest on larger, integrative designs and corresponding initial prototypes. These larger pieces are currently heading in two major directions. One is pushing hard on queriability, navigability and history in a multiscale zooming metaphor with some new interface widgets and interaction styles. This is being implemented in the infinite zooming workspace, PAD++, and also involves the development of our own small, vector retrieval engine for the user's local workspace. The second design prototype is pushing more explicitly on the shared, social possibilities of cyber information access for the DL. We are using a new substrate called CCR, similar to a computationally rich, distributed peer¥to¥peer MOO, being developed by David Ackley on an ARPA grant. We are using it to create a cyber library space where an information droid, hooked into the current UMDL answers questions from a group human participants, who can interact with each other and the results in various ways. Both of these efforts are currently in the intermediate design and early implementation phases.

In addition, we are continuing to interview more information seekers particularly those engaged in lightweight tasks, and to fill out and organize our extensive literature review, surveying efforts to understand seekers and users of information as well as relevant parts of the interface design literature.

 

Collection Development

¥ Content

In the past quarter, we have added reference, periodical, and web based content and laid the foundation for GIS resources. For reference, we have added access to the full text of the Encyclopedia Britannica, for use at the high schools and University. Access to periodical content has increased with the addition of Proquest Direct from UMI, which enables our digital library to access over 190 journals and magazines from many fields, suitable for a wide variety of user populations. We have also added over 150 web sites, increasing our accessible web page total to over 2000 pages of material. These sites are fully indexes using Opentext's PAT software. More are added weekly.

¥ GIS

Our GIS efforts have centered around the creation of a web site which will provide access to GIS tools and data. In particular, we have identified 15 gigabytes of census and map data from Tiger Inc, and another gigabyte of basic world map data from ArcView. The web site will allow for the downloading of PC based GIS tools, and remote access to data. Future additions to the site will include Unix based tools and direct web based access on all platforms to GIS data. The primary audience of this site will be University users.

¥ Registry Functions

To facilitate the registry of content, the Registry Group has improved the interface and added functionality to the registry interface. The interface has been improved with a more ordered arrangement of sections, and the elimination of those sections no longer needed for registration, i.e. format. Functionally, we have improved access to the registry page, added simple automatic information gathering functions, added record management functions, and automated the incorporation of records into the registry from the interface output. The registry page has been moved to a production machine, which has greatly increased access and speed of function of the interface. Information gathering function added include an automated site description gathering tool and a web document tree finder. Record management functions created include the ability to delete and overwrite old records. New records can now be automatically added to the registry without human intervention, thus greatly shortening the time it takes for registered items to be fully incorporated in the UMDL.

 

 

Collection Search and Retrieval

The Webbook Agent and collection code have been completed. The agent is undergoing testing before its inclusion as a daily used collection in the UMDL. The output interface is being fine-tuned and the collection librarians are building up Webbooks for use in the science classroom. As of April, there have been almost 300 Webbooks in the areas of science, astronomy, and geology collected, a total of 15 Megabytes of data for an average of 50 kilobytes per Webbook. We have refined the collection process to allow the librarian greater control of the pages that are collected in a Webbook.

The librarian presents the system with a beginning page of the Webbook. Our Webbook Agent collects related pages off the Web and presents the first rough collection back to the librarian to refine. The librarian prunes out the extraneous pages and fills in the collection's meta data, the title, author, descriptions, added entry points. That is then passed to the Webbook maintenance program which builds the full collection using pages off the Web.

Once a collection is assembled and indexed, the search agent allows users to search the pages and presents search results in a commonly used internet browser marked up with HTML. The user can then click on the page to follow the links to the page of the original Web page.

Now that the Webbook Agent is almost complete and documented, the Search & Retrieval Group has turned its attention to planning for the development and testing of scaffolding to determine whether it will improve recall and precision over large collections. Scaffolding is a ranking method for Boolean¥based retrieval systems that gives weights to search hits depending on the location of query matches in a structured document. Our initial discussions have focused on identification of a large structured document collection, methodology for conducting the experiment, and the actual scaffolding process applied to structured documents.

 

 

Education, Deployment and Evaluation

¥ First Beta Test of UMDL/Artemis in a School

The UMDL with the Artemis interface was used for four consecutive days in a classroom in a middle school in Ann Arbor, in April. This is the first real use of the UMDL in a classroom setting. The UMDL Conspectus contained collections that focused on geology; 3 cybrarians worked half-time for 3 months registering collections that focused on geology to prepare for this test. This is the first real test of the UMDL registry architecture and agents.

To support this classroom test, we revised curriculum materials to incorporate the changes that using the UMDL afforded. For example, we changed the introductory Scavenger Hunt and the Geology Unit.

Our intent in this exercise was to explore the viability of the performance on current generation Apple Macs of the UMDL/Artemis, since the latter is written in Java, and the former makes significant networking demands. We can report that while there are clearly issues that need to be examined, this first in situ beta resulted in a usable system: students actually searched the UMDL collections, found materials, and created reports based on those finds.

The bottom line: we are now confident that indeed we can move at full pace to install the current version (albeit, with some key changes) of the UMDL/Artemis in the two target high schools in Ann Arbor.

¥ Constructed Detailed UMDL Evaluation Plan

We expanded and reshaped the evaluation plan included in the original proposal based on what we have learned in the past 3 years and on the new ideas that have arisen as the UMDL has been under construction. We are putting the final touches on a detailed evaluation plan, then, that will guide data collection and analysis next year at the high school sites in Ann Arbor.

Questions that we will gather data on are:

¥ What is the impact of the UMDL on science learning in classrooms? For example, we expect that the UMDL will enable students to more easily find relevant materials, which in turn will enable them to spend more time on content materials and less time on searching; we expect the UMDL will support students constructing keyword searches more effectively, and thus shorten the time needed to find relevant materials; we expect that the virtual collections in the UMDL will provide students with more tailored information and thus enable them to spend more time on the content and less time searching about for materials.

¥ What are the performance characteristics of the UMDL? For example, we will gather baseline data on UMDL performance versus simply searching the Internet, e.g., relevant hits returned, time spent reformulating queries.

¥ Design of New Agents for the UMDL

In order to better support students in using the UMDL for research -- provide them with scaffolding -- we are designing a new set of agents. For example, we will be implementing the Recommender Agent that uses a form of social filtering to suggest good resources to students based on the previous use of those resources by other students. We will be implementing a Ranking Agent, that helps cybrarians evaluate collections with respect to criteria such as readability, non¥textual elements. Our intent is to both exercise the UMDL agent architecture and create in a straightforward manner agents that provide unique research support.

¥ Plan for UMDL Rollout at High Schools in Ann Arbor

This spring, we plan to introduce the UMDL/Artemis at Huron High School and at Community High School in Ann Arbor. At Huron, we will use the UMDL for a unit on astronomy; at CHS, students will use the UMDL for their final, independent project. Our intent in this activity is to better understand the key issues that are involved in putting such a sophisticated tool into place. The insights gathered in this exercise will help us reshape the system over the summer for more full¥scale deployment at these high schools in Fall, 1997.

 

Service/Market Society (SMS)

During this quarter, the architecture and economics groups joined efforts around the Service/Market Society (SMS). SMS brings together research in markets, ontologies, and strategic behavior. In SMS, markets are used to balance queries among a set of task-planning agent (TPAs); the Service-Classifier Agent (SCA) uses the evolving UMDL ontology to uniquely name classes of agents that perform the same service. The SCA allows agents to more easily find a particular type of service in a potentially large space of services.

The SMS as a group spent considerable time designing a set of experiments to demonstrate the ability of the SMS, and therefore UMDL, to scale. While experiment design is still on going, we have developed a preliminary set of experiments that we will begin to execute this coming year.

The SMS as a group also continued to develop new agents and expanding the capabilities of existing agents, as described below.

Specific Accomplishments:

¥ Web Agent

The Web Agent is a UMDL agent that takes queries from Artemis and posts them to HotBot. The Web Agent was made more robust this quarter, and is currently being used in the high schools.

¥ Using Preferences for Document Search

We have developed a few prototype search mechanism that orders documents relative to a user's preferences. These preferences, which are orthogonal to subject or topic, include such things as preferring shorter documents to longer ones, preferring local weather to national weather reports, and so forth.

We are currently working on formalizing these notions, and determining if there is a way to incorporate some of them into Artemis.

¥ Agent Interpreter

We are working on developing a new agent interpreter that can ultimately be used to help quickly design and build UMDL agents. So far, a part of a "subscription" service agent has been coded up in this interpreter. The new interpreter will not be available until it is rewritten in C++

¥ A Stochastic Agent for Multiagent Contracts

The goal of this research is to design a stochastically-intelligent contracting agent for the multiagent environment. By putting such a stochastic agent in the UMDL, we can see (1) how the stochastic agent achieves its profit (or loss), and (2) its impact on the society (and vice versa).

The stochastic agent is a decision-theoretic agent: it models the situation and chooses the best action. More specifically, it models the future contracting process using a stochastic model (called Markov Chain), and then using the model, finds the best offer that maximizes its utility.

For the last quarter, we prepared a stand-alone experiments with two buyers and three sellers. The experiments demonstrated that the stochastic agent achieves higher profit and that it is stable in terms of profit no matter who it competes with.

At present, the objective is to integrate the stochastic agent and do the experiments in the UMDL SMS. The SMS environment is different from the previous experiment settings. First, there is no more preset number of buyers and sellers--they may come and go dynamically. Second, the protocol they are using is based on auction. In comparison, the We have built a new Markov Chain model for the UMDL protocol and integrated the stochastic TPA in the UMDL system. We're currently in the process of setting up new experiments to measure the performance of the stochastic agent in the UMDL system. We have implemented 0¥-evel (i.e. reinforcement) and 1-level (i.e. modeling of agents) learning into TPA and demo UIAs. We also modified the Auctioneer agent to handle more complicated bids and broadcast results. We then ran numerous scenarios consisting of several TPAs and UIAs, along with an Auctioneer, Registry, Auction Manager, and a Service Classifier Agent. The agents were distributed over the CAEN network. The tests were meant to simulate actual UMDL usage, the demo UIAs repeatedly bought services from the TPA (via the Auction) and these services were delivered by the TPA (unlike the Stanford demo where no TPA services were ever delivered). Test runs show that even with truly "greedy" agents (unlike the Stanford demo where agents bid were determine by a simple function) the UMDL reaches a price equilibrium.

We also demonstrated how learning can be used to make the UMDL more robust. That is, as agents/services are added/retired and as agents have different opinions about quality of service, the system behaves in a reasonable way without the need for a "big brother" approach to agent certification.

¥ Markets

For certain services, which can be described in a well-structured manner, we have developed a set of transformation rules. These rules generate the possible ways that such structured services could be automatically transformed into other related services. This will allow us to automatically link related markets, thereby increasing the likelihood that an agent can both find what they need as well as be able to transact within that market.

Currently these rules have been tested outside of UMDL, we are now in the process of integrating them into the UMDL environment.

¥ SCA

The SCA is being readied for use in complementary project, the School of Information Paper Trail (a specialized digital library for Information Science). In this application, the SCA will create an ontology of keywords describing papers in Paper Trail. From this application, we will gain insight into how ontologies can support and enhance document search and retrieval tasks.

¥ Protocol Semantics

We have begun an effort to formalize the semantics of the protocols used by agents in UMDL and Paper Trail. These formal semantics will eventually make it easier for third parties to integrate into UMDL.

 

Testbed Construction

This quarter has seen a number of incremental enhancements to the UMDL testbed. Our user interface, Artemis, has continued to be improved. Based on feedback from the March geology unit at a local school (see Education and Evaluation section), we identified a number of additional necessary features. These have been implemented and will be used in an astronomy unit in May.

After successful negotiations with UMI, we now have access to a large number of their earth and space science periodicals via their Proquest Direct web interface. We have developed a prototype collection interface agent to provide access to this collection from UMDL. We are currently testing this agent and plan to use it during the May astronomy unit.

We have implemented an updated version of our web page registration tools. These tools are used by our librarians to register interesting and relevant web sites in UMDL. This new version contains a number of enhancements and new functions that were suggested by our librarians.

 

Meetings and Visitors

January 1997

1/27/97 Amy Friedlander, CNRI, NSF Digital Library Kiosk Project, met with OC members

1/29/97 John Adams, writer, NSF project "Beyond the Internet: Knowledge Networking", Dan Atkins and Elliot Soloway

February 1997

 

2/5/97 Japanese Digital Library Delegation, 17 visitors representing Hitachi, Fujitsu, NEC, IBM Japan, Toshiba, Mitsubishi Electric, Oki, Unisys Japan, and Rocoh, met with OC members

2/6/97 Kevin Kelly - Wired Magazine, presentation to School of Information faculty and students

2/6/97 William Richardson - President, Kellogg Foundation, presentation "Adding Value and Virtue: The New Challenge of Higher Education"

2/10/97 Joerge Mueller and Mike Woolridge, Mitsubishi London, met with OC members

2/12/97 ALISE Conference, Washington, DC

 

March 1997

3/9/97 Workshop in Santa Fe, New Mexico

3/25/97 CHI Meeting, Atlanta, Georgia

 

April 1997

4/11/97 Jeff Walz, Intel Corporation, met with Dan Atkins

4/21/97 Nordic Digital Library Delegation, 7 visitors representing Helsinki University Library, Norwegian University of Science and Technology, BIBSYS, Norway, met with OC members

4/22/97 Don Norman, presentation to School of Information faculty and students, "Toward the third generation of the PC: Lessons learned from Thomas Edison"