This is the quarterly progress report for the period August through October 1997 for the NSF/DARPA/NASA DLI project at the University of Michigan. The focus of the research is creation and understanding of architectures for information brokering across heterogeneous, distributed collections and services. The architecture is based upon dynamic teams of intelligent agents and instances of the architecture have been prototyped and test in the context of support new types of more authentic science education in middle and high school.
As we move into the last year of funding of the current project, we are seeking a balance between extension of the project in new directions and the capture and coalescence of results and technology created to date.
The third quarter of 1997 saw deepening of new prototypes that we began developing earlier in the year, and administratively, some personnel changes.
One prototype explored queriability, navigability and history in a multiscale zooming metaphor with some new interface widgets and interaction styles. It has resulted in a tighter integration of querying with both the context from which the query sprang and the context in which the results are situated, yielding several interesting synergies of query and navigation. One of its new technical aspects was a new interface technique for working with ad hoc sets of items in large information worlds. These so-called "e-sets" allow the user to gather, visualize and manipulate subsets of information objects that arise in searching, exploring, and organizing information. This quarter they were the subject of a special Directed Study effort for one of the graduate students, who did an expanded literature review, formalized the concepts, wrote a long paper, and passed an oral qualifying exam on the topic. One result is a clearer sense of what features should be developed next (e.g., nested e-sets, better e-set history organization), and where else they might be useful (e.g., in other information environments like those from Xerox PARC). We will be pushing on some of these directions in the coming months. We are also exploring ways to speed the system up and make it more extensible by at least partial reimplementation with C++ instead of Tcl/Tk scripts.
The second prototype pushed more explicitly on the shared, social possibilities of cyber information access for the DL. Playing off the inspiration that there is "No need for silence in the digital library!", we had previously created an early prototype cyber library space where information droids (e.g., talking to UMDL agents) and groups of people could interact in a MOO-like environment. This has raised issues about understanding multi-agent planning and communication, particularly for information gathering, and we have spent time this last quarter looking into the existing literature (e.g., in distributed AI, CSCW) on the topic. We have also extended the code for our image-bots to support new features of the underlying distributed computation platform (CCR) including new image data types.
Other activities include reimplementation of our early File Fider micro-prototype, which gave a multiscale (infinite pan/zoom) hybrid query and navigation interface to the UNIX directory hierarchy. The code is cleaner, includes the Glimpse search engine for allowing content search of the file system, and is now in use by some members of the research team for interaction with their filespaces. The intention is to give us more first hand experience with actually living and working in a multiscale information environment.
Finally, on an administrative note, two talented masters students working on the project (totaling 1.5 GSRA-equiv) graduated this summer, and have only partially (.5 GSRA) been replaced. We are getting another CS graduate student up to speed this semester and he will be working full time starting in January.
The number of registered sites to support the science education deployment experiments has nearly doubled to 649 sites, for an approximate page count of approximately 5200 pages of text and images. A number of these sites are "front ends" to databases, and allow access to much more information than the site's page number reflects. Most of these sites are targeted toward high and middle-school use, with an emphasis on astronomy, weather, and the environment.
The registration interface side has seen little visible change, but it is much more robust. Dan Kiskis has worked with Yuhua Liu's code to see that the interface and the current functionality are stable, relatively fast, and accessible across campus. We currently have three individuals adding content through the interface, and are able to respond quickly to needs for sites from the deployment group.
The negotiations are near final with Simon Schuster to add significant newspaper contact to UMDL.
Search and Retrieval committee is experimenting with scaffolding, a ranking method that uses the structure of a document when ordering a retrieval set. The scaffolding algorithm emphasizes the value of particular parts of the document of another. Information in the title has more value than information in the body of the document. Consequently, documents with hits in the title are ranked higher than hits in the body. Similar ranking occurs for phrases verses individual words in the phrase.
The scaffolding algorithm should be tested with a large set of documents. The committee has selected a large corpus of full text documents used by the TREC conference. It is called the TIPSTER data collection from LDC. The collection contains 200,000 articles and 150 queries with results. An important part of the collection is the list of results of the relevant documents. The documents consist of four types of text from Associated Press stories, Federal Registry, Wall Street Journal articles and computer articles from select Ziff-Davis magazines.
We are working on the design of the computer program to be used in the experiment. The computer program design will use OpenText's PAT search engine as the basic search program. The data is in SGML compliant form and PAT can parse the individual components of each document type so we can distinguish from the title of the article and its body. The program will use multiple PAT search commands to process and rank the results of searches. For experimental purpose, we will obtain the list of all matches after each part of the scaffolding process.
The Search and Retrieval committee will need to complete the computer program design and then the experimental design before completing the research. Then we will need to analyze the results and report on our conclusions to the other committees of the UMDL.
This quarter was spent preparing for and deploying UMDL in a number of Ann Arbor Public Schools. In preparation for a September 24th deployment, we identified a number of essential features. These were implemented during the course of the Summer. We also made a number of dependability and performance improvements in the system. We virtually eliminated all situations where software faults or other system faults could lead to users' workspaces being corrupted and rendered unusable for subsequent sessions. This was a problem that had plagued us in our beta tests. We also totally redesigned and reimplemented our task planning agent. This is the agent which performs the collection-level metadata search to identify candidate collections. The new version of the task planner exhibits an order of magnitude improvement in performance when compared to the previous version. It also has greatly improved scaling properties.
August and the first half of September were spent testing UMDL and the Artemis interface. The system was modified as necessary based on the results of the tests. Unit and integration tests with simulated workloads were used. We also performed a number of load tests with users exercising the system from the same clients that the students would be using in the schools.
The deployment has been quite successful. We have over 300 users on the system each day. There have been some problems reported, especially as we start bring new schools on-line. However, almost all the problems reported were a consequence of the particular client machine configurations being used and not bugs in our software.
Evaluation of UMDL/Artemis Underway
We have decided to focus our attention in one high school, and examine how its science students use the UMDL/Artemis. Thus, at Community High School, we are collecting three kinds of data from the 10th grade science students (100):
(1) Log file data: We track what each student is doing during an on-line session. Our intent here is to collect data from the larger group and use statistical analyses techniques.
(2) On-line process video data: Here we track a small number of students as they engage in an interactive on-line session; we record the screen continually and on top of that screen capture we record the students' verbal comments. (Students work in teams, typically, and engage in discussion.)
(3) Interview data: After an on-line session we carry out video-taped interviews with students, asking them to explain their on-line behaviors.
We have started these data collection methods in October, and will continue till May.
Incorporation of New Agents for the UMDL
In order to make the UMDL for a research tool, we have added/changed several components:
¥ The Recommendation Agent is now integrated into the UMDL: this agent enables students and teachers to create "recommended collections" easily.
¥ DQ Bin Bookmarking: Students can now add non-UMDL hits into their DQ Bins. This functionality enables students to almost entirely "live" in the UMDL environment.
¥ Thesaurus Revision: We are revamping the way broad topics are searched; it was simply too complex before. Currently, we have an alphabetized list of topics, rather than the arbitrary hierarchical list presented by Sears.
¥ Active DQ Bin: We are adding a "push" component to the DQ Bins; they will continue searching for items even after the student has quit the system; when the student logs on again, the student will be notified of those new finds.
All together, these new agents give the UMDL significantly increased functionality: no environment for doing on-line research has all these functions.
SMS Structure
One of our goals when building the University of Michigan Digital Library (UMDL) was to provide an architecture for a digital library that can continually reconfigure itself as users, contents, and services come and go. This has been achieved by the development of a multi-agent infrastructure with agents that buy and sell services from each other using our commerce and communication protocols. We refer to the services/protocols offered by this infrastructure as the Service Market Society (SMS). The SMS allows for the decentralized (scalable) ongoing configuration of an extensible set of users and services.
We are currently extending the SMS by incorporating agents that learn which specific goods they prefer and protocols that allow these agents to request the creation of new auctions. In order to maintain a high social welfare for the system, we are giving exclusive rights of creating new auctions to the Auction Manager Agent. This will give us a central point of control and make it easier to experiment with different protocols for auction creation.
Our experiments will show an evolving UMDL. Starting with just one auction, the system will automatically create new auctions that satisfy the agent's needs/desire for more specific (i.e., described to a finer level of detail in the UMDL-ontology) auctions. The system will also eliminate undesired auctions automatically. The agents in the system will all act in their own (not the collective) best interest. Even so, we hope to show that the total social welfare in our system approximates the theoretical maximum, thanks to our protocols and agent learning capabilities.
In the last quarter, the SMS group has been enhancing the capabilities of several agents, particularly the Service Classifier and the Preference Agent, and the SMS infrastructure.
We have also been planning a series of experiments to validate the dynamic behavior of the SMS. The idea to show that the agent system gracefully responds to fluctuations in demands for services by users by allocating new resources (agents and auctions) where demand increases, and then capturing those resources when demand decreases. We plan to demonstrate some of these properties next quarter.
Details on various projects within SMS follow:
Service Classification
During the last quarter we modified the classification strategy of the Service Classifier Agent (SCA), in preparation for the greater demands for inferential reasoning that will be required for the next Service Markets Society (SMS) demo.
We designed research to develop support for agent selection of services in large societies where agents use localized language that extends a common ontology. This involved analyzing algorithms for mapping between expressions in formal ontologies, and developing a strategy to evaluate these algorithms with simulations in a way that is theoretically well-founded.
We also presented a paper on service classification at the digital library workshop at the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI-97), Nagoya, Japan; and a poster at the Fourteenth National Conference on Artificial Intelligence (AAAI-97), Providence, Rhode Island.
Ontology
Work on the ontology in the third quarter was oriented towards completion of the Beethoven project: a test of our ontology by implementation of a knowledge-base of collection metadata converted from US MARC records. We continued to extend the ontology to handle the complexity of relations in the MARC data, including a design for handling containment relations (when a work includes several sub-works). We implemented the first stage of the data conversion process: translation of the US MARC data to an intermediate, editable format (prior to conversion to assertions in Loom, the language we use to represent the knowledge-base). To guide this conversion we developed an initial version of a set of "mapping rules" control files, which we will continue to refine. Finally, we developed an initial interface design for the demonstration that will illustrate the powerful queries supported by ontology-based collection metadata.
We also presented a paper on our ontology for digital library content and the Second ACM Conference on Digital Libraries, Philadelphia, Pennsylvania.
Strategic Agents
The UMDL system fully expects strategic agents to emerge who try to take advantage of its dynamics. We have demonstrated the strategic strategy (called p-strategy) accumulates more profits compared to the other types of strategies in most cases. When there's less dynamics, however, we have found a simpler strategy (of bidding the previous clearing price) works better.
Now, we are asking questions about the collective behavior of such strategic agents. We will compare the absolute and relative performance of each p-strategy agent, depending on the number of other p-strategy agents in the system. In addition, we will see how strategic inefficiency (losing the deals because of strategic misrepresentation) and surplus extraction (getting more profits per deal) actually affect the overall system's performance.
UMDL Agentification Language
The UMDL Agentification Language is still under development. We have built an early version of the underlying interpreter, which is integrated with the UMDL Agent code. The interpreter supports the use of "mental states" and "mental rules" to control the behavior of an agent. The current version uses Prolog to control the mental states, while using C++ to execute the behavior of the agent. We are planning to begin development of the compiler for the language within the next week. The compiler will allow developers to compile the language into C++, which will then be compiled by the Solaris C++ compiler.
Preference Agent
In the last quarter, the Preference Agent has gone from a simple concept to a nearly functional agent. The Preference Agent's modules are nearly completed, and will be integrated into the Testbed very shortly. We have already been able to utilize some of the components of the Preference Agent to score, and order actual results from a web search engine. We currently have components that process HotBot and ProQuest result items, as well as a defined format for the preferences. The actual agent is intended to be fully functional at least a month before the Berkeley demo.
A demo of the Preference Agent can be accessed at
http://www-personal.umich.edu/~compuman/QueryWPrefs.html.
August, 1997
8/13 Visit to National Science Foundation by Dan Atkins
8/15 Sarah Horrigan, Senate Appropriations Committee member, visit to SI
September, 1997
9/15 IBM and Marist College visit (11 total) meetings and presentations by OC members
9/17 Larry Smarr, NCSA Director, w/Larry Jackson, Lex Lane, and Ed Grossman (NCSA) met with Dan Atkins and SI faculty meetings and presentation
9/22 Presentation for Bain & Company Advisory Council at Keystone, Colorado by Dan Atkins
9/23 Barbara O'Keefe, Noshir Contractor, Patricia Jones, Michael Dorneich, and James Jacobs (NCSA) met with Dan Atkins and SI faculty meetings and presentation
9/25 NAS workshop in Washington, DC attended by Dan Atkins
October, 1997
10/7 Dr. Homer Neal, CERN, meeting with Dan Atkins, Joseph Hardin
10/8 Ed Fox, Virginia Tech, meeting and demonstrations with OC members and faculty
10/10 Kenneth Crews presentation to UM community on copyright issues, met with Dan Atkins
10/10 Ramesh Jain, University of California, San Diego, met with Olivia Frost and Dan Atkins
10/14 Shanghai Jiao Tong University delegation (5) met with Dan Atkins
10/15 Dow Chemical visit and presentations to faculty
10/22 John Evans and AAAS meeting and presentation to faculty
10/23 Mark Abel, Intel Corporation, meetings and demonstrations by OC members