
The broad goal of this project is to create and evaluate a modular, scaleable and extensible architecture for digital libraries. The architecture must support information access and brokering in large scale, heterogeneous, dynamic organizations of hybrid (digital and print-on-paper) collections and services. The architecture is based upon software agents which represent collections and/or services. The research is being focused and informed by testbed construction, deployment, and evaluation of multimedia earth and space science collections. The library is intended to support research in these domains as well as support of project-centered learning in high school science classes based upon enhanced access to primary resources.
The operational testbed to date has focused on creating basic query processing/information retrieval capabilities, as highlighted in this report. We emphasize, however, that the research and system being built is ultimately intended to produce an information services economy, including protection for intellectual property, methods for payment, and specialized agents for information brokering.
Providing a single interface to NSF-UMDL-accessible collections. We are looking specifically at methods for searching NSF-UMDL collections that feature different search engines and interfaces for a variety of users using a single user interface. We have started with the development of a Z39.50 gateway to bibliographic collections. The first bibliographic-collection candidates are MIRLYN (the University of Michigan's online catalog that provides access to the library catalog and several abstracting and indexing databases, e.g., PAIS, PsycINFO, INSPEC, and Compendex), FTL (the NSF-UMDL-specific search engine that provides access to locally- mounted collections, e.g., Elsevier, UMI and JSTOR collections), and WAIS (a retrieval system that has been applied to bibliographic and full-text collections on the Internet).
NSF-UMDL software-development staff have created ZClient, a WWW/Z39.50 gateway recoded from the CNIDR ZClient. ZClient supports the Initialize, Search, and Present functions of the Z39.50 standard. It also can display USMARC, SUTRS, and HTML records while being able to display any diagnostic messages that may arise from a session. Staff have tested it on Z39.50 servers running the CNIDR zserver package, DRA, and NOTIS, and are currently running tests with WAIS servers and our local FTL server (with a Z39.50 port). The URL for ZClient is: http://bandwidth.engin.umich.edu/Z39.50/index.html.
The ZClient enables NSF-UMDL users to fill out a single HTML form displayed on a WWW browser. When users click the "submit" button, the browser initiates a HTML transfer and sends all form data to the HTTPD server. The HTTPD server takes the form data and sends it to the ZClient executable file. The ZClient parses the form data and initiates a Z39.50 session with specified Z39.50 servers. The ZClient negotiates with servers and begins the search for the given query. The ZClient, upon receiving retrieved records from servers, converts records into a readable HTML file which the HTTPD server sends back to the WWW browser which displays the HTML data for the users to scan.
Staff are currently enhancing CNIDR's zserver to develop an interface between FTL and Z39.50 clients. Isite uses the script search engine to bridge two search engine types. The script search engine is essentially just a shell script that invokes the FTL search engine with the query from a Z39.50 client. The zserver was modified to parse queries received in the mandated RPN (Reverse Polish Notation) format to the appropriate query format for FTL. When the zserver receives a query for the FTL database, it executes a shell script which in turn invokes the FTL search engine. After a successful FTL search the zserver returns the search result to the Z30.50 client in HTML format.
Collection Search and Retrieval. To ensure the precision of retrieved bibliographic data, research staff are designing search trees for MIRLYN, FTL, and WAIS search functions. Search trees are paths with branches or choices that guide systems in their retrieval of bibliographic data. Search trees reflect the searching approaches that expert searchers would use to retrieve bibliographic and/or structured data. Search trees shoulder much of the burden of retrieving useful information in response to user-entered queries.
Methods for searching the conspectus--the digital catalog describing the collections in the digital library. The intent is to develop effective ways to partition the collection space for a given user query in order to then effectively and efficiently search for specific items in collections which have been identified as having a high probability of being relevant to the users request.
We have been looking specifically at methods for operationalizing a task language for planning conspectus queries. The goal is to make the conspectus search much more flexible and interactive than it is presently. This has involved the following task at both the logical (conceptualization and operationalization) and physical (implementation) levels:
Design of an intellectual property license management scheme. We have identified the basic concepts of a scheme for specifying and managing usage licenses for digital works. Licenses represent the authority to use or distribute elements in the digital library. The purpose of a general usage license language is to enable dynamic negotiation and enforcement of intellectual property rights in the NSF-UMDL. The language is part of a more general scheme for describing information goods and services, under ongoing development.
Interoperability with Stanford University. Our initial interoperability goal is to support document retrieval between the two sites. We will be able to access collections connected to their architecture, and they will be able to access ours. To create a bridge between our disparate DL architectures, staff have met with members of the Stanford team to outline a protocol for querying each others' collections and returning documents. We have agreed on Xerox's ILU implementation of the CORBA distributed object standard as our common communication transport. To achieve interoperability, we are implementing a proxy object in our architecture which will allow access to our collections which support the Z39.50 protocol. Our proxy object will be accessible to Stanford via ILU communication mechanisms. Likewise, they will implement proxy objects within their architecture which we may access via ILU to query their collections.
Electronic Atlas. Efforts are underway to deploy a spatial data search engine and relevant collections within the testbed. The focus is on developing the capability to create thematic maps of social and earth science data as well as the facility for the creation of subsets of large GIS files. A second phase of this effort would produce images of map data that can be used to query either the spatial database system or other information systems and also develop the means for overlaying the map images with vector data.
Meetings. Below is a summary of meetings hosted or attended by project staff which are related to the research effort. These meetings are a subset of the meetings and conferences attended by project members during this time period.
2/14/95 Bob Neches, ARPA Dan Atkins, Bill Birmingham, Elliot Soloway, Mike Wellman 3/7/95 Lance Mitchell, Focus Hope Doug Orr (Further meetings are being scheduled to investigate a potential research partnership between Focus Hope and NSF-UMDL.) 3/8/95 Larry Masinter, Xerox PARC Dan Atkins, Wendy Lougee, Randy Frank, Bill Birmingham 3/15/95 IBM Planning Session on Digital Libraries Dan Atkins, Wendy Lougee, John Price-Wilkin, Randy Frank, Laurie Crum 3/28/95 Apple Executives Dan Atkins, Randy Frank 3/28/95 Hector Garcia-Molina and digital library team, Stanford University Mike Wellman 3/31/95 Arthur Keller, Stanford Elke Rundensteiner, Mike Wellman 4/10-11/95 Spring 1995 CNI Task Force Meeting Randy Frank, Wendy Lougee, Mike Wellman 4/14/95 Apple planning session Wendy Lougee, Elliot Soloway 4/24-25/95 DLI All-project meeting at UIUC Dan Atkins, Randy Frank, Karen Drabenstott, Joan Durrance, Amy Warner, Elke Rundensteiner, Wendy Lougee, Doug Orr, John Price-Wilkin, Laurie Crum, Bill Birmingham 4/27/95 Eugene Miya, NASA Randy Frank, Ken Alexander, Doug Orr, Greg Peters, Dan Kiskis, Bill Birmingham, Mike Wellman, Wendy Lougee, Amy Warner, John Price-Wilkin, Gene Alloway, Karen Drabenstott, Elliot Soloway 4/28/95 Marvin Weinberger, Infonautics Dan Atkins, Elliot Soloway, Mike Wellman, Wendy Lougee, Randy Frank 5/9/95 Benjamin Grosof, IBM T.J. Watson Research Center Mike Wellman 5/10/95 Willy Chiu and digital library technical team, IBM UMDL Team (Further explorations for potential research partnerships are continuing among the IBM digital library and UM digital library project teams. Areas of potential collaboration include image search and retrieval, and digital library design and implementation challenges.) 5/16/95 Peter Bono, Fraunhofer Institute in Darmstadt at UM Dan Atkins, Kathy Willis, Bill Birmingham, Randy Frank, Karen Drabenstott 5/18-19/95 HPCC/IITA Digital Library Workshop Randy Frank, Elke Rundensteiner 5/24-25/95 Hector Garcia-Molina and other team members, Stanford University Doug Orr, Bill Birmingham 6/6/95 Mike Lesk, Bellcore Dan Atkins, Bill Birmingham, Randy Frank, Wendy Lougee, Karen Drabenstott 6/8/95 The Morino Institute, Coalition for Networked Information, Mitre Dan Atkins 6/9/95 Ray Tacoma, nCUBE Laurie Crum, Gene Alloway 6/16/95 Marti Hearst and Mark Stefik, Xerox PARC Mike Wellman 6/26/95 Benjamin Grosof and others, IBM T.J. Watson Research Center Mike Wellman 7/10-14/95 ARPA/CSTO Joint PI Meeting Bill Birmingham, Doug Orr 7/10/95 Paul DuCloy, INRIA Digital Library Project Office and tour of new French National Library (France) Dan Atkins 7/11/95 Presentation to Elsevier senior editors and interaction with Elsevier electronic publishing projects (Netherlands) Dan Atkins 7/12/95 Jose Encarnacao, House of Graphics, Fraunhofer Institute (Germany) Dan Atkins 7/13/95 Dr. Erich Neuhold, Dr. Norbert A. Streitz and faculty, GMD-IPSI (Integrated Publication and Information Systems Institute) (Germany) Dan Atkins 7/14/95 Michael Moore, University of Pennsylvania NSF-UMDL Team 7/27-28/95 Handle System Workshop Dan Kiskis, Fritz Freiheit
We will continue the development currently in place. Specifically, we will be working on the following architectural improvements:
In terms of design, we will work on the following:
Our next steps in the development of a single interface for bibliographic retrieval engines are to:
Applying bibliographic retrieval through a Z39.50 gateway to HTML-coded, Internet-accessible documents would be a disservice to users because they could not specify their queries to take advantage of structured information for the purpose of ensuring the precision of their search retrievals. Thus, we are surveying search engines that take advantage of structured information. Although this will result in two different user interfaces (for HTML documents and for bibliographic data), we plan to design the user interface so that differences between the two types of data and query processing will be transparent to users.
The research has proceeded pretty much as planned. We did a minor rescheduling to pursue the Stanford CORBA interface; this required moving up one of our scheduled projects by one year. This did not have a major impact on the schedule, only delaying by about one month the reimplement of the agentware. We're now back on schedule.
The pace of the research in general has been fine, and according to schedule (except as noted).
Basic information about the progress of the NSF-UMDL project is available through the project HomePage which is located at the following URL: http://www.sils.umich.edu/UMDL/HomePage.html
Additional URL's are noted in the appropriate sections of this report. A draft of the current architecture specification is being sent with this report.
Remora agents. Since the basic architecture is operating, we are adding new functions in the form of new agents. We have recently completed the initial implementation of a "remora" agent, which posts standing queries to NSF-UMDL. Through this agent, a user can be notified when the content of a specific site changes, or when anything that matches a query becomes available to the library (a notification service). A working prototype of the remora can be visited at http://www.engin.umich.edu/~cerebus/ontology/ontology.html.
Development and implementation of a distributed database architecture for the Conspectus database (registry). The conspectus (registry) database is designed to contain the registration information for all the agents in the NSF-UMDL system and their content and capability description. This includes, for instance, the description of collections available within the NSF- UMDL architecture, such as ACM journals, a web page, and other such resources. A persistent implementation of the conspectus database using an SQL server (currently, Sybase) has been built that now replaces our initial main memory solution. The registry agent, operating on top of the registry database, now provides support for simple conspectus retrieval and registration update services -- while assuring consistency, concurreny and recovery.
Logically, there is only one registry agent as seen by other NSF-UMDL components, however, we are now beginning to explore the development of a more powerful, distributed and open architecture. One reason for this is that given that the type of information that potentially could be managed by the registry database includes heterogeneous data types, ranging from a BSO classification of the collections, author index type of support for known-item searches, to the description of agent capabilities. This demands the development of a distributed, registry architecture integrating heterogeneous databases and search engines. In addition, to increase performance and reliability of the registry agent, we plan to dynamically replicate and redistribute the conspectus database for handling large numbers of search and update requests in the context of continuous changes in the NSF-UMDL system components (agents joining or leaving the system, and task planners performing complex conspectus search and retrieval searches). NSF-UMDL has recently arranged a partnership with Sybase, who will be providing Sybase software, including the Sybase server and replication servers. Our goal is now to explore the utilization of the Sybase replication servers to achieve a powerful distributed search paradigm, that while robust and scaleable, is transparent to the rest of the NSF-UMDL system.
Design and development of the task planner agents. We are implementing task planner agents which will apply artificial intelligence techniques to the task of searching and retrieving documents. Our current implementation uses the locally developed UM-PRS, which is based on the Procedural Reasoning System, as an engine for executing search and retrieval plans. These plans use information from the user profile and query parameters to choose which collections can best satisfy the user's query. The task planner may negotiate for the services of other agents in order to better satisfy the user's query. For example, the task planner may make use of an agent which is an authority on the Broad System of Ordering to find alternate search terms. We are developing an initial version of the task planner which makes use of available agents and the current state of the conspectus and agent capability description languages.
All text-based collection contributions to the testbed have been received, with ongoing issues of journals regularly shipped. Three primary publisher contributions are represented: UMI journals, Elsevier Science journals, and McGraw Hill reference works. UMI has provided back files of image collections; Elsevier has begun image shipments with 1995 issues. McGraw Hill's encyclopedic works have been deployed in SGML (though can be rendered for HTML), and we are awaiting the availability of the SGML viewing client (Panorama) for all platforms.[Windows available now; Mac expected in fall.
Negotiations with additional publishers are underway. A contract has recently been drawn up for the addition of a multimedia encyclopedia from Groliers (Americana). The University Library has contributed the campus license support; Groliers has contributed the license extension for the high school sites. Other publishers at various stages of negotiation include: EBSCO (high school science journals), Cambridge University Press, Academic Press, Infonautics, and the American Chemical Society. University Library science librarians have identified a list of additional relevant publishers to be pursued as well.
In addition to publisher-provided content, project librarians have identified relevant Internet sites for registration in the Conspectus. These sites, along with curriculum-development sites identified or created by the deployment team are now part of the testbed collections.
Tom Finholt, an Assistant Professor in Psychology and member of the Upper-Atmospheric Research Collaboratory (UARC) project, has recently joined the NSF-UMDL to design and implement evaluation strategies for the project. Working closely with the deployment, testing and evaluation team, he has begun to develop a design methodology which will be used at the deployment sites.
We have conducted two focus group interviews. The first group consisted of high school teachers and the second group consisted of school media specialists and public library librarians. The interview questions elicited information about the resource needs and research patterns of student and teacher populations. The teacher group provided a framework for the basic information needs which the NSF-UMDL can satisfy, such as: timeliness of resources, topical rather than historical information, and an immediate, centralized source of information. The teacher group also identified functions which should be incorporated into the user interface agent, such as: information filters to aid in finding material of the appropriate reading level, treatment of work, size of document, and searching aids which reduce the amount of time needed for students to find relevant information. The group of librarians and media specialists was able to specify six key issues which the user interface agent must address:
Deployment efforts for the NSF-UMDL began in January, 1995, with identification of needs for Pioneer High School, the Ann Arbor Public Library, and Community High School. Inasmuch as the NSF-UMDL was still under construction, goals were established which were intended to lay the groundwork for future use of the NSF-UMDL. The main focus was to create a culture that routinely used search engines and on-line resources. The narrow goal of the effort was to design a curriculum which utilized resources found on the Internet. More broadly, the goal was to foster "an inquiry approach toward learning which encourages two way communication and which results in a product to be showcased."
Initially, teachers and/or media specialists were identified at the three pilot sites and at Huron High School. A liaison from the School district was hired as a half-time employee of the NSF-UMDL. This position is being funded by the grant from the W.K. Kellogg Foundation, which was awarded to the School of Information and Library Studies to innovate the school and its curriculum. The Ann Arbor Public School's science coordinator and assistant science coordinator were also included as participants.
Training sessions for teachers and media specialists were set up. These were conducted at the University of Michigan once a month, a total of 4 all-day sessions. Employees from Pioneer High School, Community High School, Ann Arbor Public Library and Huron High School attended each of the sessions. Participants were given release time from their normal school assignments and subs were provided. The content of the training sessions included using e-mail, connectivity issues, using Web browsers, and designing curriculum. Time was also available during the training sessions for feedback and suggestions.
The deployment effort was successful in enabling a large number of students (over 300), teachers (10), media specialists (4), Computer Specialists (2), and librarians (4) exposed to using digital resources for educational purposes, including research and communication. Consistent with the project goals for this semester, curriculum units were piloted which included using the Internet. Artifacts (reports) from the projects have been collected and will be evaluated further.
Once the systems were in place, they generated considerable interest and enthusiasm among teachers, media specialists and students. There was nearly unanimous feedback that everyone wanted more time to look for resources and explore. Teachers are eager to participate in the project next year and they expect great enthusiasm from their students.
Pioneer High School. Pioneer High School is a comprehensive high school located in Ann Arbor, MI. It has approximately 2000 students in four grades. The media center includes an on-line catalog and circulation system; several computers with modems set up to dial out to specific services including Prodigy and the Ann Arbor Public Library; and a CD-ROM magazine database (ProQuest). There are computers available in labs for business education and data processing classes and computers available for classroom usage on a somewhat random basis.
Participants at Pioneer High School were 4 science teachers (see below), the Media Specialist, and the computer coordinator. The science classes involved were:
Teacher 1: 5 Earth Science classes, 150 Students
Teacher 2: 1 Earth Science class, 28 students
Teacher 3: 1 Earth Science class, 25 students
Teacher 4: 1 Advanced Biology class, 15 students
Total: 8 classes, 218 students
Each class was to be given a research project to do which included an on-line element as well as a traditional library research element. All of Pioneer's participants were provided with e-mail addresses and began to use e-mail this semester.
Community High School. The original goal of the NSF-UMDL project for Community for this semester was to incorporate Internet searching into the Foundations of Science curriculum; provide e-mail for teachers and students; and to showcase student findings on Community's Web site.
Community has been the site for several University Research projects including the Foundations of Science program, a current grant studying project based science. The environment is "technology rich" -- this year's 9th grade class of 100 students was provided with one Powerbook for every two students through the Project Based Science grant. The Powerbooks are available during every science class and are used several days a week for class activities. Students also check their Powerbook out for use in other classes and to take home. All of these classes participated in the NSF-UMDL project.
One ninth grade science teacher, and the media specialist participated directly in the NSF-UMDL project attending training sessions. The other two 9th grade science teachers participated indirectly, using the same curriculum unit for their classes as the one designed by the main participating teacher. All three science teachers are part of the Project Based Science grant and share a common planning time. E-mail addresses were provided to the participants who didn't already have them and all 4 Community High School staff in the project were successfully using e- mail this semester. Twenty tenth graders who were in the initial class of Foundations of Science (FOS) also participated in the NSF-UMDL project.
We are pleased to note that effective July 1, Dr. George Furnas, formerly a senior member of technical staff at Bellcore, became a Professor at SILS. George has a distinguished research record in computer-human interface design and information visualization and is beginning to participate in the NSF-UMDL Project. In particular he will begin by exploring the application of infinitely zoomable interfaces to the NSF-UMDL user interface. This work is being supported in part by an ARPA contract between Bellcore, the University of New Mexico, and the University of Michigan.
Although not yet formally involved in the NSF-UMDL project, we are also pleased to announce that Dr. Margaret Hedstrom, formerly Director of the New York State Archives has accepted a professorship at SILS beginning September 1. She will be focusing, among other things, on the design and evaluation of archival and records management systems in the context of digital libraries.
Doug Orr, a research programmer, has recently left the project to pursue entrepreneurial activities in industry. He will continue as an occasional consultant to the project. Doug's responsibilities have been transferred to Dan Kiskis without disruption of progress. Spencer Thomas, an experienced research computer scientists from the UM Center for Information Technology Integration, is joining the SILS team to work jointly on the Mellon Journal Storage (JSTOR) Project as well as NSF-UMDL.
We are in the final stages of recruiting a post-doc for the NSF-UMDL Project. This person will make personal contributions to the architectural research as well as provide management to strengthen the effectiveness interactions between the research activities of the graduate student and the testbed construction and evaluation.
Comments or questions may be sent to: UMDL.INFO@umich.edu