SemNet:
Three-Dimensional Graphic Representations of Large Knowledge
Bases
Kim Michael Fairchild, Steven E. Poltrock,
MCC
and
George W. Furnas
Bell Communications Research
Table of Contents
1.0 Introduction
2.0 SemNet's Implementation
3.0 Positioning Knowledge Elements
3.1 Mapping Functions
3.2 Proximity Based Functions
3.2.1 Multidimensional Scaling
3.2.2 Heuristics
3.3 Personalized Positioning
4.0 Coping with Large Knowledge Bases
4.1 Fisheye Views
4.1.1 Fisheye Views from Clustering
4.1.2 Fisheye Views from Three-Dimensional Point Perspective
4.1.3 Fisheye Views from Sampling Density.
4.2 The Display of Arcs
5.0 Navigation and Browsing
5.1 Recognizing Locations
5.2 Controlling Location
5.2.1 Relative Movement
5.2.2 Absolute Movement
5.2.3 Teleportation
5.2.4 Hyperspace Movement
5.2.5 Moving the Space
6.0 Showing Dynamic Execution of a Knowledge-Base
7.0 Conclusion
8.0 References
1.0 Introduction
In a few decades, computers have increased the information available
to people by many orders of magnitude. The rate of this information
explosion is continuing to increase, straining the ability of
computer users to comprehend or manage available data. Fifth-generation
computers are expected to intensify this problem, but simultaneously
to help manage the information. With their ability to quickly
access and manipulate symbolic knowledge, fifth-generation computers
should revolutionize problem solving and data management. But
symbolic knowledge is also a kind of information, different from
numeric data, and poses new problems for computer users. How should
large data bases of symbolic knowledge (i.e., knowledge bases)
be presented to users? How should users explore, manipulate, and
modify these knowledge bases? SemNet was developed to address
these questions.
SemNet is an exploratory research project undertaken to advance
understanding of the problems facing both users and developers
of large knowledge bases. The immediate objective of the SemNet
project is to identify important problems and a collection of
possible solutions to these problems. The problems and solutions
that have been investigated, reflecting the disciplines of the
authors, derive from the convergence of computer science, measurement
and scaling, and cognitive psychology. Because this research explores
uncharted territory, our emphasis was on identifying and informally
evaluating many alternatives instead of conducting formal evaluations
of a few alternatives. A longer term objective is to develop a
generic approach to knowledge-base browsing and editing by combining
and optimizing the best solutions.
The major problem addressed in the SemNet project was how to present
large knowledge bases so they can be comprehended by a user or
developer. To comprehend a knowledge base, we hypothesize, a user
must recognize (1) the identities of individual elements in the
knowledge base, (2) the relative position of an element within
an hierarchical context, and (3) explicit relationships between
elements. Consequently, research was focused on ways to represent
elements and their interrelationships within the context of a
large knowledge base.
SemNet represents knowledge bases as directed graphs in a three-dimensional
space (Fairchild, 1985; Fairchild & Poltrock, 1986). SemNet represents
knowledge bases graphically because knowledge bases represent
information about relationships between symbolic entities, and
graphics are an effective way to communicate relationships among
objects. Furthermore, we wanted to exploit the skills that people
have already developed for recognizing visual patterns and moving
in three-dimensional space.
Figure 1 shows how SemNet represents part of a knowledge base
containing Prolog rules. SemNet represents elements of the knowledge
base as labeled rectangles connected by lines or arcs. The arcs
are color coded to represent specific kinds of relationships between
the knowledge-base elements. In Figure 1 the rectangles represent
Prolog modules labeled with the module names. The arcs show the
modules that each module can cause to execute. The knowledge base
is explored by traveling through the three-dimensional space.
A user of SemNet manipulates information in the knowledge base
by direct manipulation of knowledge-base objects, either the rectangles
or the arcs, or by manipulation of tools, which are the blue rectangles
at the top and bottom of Figure 1.
-------------------------------
Insert Color Plate 1 about here
-------------------------------
Three-dimensional graphic representations of knowledge bases,
such as the one shown in Figure 1, can help reveal the organization
of the knowledge base. However, graphic representations do not
automatically solve all problems associated with exploring, manipulating,
and modifying very large knowledge bases. They simply transform
very large knowledge bases into very large directed graphs. The
major emphasis of research with SemNet was to investigate solutions,
or partial solutions, to the problems that arise when working
with three-dimensional graphic representations of large knowledge
bases. Other approaches to management of large amounts of information
are presented in chapters on the Memory Extender by Jones and
on Hypertext Systems by Halasz, Conklin, and Gullichsen. In this
chapter, the major problems are defined, some potential solutions
are explored, and the particular solutions implemented in SemNet
are described.
2.0 SemNet's Implementation
SemNet is intended to be a general purpose tool for exploring
and manipulating arbitrary knowledge-bases represented as directed
graphs. SemNet can be used as an interface, or part of an interface,
to arbitrary knowledge-base applications. This section describes
how applications communicate with SemNet, how different kinds
of knowledge bases may be represented, and the hardware required
for SemNet.
Figure 2 shows the components of the SemNet system. These components
are implemented on two machines, a Silicon Graphics IRIS Workstation
and a machine containing the application program and the knowledge
base. In the IRIS, the SemNet user-interface layer, written in
C, controls the display and allows the knowledge base to be explored
and edited through direct manipulation of graphic objects. In
the application machine, the SemNet knowledge-base interface layer,
written in LISP, connects arbitrary LISP applications and knowledge
bases to the SemNet user-interface layer through a control language.
These two layers (the user-interface layer and the knowledge-base
interface layer) comprise a general purpose system, and may be
attached to arbitrary knowledge-base applications.
When the system is initialized, pointers to every element of the
knowledge base are sent by the knowledge-base application to the
knowledge-base interface layer. The knowledge-base interface
layer constructs a description of each of the elements and their
interconnections and sends these descriptions via ethernet or
a serial connection to the SemNet user-interface layer. The user-interface
layer generates the SemNet graphical display. The user, via the
mouse, edits the graphic representation as desired. The resulting
changes are sent to the knowledge-base interface layer, which
in turn passes them to the application for interpretation. The
resulting knowledge-base modifications are then sent back through
the knowledge-base interface to the SemNet user-interface layer
for modification of the graphic display.
-------------------------------
Insert Figure 2 about here
-------------------------------
All the components shown in Figure 2 are required to use SemNet
with arbitrary knowledge base applications. However, for many
purposes the SemNet user-interface layer can be used in a stand-alone
mode. The SemNet user-interface layer can read a file containing
descriptions of the knowledge base like the descriptions generated
by the interface layer. The user can browse and manipulate the
graphic representation of the knowledge base, but all editing
functions that require the application are disabled. This stand-alone
mode is useful for training new users, investigating new graphic
techniques, and for demonstrations.
When SemNet is integrated with a knowledge-base application, a
mapping must be defined between elements of the knowledge base
and the graphical objects that represent these elements in SemNet.
This mapping defines a graph structure representation of the knowledge
base. For a knowledge base represented using frames, this mapping
is straightforward. Each frame is represented as a labeled rectangle
(the nodes of the network) and slots of the frames are represented
as lines (or arcs), color coded to indicate slot type. For a knowledge
base represented using rules, defining the mapping requires some
ingenuity and analysis of the task to be performed using SemNet.
For example, the nodes of the network could represent the antecedents
and consequents of rules with arcs representing implication.
SemNet has been used by linguists in the Natural Language Project
at MCC (Slocum & Cohen, 1986) to debug the morphological analyzers
that they are developing. These morphological analyzers are knowledge
bases expressed as rules organized into groups. SemNet represents
each group as a node. The arcs connecting the groups represent
paths for messages passed between the groups. Messages passed
from one rule group to another are represented as labeled objects
that travel along the arcs connecting nodes, providing a dynamic
simulation of the morphological analysis. A formal evaluation
of SemNet was conducted in this setting (Shook, 1986) and served
as the foundation for many of the evaluative statements that follow.
3.0 Positioning Knowledge Elements
A principal reason that SemNet represents a knowledge base as
a graphical network is to reveal the organization or structure
of the knowledge base. One of the obstacles to comprehending the
organization of the knowledge base is the large number of arc
crossings in any large knowledge base. An arbitrary graph cannot,
in general, be embedded in a plane without some arcs intersecting.
In a flat display, such intersections impede the eyes' efforts
to trace interconnections, and crossing points may also be visually
confused with nodes. Fortunately, if three dimensions are available,
nodes can always be assigned to positions such that no arcs intersect.
Of course, arcs still intersect in two-dimensional views of the
knowledge base, but when the viewpoint moves, users perceive a
three-dimensional space, and the arcs no longer appear to intersect.
The resulting visual simplification of the structure was one of
the primary motivations for choosing a three-dimensional representation
for the SemNet interface.
The positions of the knowledge elements in three-dimensional space
also influence how effectively the knowledge base organization
is revealed. In Figure 1 the assigned positions do not seem too
important because relatively few nodes are displayed; Figure 1
shows only a small part of a much larger knowledge base. The entire
knowledge base is displayed in Figure 3 with each knowledge element
randomly assigned to a position. In Figure 3 the labels have been
removed from the nodes and the node size has been reduced. No
clear organization emerges in Figure 3; randomly assigned positions
cause a dense maze of interconnections and obscure relationships
among the elements.
-------------------------------
Insert Color Plate 3 about here
-------------------------------
The problem is how to assign positions to knowledge elements so
that the organization or structure of the knowledge base is maximally
apparent to the user. A general solution to this problem probably
does not exist; the optimal method of position assignment for
one knowledge base may be totally inappropriate for another. Some
knowledge bases may contain information that can be used to compute
effective positions. For other knowledge bases, the user may have
information that is not stated explicitly in the knowledge base,
and this information may allow the user to assign effective positions.
Three potential sources of information for determining the positions
of knowledge elements have been explored in SemNet. First, the
positions of knowledge elements can be based on their properties
through mapping functions, described below. Second, the connectivity
between knowledge elements can be used to assign related elements
to adjacent positions. Third, the user can assign knowledge elements
to positions based on information that is not represented in the
knowledge base.
3.1 Mapping Functions
In many cases, the elements of a knowledge base have properties
that can be mapped to positions in three-dimensional space, establishing
a spatial representation of the knowledge base organization. Suppose,
for example, that SemNet was used to browse a semantic network
describing the properties of all mammals. Rips, Shoben, and Smith
(1973) found that people organize animals along three principal
dimensions: size, predacity, and domesticity. If these three properties
are defined in the knowledge base, or can be computed from other
properties in the knowledge base, then these properties can be
transformed into x, y, and z coordinates. The result should be
a spatial representation of the knowledge base that closely corresponds
to the user's conceptual representation. Hamsters and tigers would
be assigned opposite values on all three dimensions of this space.
In a large knowledge base there may be many dimensions that could
be used to organize the knowledge base, depending on the user's
task. In SemNet, the user may select the dimensions from a predefined
list.
To use mapping functions in SemNet, the application knowledge
base must contain a set of ordering functions, each of which produces
an ordering of all knowledge elements. The names of these ordering
functions are passed to the SemNet user-interface layer (see section
2.0). When a user accesses the MappingFunctions tool, SemNet presents
a list of the ordering functions. The user indicates which functions
should correspond to each of the three display dimensions. SemNet
sends these selections to the application, which returns normalized
coordinates for each knowledge element.
3.2 Proximity Based Functions
In many knowledge-base applications, the relationships between
knowledge elements may provide the key to understanding the knowledge-base
organization. These relationships are easier to perceive if related
knowledge elements are close together and unrelated knowledge
elements are far apart. Several techniques have been explored
in SemNet for determining knowledge element positions from the
relationships between knowledge elements.
3.2.1 Multidimensional Scaling
One technique uses multidimensional scaling to assign positions
to elements so that two elements directly connected by an arc
are closer together than elements with no direct connections.
If two elements can be connected by more than one arc, the technique
assigns positions so that the distance between two elements decreases
as the number of arcs connecting them decreases. A test of this
technique was conducted using a knowledge base of 53 elements
that all had connections to other knowledge elements.
The first step in this technique is to construct a matrix summarizing
the interconnections among the knowledge elements. The entry for
row i and column j of the matrix is the number of slots in element
i that point to element j. This matrix is made symmetric by summing
across the diagonal (the element in row i, column j is added to
the element in row j, column i). The symmetric matrix represents
the number of interconnections between elements without regard
to the direction of the connection.
The positions of the knowledge elements are defined by nonmetric
multidimensional scaling applied to this symmetric matrix. This
scaling technique yields a three-dimensional solution such that
the distance between elements is monotonically related to the
number of interconnections between the elements, to the extent
that is possible. In our test, the KYST program (based on Kruskal,
1964a,b and described in Kruskal, Young, & Seery, 1973) with monotone
regression and a Torsca starting procedure was used to perform
this analysis. The choice of a starting procedure is important
because the program iteratively adjusts the elements' positions
and may reach a local optimum or saddle point from poor initial
values. After 50 iterations the stress was 0.02, which indicates
a good fit to the requirement of monotonicity, and changes in
the position of elements were in the third decimal place. The
final, three-dimensional solution was rotated to principal components.
To test the effectiveness of this technique, SemNet was initially
given random positions for the 53 elements of the knowledge base.
Qualitatively, this representation of the knowledge base was unaesthetic
and confusing. Many of the arcs connecting the knowledge elements
were long, and the arcs nearly obscured the knowledge elements.
Then the elements were moved to the locations determined by the
multidimensional scaling analysis. The results were striking.
Not only were both ends of most arcs visible on a single screen,
but since the arcs tended to be short, they were easier to trace,
and even their two-dimensional projections had fewer crossings.
It became apparent for the first time, for example, that our
set of test nodes was in fact two disconnected subsets. There
were further beneficial effects: Interesting subsets of interconnected
nodes became visible, highly connected nodes took central positions,
and more isolated nodes moved to the periphery.
Applying this multidimensional scaling technique to large knowledge
bases poses some problems. Iterative nonmetric multidimensional
scaling programs are slow, and therefore inappropriate for real-time
analysis of very large knowledge bases. This problem may not be
insurmountable, however. The analysis could be performed off-line
(overnight, for example), and simple heuristics could be used
to position new knowledge elements as they are defined. For example,
an element connected to more than one other element could be positioned
at the centroid of all the elements' positions.
3.2.2 Heuristics
Other techniques use heuristics to move related knowledge elements
closer together. These heuristics require an initial position
for every element, possibly a randomly chosen position, and a
specified minimum distance between any two knowledge elements.
This minimum distance ensures that nodes remain far enough apart
to be discriminable. The heuristics define a new position for
each element of the knowledge base, one at a time, and are applied
iteratively until the knowledge elements no longer move noticeably.
The centroid heuristic defines the new position of a knowledge
element as the weighted mean of the positions of all related knowledge
elements, with each position weighted by the number of arcs that
directly connect the elements. Because of these weights, a knowledge
element moves closer to those knowledge elements with which it
shares the most relationships. This heuristic could assign different
weights to different kinds of relationships, accommodating differences
in relationship importance. If the computed centroid is too close
(less than the specified minimum distance) to any knowledge element,
a new position is found along an arc between the original position
and the centroid.
The centroid heuristic is fast and could readily be used in conjunction
with the multidimensional scaling approach described above. The
solution obtained is, however, strongly dependent on the initial
positions. In tests we conducted with a large knowledge base,
the centroid heuristic improved the original random positions
significantly, but never untangled the complex web of interconnections.
An annealing heuristic was motivated by recent research on simulated
annealing! (Kirkpatrick, Gelatt, & Vecchi, 1983). This technique
is slower than the centroid heuristic but more successful at untangling
the interconnections. A new position for each element is computed
by adding a vector of random orientation and length to the current
position. A scale factor controlled by the user determines the
maximum length of the random vector. The element is moved to the
new position if it is closer than the current position to the
centroid, but not too close to any other element. By setting the
scale factor high initially, knowledge elements can be moved large
distances, potentially untangling the interconnections. When new
positions are no longer found, the scale factor is gradually reduced
until the changes in positions are acceptably small.
Like the centroid heuristic, the annealing heuristic is sensitive
to initial positions, but it yielded solutions that were judged
substantially more effective than the solutions of the centroid
heuristic. Figures 3 and 4 show the same knowledge base with positions
assigned randomly in Figure 3 and positions adjusted by the annealing
heuristic in Figure 4. An organization of the knowledge base is
clearly visible in Figure 4, making the knowledge base appear
smaller and more comprehensible.
-------------------------------
Insert Color Plate 4 about here
-------------------------------
3.3 Personalized Positioning
In some cases, the user may have information that is not in the
knowledge base but should be considered when assigning positions
to the knowledge elements. The user can simply move the knowledge
elements to these positions and leave them there. This is the
location method we use in our daily lives: We put things where
we want and expect them to be there next time we look for them.
This method can be combined with other positioning techniques.
For example, the user can fix the positions of certain knowledge
elements, then let all others be adjusted by the annealing or
centroid heuristic. These heuristics will treat the fixed positions
as anchors, and all connected knowledge elements will be grouped
around these anchors. For example, a linguist could fix one verb
analysis node and one noun analysis node at opposite ends of the
space. Then the annealing or centroid heuristic could be used
to pull apart the remaining noun and verb nodes.
4.0 Coping with Large Knowledge Bases
Whatever strategies are used to organize the display of a graph,
the number of nodes and arcs can eventually be overwhelming. Two
limitations affect performance with large knowledge bases. First,
the graphic hardware is limited in the speed with which it can
compute and display information. As the number of objects to be
displayed increases, the system's responsiveness decreases, degrading
the real-time user interaction. Second, humans are limited in
their ability to discriminate and attend to objects on the display.
Somehow, the total amount of information must be reduced. We addressed
this issue in SemNet by inventing ways to find and display useful
subsets of the knowledge base. The simplest methods involve selection
by the user. More sophisticated methods were derived from Generalized
Fisheye Views (Furnas, 1986).
When knowledge elements are of different types and not all types
are of current interest, a simple subset-by-type strategy can
be adopted. The user initiates this strategy by simply identifying
which type of element should be displayed. For example, a user
could specify that only mammals should be displayed in a knowledge
base about animals. This capability has not been fully implemented
in SemNet, but can be accomplished through interacting with the
application program.
The underlying spatial metaphor of SemNet allows several strategies
for dividing the knowledge base into subsets using spatial criteria.
One strategy is a natural consequence of the three-dimensional
graphics hardware, which shows only the part of the knowledge
base within a restricted viewing angle when looking in a particular
direction from the viewpoint. This allows the user to concentrate
on the subset of nodes within this viewing angle. Moreover, the
perspective view makes objects nearer the viewpoint appear larger,
helping the user to examine local neighborhoods more effectively.
These local neighborhoods will be comprehensible only if related
elements are within the same neighborhoods. In other words, proximity
in euclidean space should correspond to proximity in the graph
structure. In general, however, it is not mathematically possible
to achieve perfect correspondence between proximity in arbitrary
graph structures and proximity in three-dimensional euclidean
space, so SemNet provides a view_neighbors tool that temporarily
pulls all the nodes adjacent to any selected node into view. The
view_neighbors tool is described in more detail in section 5.2.4.
4.1 Fisheye Views
In addition to spatial subsets, SemNet's underlying spatial metaphor
also supports more sophisticated information reduction strategies
called Generalized Fisheye Views (Furnas, 1986). These strategies
display details near a focal point and only more important landmarks
further away. Such views attempt to give a useful balance of local
details and surrounding context. Conceptually, these views are
implemented by computing a degree-of-interest value for every
object, then displaying only those objects that have a degree-of-interest
value greater than a criterion. The degree-of-interest value for
any object increases as the a priori importance of the object
increases, and decreases as distance increases from the object
to the point where the user is currently focused. The following
sections discuss several generalized fisheye viewing strategies,
two of which, one explicit and one implicit, are currently implemented
in SemNet.
4.1.1 Fisheye Views from Clustering
Construction of a fisheye view requires that (1) interactions
can be characterized as focused at a point in a reasonably static
structure, (2) there be a notion of distance, and (3) there be
a notion of differential a priori importance of objects. The first
two prerequisites were readily satisfied in SemNet. The knowledge
base is the stable structure, the focal point is a location in
three-dimensional space, and distance can be defined as euclidean.
Notions of differential importance, however, were more elusive.
One might consider nodes with more incident arcs to be more important.
Or perhaps the semantics of the domain might associate importance
with some other feature of the network. For example, in a knowledge
base, nodes higher in an ISA-sublattice might be considered more
important. Or there might be purely non-structural application-specific
importances (e.g., creation date). Any of these definitions of
importance could be used in a standard fisheye strategy that eliminates
less important nodes as distance increases. While all these definitions
seemed reasonable for particular applications, none seemed sufficiently
general.
The absence of satisfactory explicit notions of differential importance
led us to pursue new kinds of generalized fisheye views, all based
on somehow identifying implicit notions of differential importance
and making these notions explicit. For example, even though no
individual knowledge-base element may be more important than others,
sets of elements may exist that are more important than individual
elements. If we could find a way to identify such sets, then an
implicit structure could be made explicit by (1) introducing new
cluster-objects that represent sets, and (2) assigning greater
importance to these cluster-objects than to individual elements.
Consider the consequences that would emerge if we succeeded in
finding ways to identify important sets of elements. Every element
could be assigned to a unique set represented by a cluster-object,
then every set could be assigned to a superset represented by
another kind of cluster-object, and so on. The result would be
a new hierarchical structure, with knowledge-base elements at
the bottom and successive layers of cluster-objects above the
objects that they contain. In this metastructure, importance corresponds
to height in the hierarchy even though no general definition of
differential importance was available in the original structure.
Furthermore, successively higher layers of the metastructure have
fewer and fewer objects in them, allowing a fisheye strategy to
be implemented by displaying knowledge elements near the focal
point but only cluster-objects further from the focal point. As
distance from the focal point increases, the displayed cluster-objects
correspond to higher levels of the metastructure. The user can
concentrate on knowledge elements in the neighborhood of the focal
point in the global context of sets of elements represented by
cluster-objects.
How should these cluster-objects be represented? They should probably
be discriminable from individual knowledge elements, so that the
user can understand the distortion or transformation of the view
introduced by the fisheye process. For example, they may be represented
by a special cluster icon carrying some identifying information.
This information might indicate the level of the cluster and some
sort of label. They might be labeled by an explicit cluster-name
(if one is available), or labeled by example, i.e., displayed
with a short list of some knowledge elements inside the cluster,
either randomly or systematically chosen (cf., Dumais & Landauer,
1984). Alternatively, one might try to make semantically appropriate
pictographs for each individual pseudo-object, but this seems
difficult to automate.
Recognizing the advantages of identifying important sets of elements,
we required a method for assigning the elements to these sets.
In SemNet euclidean neighborhoods were used to assign elements
to sets, which again emphasizes the importance of the positions
assigned to elements. If semantic information is used to assign
positions to the elements, then these euclidean neighborhoods
may approximate meaningful subdivisions of the elements. Simple
hierarchical clustering was imposed on the nodes by dividing the
space in half along the x, y, and z axes. Each of the resulting
eight sub-regions was then itself similarly subdivided, and so
on recursively down three levels. This successive octal subdivision
yields a rooted 8-ary tree, with the SemNet objects partitioned
among its leaves. The internal nodes of the tree are the cluster-objects.
Figure 5 shows the resulting SemNet display for one view of the
same knowledge base depicted in Figures 3 and 4. The objects that
are displayed in this fisheye view depend on the position of the
viewpoint, which the user controls using the mouse. In the middle
of the display, knowledge-base elements, such as yesno and quantify,
are visible because these elements are in the same subdivision
as the viewpoint. The large green rectangles, such as pt and is,
are cluster-objects that represent the subdivisions adjacent to
the subdivision containing the viewpoint. The orange rectangles,
such as terminal, are cluster-objects representing sets of four
subdivisions that are further away from the current subdivision.
This fisheye technique of representing the more remote regions
in correspondingly larger chunks essentially reduces the number
of objects to be displayed on the screen logarithmically, yet
preserves a balance of local detail and global context. A blue
band at the bottom of each cluster-object represents the proportion
of the knowledge base located within the volume represented by
that icon. Each cluster-object is positioned at the center of
the volume it covers and labeled with the name of its most highly
connected node (a label-by-example heuristic).
-------------------------------
Insert Color Plate 5 about here
-------------------------------
Note that tree distance is used to determine the degree-of-interest
value for all the objects. Cluster objects are interesting because
they have high a priori importance, but knowledge-base elements
in one subdivision are interesting because they are all equally
close (in tree distance) to the viewpoint. Although the tree is
based on a subdivision of euclidean space, tree distance is not
identical to distance in euclidean space. Tree distance was chosen
because the subdivision structure is static and easily computed,
and the algorithms for what to display in any tree-structure fisheye
are very fast. However, tree distance has several drawbacks. First,
when the viewpoint is moved, the objects that are displayed change
abruptly at (invisible) subdivision boundaries. This distracting
effect is ameliorated by using euclidean distance to tell when
a boundary is being approached, then highlighting the clusters
that are about to change. For example, in Figure 5 the is and
pt cluster objects have light green borders, indicating that the
viewpoint is near the boundaries of those subdivisions. If the
viewpoint moves much closer, the knowledge-base elements in one
of those subdivisions will be exposed and the current subdivision
will be represented by a cluster object. A second drawback of
tree distance is that the subdivision structure is very heterogeneous
with respect to the embedding euclidean space. As a result, a
small motion crossing a low-level subdivision boundary has a much
smaller effect than a similar motion crossing one of the highest
level subdivisions (in which case everything changes). We were
unable to fix this problem since, as far as we know, it can be
remedied only by adopting some other metastructure (e.g., a non-hierarchical
clustering model).
We have already noted that the subdivision strategy for assigning
knowledge-base elements to sets increases the importance of the
positions assigned to the elements. In order for any spatial subdivisions
to be meaningful, elements must first be placed in the space so
that nearby elements are closely related. Otherwise the content
of the spatially defined clusters will be essentially random,
and the resulting metastructure will not have captured the implicit
importance structure. The fisheye view would not make sense. Thus,
either very meaningful mapping functions or proximity-based node
placement techniques must be used. Furthermore, the octant subdivision
strategy implicitly assumes that cuts perpendicular to the x,
y, and z axes are the most appropriate. This may be defensible
when coordinates come from mapping functions that make these axes
meaningful. When object positions result from proximity-based
techniques however, the final orientation of the cloud of points
is typically arbitrary@, so the cuts will be similarly arbitrary.
One possible solution would be to do the clustering more intelligently,
based on the distribution of the nodes and not simply on nested,
binary, orthogonal subdivisions.
4.1.2 Fisheye Views from Three-Dimensional Point Perspective
The SemNet interface contains another strategy for managing size.
It comes naturally from the way three-dimensional graphics are
used, and it exemplifies another generalized fisheye strategy
for worlds that do not have explicit differential a priori importance.
A balance of local detail and global context arises automatically
from the geometry of point perspective in three dimensions: A
few nearby points loom large and those further away appear smaller
and smaller. We argue that this is not trivial -- an orthogonal
projection of three-dimensional space would not achieve this fisheye-like
balance, and would be correspondingly less useful. This ability
of a movable perspective viewpoint in three-dimensional space
to help deal with size and complexity through a combination of
geometric subsets and fisheye views was the original motivation
for its use in the SemNet interface.
It is useful to analyze this fisheye effect, not in terms of the
geometry of projection, but in terms of a general metastructure
strategy. This time, however, the metastructure is not a hierarchy
of clusters. Instead, each SemNet object is associated with a
continuous set of graphic representations that differ only in
size. These graphic representations are the same as those that
result from perspective geometry. Combining these sets with the
original knowledge-base structure yields a metastructure with
a new size dimension orthogonal to the original structure. Now
assume that the a priori importance of a graphic representation
is inversely related to size: Smaller images of each object are
more important than larger images. This assumption may seem counterintuitive,
but reflects the intuition that it is more important that an object
be represented, even if it is small, than not represented at all.
The fisheye view that results from this elaborated importance
structure and euclidean distance is equivalent to the view that
results from perspective geometry. Since the higher level graphic
representations are smaller, the resulting views are reduced compared
to a non-fisheye view (such as would result from an orthogonal
projection). The geometric knowledge embedded in the three-dimensional
graphics hardware selects the correct size for each object in
the view quickly and automatically.
Theoretically, this fisheye conceptualization supports other transformations
of the objects' representations. For example, the objects could
change in form as well as size. The most detailed representation
of an object (for use at close range) might be an icon containing
the node's full English title and perhaps other useful information.
Next might be a smaller box containing only the node's title,
next only a small unlabeled box, and finally just a small point.
The fisheye view would select the appropriate form and size for
each knowledge-base element by first computing degree-of-interest
values from distance and a priori importance, then displaying
the representations with values closest to the criterion. At present,
we simply make use of the IRIS hardware and the geometric fisheye;
consequently, objects change only in size.
4.1.3 Fisheye Views from Sampling Density.
In addition to the two fisheye strategies currently implemented,
there is at least one other strategy that we could have used that
does not depend on explicit values of differential a priori importance.
Consider a structure whose layout exhibits a kind of graded spatial
autocorrelation between the objects: Only small changes happen
from neighbor to neighbor in the space; major changes happen only
gradually. In pictures, for example, local changes usually just
delineate small details. In such a case, sampling density can
be used to make explicit the distinction between major and minor
features. In a picture, the image would be sampled very densely
near the point of interest to give high resolution and detail.
Regions successively further away would be sampled less and less
densely, and only major changes in the image would be visible.
This corresponds exactly to what the human retina does to achieve
information compression. In SemNet a comparable strategy could
be implemented since a proximity-based node layout should give
the right kind of autocorrelational structure. Thus, one would
sample the three-dimensional space densely near the current point
of view, showing all objects. Further away, only a correspondingly
smaller proportion of the objects would be permitted in the view
-- only a coarse sampling to give the gist of the more remote
regions.
4.2 The Display of Arcs
Two issues arise in the display of arcs. First, if not all nodes
are visible, which arcs should be shown? In SemNet a simple strategy
was adopted: Show a directed arc if and only if its origin node
was explicitly visible.
A second problem, one we did not address, involves what to do
when the number of arcs, not the number of nodes, is extremely
large. This problem can be solved by following a similar approach.
First, strategies for dividing the arcs into sets are needed.
If, as in a knowledge base, the arcs are of different types (e.g.,
different semantic relations), those arcs corresponding to types
that are not of current interest may be deleted. Fisheye strategies
are also possible if arcs have differential importance -- for
example, those of the ISA-sublattice in the knowledge base. Then
less important arcs would be deleted as distance increased.
5.0 Navigation and Browsing
To explore graphical representations of large knowledge bases,
such as those shown in Figures 3 and 4, the user must be able
to move the viewpoint or move the entire knowledge base. Suppose,
for example, a user needs to inspect a particular knowledge element
and all its connections in Figure 4. This task could be accomplished
by moving the viewpoint close to that element, restricting the
view to the small set of elements in a local region. Although
this technique provides a powerful method for coping with the
size and complexity of the knowledge base, it raises a new set
of problems associated with navigating in a virtual three-dimensional
space. These problems can be decomposed into two aspects of navigation:
recognizing locations and controlling locations.
5.1 Recognizing Locations
How does the user know where the current viewpoint is in a complex
knowledge space? One solution is to provide navigation aids similar
to those found in the real world. For example, when traveling
in a strange city, people use abstract scale models or maps to
discover where places are in relation to each other and how to
get to new places. Landmarks, both natural and man-made, often
help to identify the current position. SemNet provides tools
similar to maps that show the current position of the viewpoint
in the x-y and the x-z planes. Structures similar to landmarks
may occur naturally when knowledge elements form structures that,
with experience, become recognizable.
When following a complex path in the real world, people sometimes
leave marks so the path can be retraced later. This approach could
be adopted in SemNet, allowing the user to place identifiable
markers on each section of the knowledge; for example, an icon
for an elephant could represent information about large African
animals. In addition, SemNet could show the path that was followed
to reach the current location and provide automatic means for
retracing it. Path retracing could have the added benefit of helping
to re-establish the context that led to the current position.
These capabilities do not currently exist in SemNet.
>From our experiences with navigating through a three-dimensional
knowledge space, the single most important feature of the interface
is to make the user experience a real, three-dimensional space.
diSessa (1985) terms such a user a naive realist - that what the
user sees and manipulates on the display screen is the knowledge
itself, rather than simply an interface to actions that manipulate
an invisible system. The same control movements the user already
makes to control the real world should map directly into the virtual
world. One important component of this is the quality of the three-dimensional
imagery (Poltrock et al, 86). Real-time movement of graphical
objects, motion parallax, graphic depth cues, and stereopsis will
enhance this display.
Many cues help a user to recognize the current static position
of the viewpoint. The organization of the knowledge base and features
of the graphical objects provide perceptual information about
the viewpoint position. For example, objects near the viewpoint
are larger than objects far from the viewpoint. Other ways that
graphical objects communicate location were described in section
4.0. A map of the space is provided by an Absolute Positioning
tool (the blue rectangle at the lower left in Figures 1, 3, and
4), which shows the current position of the viewpoint.
Because the space is three dimensional, the relative depth of
objects is important too. When the viewpoint is stationary, however,
the knowledge base looks flat because there are few effective
cues to depth. Two methods were explored for enhancing the depth
effects in a static display. One method is to vertically oscillate
the viewpoint, creating the impression that the knowledge base
is rocking around the x axis. Although this method improves perception
of depth, linguists in MCC's Natural Language Project found it
distracting. The second method is to make small random movements
of the viewpoint. This method appears to be as effective as vertical
oscillations and is more pleasant to watch.
5.2 Controlling Location
How should the user control the position of the viewpoint, thereby
determining the portion of the knowledge base that is displayed?
Five different methods have been explored for either moving the
viewpoint in a complex three-dimensional knowledge space or moving
the knowledge base itself.
5.2.1 Relative Movement
The first method implemented in SemNet evolved from past experience
with real-time visual simulation systems for training pilots to
fly helicopters. This method provides independent controls for
three orthogonal rotations of the viewpoint and movement forward
and backward along the line of sight. This movement method is
accessed by selecting the helicopter flight option on SemNet's
main system menu. A submenu allows the user to select roll, pitch,
yaw, or forward and backward movement. Tools for adjusting the
velocity of movement and rotation are also provided.
This relative movement method has proven difficult to use for
several reasons. First, changes in orientation are confusing,
perhaps because there are no visual cues such as the gauges provided
in helicopters and airplanes to indicate the current orientation.
Consequently, users quickly become lost and disoriented. To help
users recover their orientation, functions were added to the flight
menu that return the viewpoint to the center of the space and
level off the orientation. Second, the relative movement method
is slow and awkward to use. Users must often iteratively adjust
one control after another to reach a location that is readily
visible on the display. Characteristics of the available control
devices (i.e., the mouse and keyboard) may be largely responsible
for these problems. If the user had a joystick and accelerator
that could be manipulated simultaneously, the helicopter model
would probably be an effective method for controlling viewpoint.
Some of the three-dimensional control devices that are gradually
becoming available commercially may provide acceptable substitutes
for an accelerator and joystick.
5.2.2 Absolute Movement
With a map of the three-dimensional knowledge space, the user
can point to the desired viewpoint location. One kind of map used
in SemNet is shown in Figures 1, 3, and 4. Because the space is
three dimensional, the map shown in the figures has two two-dimensional
parts, one representing the x-y plane and the other representing
the y-z plane. The position of the viewpoint in the three-dimensional
knowledge space is represented by an asterisk in each plane of
the map. The user manipulates the position of the viewpoint by
moving the asterisk in one plane at a time using the mouse. A
filter ensures that the viewpoint moves smoothly, retaining the
experience of travel through a three-dimensional space.
This absolute movement method is quicker and easier to use than
the relative movement method described above. It poses new problems
however. First, viewpoint positioning is not very accurate; it
is easy to move rapidly to an approximate position but difficult
to adjust the viewpoint precisely. This problem could be solved
by allowing the user to adjust the filter so the viewpoint moves
more slowly. The second problem is that using two-dimensional
maps of a three-dimensional space imposes an extra cognitive demand
on the user, forcing the user to determine which map to use and
which direction to move the mouse to cause desired movements.
Moving the mouse up and down in the x-y plane or left and right
in the y-z plane causes the viewpoint to move along the y axis.
It is easy to confuse these motions and difficult to determine
the correct mouse movements required to move the viewpoint toward
a particular object. This unnaturalness may be solved when three-dimensional
control and display devices are available, allowing a direct coupling
of the user's actions and the display consequences.
Two different absolute movement tools have been implemented in
SemNet to explore alternative ways of controlling three-dimensional
movement in two-dimensional maps. One tool, shown in the figures,
displays two maps; the other tool shows three maps corresponding
to the x-y, y-z and z-x planes. The third map is, of course, redundant;
it provides no information that is not available in the other
two maps. More importantly, users rarely used the z-x map, but
restricted their attention to the other two maps.
The tool with three maps also presented a miniature version of
the knowledge base, which was expected to help the user navigate.
Any advantage of this information was overwhelmed, however, by
the heavy processing load that it imposed on the computer. Movements
were less smooth because of processing required to maintain the
map images.
5.2.3 Teleportation
Teleportation, a movement method based on cache table principles,
works nicely with other movement methods. Once the user has been
to a location in the knowledge space, the chance of needing to
go back to that same location increases dramatically. Typically,
when entering new information into a knowledge-base, a user will
find information that is similar to the piece to be entered, make
a copy, and edit it. Teleportation allows the user to pick a recently
visited knowledge element from a two-dimensional list (a menu)
and instantly move to the location of the knowledge element. By
selecting options from this menu, the user could pop to a location,
make a copy of a knowledge element, then pop back to the original
location to continue editing. This concept can easily be extended
to create various types of bookkeeping lists that store concepts
to be added, locations of interesting places, and concepts to
be tested.
5.2.4 Hyperspace Movement
Browsing a knowledge base typically involves examining knowledge
elements one at a time, observing which knowledge elements are
related to the one under examination, then following a selected
relationship to examine another knowledge element. The hyperspace
movement method facilitates this kind of browsing. It is particularly
useful in a large knowledge base in which the relationships among
knowledge elements are not easily recognized and in situations
in which a user is searching for an element similar to one that
has been found. In a knowledge base about animals, a user could
follow the eats relationship from tiger to mongoose to snake to
hamster, thereby traveling from one end of the knowledge base
to the other.
Hyperspace movement is accomplished by first selecting a function
called view_neighbors. This function temporarily moves all the
nodes connected to the selected knowledge element, positioning
them around it. When this function is de-activated, or when a
new node is selected, these nodes snap back to their original
positions. If the next node selected is one of the moved nodes,
the viewpoint follows this node back to its original position,
the newly selected node is in the center of the display and all
related nodes are distributed around it.
The hyperspace movement method is particularly useful when clustering
methods are used to compress the information in the knowledge
base (see section 4.0). If an element is related to an element
in another cluster, this method can be used to pull the connected
element from the cluster. If the user decides this element is
important, a single keystroke will move the viewpoint to this
new element.
5.2.5 Moving the Space
All the movement methods described above involve moving the viewpoint
through a static knowledge space. It is not clear that this is
the best way to represent the movement task. Perhaps movement
control would seem more natural to a user if it was conceptualized
as manipulating the position of the knowledge base instead of
manipulating the viewpoint. In support of this alternative, research
has shown that when children are asked to describe how a structure
would appear after some movement, they are more accurate if they
are asked to imagine that the structure is rotated than if they
imagine themselves moving to another viewpoint. Three SemNet tools
allow the user to set the attitude or orientation of the knowledge
space while keeping the viewpoint fixed. Initial findings suggest
that this movement method is effective but it has not been systematically
compared with the other methods.
6.0 Showing Dynamic Execution of a Knowledge Base
After a knowledge base has been entered, the user must test and
debug it. Typically, the user makes a query or an assertion to
the knowledge base, which processes this input and makes internal
changes to the data, usually returning a text string. To debug
this process, the user needs a history of the nodes that were
visited and the paths that were taken. In the past, this history
was obtained by printing the current execution state as each node
was visited. For a complex knowledge bases this debugging trace
may be hundreds of pages long, overloading the user.
The approach taken in SemNet is to show dynamically the changes
that result from knowledge-base processing. During knowledge base
execution, when one node completes its processing and passes its
result to another node, an object called a sprite travels down
arcs between the nodes to show the progress of the execution.
Attached to the sprite is a label showing the current value of
the computation. In the Natural Language Project application,
this label is the piece of the word the system is analyzing.
Since the knowledge base executes much faster than a human can
follow, the speed of the sprite is under the user's control. To
show the history of knowledge-base execution, both arcs and nodes
have three color-coded states: background, active, and relaxed.
Initially, all objects are set to the background color. When the
sprite traverses an arc, the color of the arc and its connected
nodes change to active. Both the nodes and the arc change to the
relaxed color when the sprite has moved on to other arcs and nodes.
By examining the color of the nodes and arcs, the user can determine
which nodes have executed, which helps to localize errors.
When the user has determined that a particular region of the knowledge
base is responsible for an error, extra debugging aids can be
requested for the nodes in that region. When one of these nodes
is encountered, a temporary fixed menu appears, which contains
the rules that the node represents. Highlighting is used to show
how each rule in the menu responded to the input.
7.0 Conclusion
As knowledge bases become larger, more powerful semantic and syntactic
techniques will be required for exploring and manipulating the
knowledge. SemNet offers a syntactic approach to solving this
problem. SemNet's representation of a knowledge base is based
entirely on the names and positions of the knowledge elements
and the connections among them. SemNet demonstrates how major
problems in knowledge base management can be solved, or partially
solved, with only this syntactic information. Of course, semantic
information available to the application program could be used
in conjunction with SemNet's syntactic techniques to achieve a
more effective solution.
The positions of the knowledge elements greatly affect comprehension
of the knowledge base structure. The application program assigns
initial positions, allowing use of semantic information that is
unavailable to SemNet. Alternatively, positions can be determined
by heuristics designed to put knowledge elements close together
spatially if they are neighbors in the graph structure formed
by their interconnections. These methods for assigning positions
become increasingly important and increasingly less effective
as the knowledge base increases in size, pointing to the need
for further research on this problem.
An important issue was how, using only syntactic information,
SemNet could reduce the displayed information by presenting less
information about those knowledge elements unimportant to the
user's immediate concern. To some extent, this information reduction
is accomplished automatically by the three-dimensional graphic
hardware if proximity-based positioning has been used. Objects
near the viewpoint, which is assumed to represent the region of
immediate interest, are displayed at full size. Object size diminishes
as distance from the viewpoint increases, and objects outside
the field of view are not displayed at all. The result is a fisheye
view based on three-dimensional perspective. This method was made
more effective by combining it with a fisheye view based on clustering.
All knowledge elements within local regions of euclidean space
were assigned to clusters, neighboring clusters were assigned
to higher level clusters, and so on. Only knowledge elements near
the viewpoint are presented; further from the viewpoint, cluster-objects
are displayed that represent all the knowledge elements in a region
of the space. The combination of these two methods greatly reduce
the displayed information while providing both local details and
global context.
SemNet was an exploratory project to investigate alternative solutions
to problems endemic to large knowledge-base systems. SemNet was
never intended to provide solutions to all knowledge-base problems.
SemNet was intended to help users recognize and understand the
structure or organization of a large knowledge base. To achieve
this understanding, we hypothesize, the user must recognize the
identities of knowledge-base elements, explicit relationships
between elements, and the relative position of elements within
the knowledge base. To expand an old aphorism, SemNet attempts
to reveal the structure of the forest, not details about trees.
The details of the knowledge-base are de-emphasized and the structure
is emphasized by representing the elements of the knowledge base
as simple, labeled rectangles and the relationships between elements
as color-coded arcs. Navigation tools enable a user to explore
the structure, view different parts of it, and follow relationships
from one element to another. Positioning heuristics and fisheye
techniques are intended to ensure that the view presented to the
user communicates the logical structure of the knowledge base.
Although SemNet was an exploratory research project, an outcome
of the project was a system that can be integrated with a wide
range of applications. SemNet solves some but not all problems
that are commonly encountered in knowledge-base applications.
Indeed, a single system cannot address all knowledge-base problems
because they depend on the application and the tasks that are
to be performed. Consequently, SemNet is intended to be used in
conjunction with an application program that would provide access
to the contents of the knowledge-base elements. In the Prolog
example shown in Figures 1 and 3-5, SemNet shows the relationships
among Prolog modules, while an application program would display
the contents of these modules and support module construction
and editing. The Natural Language Project accomplished a tighter
marriage of SemNet with the application program. When a linguist
debugs a rule system using SemNet, the nodes represent collections
of rules responsible for analyzing morphemes that visibly travel
from one node to another. When an error is isolated to a node,
the linguist can request a display of the rule collection, which
SemNet obtains from the application program. Other potential applications
of SemNet are communicating the structure of large projects, debugging
object-oriented programs, browsing through hypermedia documents,
and representing connectionist models.
8.0 References
diSessa, Andrea A. (1985). A Principled Design for an Integrated
Computational Environment. Human-Computer Interaction, 1, 1-47.
Dumais, S.T. & Landauer, T.K. (1984). Describing categories of
objects for menu retrieval systems. Research Methods, Instruments
and Computers, 16, 242-248.
Fairchild, K. M. (1985). Construction of a Semantic Net Virtual
World Metaphor (Tech. Rep. No. HI-163-85), Austin, TX: MCC.
Fairchild, K. M. & Poltrock, S. (1986). SemNet [Videotape]. Presented
at CHI'86 Human Factors in Computing Systems, Boston, 1986, and
(Tech. Rep No HI-104-86), Austin, TX: MCC.
Fairchild, K. M. & Poltrock, S. (1987). Soaring through Knowledge
Space: SemNet 2.1 [Videotape]. Presented at CHI'87 Human Factors
in Computing Systems, Toronto, 1987, and (Tech. Rep No HI-104-86
Rev 1), Austin, TX: MCC.
Furnas, G. W. (1986). Generalized fisheye views. Proceedings
of CHI'86 Human Factors in Computing Systems (pp. 16-23), New
York: ACM.
Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization
by simulated annealing. Science, 220, 671-680.
Kruskal, J. B. (1964a). Multidimensional scaling by optimizing
goodness of fit to a non-metric hypothesis, Psychometrika, 29,
1-27.
Kruskal, J. B. (1964b). Non-metric multidimensional scaling: a
numerical method, Psychometrika, 29, 28-42.
Kruskal, J. B., Young, F. W., & Seery, J. B. (1973). How to use
KYST: A very flexible program to do multidimensional scaling (Bell
Laboratories Technical Memorandum), Murray Hill, NJ.
Poltrock, S., Shook, R. E., Fairchild, K., Lovgren, J. E., Tarlton,
P. N, Tarlton, M. & Hauser, M. (1986). Three-dimensional interfaces:
The promise and the problems (Tech. Rep. No. HI-291-86), Austin,
TX.: MCC.
Rips, L. J., Shoben, E. J., & Smith, E. E. (1973). Semantic distance
and the verification of semantic relations. Journal of Verbal
Learning and Verbal Behavior, 12, 1-20.
Shook, Robert E. (1986). SemNet: A Conceptual and Interface Evaluation
(Tech. Rep. No. HI-320-86-P), Austin, TX: MCC.
Slocum J., & Cohen, R. (1986). NABU Documentation, Natural Language
Processing Project. (Tech. Rep. No. AI-228-86-Q), Austin, TX:
MCC.
Footnotes
! The annealing heuristic is not strictly an example of simulated
annealing, but it is similar in that a scale factor serves a role
similar to temperature in annealing, which produces stable molecular
structures by slowly decreasing the temperature from a high initial
value.
@ Many proximity scaling packages, like KYST or MDS, rotate to
some canonical orientation, such as principal components. While
such orientations are well defined they are rarely semantically
interpretable, so cuts based on halving the canonically oriented
axes are also unlikely to be semantically sensible. The problem
really is that the current subdivision scheme is blind; it is
totally oblivious to the configuration of points, and any natural
structure (e.g., natural clustering) they might have. That is,
of course, why the technique can be so simple and so fast. An
alternative is to use subdivision techniques, like aglomerative
clustering, that more tediously (e.g., O(n#)) construct a hierarchy
that reflects the detailed positions of points. The hope would
be that it would thereby make semantically sensible clusters.
Figure Captions
Figure 1. Part of a knowledge base consisting of Prolog modules.
Figure 2. Conceptual architecture of SemNet.
Figure 3. The complete knowledge base of Prolog modules with nodes
assigned to random positions.
Figure 4. The complete knowledge base of Prolog modules with node
positions determined by the annealing heuristic.
Figure 5. A fisheye view of the complete knowledge base of Prolog
modules.