Lexical Navigation: Visually Prompted Query Expansion and Refinement

Preliminary Version - January 14, 1997

James W. Cooper and Roy J. Byrd

30 Sawmill River Rd, Hawthorne, NY 10532, USA

Tel: (914)-784-7285, 6815

Fax: (914)-784-6307, 7455

jwcnmr@watson.ibm.com, byrd @watson.ibm.com

IBM Thomas J. Watson Research Center

30 Sawmill River Rd, Hawthorne, NY 10532, USA

Tel: (914)-784-7285

Fax: (914)-784-6307

James W. Cooper is a Research Staff Member in the Digital Libraries group at the IBM T. J. Watson Research Center. He holds a Ph.D. in chemistry from the Ohio State University. His areas of research include user interfaces, graphical interface design and object oriented programming.

Roy J. Byrd is a Research Staff Member in the Digital Libraries department at the IBM T. J. Watson Research Center. He holds a B. A. degree in theory and composition of music from Yale and is a Ph.D. Candidate in theoretical linguistics at New York University. His current research interests include large scale text analysis for discovering domain-specific vocabularies, construction of lexical networks from relationships over items in these vocabularies, and application of these networks for building computer systems with human-computer interfaces.

Abstract

We have designed a system, termed Lexical Navigation, which provides an interface allowing a user to expand or refine a query based on the actual content of the collection. In this work we have designed a client server system written in Java to allow users to issue queries, have additional terms suggested to them, explore lexical relationships, and discover additional keywords. We find that this methodology constitutes a powerful tool for searching large data collections.

1. Introduction

Search of document databases, as supported for example by commercial services such as Dialog, Lexis-Nexis, and Derwent or by World-Wide Web search engines such as Altavista, Lycos, and Infoseek, is not usually a one-step process. Rather, successful searches, in the overwhelming majority of cases, involve user iteration over a sequence of operations. Typical operations include the ones listed in Table 1.

Users can have various goals when performing a search. Among the possible goals are:

As a search session proceeds, the user's goals may change. For example, upon obtaining a surprising answer to her specific question, a user might next enter exploratory mode in order to understand the answer and finally move to quotation mode in order to find the best way to explain the answer to her students and colleagues.

Table 1. Operations performed during document database search
formulate information requirement
select a document collection ("file", "category", "source", etc.)
*map information requirement to query terms
*enter a query
*expand query with additional query terms
*process query
read hitlist of resulting documents
*match hitlist items to query terms
read summaries of hitlist documents
retrieve and read hitlist documents
*find relevant query terms in documents
find relevant domain concepts in documents
*map new domain concepts to additional query terms

The starred items in Table 1 illustrate the fact that queries, which express the user's information requirements, are central to most of the search process operations. Users focus most of their attention and energy on the creation of queries which adequately express their requirements and which will effectively retrieve documents which meet their goals. In practice, this means that users work very hard to come up with query terms which express the concepts that they want to find, exemplify, or explore.

During most search process, a query will be written, expanded, issued, matched to documents, re-formulated, and re-issued many times. Successful document retrieval systems must therefore provide not only good document-management functions but also good query-management functions. Included in the query-management functions must be a good set of facilities for expanding and refining queries, for keeping track of query modifications, and for helping the user understand the effect these modifications have on queries' effectiveness at retrieving documents which meet the information requirement.

This paper describes an approach to query expansion and query refinement which employs active user interface objects to prompt users with query terms which are likely to be productive. A central role in our approach is played by collection-specific vocabularies which are used as the source of the query terms with which users are prompted.

Section 2 of the paper describes out approach to query modification and contrasts it with previous methods described in the literature. Section 3 describes the user interface architecture that provides access to the system's facilities, with special attention being paid to facilities for query modification. Section 4 gives a brief overview of the text analysis mechanisms used to derive the collection-specific vocabularies. Finally, section 5 summarizes the current status of the work and describes future activities based on the ideas presented here.

2. An Approach to Query Refinement

The problem of guiding a user to formulate useful queries to obtain useful results from large data collections has been widely studied. Local feedback was utilized by Buckley [Buckley, et. al., 1996] and Xu and Croft, [Xu & Croft, 1996] utilized local feedback and local context analysis using the most frequent 50 terms and 10 two-word phrases from the top ranked documents to perform query expansion.

This problem becomes more involved when we include the difficulty of providing a simple user interface suitable for infrequent use by non-technical users. Envision is one such interface [Nowell, et. al., 1996] although it was designed primarily for use in searching technical journal collections. Another approach, termed Scatter/Gather [Hearst & Pedersen, 1996] emphasized clustering of similar documents into several subgroups in response to the initial query. Fireworks [Hendry & Harper, 1996] presents a unique scattered screen environment called SketchTrieve which allows manipulation of queries, documents, authors and results as separate, moveable windows.

When translating an information requirement into a query for a document retrieval system, a user must convert concepts involved in his requirement into query terms which will match terms found in documents in the data base. Such a user faces "the vocabulary problem" described in Furnas, et al. (1987). One aspect of this problem is that a word may not be precise enough to ensure a focused response. For example, a query containing the term "fruit" will not be as helpful as one containing, "pear," "nectarine," or "raisin," assuming that the latter concepts are part of the information requirement. A related aspect of the vocabulary problem concerns ambiguous words. An otherwise unconstrained query containing the query term "nut" is likely to return documents related to hardware, seeds, and eccentric people, assuming such documents are in the database. Finally, the vocabulary problem also encompasses the fact that (according to Voorhees (1994)) large full-text databases contain many different expressions for the same concept.

In the information retrieval literature, solutions to the various aspects of the vocabulary problem are usually described and evaluated as methods for improving an IR system's recall and precision. It is generally believed that recall can be enhanced by including additional terms that are likely to occur with -- or to mean the same thing as -- terms already in the query. The corresponding belief about precision is that is that ambiguous query terms can be disambiguated by also including in the query terms which are likely to occur with only one sense of the ambiguous term. For example, if the query about "nut" also contained "bolt," then retrieved documents will be more likely to involve hardware than seeds or eccentrics.

Various manual and automatic techniques for enhancing queries under these assumptions are described in Harmon (1992), Voorhees (1994), and Xu and Croft (1996). The techniques can be categorized as either (1) "query expansion" - adding query terms related to information requirement concepts and obtained from a thesaurus or (2) "relevance feedback" - adding query terms obtained from documents known to be relevant to the information requirement. Although some successes have been reported with automatic techniques (see Xu and Croft 1996)), they have generally not been spectacular for several reasons.

First, reliable technology for automatically disambiguating query terms is not available. While Lesk (1986) and subsequent work based on it have shown how to use word overlap with definitions in machine-readable dictionaries (MRDs) to disambiguate some common words, those techniques will not work for queries containing proper names or technical terms which are not present in dictionaries. Furthermore, most (initial) document queries consist of only a few words and are, hence, not suitable for use with the word overlap techniques which require a significant amount of word context. Even when they do work, these techniques do not provide an correct answer in a sufficient number of cases to be of much use for query expansion.

Another problem with automatic techniques which rely on pre-existing thesauruses is that they typically don't contain the most useful types of information for query expansion. New names and technical terms appear faster than they can be captured in published thesauruses. Furthermore, such thesauruses often contain only synonym, hypernym, and hyponym relationships. Real relations among technical terms and names are much richer and query expansion methods that ignore them are at a disadvantage.

A final problem with automatic query expansion techniques is that they ignore perhaps the best sources of knowledge about what would be good terms to add to queries: the user and (except for relevance feedback techniques) the documents.

LexNav Operation

In our attempt to address the above problems, we developed a technique that we term "Lexical Navigation," and a user interface that we termed LexNav. In LexNav, we solve the vocabulary problem using methods which take advantage of all of the collection's documents and of the user's intuitions about concepts related to the ones in the information requirement. Before queries begin, we use a suite of text analysis tools to develop a vocabulary of the ordinary words, proper names, and technical terms by analyzing the entire collection. Additional related tools organize the vocabularies into an extensive set of relationships consisting of triples of

<vocabulary-item::relation::vocabulary-item>.

In these relationships, the "vocabulary-items" are used to represent the real-world concepts that they name. The "relations" give either the name or the strength of an association between the related concepts.

It has been pointed out in Byrd, et al. (1995) that the largely multi-word items in Talent vocabularies are highly effective at enhancing the precision of query processing. In terms of Lesk's example, the term "pine cone" is not ambiguous. Most modern search engines would zero in on documents containing this term, while de-emphasizing or excluding other mentions of "cone".

After they are extracted, the entire set of relationships for a collection is organized into a "lexical network," in which the vocabulary items are the nodes and the relations are the links. The lexical network is maintained on a server - usually the same server as the one on which the documents themselves and the search engine index are kept. LexNav server processes are used to match initial queries and query terms to nodes in the lexical network. The links are used for navigating to nodes of related vocabulary items, which are then returned to client LexNav processes for use in prompting the user during query modification.

A by-product of the derivation of lexical networks is another object called a "context thesaurus" which, given a probe consisting of a single word or an entire query, produces a ranked list of vocabulary items (i.e., nodes in the network) that are related to the probe. The context thesaurus is somewhat similar to (and was inspired by) the "phrase finder" procedures described in Jing and Croft (1994). LexNav's context thesaurus server is also used to respond to a client request for query terms to be added to an initial query.

During a LexNav query session, the user has constant access to the lexical network and context thesaurus servers. At any point during the query process, lists of vocabulary items representing concepts related to the current query or document can be viewed. Users can expand the query with new query terms by simply clicking on desired vocabulary items. Furthermore, navigation through the context thesaurus and lexical network themselves is also iterative: users may move from node to node with a simple point and click interface over a visual representation of the network's contents.

It is useful to consider the properties of the LexNav interface in the context of a set of design criteria developed in Spink(1994). That paper analyzes data obtained during a study of query development by users and human intermediaries (i.e., librarians). At issue are the roles and sources of "term relevance feedback" (TRF) in query expansion. Specifically, TRF refers to query modification using terms obtained from documents retrieved earlier in the query process. In fact, however, the paper also discusses terms that originate with users, intermediaries, and thesauruses.

Spink lists 4 findings concerning the terms in final successful queries:

1. TRF is the source of relatively few terms (compared to the other sources),

2. TRF terms are more successful (than those from other sources) at finding relevant documents,

3. TRF terms chosen by intermediaries are more successful at finding relevant documents (than those chosen by users).

4. TRF terms come mostly from document titles and descriptors (rather than from abstracts or texts).

There is a dilemma in findings 1 and 2: TRF terms are the most successful but least abundant query terms. LexNav can help users overcome this dilemma by showing them related terms from all text in the collection, not just from the titles and descriptors of a few documents seen on early hit lists (see finding 4). Relevant related vocabulary items are available in the context thesaurus and lexical network, which, in effect, offer predigested relevance feedback information for the entire document collection.

If we assume that the relative success of TRF terms select by intermediaries (reported in finding 3) is due to the intermediaries' familiarity with the document collection, then we can argue that vocabulary items and relationships drawn from the database have the same advantage. In other words, these vocabulary items are guaranteed to exist in the document collection and, hence, to retrieve documents containing them.

3. Architecture of the Lexical Navigator Client

This section presents the user interface client developed for LexNav and illustrates its operation with examples which also illustrate the behavior and use of the context thesaurus and lexical network. Working with users and other IR experts in the group, we developed the LexNav interface, which supports most of the operations in Table 1. We felt that it was important that the user interface look professional but simple and that it run on common platforms like Windows as well as on some of our developers' Unix workstations.

Accordingly, we developed the interface program in Java [Arnold & Gosling, 1996], and after some experimentation decided that it should be presented as a Java applet in a Web browser rather than as a stand-alone application, thus making the entire search system available for testing by users on our company-wide intranet.

Java processes running as applets embedded in Web pages have significant security restrictions to prevent foreign applets downloaded with Web pages from doing mischief to your computer system. However, such applets are still allowed to make TCP/IP socket connections back to the HTTP server machine and this provides the client-server communication pathway (Cooper, 1996).

The Java Server

The actual retrieval of information for queries was accomplished by sending the queries to a server process, also written in Java but running on an IBM RS-6000. This server process, in turn, calls any of a number of C programs which perform the queries of the various components of the LexNav system. Each instance of the server then provides a separate connection to the retrieval system and recognizes 6 commands which call programs based on arguments to these commands and receive data from these programs through standard-out. Then the server instance formats and returns these data to the query process on the client workstation.

The commands each server instance supports include

The Lexical Navigator Client


Figure 1

Figure 2

Figure 1 shows the main Lexical Navigation interface. You can enter queries in the multi-line text box at the top and then either click on the query button directly to inspect document titles in the right hand list box, or you can click on the "related concepts" button, which calls the context thesaurus server process to return terms related to the query terms. In Figure 2, we show the context thesaurus keyword results of the query "airbus" on the 1988 Wall Street Journal collection. Terms from the context thesaurus are computed to have at least a strong unidirectional relationship to the query terms.

Once we have identified these additional terms, you can add any of them to the additional query, but can also investigate them further by querying what sort of named and unnamed relationships exist between these (sometimes surprising) terms and other terms you expect to find in this collection. In Figure 3a, we show these relationships for several of the query terms, which shows why some of the terms were found to be relevant. Two kinds of relationships are shown: named and unnamed relationships. Named relationships show the type of relation between the terms, while in the case of unnamed relationships, the strength of the relation is shown.

Figure 3a
Figure 3b

The interface allows you to add any marked term to the query at this point or to navigate further in relationship space by selecting one or more terms and clicking on the "Search" button. We see the results of further navigation in Figure 3b, where we have moved outward one relationship level to find terms related to the selected terms. Again, you can add these terms to the query or migrate further. In addition, you can view these relationships graphically by clicking on the "Plot" button, which produces the display shown in Figure 4. Each term can be expanded further to discover all of the relationships for each additional term. This represents an additional, powerful way of navigating through concept space, to discover distant, but significant relationships between terms.


Figure 4

Searching for Documents

We have now narrowed our original query using the context thesaurus and terms from the relationships index and will use this refined query to search the collection. The result of this search is shown in Figure 5, where right-clicking on any document title in the list will produce a list of the search terms contained in that document.


Figure 5

Figure 6

Clicking on the "View" button produces the display in Figure 6. This Web page is constructed dynamically by the server, and consists of a frame where the left third is a Java applet and the right two thirds is a HTML-wrapped version of the Tipster1 format data, with the search terms converted to boldface.

The list box at the top left is a keyword-in-context display made from a custom Java control. Clicking on any line in this KWIC display moves the document in the right frame to that point. The list box in the lower left contains a list of vocabulary items discovered in that document. Finally, clicking on the "Show Keywords" button at the lower left displays the document with the discovered items highlighted.

Interface Summary

In summary, the interface provides a simple way to access a number of complex relationships and elicit responses from users based on their original vague queries to help them focus their query on specific areas within large collections.

User Evaluations of the Interface

As part of the evaluation of Lexical navigation we enlisted the help of the User Interface group at IBM Research as well as the reactions of a number of lab visitors from both technical and non technical backgrounds. The overall informal evaluation of the non-technical visitors was very favorable, although most needed some explanation of the concept of related terms.

The User Interface research group provided a number of practical suggestions for improvements as well as noting some features that were difficult to implement with the visual controls provided in Java 1.0. The suggestions included providing navigational push buttons to move from one document to another without going back to the main document list, and providing a progress indicator while the client is waiting for a server process. Since it is not easy to predict how long a server process will take, we have elected to provide an animated icon to simulate "work" while the server computes the terms or documents requested.

Another valuable suggestion was presenting keywords-in-context in the same order they were entered in the original query, on the assumption that the user entered them with the most important keywords first.

Other wishes included solving technical problems related to applets running on separate Web pages. For example, the key word list that is generated in an applet on the second web page can only communicate back to the search term list through a server command.

4. Lexical Navigation Server Processes

The LexNav server processes are divided into two categories. The first includes those processes which operate on the document collections at "indexing" time and create the vocabularies, context thesauruses, and lexical networks. These processes are built using the extraction tools in Talent ("Text Analysis and Language Engineering Tools"). The second category of server processes in LexNav are those which allow the client to retrieve information in the vocabularies, context thesauruses, and lexical networks.

The following gives brief descriptions of the talent extraction tools and access facilities used in LexNav. Fuller descriptions of these and other components of Talent are available in Byrd, et al. (1995), Ravin and Wacholder (1996), Justeson and Katz (199x), and forthcoming publications.

Talent Extraction Tools

The Talent extraction tools comprise a set of heuristic text analysis functions designed to extract and organize vocabularies and lexical networks from ordinary text using fast (usually finite state) techniques. In order to obtain speed, the heuristics sacrifice accuracy with the result that they can sometimes miss good items or relationships and extract bad ones. However, we have adopted a number of strategies which keep the mistakes to a minimum and usually result in the overwhelmingly large proportion of extracted material being good.

In the context of user interfaces to computer systems, such as the LexNav interface, we further expect that human users' semantic processing capability combined with their knowledge of the application domain will allow them to ignore small numbers of mistakes in Talent extracts, as being at least irrelevant and at most funny or mildly annoying. We have not studied this carefully but, at least anecdotally, our expectations have been borne out.

Talent vocabularies consist of "vocabulary items." Each item has a "canonical form" and zero or more "variant forms." Strictly speaking, the canonical form is also a variant and is sometimes called the "canonical variant." Depending on the item, the canonical form is chosen because it is least ambiguous, the most explicit, the longest, the lemma, etc., among all the variants. Vocabulary items also have a category associated with them. The combination of canonical form and category are unique within a Talent vocabulary. Depending on the application, vocabulary items may also have collection statistics stored with them. Function words and many non-content stop words are excluded.

Nominator. Talent's name extraction tool is called Nominator. It finds single- and multi-word names using heuristics based mainly on capitalization patterns together with a large inventory of word patterns characteristic of proper names. Nominator can assign names to the categories Person, Place, Organization, Other (for names of laws, and other miscellany), and Uncat. An important aspect of its operation is that it can find multiple variant forms of names that occur in the same document. These variants can be long and short versions of person names ("John F. Kennedy" vs. "J. Kennedy"). They can names can include or exclude empty words like "Corp.", "Senator", or "Esq.". Acronym and abbreviations of fullforms occurring in the same document are included among the fullform's variants.

Terminator. Talent's technical term extraction tool, Terminator, finds multi-word technical terms. Terminator uses a heuristic based on the structure of technical terms: technical terms usually have a structure that is like a subset of the noun phrases. The structure of terms in text is determined with a regular expression over the possible parts-of-speech of the word tokens in the text. No parsing or part-of-speech disambiguation is used. Variant forms of terms include case variations and forms in which one or more of the words are inflected. Canonical variants tend to be lower-case and to contain lemma forms. Terminator's heuristic also requires that a term be repeated in a document (at least once) in order to be recognized. This captures the notion that technical terms are things that documents can be "about". Arbitrary noun phrases that are never repeated in an entire document collection are probably not technical terms and are not included in Talent vocabularies.

Abbreviator. Talent's Abbreviator tool can recognize abbreviations and acronyms in a document and match them with their corresponding fullforms. Its recognizer contains a finite-state process, which identifies the local syntactic contexts in which abbreviations and acronyms typically occur. Identified contexts are submitted to a more detailed matching procedure which examines the candidate shortforms and fullforms that they contain. This procedure consists primarily of constraints on the matches between letters in the two candidate forms. Pairs which survive the matching filter are added to the vocabulary, with the fullform becoming the canonical form.

Aggregator. The initial vocabularies formed from the output of Nominator, Terminator, Abbreviator, and other extractors (not mentioned here) must be aggregated by Talent's Aggregator before they are useful. Duplicate vocabulary items found in multiple documents must be eliminated.

Near-duplicates must be examined to determine whether they can refer to the same concept; if so, they are merged. Case redundancy and morphological redundancy among and item's variants must be removed. Resulting categories must be assigned to merged vocabulary items. For some applications, Aggregator computes occurrence statistics and stores them with the items. Aggregator's output is the Talent vocabulary.

Context Thesaurus. As already mentioned, the context thesaurus is a device which can produce a ranked list of vocabulary items related to a query. The context thesaurus's basic data structure is an IR index of a set of "pseudo-documents." There is one pseudo-document per vocabulary item, which serves as its title, and each one contains sentential contexts in which the vocabulary item occurs in the original document collection.

Talent constructs the context thesaurus in a three-step process. First, a concordance is constructed, listing all occurrences of every vocabulary item in the collection. Second, the set of sentential contexts for each vocabulary item, in turn, are inspected, filtered, and gathered into a pseudo-document. Third, the pseudo-documents are indexed with a standard IR system's indexer. In our prototype application, we use the Guru IR system (Maarek(1991)). However, any system that can handle queries of the type we want to process and whose query engine can provide ranked document retrieval would be suitable. Access to the resulting context thesaurus is simple. When any query is processed against the index of pseudo-documents, the hitlist produced is a list of ranked vocabulary items where the rank indicates the degree of relatedness between the item and the query.

Unnamed Relations. The notion behind unnamed relations is simple: vocabulary items which often appear together are likely to be related. Furthermore, the extent to which two items appear together can be used to measure the strength of the relationship between them.

Many implementations of the notions behind unnamed relations are possible. The current implementation uses the context thesaurus. For a pair of vocabulary items, the unnamed-relation process determines the extent to which one item occurs in the context of the other by querying the context thesaurus for each item. Recall that the context thesaurus assigns a rank to the items on a query's hitlist. When the query is a vocabulary item, the rank of a hitlist item can be interpreted as a numerical value of the strength of association between the query item and the hitlist item. If each item appears on the other items hitlist with a frequency that exceeds some threshold and if the ranks are fairly close, we assume that there is a symmetric relation between the two items. The strength of the relationship is computed to be the average of the tworanks from the two queries. In the Guru, hitlist ranks are normalized values between 0 and 100. The unnamed relation process uses a rank of 50 as the threshold.

Unnamed relations are represented as triples of

<vocabulary-item::strength::vocabulary-item>

The context thesaurus itself also implicitly contains a set of asymmetric unnamed relations. When any vocabulary item is used as a query, the items on the context thesaurus hitlist are in a one-way relationship to the query item. The relationship between the two items is asymmetric unless the hitlist item, when used as a query, produces the original query item on its own context thesaurus hitlist, as described. For example, the relationship between "Alphonse Gratz" (who lives in New York State) and "New York State" is most likely asymmetric, whereas the relationship between "George Pataki" (governor of New York State) and "New York State" is probably symmetric. Of course the distinction between symmetric and asymmetric relations is a fluid one and depends on the text in the document collection.

Named Relations. Talent extracts named relations between vocabulary items by exploiting text patterns in which such relationships are typically expressed. These text patterns include appositive constructions, possessive constructions, parentheticals, short copular clauses, coordinate constructions, etc. As with many of the other Talent extractors, a set of finite state processes are used to identify the occurrences of text patterns of interest and then to analyze and filter the material in the patterns to extract relations.

The named relation extractor operates in "discovery mode." That is, it is not necessarily primed with specific relation names that it is looking for. Rather, the relation names, along with the participating vocabulary items are identified by virtue of the positions they occupy in the defining text patterns. This allows for the extraction of collection-specific relations.

Named relations are represented as triples of

<vocabulary-item::relation-name::vocabulary-item>.

and are inherently asymmetric.

Lexical Networks. Named and unnamed relations are combined to form the lexical networks used in LexNav. The triples constituting the relations are stored in an LGraph structure (see below).

In the current implementation, only symmetric unnamed relations are included in the networks. As mentioned, above, LexNav users have access to asymmetric relations through the context thesaurus. However, in the future this decision may be revised and some asymmetric relations may also be stored in lexical networks. Lexical networks are stored in the LGraph data structure described below.

Talent Access Facilities

As described in Section 3, the LexNav server is built from a number of processes that support the various access commands issued by the client.

Guru. The search engine used for document retrieval is Guru. It should be noted that Guru itself does not know about or use Talent vocabulary items in its processing. Like many IR systems supporting free-text queries, however, it does exploit word proximity during query processing. The fact that queries enhanced with the vocabulary-based methods described in the paper show improved performance in Guru suggests that the enhancement methods are independent of the underlying query processing engine.

Context Thesaurus. Guru is used to access the context thesaurus, by processing context thesaurus queries against the pseudo-document index described above. Items from the top of the ranked hitlist are returned to the client.

LGraph. Named and unnamed relations, their strengths, and their directions are retrieved from the lexical networks stored in LGraph data structures. The LGraph access functions permit efficient access to connected subparts of large lexical networks. Typical access parameters are vocabular items and/or relation names or relation strengths.

GLOW. GLOW ("Graph Layout or What!") is a visual tool for accessing graph structures, such as lexical networks, through a two-dimensional user interface. It consists of a client part, illustrated in Section 2, and a server part. The server uses LGraphs to access lexical network data and contains layout algorithms (based on the work described in Tunkelang(xxxx)) to compute two dimensional presentations of lexical networks for display at the client. They layout algorithms are incremental and thus support LexNav's navigation metaphor very well.

5. Conclusions and Future Work

This work has allowed us to put a number of fairly complex technologies in front of inexperienced users and provides us both with a way to illustrate their effectiveness and a way to evaluate the users' responses to specific interface representations. Our initial conclusions are that they quickly grasped the value of the context thesaurus and made effective use of it, and that they found the relationship information diverting but that we have not made it integral to the search process yet. However, the use of Java and a Web browser client to the search technology was generally deemed extremely successful.

Further work in this area certainly will involve a revision of the interface using Java 1.1 tools and controls, and tighter integration between context thesaurus, keywords and relationships. In addition, we plan to replace the Guru search engine with an industrial strength system such as IBM's Search Manager. Since this revision will then allow Boolean queries, this will further affect the revision of the user interface.

Finally, a number of enhancements to the Talent tools are planned, with the goals of improving output quality and enhancing storage and execution efficiency.

Bibliography

Arnold, Ken and Gosling, James, The Java Programming Language, Addison-Wesley, 1996.
Brajnik, G., S. Mizzaro, and C. Tasso "Evaluating User Interfaces to Information Retrieval Systems: A Case Study on User Support" in Proceedings of the 19th Annual ACM-SIGIR Conference, 1996, pp. 128-136.
Buckley, C., Singhal, A., Mira, M & Salton, G. (1996) "New Retrieval Approaches Using SMART:TREC4. In Harman, D, editor, Proceesings of the TREC 4 Conference, National Institue of Standards and Technology Special Publication.
Byrd, R. J., Y. Ravin, and J. Prager "Lexical Assistance at the Information-Retrieval User Interface," Proceedings of the SDAIR (Symposium on Data Analysis and Information Retrieval), UNLV, 1995.
Church, K. W. "One Term or Two?" in Proceedings of the 18th Annual ACM-SIGIR Conference, 1995, pp 310-318.
Cooper, James W., The Visual Basic Programmer's Guide to Java," Ventana, 1996.
Furnas, G. W., T, K., Landauer, L. M. Gomez, and S. T. Dumais "The Vocabulary Problem in Human-System Communication" in Communications of the ACM, vol. 30, no. 11, November 1987, pp. 964-971.
Harmon D. "Relevance Feedback and Other Query Modification Techniques", in W. B. Frakes and R. Baeza-Yates, eds., Information Retrieval: Data Structures and Algorithms, Prentice-Hall, 1992.
Hearst, Marti A. and Pedersen, Jan O., "Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results." Proceedings of the 19th Annual ACM-SIGIR Conference, 1996, pp. 76-84.
Hendry, David G. and Harper, David J., "An Architecture for Implementing Extensible Information Seeking Environments," Proceedings of the 19th Annual ACM-SIGIR Conference, 1996, pp. 94-100.
Jing, Y. and W. B. Croft "An association thesaurus for information retrieval", in Proceedings of RIAO 94, 1994, pp. 146-160.
Justeson, J. S. and S. Katz "Technical terminology: some linguistic properties and an algorithm for identification in text."
Lesk, M. "Automatic Sense Disambiguation using Machine-Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone", in Proceedings of the 1986 SIGDOC conference, 1986.
Maarek, Y.S., "Software Library Construction from an IR perspective," in SIGIR Forum, fall 1991, 25:2, 8-18.
Nowell, Lucy Terry, France, Robert K, Hix, Deborah, Heath, Lenwood S., and Fox, Edward A. "Visualizing Search Results: Some Alternatives to Query-Document Similarity." Proceedings of the 19th Annual ACM-SIGIR Conference, 1996, pp. 67-75.
Ravin Y. "Disambiguating Proper Names in Text", in Proceedings of MIDDIM 96, International Seminar on Multimodal Interactive Disambiguation, Col de Porte, August 1996.
Ravin Y. and N. Wacholder. "Extracting Names from Natural-Language
Spink A, A. Goodrum, D. Robins, and M. M. Wu "Elicitations During Information Retrieval: Implications for IR System Design" in Proceedings of the 19th Annual ACM-SIGIR Conference, 1996, pp. 120-127.
Spink, A. "Term Relevance Feedback and Query Expansion: Relation to Design" in Proceedings of the 17th Annual ACM-SIGIR Conference, 1994, pp. 81-90.
Srinivasan, P. "Thesaurus Construction", in W. B. Frakes and R. Baeza-Yates, eds., Information Retrieval: Data Structures and Algorithms, Prentice-Hall, 1992.
Tunkelang, D. MIT Masters Thesis on Graphical Layout
Voorhees, E. M. "Query Expansion using Lexical-Semantic Relations" in Proceedings of the 17th Annual ACM-SIGIR Conference, 1994, pp. 61-69.
Xu, Jinxi and Croft, W. Bruce. "Query Expansion Using Local and Global Document Analysis," Proceedings of the 19th Annual ACM-SIGIR Conference, 1996, pp. 4-11