CSLI
Home
Contact Us
Projects
People
Links

New Options for Search Engines pioneered by Infomap

We are constantly working towards better ways of representing meaning.
As part of this users of our systems must be able to generate the meanings they want.
This page describes some of the tools we built to enable this, and its implications for how concepts can be arranged and distinguished.


Negative Keywords

It can be very useful to describe a concept by saying precisely what it isn't. For example, if I were to tell you that my friend "is a doctor, but not a physician", you would correctly infer that my friend probably has a PhD. We have modelled this process by calculating exactly which meaning-coordinates to remove from an expression to make the undesired meaning irrelevant to the results.

Such options have not been successfully implemented before, because the question "how much of the unwanted meaning should be removed?" has not been correctly answered. A sure understanding of the mathematical processes involved has enabled us to remove meanings very succesfully: and not just individual meanings, but whole areas of meaning.

Such a process can be particularly useful for information retrieval, where the user may be given documents all of which relate to a particularly popular aspect of a word's meaning. For example, if you want to know about geology, a query for "rock" can often just tell you about popular music. Our new method of negation enables the user to remove this unwanted intrusion and retrieve the documents they really want.

The process is being evaluated prior to publication, and commercial implementation is being considered via the Stanford Office of Technology Licensing.

Resolving Ambiguity using Contrasting Pairs

Ambiguous words often have associated meanings which are very different from one another. For example, one way to detect the ambiguity of the word "suit" is to find that "jacket" and "lawsuit" are both closely related to the word suit, but virtually unrelated to one another.

We therefore define the `contrast index' of a pair of words relative to a given query to be the product of their similarities with the query, divided by their similarity with one another. So for a query q and two words a and b, we have

Contrast Index = ( Sim(q,a) x Sim(q,b) ) / Sim(a,b).

You can experiment with this option by selecting "Contrasting Pairs" instead of "Nearest Neighbors" in the pilot demo below. The results model a human's natural skill in offering dismabiguating options in sentences of the form "Do you mean a or b ?"

Clustering

Meanings are often represented by clusters of words which occur together. In finding these clusters, we can find the different meanings of an ambiguous word, or more subtle differences between the way a word is used in different contexts. By selecting "Clustered Results" in the pilot demo, you can view words clustered into distinct groups with different meanings. You can also choose how many results you want to see, and how many clusters you want them put into. This technique has enormous potential for discovering and analysing word senses, and poses some very interesting research questions.

Try these options.

Back to Infomap Home