New Options for Search Engines pioneered by Infomap
We are constantly working towards better ways of representing meaning.
As part of this users of our systems must be able to generate
the meanings they want.
This page describes some of the tools we built to enable this, and its
implications for how concepts can be arranged and distinguished.
Negative Keywords
It can be very useful to describe a concept by saying precisely what it
isn't. For example, if I were to tell you that my friend "is a
doctor, but not a physician", you would correctly infer that my friend
probably has a PhD. We have modelled this process by calculating
exactly which meaning-coordinates to remove from an expression to make
the undesired meaning irrelevant to the results.
Such options have not been successfully implemented before, because the
question "how much of the unwanted meaning should be removed?" has not
been correctly answered. A sure understanding of the mathematical
processes involved has enabled us to remove meanings very succesfully: and
not just individual meanings, but whole areas of meaning.
Such a process can be particularly useful for information retrieval, where
the user may be given documents all of which relate to a particularly
popular aspect of a word's meaning. For example, if you want to know about
geology, a query for "rock" can often just tell you about popular music.
Our new method of negation enables the user to remove this unwanted
intrusion and retrieve the documents they really want.
The process is being evaluated prior to publication, and commercial
implementation is being considered via the Stanford
Office of Technology
Licensing.
Resolving Ambiguity using Contrasting Pairs
Ambiguous words often have associated meanings which are very different
from one another. For example, one way to detect the ambiguity of the
word "suit" is to find that "jacket" and "lawsuit" are both closely
related to the word suit, but virtually unrelated to one another.
We therefore define the `contrast index' of a pair of words relative to a
given query to be the product of their similarities with the query,
divided by their similarity with one another. So for a query q and
two words a and b, we have
Contrast Index = ( Sim(q,a) x
Sim(q,b) ) / Sim(a,b).
You can experiment with this option by selecting "Contrasting Pairs"
instead of "Nearest Neighbors" in the pilot demo below. The results model
a human's natural skill in offering dismabiguating options in sentences of
the form "Do you mean a or b ?"
Clustering
Meanings are often represented by clusters of words which occur
together. In finding these clusters, we can find the different meanings of
an ambiguous word, or more subtle differences between the way a word is
used in different contexts.
By selecting "Clustered Results" in the pilot demo, you can view words
clustered into distinct groups with different meanings. You can also
choose how many results you want to see, and how many clusters you want
them put into. This technique has enormous potential for discovering and
analysing word senses, and poses some very interesting research
questions.
Back to Infomap Home
|