The theory behind Infomap

These are ambitious questions with enormous relevance for pure scholarship, scientific knowledge, and all of our daily lives.
And they are questions which those of us working on the infomap project are trying to answer, in practical and robust ways.

Our main approach is to build words and meanings into mathematical spaces, in such a way that the relationships between words in these spaces reflects the way words and meanings are related in documents. We develop these in such a way that the spaces can be built directly from the documents themselves, without a human intermediary saying which bit should go where.

There are two reasons for this. The first is practical - building lexical resources like dictionaries and thesauri is time-consuming and expensive. The number of documents available and the terminology used in different fields is growing so quickly that we need to develop tools which can automatically help with this task: especially if we want to provide good quality resources for more languages than just English. The second is theoretical. The fact that words can be learnt and used to refer to concepts is fascinating, and any abstract system that can perform aspects of this task may shed light on it.

One way of building words into an abstract space is to give each word a list of identifying "co-ordinates", which measure how important a certain feature or property is in defining that word. A good analogy is the way we use latitude and longitude for describing the location of a point on the earth's surface. One of the best things about these numbers is that two places which are close together have similar co-ordinates. In a similar way, Infomap seeks the best way to assign "meaning co-ordinates" to a word, influenced by the words nearby in a sentence or document.

The Practical Details

Follow this link to read a simple description of how we accomplish this in practice. We are working to improve the sensitivity of our models and our variety of applications: we encourage you to visit us regularly. You can test some of the resulting models on our demonstrations, and we would welcome your response.

A more thorough description of our technique can be found in:

Yasuhiro Takayama, Raymond Flournoy, Stefan Kaufmann, and Stanley Peters: Information Mapping: Concept-based Information Retrieval based on Word Associations.

Watch this space.

Back to Infomap Home