Visualizing Word Meanings

The diagrams below give a pictorial representation or Word-Spectrum of the words most closely associated with the terms "word", "human", "chair" and "drug".

The associations are constructed by analysing word use in the New York Times, with the exception of "chair" which is constructed from AP-newswire.

The diagrams are constructed by projecting to the plane that best fits the data in question (the two "principal components").

Build Your Own Word-Spectrum

Build a Bilingual Word-Spectrum


Representation of "word"


Note the very clear cluster over on the left, which results from "Word" being used a great deal in the context of computer software.

If you wanted to avoid this computing context, Infomap allows you to reformulate the query as "word NOT Microsoft".


Representation of "human"


Again, the left hand side contains a specific cluster which reflects the concern of Western newspapers with human rights in China. This is a good example of the way in which Infomap describes a word's actual usage rather than what the word is (pre)supposed to mean.

To remove this aspect of the meaning of "human", try reformulating the query as "human NOT rights".



Representation of "chair"


This rather bleak picture represents the use of the word "chair" in AP newswire. On the left we have meanings to do with execution, on the right we have rooms and furniture: more everyday associations of the word "chair".

Another noteworthy point is the distance between "row" (left, closely associated with crime and punishment because of "death row") and "rows", much more to do with rows of chairs in a more peaceful context. Information retrieval systems often treat words with the same root (called morphological variants) as having the same meaning, assuming that they appear distinct only for syntactic reasons (singular vs plural, subject vs object, etc.) In practice, this assumption is carried out by the use of "stemming", which means reducing all morphological variants to their stem and treating them as equal.

This technique is useful in many contexts - especially for languages where words are much more inflected than in English (for things like gender, case and tense). However, here we see an example of a subtle but important difference in meaning between the words "row" and "rows" which serves to disambiguate "chair" between "electric chair" and "chair for sitting and eating".

Another example of this is the marked difference between the meanings of the sentences

  • Have you seen the lights?
and
  • Have you seen the Light?

These are very preliminary observations, and would require much careful research before anything was to be made of them.



Representation of "drug"


"Drug" seems to be one of the few words for which both of the most common contexts (pharmaceutical and narcotic) are well represented. (As opposed to "suit" and "bank" whose legal and financial meanings are overwhelmingly preferred by the NYT corpus.)

The diagram is also intriguing insofar as it presents a striking resemblance with the British Isles. The hearty Irish craic has been mistaken for something completely different, the Scots are very business oriented (upjohn being Newspeak for John'o'Groats), they're having a terrible time on the London Underground again but at least there's progress in West Yorkshire.


Back to Infomap home page
Last modified: Thu Oct 30 12:39:04 PST 2003