Word clouds, the images that help to see what a speech is about

About 25 years ago I read Isaac Asimov’s Foundation. For whatever reason, one of the things that impressed me the most was the passage in which Asimov portrays the power of (automatic?) mathematical analysis to discover the real meaning of some text (1):

"There is a branch of human knowledge known as symbolic logic, which can be used to prune away all sorts of clogging deadwood that clutters up human language. I applied it to the protection treaty that we signed and something like ninety percent of it boils right out of the analysis as being meaningless; what we end up with can be described as a recognition that our "ally" cannot protect us".
"But what about the ambassador's assurances of support? They seemed satisfactory"
"You know, I took the liberty of recording all his statements and sent them for analysis, also. When they succeeded in eliminating meaningless statements, vague gibberish, useless qualifications **there was nothing left**. In five days of discussion the ambassador didn't say one damned thing, and said it so you never noticed."

Today there is something like that analysis that isn’t so powerful, but can be used in a similar way and is very, very easy to use: word clouds.

Word clouds are computer generated images that contain almost all the words contained in a given text. The more a word is frequent, the bigger it is written. Consequently, word clouds are an effective way to show what some article, report or speech is really about. Word clouds can do this in two ways: the bigger a word is in a word cloud, the more the author of the corresponding text wanted you to keep in mind that particular word, and to associate that author with it. At the same time, and for the same reasons, the smallest a word is in a word cloud, the less the author really cared to elaborate on it (even if he or she says otherwise!).

Word clouds, the images that help to see what a speech is about /img/discorso_berlusconi_29_settembre.png

Word clouds can be generated online with free, simple interfaces like WordItOut and Wordle. As an example, and to explain why they fascinate me, I made a word cloud at Wordle out of the september 2010 speech of Silvio Berlusconi. The biggest words include “maggioranza” (majority), “federalismo” (federalism) and “riforma” (reform). The smallest word are terms like “rispetto” (respect), “famiglie” (families) and “Giustizia” (justice).

Word clouds, the images that help to see what a speech is about /img/obama_cairo_speech1.png

Then I played with Barack Obama’s 2009 speech in Cairo, Egypt. People, World and America are some of the biggest words; change, tolerance and aspirations are small.

Interesting, isn’t it? Of course, word clouds must be taken with a big grain of salt, and carefully considering the context. Words like reform or Muslims will have the same, very high frequency, both in the speeches of speakers who really care about reforms or Muslims, and in those of speakers who really can’t stand them. There is no “Donna” (woman) in Berlusconi’s word cloud. Placement in a word cloud also depends on formatting, so it may or may not be relevant. This is, I think, the case of the “Applause” word in Obama’s speech, that always appears in the transcript alone (=special), between parentheses. This said, word clouds are really fascinating and prompt at least a couple of thoughts:

  • it would be nice if all voters started to use them regularly whenever they have to decide who, or what, to vote. No, maybe not just nice: overdue!
  • the meta-message that all these oh-so-easy-to-generate word clouds (not to mention the ads in your Gmail) tell you is that it is equally easy to automatically find out what is most or less relevant for you, by scanning your blog or digital messages.
Word clouds, the images that help to see what a speech is about /img/of_citizens_and_software_3.png

For the record, the potential of this and many other digital technologies in politics is one of the topic of my Digital Citizens Basics course. To explain what the course is all about, I’ve made this word cloud of an interview about it.

(1) the one here is just a synthesis, to give you an idea: the complete, wonderful passage is readable in Asimov, Foundation: A long agreement that meant nothing_