Monday, June 6, 2011

Featured article word cloud

The three thousand featured articles of the English are made up of roughly 223 thousand different words, out of which 100 thousand are used only once.* As a comparison, Shakespeare used 29 thousand words in his works, out of which 12 thousand occurred only once.

The most frequent words represented as a cloud after the most common function words were removed:
And this is what the above cloud would look like if the function words (including the 1.1 million the's out of the 15 million words in total) were included and weighted according to their frequency:

* Different word forms of the same word are counted separately but uppercase and lowercase forms are counted as one.E.g  "Cat" and "cat" count as one but "cats" is counted separately from "cat". 

