Friday, May 13, 2011

The readability of user warning messages

Looking at the talk pages on the English Wikipedia I got the impression that the standard user warning messages are terribly difficult to understand. First impressions can be deceiving though, so I decided to investigate.

The English Wikipedia catalogues 405 different warning messages (there are some duplicates in that count) that can be sent to users who commit any of the scores of possible transgressions. As a comparison, there are only 137 so called barnstars used to congratulate users for their achievements.

To determine how readable these are, I looked at 105 of these messages and calculated their readability scores (the raw data is available here). The standard readability formulas take into account the length of sentences and the length of words (either as the number of characters or syllables in them) and using a formula give a prediction of the number of years of formal education one would need to understand them (this is the “grade level” based on the US education system).

Readability is not really an exact science, different formulas give slightly different weights to the length of words and sentences and there are a number of other factors that influence the comprehensibility of a text – for example, the frequency of difficult words, the use of multiple negation, etc. – that the formulas don’t take into account1. Nevertheless, readability formulas give a comparable indication of the difficulty of different texts.
The readability of various categories of user warnings, based on the SMOG formula

The results show that on average it would take an American student 12 years of study (i.e. graduating high school) to understand these warning messages. This level seems appropriate for an encyclopedia.2

The averages, however, mask the outliers. The least readable message in the sample was the notice people get when they are blocked to enforce a decision by the English Wikipedia’s arbitration committee would need about 18 and a half years of education to understand on the first reading. Running up are some of the more commonly appearing templates that warn users that their article is nominated for deletion or breaches copyrights.

SMOG index (years of education needed to understand text)
Block to enforce arbitration decision 18,49
Warning that the user has added copyrighted material 17,23
Warning that the user has added a link to copyrighted material 16,86
User's article is proposed for deletion 16,64
Final warning that the user not remove maintenance templates 16,42
User's article nominated for deletion 15,45
User's article proposed for deletion 15,42
User blocked for advertising or self-promoting 15,25
User's article speedily deleted for spam 14,75
Warning that the user not assume ownership of articles 14,62

In conclusion, the warning messages aren’t unreasonably unreadable, although the various deletion notices, especially the ones concerned with copyright are written in a way that is too difficult to understand by the average user. At this point it is only a hunch, that the most commonly used messages are among the most difficult to comprehend.

1 Studies have confirmed that the inclusion of other factors in the formula contributes more work than it improves the results. [1]
2 According to the UNU-Merit user survey, 88% of the users have finished secondary education. [2]

1 comment:

  1. This is so great! What fantastic work.