Friday, May 13, 2011

The readability of user warning messages

Looking at the talk pages on the English Wikipedia I got the impression that the standard user warning messages are terribly difficult to understand. First impressions can be deceiving though, so I decided to investigate.

The English Wikipedia catalogues 405 different warning messages (there are some duplicates in that count) that can be sent to users who commit any of the scores of possible transgressions. As a comparison, there are only 137 so called barnstars used to congratulate users for their achievements.

To determine how readable these are, I looked at 105 of these messages and calculated their readability scores (the raw data is available here). The standard readability formulas take into account the length of sentences and the length of words (either as the number of characters or syllables in them) and using a formula give a prediction of the number of years of formal education one would need to understand them (this is the “grade level” based on the US education system).

Readability is not really an exact science, different formulas give slightly different weights to the length of words and sentences and there are a number of other factors that influence the comprehensibility of a text – for example, the frequency of difficult words, the use of multiple negation, etc. – that the formulas don’t take into account1. Nevertheless, readability formulas give a comparable indication of the difficulty of different texts.
The readability of various categories of user warnings, based on the SMOG formula

The results show that on average it would take an American student 12 years of study (i.e. graduating high school) to understand these warning messages. This level seems appropriate for an encyclopedia.2

The averages, however, mask the outliers. The least readable message in the sample was the notice people get when they are blocked to enforce a decision by the English Wikipedia’s arbitration committee would need about 18 and a half years of education to understand on the first reading. Running up are some of the more commonly appearing templates that warn users that their article is nominated for deletion or breaches copyrights.

SMOG index (years of education needed to understand text)
Block to enforce arbitration decision 18,49
Warning that the user has added copyrighted material 17,23
Warning that the user has added a link to copyrighted material 16,86
User's article is proposed for deletion 16,64
Final warning that the user not remove maintenance templates 16,42
User's article nominated for deletion 15,45
User's article proposed for deletion 15,42
User blocked for advertising or self-promoting 15,25
User's article speedily deleted for spam 14,75
Warning that the user not assume ownership of articles 14,62

In conclusion, the warning messages aren’t unreasonably unreadable, although the various deletion notices, especially the ones concerned with copyright are written in a way that is too difficult to understand by the average user. At this point it is only a hunch, that the most commonly used messages are among the most difficult to comprehend.

1 Studies have confirmed that the inclusion of other factors in the formula contributes more work than it improves the results. [1]
2 According to the UNU-Merit user survey, 88% of the users have finished secondary education. [2]

Tuesday, May 10, 2011

A bit more on user talk pages

Building on my previous post, where I have looked at the tone of discussions on Wikipedia users' talk pages, especially that of new users, today I looked at a couple of other languages to see if there are any interesting trends.

I looked at 30-30 recently registered users' talkpage from April on the Croatian, Serbian, Russian and English Wikipedias – of course, this means that neither sample was very representative as the size of the Wikipedias differ and in certain cases it takes days, while in others only minutes until 30 new users register. Therefore, it is important to take the numbers with a grain of salt, while the overall trends should be about right.
Colourful welcome message on the Russian Wikipedia. There is  also a more text heavy black and white version.

The three smaller Wikipedias (and the previously examined Hungarian one) had in common the practice to place a welcome template message on the users' pages  following their first edits, even if they didn't have any other comment (praise or correction) to offer (about 28-29 people in the samples received some form of welcome template).The welcome messages are sometimes (6-30% of cases) followed by warnings that are somewhat specific to a given Wikipedia.   

Serbian welcome message, with a warm invitation at the end that looks personal.
What was interesting was the common warning (4 times in the sample) on the Croatian Wikipedia that the user write in Croatian (given the similarity of the Serbocroatian languages, I cannot judge whether the warning was justified, but it can't be a positive experience if you are told that you are not speaking the right language or the language right), and that 4 out of the 30 people were indefinitely blocked for unproductive editing (without the ability to see deleted edits I cannot judge these blocks, but their harshness and the lack of warning in cases was striking).
A typical English Wikipedia talk page with a welcome and a number of deletion notices.
When I turned to the English Wikipedia the image was slightly different. The talk pages suddenly look like minefields dotted with danger signs. Only 55% of the users received a welcome message preceding a notice that their article was deleted or their contribution reverted (about 85% of the sample received some kind of warning).

Given the high proportion of users faced with the warning sign messages as the first feedback they get from Wikipedia, it might be worthwhile to consider making them more user friendly. One good step would be to make them easier to understand by simply rewriting them in Plain English (the grammar could be simplified, insider jargon like "tag", "under criteria A7", "userfy" should be removed, etc.). 

An interesting follow-up study would be to see what effect do welcome messages or the lack of them have on new users' behaviour.

Saturday, May 7, 2011

Tone of talk page discussions

The Community Department at the Wikimedia Foundation has been running a number of small scale studies on the English Wikipedia in preparation for a more in-depth study during the summer.
English Wikipedia. (CC By-Sa: Steven Walling)

One of the things they have looked into was the tone of messages left on new editors' talk pages. Their findings show that the ratio of messages with a negative tone and sometimes scary imagery (red stop signs usually) has been on the increase, while messages of praise has shown a stark decline around 2007.

To see if the situation is similar on the Hungarian Wikipedia I tried to look at the user discussion pages on the Hungarian Wikipedia. Without diving into copies of the database that contain every single historical edit, I concentrated on edits in April-May 2011.

First I looked at the 100 most recent edits on user talk pages, which has included experienced editors – indeed, a lot of the discussion was between experienced editors. I tried to partition the edits based on tone into positive, negative and neutral, but (except for negative) it is usually quite difficult and the line between positive and neutral is a matter of subjective judgement (as a rule of thumb, anything that included a thank you or the default welcome template went into the positive bucket).

After doing this, I realized that I should have concentrated on messages left for new editors, so I looked at the talk pages of 30 people who have registered in April on the Hungarian Wikipedia.

The results weren't too exciting as there wasn't much interaction happening with new users. The majority received only the standard welcome message on their talk page; only two of the pages showed extensive discussion (indicating that the user has become quite active already).

A good sign is that the Hungarian Wikipedia doesn't really use scary images in templates, except in the cases of copyright violations and the Wikipedia puzzle piece in warnings about articles that are too short and that will therefore be deleted.

Thus, the situation seems to be better on the Hungarian Wikipedia than on the English Wikipedia. Unfortunately, this means that other explanations are needed to find out why is the retention and "conversion rate" of new editors on the Hungarian Wikipedia very low.