So I have created my first real website, Wikimedia Chapters Planet, a blog aggregator that collects blog posts and reports from the Wikimedia Chapters and the Wikimedia Foundation.

Setting it up was surprisingly easy thanks to the free Planet software (only a small hack was needed to get it to run). The question on how to update it was a bit of a dilemma as it is not trivial to run the Python Planet scripts on the free hosting I had (and I didn't want to spend too much on updated hosting), but I solved it by simply running all the scripts on my PC and having a .bat file do the uploading every hour or so.
The translation part, which is crucial in making the site useful, was sheer luck, as although Google closed its Translation API, it is still available as a free gadget for websites, and it can handle multiple languages in the same page (something, Google Chrome's built in  service can't).

All that was left is to gather the blogs and flip the switch. Currently, I collect blog posts from 19 and reports from eight  chapters(out of 35) plus the Wikimedia Foundation. Hopefully, more chapters will join the list and we can all see the many things that are happening around the world.

WikiCamp 2011 takes Miskolc

Just as last year, a score and ten Wikipedians gathered for a four-day Wikicamp in the north-eastern town of Miskolc.

The campers got a chance to get to know each other while sightseeing in Eger and Miskolc (the former with a pit stop at a wine cellar), hiking and getting lost in the nearby woods, a visit to an adventure park and some short presentations on how to take quality pictures for Wikipedia and Wikimedia Commons and of the planned software changes coming to Wikipedia (among a few others).

The event proved to be a success and is becoming a tradition, so we urge everyone to sign up early for the 2012 camp to be held in Veszprém.

Photo: Texaner, Wikimedia Commons, under CC BY-SA 3.0 and GFDL

Wikimedia Hungary grants

The National Civil Fund (recently renamed after Sándor Wekerle, a former prime minister), the Hungarian grant giving arm of the European Social Fund has granted Wikimedia Hungary 250 000 HUF ($1300) to cover its operating expenses between 1 July and 30 September, in particular, the grant funds the development of an online payment gateway for our bank built on CiviCRM, and for producing printed materials.

This is the third grant in a row that we have won and the justifications of the grants show that we are getting better at it, reflecting both on our grant writing skills and more so on our activities.

($1 = 190 Hungarian Forints)

Featured article word cloud

The three thousand featured articles of the English are made up of roughly 223 thousand different words, out of which 100 thousand are used only once.* As a comparison, Shakespeare used 29 thousand words in his works, out of which 12 thousand occurred only once.

The most frequent words represented as a cloud after the most common function words were removed:
And this is what the above cloud would look like if the function words (including the 1.1 million the's out of the 15 million words in total) were included and weighted according to their frequency:

* Different word forms of the same word are counted separately but uppercase and lowercase forms are counted as one.E.g  "Cat" and "cat" count as one but "cats" is counted separately from "cat". 

Readability of South African Constitutions

South Africa has had five constitutions during its history. The first one, the South Africa Act of 1909 was actually an act of the British Parliament. The 1961 Constitution was adopted during apartheid to transform the country into a Republic and the 1983 tried to reform things a bit with a Tricameral parliament. The 1993 Constitution was an interim one that set out the framework for the process that created the current, democratic Constitution of 1996.

My thesis looked at the readability (and factors affecting easy comprehension) of South African Constitutions at two specific points in time, but it is quite, or even more interesting to look at the whole developmental sequence.

The language of two South African Constitutions

One of my two theses is now finally ready, and given that I am satisfied with the results, I thought I should share it. It was a comparison of two South African constitutions (the 1961 and the current 1996 one), to see if the freer society has manifested itself in a more accessible legal text, which I showed it did. This was not only the result of modernization, but a conscious effort on the part of the drafters.

Here's the abstract, and if you are interested, you can read the whole thing here.

This study examined in detail the language of two South African constitutions. The Republic of South Africa Constitution Act, 1961 adopted in the era of apartheid was compared with the current constitution, the Constitution of the Republic of South Africa, 1996, to find out whether the democratization of society has resulted in a more accessible constitution. 
Based on the recommendations of the Plain Language Movement for more accessible legal language, four criteria were examined in a quantitative analysis: average sentence length, the use of passive verb forms, the use of „shall‟ and the use of archaic and Latin expressions. 
The results showed that the 1996 Constitution compared to the 1961 Constitution has significantly shorter average sentences; passive constructions are half as frequent; the use of „shall‟ and difficult, archaic and Latin expressions are avoided. The results indicate that the language of the 1996 Constitution conforms better to the recommendations on accessible language. In conclusion, the democratization of society has been accompanied by a constitution that is easier to comprehend and understand, allowing the citizens to understand their rights and obligations towards the state better.

The Mouse That Roared

The text of the declaration from The Mouse That Roared book, which is about as good as the film itself:

The readability of user warning messages

Looking at the talk pages on the English Wikipedia I got the impression that the standard user warning messages are terribly difficult to understand. First impressions can be deceiving though, so I decided to investigate.

The English Wikipedia catalogues 405 different warning messages (there are some duplicates in that count) that can be sent to users who commit any of the scores of possible transgressions. As a comparison, there are only 137 so called barnstars used to congratulate users for their achievements.

To determine how readable these are, I looked at 105 of these messages and calculated their readability scores (the raw data is available here). The standard readability formulas take into account the length of sentences and the length of words (either as the number of characters or syllables in them) and using a formula give a prediction of the number of years of formal education one would need to understand them (this is the “grade level” based on the US education system).

Readability is not really an exact science, different formulas give slightly different weights to the length of words and sentences and there are a number of other factors that influence the comprehensibility of a text – for example, the frequency of difficult words, the use of multiple negation, etc. – that the formulas don’t take into account1. Nevertheless, readability formulas give a comparable indication of the difficulty of different texts.
The readability of various categories of user warnings, based on the SMOG formula

The results show that on average it would take an American student 12 years of study (i.e. graduating high school) to understand these warning messages. This level seems appropriate for an encyclopedia.2

The averages, however, mask the outliers. The least readable message in the sample was the notice people get when they are blocked to enforce a decision by the English Wikipedia’s arbitration committee would need about 18 and a half years of education to understand on the first reading. Running up are some of the more commonly appearing templates that warn users that their article is nominated for deletion or breaches copyrights.

SMOG index (years of education needed to understand text)
Block to enforce arbitration decision 18,49
Warning that the user has added copyrighted material 17,23
Warning that the user has added a link to copyrighted material 16,86
User's article is proposed for deletion 16,64
Final warning that the user not remove maintenance templates 16,42
User's article nominated for deletion 15,45
User's article proposed for deletion 15,42
User blocked for advertising or self-promoting 15,25
User's article speedily deleted for spam 14,75
Warning that the user not assume ownership of articles 14,62

In conclusion, the warning messages aren’t unreasonably unreadable, although the various deletion notices, especially the ones concerned with copyright are written in a way that is too difficult to understand by the average user. At this point it is only a hunch, that the most commonly used messages are among the most difficult to comprehend.

1 Studies have confirmed that the inclusion of other factors in the formula contributes more work than it improves the results. [1]
2 According to the UNU-Merit user survey, 88% of the users have finished secondary education. [2]

A bit more on user talk pages

Building on my previous post, where I have looked at the tone of discussions on Wikipedia users' talk pages, especially that of new users, today I looked at a couple of other languages to see if there are any interesting trends.

I looked at 30-30 recently registered users' talkpage from April on the Croatian, Serbian, Russian and English Wikipedias – of course, this means that neither sample was very representative as the size of the Wikipedias differ and in certain cases it takes days, while in others only minutes until 30 new users register. Therefore, it is important to take the numbers with a grain of salt, while the overall trends should be about right.
Colourful welcome message on the Russian Wikipedia. There is  also a more text heavy black and white version.

The three smaller Wikipedias (and the previously examined Hungarian one) had in common the practice to place a welcome template message on the users' pages  following their first edits, even if they didn't have any other comment (praise or correction) to offer (about 28-29 people in the samples received some form of welcome template).The welcome messages are sometimes (6-30% of cases) followed by warnings that are somewhat specific to a given Wikipedia.   

Serbian welcome message, with a warm invitation at the end that looks personal.
What was interesting was the common warning (4 times in the sample) on the Croatian Wikipedia that the user write in Croatian (given the similarity of the Serbocroatian languages, I cannot judge whether the warning was justified, but it can't be a positive experience if you are told that you are not speaking the right language or the language right), and that 4 out of the 30 people were indefinitely blocked for unproductive editing (without the ability to see deleted edits I cannot judge these blocks, but their harshness and the lack of warning in cases was striking).
A typical English Wikipedia talk page with a welcome and a number of deletion notices.
When I turned to the English Wikipedia the image was slightly different. The talk pages suddenly look like minefields dotted with danger signs. Only 55% of the users received a welcome message preceding a notice that their article was deleted or their contribution reverted (about 85% of the sample received some kind of warning).

Given the high proportion of users faced with the warning sign messages as the first feedback they get from Wikipedia, it might be worthwhile to consider making them more user friendly. One good step would be to make them easier to understand by simply rewriting them in Plain English (the grammar could be simplified, insider jargon like "tag", "under criteria A7", "userfy" should be removed, etc.). 

An interesting follow-up study would be to see what effect do welcome messages or the lack of them have on new users' behaviour.

Tone of talk page discussions

The Community Department at the Wikimedia Foundation has been running a number of small scale studies on the English Wikipedia in preparation for a more in-depth study during the summer.
English Wikipedia. (CC By-Sa: Steven Walling)

One of the things they have looked into was the tone of messages left on new editors' talk pages. Their findings show that the ratio of messages with a negative tone and sometimes scary imagery (red stop signs usually) has been on the increase, while messages of praise has shown a stark decline around 2007.

To see if the situation is similar on the Hungarian Wikipedia I tried to look at the user discussion pages on the Hungarian Wikipedia. Without diving into copies of the database that contain every single historical edit, I concentrated on edits in April-May 2011.

First I looked at the 100 most recent edits on user talk pages, which has included experienced editors – indeed, a lot of the discussion was between experienced editors. I tried to partition the edits based on tone into positive, negative and neutral, but (except for negative) it is usually quite difficult and the line between positive and neutral is a matter of subjective judgement (as a rule of thumb, anything that included a thank you or the default welcome template went into the positive bucket).

After doing this, I realized that I should have concentrated on messages left for new editors, so I looked at the talk pages of 30 people who have registered in April on the Hungarian Wikipedia.

The results weren't too exciting as there wasn't much interaction happening with new users. The majority received only the standard welcome message on their talk page; only two of the pages showed extensive discussion (indicating that the user has become quite active already).

A good sign is that the Hungarian Wikipedia doesn't really use scary images in templates, except in the cases of copyright violations and the Wikipedia puzzle piece in warnings about articles that are too short and that will therefore be deleted.

Thus, the situation seems to be better on the Hungarian Wikipedia than on the English Wikipedia. Unfortunately, this means that other explanations are needed to find out why is the retention and "conversion rate" of new editors on the Hungarian Wikipedia very low.