Tuesday, July 19, 2011

WikiCamp 2011 takes Miskolc

Just as last year, a score and ten Wikipedians gathered for a four-day Wikicamp in the north-eastern town of Miskolc.

The campers got a chance to get to know each other while sightseeing in Eger and Miskolc (the former with a pit stop at a wine cellar), hiking and getting lost in the nearby woods, a visit to an adventure park and some short presentations on how to take quality pictures for Wikipedia and Wikimedia Commons and of the planned software changes coming to Wikipedia (among a few others).

The event proved to be a success and is becoming a tradition, so we urge everyone to sign up early for the 2012 camp to be held in Veszprém.

---
Photo: Texaner, Wikimedia Commons, under CC BY-SA 3.0 and GFDL

Thursday, July 14, 2011

Wikimedia Hungary grants

The National Civil Fund (recently renamed after Sándor Wekerle, a former prime minister), the Hungarian grant giving arm of the European Social Fund has granted Wikimedia Hungary 250 000 HUF ($1300) to cover its operating expenses between 1 July and 30 September, in particular, the grant funds the development of an online payment gateway for our bank built on CiviCRM, and for producing printed materials.

This is the third grant in a row that we have won and the justifications of the grants show that we are getting better at it, reflecting both on our grant writing skills and more so on our activities.


($1 = 190 Hungarian Forints)

Monday, June 6, 2011

Featured article word cloud

The three thousand featured articles of the English are made up of roughly 223 thousand different words, out of which 100 thousand are used only once.* As a comparison, Shakespeare used 29 thousand words in his works, out of which 12 thousand occurred only once.

The most frequent words represented as a cloud after the most common function words were removed:
And this is what the above cloud would look like if the function words (including the 1.1 million the's out of the 15 million words in total) were included and weighted according to their frequency:

* Different word forms of the same word are counted separately but uppercase and lowercase forms are counted as one.E.g  "Cat" and "cat" count as one but "cats" is counted separately from "cat". 

Friday, June 3, 2011

Readability of South African Constitutions

South Africa has had five constitutions during its history. The first one, the South Africa Act of 1909 was actually an act of the British Parliament. The 1961 Constitution was adopted during apartheid to transform the country into a Republic and the 1983 tried to reform things a bit with a Tricameral parliament. The 1993 Constitution was an interim one that set out the framework for the process that created the current, democratic Constitution of 1996.

My thesis looked at the readability (and factors affecting easy comprehension) of South African Constitutions at two specific points in time, but it is quite, or even more interesting to look at the whole developmental sequence.


The language of two South African Constitutions

One of my two theses is now finally ready, and given that I am satisfied with the results, I thought I should share it. It was a comparison of two South African constitutions (the 1961 and the current 1996 one), to see if the freer society has manifested itself in a more accessible legal text, which I showed it did. This was not only the result of modernization, but a conscious effort on the part of the drafters.

Here's the abstract, and if you are interested, you can read the whole thing here.

This study examined in detail the language of two South African constitutions. The Republic of South Africa Constitution Act, 1961 adopted in the era of apartheid was compared with the current constitution, the Constitution of the Republic of South Africa, 1996, to find out whether the democratization of society has resulted in a more accessible constitution. 
Based on the recommendations of the Plain Language Movement for more accessible legal language, four criteria were examined in a quantitative analysis: average sentence length, the use of passive verb forms, the use of „shall‟ and the use of archaic and Latin expressions. 
The results showed that the 1996 Constitution compared to the 1961 Constitution has significantly shorter average sentences; passive constructions are half as frequent; the use of „shall‟ and difficult, archaic and Latin expressions are avoided. The results indicate that the language of the 1996 Constitution conforms better to the recommendations on accessible language. In conclusion, the democratization of society has been accompanied by a constitution that is easier to comprehend and understand, allowing the citizens to understand their rights and obligations towards the state better.

Wednesday, June 1, 2011

The Mouse That Roared


The text of the declaration from The Mouse That Roared book, which is about as good as the film itself:

Friday, May 13, 2011

The readability of user warning messages

Looking at the talk pages on the English Wikipedia I got the impression that the standard user warning messages are terribly difficult to understand. First impressions can be deceiving though, so I decided to investigate.

The English Wikipedia catalogues 405 different warning messages (there are some duplicates in that count) that can be sent to users who commit any of the scores of possible transgressions. As a comparison, there are only 137 so called barnstars used to congratulate users for their achievements.

To determine how readable these are, I looked at 105 of these messages and calculated their readability scores (the raw data is available here). The standard readability formulas take into account the length of sentences and the length of words (either as the number of characters or syllables in them) and using a formula give a prediction of the number of years of formal education one would need to understand them (this is the “grade level” based on the US education system).

Readability is not really an exact science, different formulas give slightly different weights to the length of words and sentences and there are a number of other factors that influence the comprehensibility of a text – for example, the frequency of difficult words, the use of multiple negation, etc. – that the formulas don’t take into account1. Nevertheless, readability formulas give a comparable indication of the difficulty of different texts.
image001
The readability of various categories of user warnings, based on the SMOG formula

The results show that on average it would take an American student 12 years of study (i.e. graduating high school) to understand these warning messages. This level seems appropriate for an encyclopedia.2


The averages, however, mask the outliers. The least readable message in the sample was the notice people get when they are blocked to enforce a decision by the English Wikipedia’s arbitration committee would need about 18 and a half years of education to understand on the first reading. Running up are some of the more commonly appearing templates that warn users that their article is nominated for deletion or breaches copyrights.



 
Purpose
SMOG index (years of education needed to understand text)
Block to enforce arbitration decision 18,49
Warning that the user has added copyrighted material 17,23
Warning that the user has added a link to copyrighted material 16,86
User's article is proposed for deletion 16,64
Final warning that the user not remove maintenance templates 16,42
User's article nominated for deletion 15,45
User's article proposed for deletion 15,42
User blocked for advertising or self-promoting 15,25
User's article speedily deleted for spam 14,75
Warning that the user not assume ownership of articles 14,62

In conclusion, the warning messages aren’t unreasonably unreadable, although the various deletion notices, especially the ones concerned with copyright are written in a way that is too difficult to understand by the average user. At this point it is only a hunch, that the most commonly used messages are among the most difficult to comprehend.

1 Studies have confirmed that the inclusion of other factors in the formula contributes more work than it improves the results. [1]
2 According to the UNU-Merit user survey, 88% of the users have finished secondary education. [2]