Showing posts with label Wikipedia. Show all posts
Showing posts with label Wikipedia. Show all posts

Tuesday, July 19, 2011

WikiCamp 2011 takes Miskolc

Just as last year, a score and ten Wikipedians gathered for a four-day Wikicamp in the north-eastern town of Miskolc.

The campers got a chance to get to know each other while sightseeing in Eger and Miskolc (the former with a pit stop at a wine cellar), hiking and getting lost in the nearby woods, a visit to an adventure park and some short presentations on how to take quality pictures for Wikipedia and Wikimedia Commons and of the planned software changes coming to Wikipedia (among a few others).

The event proved to be a success and is becoming a tradition, so we urge everyone to sign up early for the 2012 camp to be held in Veszprém.

---
Photo: Texaner, Wikimedia Commons, under CC BY-SA 3.0 and GFDL

Monday, June 6, 2011

Featured article word cloud

The three thousand featured articles of the English are made up of roughly 223 thousand different words, out of which 100 thousand are used only once.* As a comparison, Shakespeare used 29 thousand words in his works, out of which 12 thousand occurred only once.

The most frequent words represented as a cloud after the most common function words were removed:
And this is what the above cloud would look like if the function words (including the 1.1 million the's out of the 15 million words in total) were included and weighted according to their frequency:

* Different word forms of the same word are counted separately but uppercase and lowercase forms are counted as one.E.g  "Cat" and "cat" count as one but "cats" is counted separately from "cat". 

Friday, May 13, 2011

The readability of user warning messages

Looking at the talk pages on the English Wikipedia I got the impression that the standard user warning messages are terribly difficult to understand. First impressions can be deceiving though, so I decided to investigate.

The English Wikipedia catalogues 405 different warning messages (there are some duplicates in that count) that can be sent to users who commit any of the scores of possible transgressions. As a comparison, there are only 137 so called barnstars used to congratulate users for their achievements.

To determine how readable these are, I looked at 105 of these messages and calculated their readability scores (the raw data is available here). The standard readability formulas take into account the length of sentences and the length of words (either as the number of characters or syllables in them) and using a formula give a prediction of the number of years of formal education one would need to understand them (this is the “grade level” based on the US education system).

Readability is not really an exact science, different formulas give slightly different weights to the length of words and sentences and there are a number of other factors that influence the comprehensibility of a text – for example, the frequency of difficult words, the use of multiple negation, etc. – that the formulas don’t take into account1. Nevertheless, readability formulas give a comparable indication of the difficulty of different texts.
image001
The readability of various categories of user warnings, based on the SMOG formula

The results show that on average it would take an American student 12 years of study (i.e. graduating high school) to understand these warning messages. This level seems appropriate for an encyclopedia.2


The averages, however, mask the outliers. The least readable message in the sample was the notice people get when they are blocked to enforce a decision by the English Wikipedia’s arbitration committee would need about 18 and a half years of education to understand on the first reading. Running up are some of the more commonly appearing templates that warn users that their article is nominated for deletion or breaches copyrights.



 
Purpose
SMOG index (years of education needed to understand text)
Block to enforce arbitration decision 18,49
Warning that the user has added copyrighted material 17,23
Warning that the user has added a link to copyrighted material 16,86
User's article is proposed for deletion 16,64
Final warning that the user not remove maintenance templates 16,42
User's article nominated for deletion 15,45
User's article proposed for deletion 15,42
User blocked for advertising or self-promoting 15,25
User's article speedily deleted for spam 14,75
Warning that the user not assume ownership of articles 14,62

In conclusion, the warning messages aren’t unreasonably unreadable, although the various deletion notices, especially the ones concerned with copyright are written in a way that is too difficult to understand by the average user. At this point it is only a hunch, that the most commonly used messages are among the most difficult to comprehend.

1 Studies have confirmed that the inclusion of other factors in the formula contributes more work than it improves the results. [1]
2 According to the UNU-Merit user survey, 88% of the users have finished secondary education. [2]

Tuesday, May 10, 2011

A bit more on user talk pages

Building on my previous post, where I have looked at the tone of discussions on Wikipedia users' talk pages, especially that of new users, today I looked at a couple of other languages to see if there are any interesting trends.

I looked at 30-30 recently registered users' talkpage from April on the Croatian, Serbian, Russian and English Wikipedias – of course, this means that neither sample was very representative as the size of the Wikipedias differ and in certain cases it takes days, while in others only minutes until 30 new users register. Therefore, it is important to take the numbers with a grain of salt, while the overall trends should be about right.
Colourful welcome message on the Russian Wikipedia. There is  also a more text heavy black and white version.

The three smaller Wikipedias (and the previously examined Hungarian one) had in common the practice to place a welcome template message on the users' pages  following their first edits, even if they didn't have any other comment (praise or correction) to offer (about 28-29 people in the samples received some form of welcome template).The welcome messages are sometimes (6-30% of cases) followed by warnings that are somewhat specific to a given Wikipedia.   

Serbian welcome message, with a warm invitation at the end that looks personal.
What was interesting was the common warning (4 times in the sample) on the Croatian Wikipedia that the user write in Croatian (given the similarity of the Serbocroatian languages, I cannot judge whether the warning was justified, but it can't be a positive experience if you are told that you are not speaking the right language or the language right), and that 4 out of the 30 people were indefinitely blocked for unproductive editing (without the ability to see deleted edits I cannot judge these blocks, but their harshness and the lack of warning in cases was striking).
A typical English Wikipedia talk page with a welcome and a number of deletion notices.
When I turned to the English Wikipedia the image was slightly different. The talk pages suddenly look like minefields dotted with danger signs. Only 55% of the users received a welcome message preceding a notice that their article was deleted or their contribution reverted (about 85% of the sample received some kind of warning).

Given the high proportion of users faced with the warning sign messages as the first feedback they get from Wikipedia, it might be worthwhile to consider making them more user friendly. One good step would be to make them easier to understand by simply rewriting them in Plain English (the grammar could be simplified, insider jargon like "tag", "under criteria A7", "userfy" should be removed, etc.). 

An interesting follow-up study would be to see what effect do welcome messages or the lack of them have on new users' behaviour.

Saturday, May 7, 2011

Tone of talk page discussions

The Community Department at the Wikimedia Foundation has been running a number of small scale studies on the English Wikipedia in preparation for a more in-depth study during the summer.
English Wikipedia. (CC By-Sa: Steven Walling)

One of the things they have looked into was the tone of messages left on new editors' talk pages. Their findings show that the ratio of messages with a negative tone and sometimes scary imagery (red stop signs usually) has been on the increase, while messages of praise has shown a stark decline around 2007.

To see if the situation is similar on the Hungarian Wikipedia I tried to look at the user discussion pages on the Hungarian Wikipedia. Without diving into copies of the database that contain every single historical edit, I concentrated on edits in April-May 2011.


First I looked at the 100 most recent edits on user talk pages, which has included experienced editors – indeed, a lot of the discussion was between experienced editors. I tried to partition the edits based on tone into positive, negative and neutral, but (except for negative) it is usually quite difficult and the line between positive and neutral is a matter of subjective judgement (as a rule of thumb, anything that included a thank you or the default welcome template went into the positive bucket).

After doing this, I realized that I should have concentrated on messages left for new editors, so I looked at the talk pages of 30 people who have registered in April on the Hungarian Wikipedia.

The results weren't too exciting as there wasn't much interaction happening with new users. The majority received only the standard welcome message on their talk page; only two of the pages showed extensive discussion (indicating that the user has become quite active already).

A good sign is that the Hungarian Wikipedia doesn't really use scary images in templates, except in the cases of copyright violations and the Wikipedia puzzle piece in warnings about articles that are too short and that will therefore be deleted.

Thus, the situation seems to be better on the Hungarian Wikipedia than on the English Wikipedia. Unfortunately, this means that other explanations are needed to find out why is the retention and "conversion rate" of new editors on the Hungarian Wikipedia very low.

Sunday, May 9, 2010

Tuesday, March 16, 2010

Combating link rot on Wikipedia

Wikipedia's W (favicon). The "W" ori...Image via Wikipedia
One of the main principles of Wikipedia is verifiability, the idea that any fact you find in an article can also be found in a reliable external source (that's why there are so many footnotes in any given Wikipedia article). These external sources can either be offline paper products or more often than not online web pages. Unfortunately, web pages often change or become unavailable, a process nicknamed link rot , which goes counter to the ability of verification.

One way to combat link rot and to ensure that a reader can always find the sources used to make up a Wikipedia article is to rely on online archiving services such as the Internet Archive or WebCite. The solution to the problem is to submit each linked web page to the archives' attention to make sure they will have a copy of the referenced webpages in the eventuality that they become unavailable.

There is no automatic way to submit all links on a Wikipedia to an archive and different projects have come up with different solutions. The English Wikipedia used to send every new link added to the various articles to the WebCite archive (to the point that said archive had to increase server capacity). The French Wikipedia have devised a way to link to an archived version of linked pages at the Wikiwix search engine, but I don't know the particulars.

So far the Hungarian Wikipedia doesn't have a systematic way of eliminating dead external links. As a first step in the right direction I slightly modified a component of the Pywikipedia framework to go through every single page in the Hungarian Wikipedia and send every external link to the WebCite archive. The method was inefficient because I am not a programmer and both Python and the WebCite website often crashed. (The ideal program would have used the external links database dump that contains only the links without the irrelevant article text.)

As a results of my efforts the vast majority of the external web pages that were linked from the Hungarian Wikipedia and were alive at the end of 2009 can now be found in the WebCite archive. (Such as this copy of the Nobel prize website.) I will run my program periodically to include new links added to articles.

The logical extension of my work would be to include the links to the archived versions next to the links themselves if a page dies. This could be done either manually or automatically, however I haven't the expertise or time to make this happen.

Sunday, June 7, 2009

Queen's Champion

The office of the Queen's Champion is an important hereditary office in the United Kingdom that apparently dates back to 1066. The duties to be performed in exchange for the 12 km² Manor of Scrivelsby are not manifold, but all the more dangerous. Until the coronation of George VI in 1821 his duty was to challenge to duel those who would not accept the new monarch.
At the coronation banquet he would throw down his gauntlet three times and a herald would issue a challenge among the following lines:
If any person, of whatever degree soever, high or low, shall deny or gainsay our Sovereign Lord George, King of the United Kingdom of Great Britain and Ireland, Defender of the Faith, son and next heir unto our Sovereign Lord the last King deceased, to be the right heir to the imperial Crown of this realm of Great Britain and Ireland, or that he ought not to enjoy the same; here is his Champion, who saith that he lieth, and is a false traitor, being ready in person to combat with him, and in this quarrel will adventure his life against him on what day soever he shall be appointed.
The champion was loaned the second best horse in the Royal Mews and an armor which was his for the keeping if anyone took up the challenge and the champion has won; otherwise he would get a cup from which the sovereign has drunk the champion's health.
There are no certain records that would show that anyone accepted the challenge, though there are some rumours about different Jacobites doing so.
After George VI the tradition of holding a coronation banquet in Wetminster Hall (the building of the Houses of Parliament) was abandoned and thus the life of champion became simpler, until the 20th century. In 1902 the then champion petitioned the Court of Claims -- the special court set up at every coronation to decide on who gets to perform what service at the coronation -- and since then his duty is to carry the Royal Standard at the coronation.

Find out more on Wikipedia; the painting comes from this website. A nice way to learn about chivalric traditions and the way a proper challenge was accepted and fought out is to read the Song of Roland from the eleventh century.

Friday, May 29, 2009

Clandestine outlawries

I am a big admirer of long-kept traditions and I am always happy to see one survive or flourish. Thus, I was happy to discover a British Parliamentary tradition that has been kept for over three hundred years and has been exported to other Commonwealth countries as well.

When a new session of Parliament is opened the Queen (or her representative) makes a speech from the throne in the upper house of Parliament. (According to tradition she is not given entry to the House of Commons.) After the speech is read both chambers of Parliament demonstrate that the Queen is in no position to set the agenda of debate so in defiance they introduce a bill for a first reading (which means they first read the title of the bill and then decide whether to discuss it further in committees). For the last three hundred or so years this bill has been the same in the United Kingdom: in the Houses of Commons it is “A Bill for the more effectual preventing clandestine Outlawries” and in the House of Lords “A bill for the better regulating of Select Vestries”).

The Outlawries Bill basically sets up measures to prevent people from declaring their fellows “outlaws” in secret and also has some extra penalties for sheriffs doing this.

The Select Vestries Bill deals with the rights of “select vestries” to administer poor law.

In Canada the bills are titled “An Act respecting the Administration of Oaths of Office” and “An Act relating to Railways”. It is worthwhile to read the actual texts of these two bills that have been printed maybe for the first time ever in 2009. It is a good indication of the serious thought behind these pro forma bills is that the text stops after a short and one clause reading:

This bill asserts the right of the Senate to give precedence to matters not addressed in the Speech from the Throne.

After this pro forma bill, as far as I can see from the Hansard records I’ve seen online, the Speaker informs the members that he has obtained the Queen’s speech “for greater accuracy” and then a member moves to present an humble address to the Queen along the lines of:

Most Gracious Sovereign—We, Your Majesty’s most dutiful and loyal subjects, the Lords Spiritual and Temporal in Parliament assembled, beg leave to thank Your Majesty for the most gracious Speech which Your Majesty has addressed to both Houses of Parliament

After some long speeches by the mover and the seconder of this address, the actual work of Parliament begins.

Thursday, October 23, 2008

New search engine for Wikipedia

The English Wikipedia has a new built-in search engine, which is purely awesome. The changes are subtle but very useful. There are some behind the scenes improvements in the quality of the results but the big change is that searches now return results from the sister projects as well.

For example, if one searches for "good offices " on Wikipedia, one discovers that there is no such article, yet immediately sees a link to Wiktionary (a dictionary) that gives the definition for this term:
The beneficial services and acts of a third party; especially when used to mediate between people in a dispute
With over 2.5 million articles it is quite difficult to find something missing from Wikipedia, but if you do find such a thing now there's a chance you won't be left unsatisfied.

An other example is if you search for something that already has an article, e.g. "Bill Clinton" and you immediately receive links to some of his speeches, best quotes, and most recent news appearances.

I can hardly wait for the Hungarian Wikipedia to be migrated to this new system as this might be the very best thing that will have happened to the sister projects in a long time: they will receive greater exposure, possibly encouraging more people to contribute and the readers will have easier access to more information.

[Update]: The new system has been enabled for all Wikimedia wikis, apparently not having enough RAM was the low threshold preventing this happening earlier. The system could have a little more polishing, e.g. instead of displaying the meaningless "hu.wikisource.org" as the location of the alternative search results, it could simply say "Wikiforrás" ('Wikisource' in Hungarian).

Thursday, August 14, 2008

Wikipedia manifesto

The National Library of Australia has digitised and made publicly available the newspapers published in Australia that are in the public domain. In of the first issue of The Sydney Gazette and New South Wales Advertiser, could serve as a slogan or manifesto of what Wikipedia is all about. I reproduce it here (please change any reference to the colonial newspaper to a free encyclopaedia or similar, as appropriate):
ADDRESS.
Innumerable as the Obstacles were which
threatened to oppose our Undertaking, yet
we are happy to affirm that they were not
insurmountable, however difficult the task
before us.
The utility of a PAPER in the COLONY,
as it must open a source of solid information,
will, we hope, be universally felt and ac-
knowledged. We have courted the assistance
of the INGENIOUS and INTELLIGENT : -
We open no channel to Political Discussion, or
Personal Animadversion :--Information is
our only Purpose ; that accomplished, we
shall consider that we have done our duty, in
an exertion to merit the Approbation of the
PUBLIC, and to secure a liberal Patronage to
the SYDNEY GAZETTE. , .

Saturday, June 28, 2008

Illegal to die in the Houses of Parliament?

I have spent the last couple of hours trying to track down the law that states that it is illegal to die in the Houses of Parliament. I have heard about it from a friend, and doing a search on Wikipedia didn't yield any verifiable results.

There are three mentions of this in Wikipedia, two of them linking to two newspaper articles, one stating that this law has been voted the most ridiculous while the other that the practice is to mark St. Thomas' hospital as a place of death in cases anybody breaks the law and dies there. Neither of these articles provide the source, or the actual name of the law that would state this, and I have not found it in neither of the online law databases of the UK I have checked. I couldn't find that law either that would say that those dying in a royal palace have to receive a state funeral, the closest thing was the Coroners Act 1988, that states that inquests into the deaths of persons lying inside one of the Queen's palaces are done by the Queen's coroner; alas no mention of a state funeral.

While the third about Spencer Perceval, the only British prime minister to have been assassinated, while seems to misquote his last words (either, according to the article "I am murdered" or according to the 10 Downing Street website "Oh, I have been murdered") states without providing a source that it is only illegal to die in the House of Lords.

Thus I have to think that this is probably an urban legend, though quite interesting nevertheless. Through my quest I have found the law that states that whales belong to the King.

Friday, April 20, 2007

Are you blogging this?

It was time to upload this blog of mine as well, though it doesn't such good standings in search results as my Hungarian does (big fish in a small pond, or more likely links to it from my Wikipedia userpage and NCurse's Hungarian blog help).
Anyway, I plan on updating this blog as well over time... until then enjoy this video of Web2.0 websites; to mention just one part of it, the singer totally shares my approach to the unknown: look it up on Wikipedia :) .