Tuesday, October 14, 2014

And now for something completely different...

As always in October, lectures are starting again. Like every year, I will give a lecture on Semantic Web Technologies. BTW, I have realized that I give now courses on Semantic Web for almost 10 years. It all started as a seminar at the Friedrich-Schiller Universität back in Jena and became a fully-fledged lecture here at the HPI in Potsdam. Like the lecture of last winter semester, almost all lectures have been recorded and are online available either at tele-Task or yovisto.

Moreover, we have also prepared two MOOC courses Semantic Web Technologies in Spring 2013, and Knowledge Engineering with Semantic Web Technologies in Spring 2014, both very successful with thousand(s) of students.

This semester, I have decided not to do the very same all over again and to try out something completely different...

Have you ever heard of the Flipped Classroom concept? This semester, we are going to turn the lecture situation around for the students. All the lecture content has already been recorded. Thus, students can prepare for each lecture at home by watching the videos and studying the handouts as well as the course materials. Then, in the classroom, I will not present the content again, but we are going to discuss

  • everything which needs more attention according to the students,
  • everything that the students did not quite well understand,
  • including all problems, errors, and complements that seem to be important.
 Thus, to follow the (live) lecture the students have to prepare accordingly. Of course this will only work with the active participation of the students. On the other hand, it will also be more challenging for the lecturer and the tutors, because we have to be very well prepared to deal with all kind of potential questions and problems. Of course we will work out problem solutions and answers always together with the students. And it will be also the students who will take over the lead - well of course under the lecturer's guidance.

I'm very curious whether this concept will work out well with my lecture here at HPI. Please keep your fingers crossed and I will keep you posted.

Additional Links:

Tuesday, September 02, 2014

More intermediate Results from our Fact Ranking Challenge

Fact Ranking Challenge
Again, here is an update about our currently gathered data about our fact ranking experiment.

We started the original challenge about 5 weeks ago and now are able to present you some more intermediate results [1]. Nevertheless, the challenge is still running. Therefore, please distribute, participate, advertise, and help us to generate a fully fledged ground truth for fact ranking [2]. All data will be made publicly available for further research.

To determine the importance of a fact is of utmost importance, if you want to properly understand the content of information. Usually, you have a rich variety of possible interpretations of information. To determine the proper interpretation, you are going to use the context, i.e. further available information. So, the question develops from "what is important?" to "what is important with regard of this specific context?".

Current Intermediate Statistics (from Aug 28):

Number of users who participated: 388

Sum of concepts done: 1736

446 unique concepts are covered (out of 541). 

Average concepts done per user: 4.47

CONCEPTS DONE:
0 concepts were done by 79 users. 
1 concepts were done by 79 users. 
2 concepts were done by 68 users. 
3 concepts were done by 41 users. 
4 concepts were done by 25 users. 
5 concepts were done by 22 users. 
6 concepts were done by 8 users. 
7 concepts were done by 7 users. 
8 concepts were done by 9 users. 
9 concepts were done by 6 users. 
10 concepts were done by 8 users. 
11 concepts were done by 4 users. 
12 concepts were done by 2 users. 
13 concepts were done by 1 users. 
14 concepts were done by 4 users. 
15 concepts were done by 3 users. 
16 concepts were done by 2 users. 
19 concepts were done by 1 users. 
20 concepts were done by 3 users. 
21 concepts were done by 2 users. 
22 concepts were done by 1 users. 
23 concepts were done by 1 users. 
24 concepts were done by 2 users. 
25 concepts were done by 1 users. 
31 concepts were done by 1 users. 
38 concepts were done by 2 users. 
40 concepts were done by 1 users. 
42 concepts were done by 1 users. 
58 concepts were done by 1 users. 
59 concepts were done by 1 users. 
62 concepts were done by 1 users. 
64 concepts were done by 1 users. 

EDUCATION:
highschool : 41 users.
phd : 57 users.
other : 22 users.
bachelors : 93 users.
masters : 174 users.

AGE:
33+ :  216 users.
19-25 :  68 users.
26-32 :  98 users.
0-18 :  5 users.

GENDER:
female :  90 users.
male :  297 users.

COUNTRY OF ORIGIN:
Angola :  1 users.
Belarus :  1 users.
Portugal :  4 users.
Philippines :  2 users.
Morocco :  3 users.
Greece :  5 users.
Ukraine :  3 users.
Indonesia :  3 users.
Sri Lanka :  1 users.
Italy :  13 users.
Iraq :  1 users.
India :  40 users.
France :  11 users.
Latvia :  1 users.
Pakistan :  3 users.
Syrian Arab Republic :  1 users.
Montenegro :  1 users.
Armenia :  1 users.
Mexico :  2 users.
Brazil :  10 users.
Venezuela :  1 users.
Croatia :  2 users.
Macedonia, The Former Yugoslav Republic of :  1 users.
Romania :  2 users.
Western Sahara :  1 users.
Algeria :  4 users.
Sweden :  2 users.
United States :  21 users.
Serbia :  8 users.
Nigeria :  2 users.
Estonia :  1 users.
Spain :  8 users.
Taiwan, Republic of China :  2 users.
Ireland :  1 users.
Israel :  1 users.
Russian Federation :  9 users.
Colombia :  3 users.
Switzerland :  1 users.
Azerbaijan :  1 users.
Kenya :  2 users.
Yemen :  1 users.
Malaysia :  2 users.
Viet Nam :  1 users.
Australia :  4 users.
Peru :  1 users.
Albania :  1 users.
South Africa :  2 users.
Netherlands :  8 users.
China :  2 users.
Somalia :  1 users.
Slovenia :  1 users.
Finland :  3 users.
Lithuania :  1 users.
Austria :  6 users.
Sudan :  1 users.
United Kingdom :  15 users.
Egypt :  2 users.
Bahamas :  1 users.
Hungary :  1 users.
Poland :  4 users.
Iran, Islamic Republic of :  2 users.
Bulgaria :  3 users.
Norway :  1 users.
Germany :  140 users.
New Zealand :  3 users.

COUNTRY OF RESIDENCE:
United Arab Emirates :  1 users.
Belarus :  1 users.
Portugal :  2 users.
Philippines :  1 users.
Morocco :  2 users.
Greece :  3 users.
Ukraine :  1 users.
Indonesia :  2 users.
Luxembourg :  1 users.
Sri Lanka :  1 users.
Italy :  11 users.
India :  31 users.
France :  13 users.
Jordan :  1 users.
Denmark :  1 users.
Latvia :  1 users.
Pakistan :  3 users.
Oman :  1 users.
Turkey :  1 users.
Czech Republic :  1 users.
Armenia :  1 users.
Canada :  3 users.
Brazil :  9 users.
Croatia :  1 users.
Romania :  3 users.
Algeria :  4 users.
Sweden :  3 users.
United States :  23 users.
Serbia :  4 users.
Nigeria :  1 users.
Saudi Arabia :  1 users.
Estonia :  2 users.
Spain :  7 users.
Taiwan, Republic of China :  1 users.
Ireland :  3 users.
Israel :  2 users.
Russian Federation :  2 users.
Colombia :  2 users.
Switzerland :  7 users.
Azerbaijan :  2 users.
Kenya :  2 users.
Norfolk Island :  1 users.
Yemen :  1 users.
Malaysia :  2 users.
Australia :  6 users.
Peru :  1 users.
Albania :  1 users.
South Africa :  1 users.
Netherlands :  12 users.
Somalia :  1 users.
Slovenia :  1 users.
Gambia :  1 users.
Finland :  3 users.
Lithuania :  1 users.
Austria :  5 users.
United Kingdom :  17 users.
Egypt :  1 users.
Bahamas :  1 users.
Belgium :  2 users.
Poland :  5 users.
Singapore :  1 users.
Iran, Islamic Republic of :  1 users.
Bulgaria :  3 users.
Norway :  1 users.
Germany :  154 users.
New Zealand :  2 users.

Overall confidence of users about the seen concepts: 2.687

NO. OF USERS PER CONCEPT:
On AVERAGE there are 2.18 users per concept.

NO. OF ANSWERS PER CONCEPT (STEP 1):
On AVERAGE there are 4.71 answers per concept.

NONSENSE STATEMENTS:

TOTAL number of nonsense sentences = 1371


Hint: You might wonder about the impressive high scores on the top of the list? Well, actually points are given exponentially, i.e. the longer you play, the more points you will score per processed concept.

References:
[1] Help Us with a Research Problem, July 30, 2914
[2] Fact Ranking Web-Application, http://s16a.org/fr/

Tuesday, August 19, 2014

The Importance of Relevance - Intermediate Results

Current Highscore List of our fact ranking challenge
In my last post, we invited you to take part in our research challenge, which was about creating a ground truth for fact ranking algorithms. To determine the importance of a fact is of utmost importance, if you want to properly understand the content of information. Usually, you have a rich variety of possible interpretations of information. To determine the proper interpretation, you are going to use the context, i.e. further available information. So, the question develops from "what is important?" to "what is important with regard of this specific context?".

We started the original challenge about 3 weeks ago and now are able to present you first intermediate results [1]. Nevertheless, the challenge is still running. Therefore, please distribute, participate, advertise, and help us to generate a fully fledged ground truth for fact ranking [2]. All data will be made publicly available for further research.

Current Intermediate Statistics:
Number of users who participated: 110 (Thanks to all you you!!!)
Number of overall processed concepts: 509
Overall 200 unique concepts are covered (out of 541).

Average concepts processed per user: 4.63

Detailed number of processed concepts per user:
0 concepts were done by 15 users.
1 concepts were done by 27 users.
2 concepts were done by 21 users.
3 concepts were done by 12 users.
4 concepts were done by 7 users.
5 concepts were done by 8 users.
6 concepts were done by 2 users.
7 concepts were done by 1 users.
8 concepts were done by 4 users.
9 concepts were done by 1 users.
10 concepts were done by 2 users.
11 concepts were done by 2 users.
14 concepts were done by 1 users.
16 concepts were done by 1 users.
20 concepts were done by 1 users.
22 concepts were done by 1 users.
25 concepts were done by 1 users.
31 concepts were done by 1 users.
53 concepts were done by 2 users.

Participant statistics:

EDUCATION:
highschool : 7 users.
bachelors : 28 users.
masters : 47 users.
phd : 23 users.
other : 5 users. 

AGE:
33+ : 43 users.
26-32 : 34 users.
19-25 : 33 users.

GENDER:
female : 25 users.
male : 85 users.

COUNTRY OF ORIGIN:
United States : 9 users.
Serbia : 6 users.
Spain : 2 users.
Ukraine : 1 users.
Russian Federation : 3 users.
Colombia : 1 users.
Italy : 6 users. India : 2 users.
France : 3 users.
Malaysia : 1 users.
Australia : 1 users.
Albania : 1 users.
China : 1 users.
Pakistan : 1 users.
Finland : 1 users.
Austria : 2 users.
Montenegro : 1 users.
United Kingdom : 8 users.
Brazil : 3 users.
Poland : 2 users.
Iran, Islamic Republic of : 1 users.
Macedonia, The Former Yugoslav Republic of : 1 users.
Croatia : 1 users.
Germany : 49 users.
Algeria : 1 users.
New Zealand : 1 users.
Sweden : 1 users.

Overall confidence of users about the seen concepts: 2.585

NO. OF USERS PER CONCEPT:
On AVERAGE there are 1.295 users per concept.

NO. OF ANSWERS PER CONCEPT (STEP 1):
On AVERAGE there are 4.825 answers per concept.

We will keep you posted about the results.
Please distribute, participate, advertise, and help us to generate a fully fledged ground truth for fact ranking.

Hint: You might wonder about the impressive high scores on the top of the list? Well, actually points are given exponentially, i.e. the longer you play, the more points you will score per processed concept.

References:
[1] Help Us with a Research Problem, July 30, 2914
[2] Fact Ranking Web-Application, http://s16a.org/fr/

Wednesday, July 30, 2014

Help Us with a Research Problem

As you might know, we already have tried previously to let the public participate in our research. Last time, we have had developed several games (with a purpose). This time, unfortunately it is not a game, simply because the development of a good game is really expensive. But, let's get to the point. What is the task all about, where you can help us....?

You know, my research group is working on semantic technologies. Semantics in that sense means, we are trying to (automatically) understand what information (or data) is all about and what is the meaning of it. Sometimes, information is ambiguous. This makes it difficult to understand, because you have to solve ambiguities with the help of context.

On the other hand, sometimes you have various different information about a subject. How do you determine, which information or fact is more important or relevant than another? Just a quick example. Let's assume we have the following two facts:

(1) Albert Einstein is a physicist.
(2) Albert Einstein is a Vegetarian.

Which of the two facts is more important or relevant? Yes, this is difficult to answer, simply because the truth often lies in the eye of the beholder. For a vegetarian, maybe the second fact is more important. But, what about the most common opinion? What would the mainstream think? Probably, most people would say that fact (1) in general is more important.

So, what we are doing is that we develop heuristics that determine the importance of facts (relative to other facts). To get an idea about the quality of our heuristics, we have to do an evaluation, i.e. somebody has to decide whether the decision of the heuristics was wrong or right. Unfortunately, there does not exist a ground truth for this task called "fact ranking". Therefore, we are about to create a new ground truth that later will be publicly available and open for all researchers.

This ground truth is achieved with the little 'voting' application that you will find here [1]. You just have to register with the tool and then the task will be explained to you in detail. We took 500 popular concepts from Wikipedia and you have (1) to think about the most important facts about these concepts that come to your mind and then (2) rate the (new) facts presented to you according to their relevance. There is no right or wrong answer. Just vote as you think it seems right for you. Afterwards, we will aggregate all votes from all participants to determine the general (mainstream) relevance of the presented facts.

You might interrupt your rating of the presented facts at any time you like and continue later. To make it a bit more interesting, you can also score points and of course there is a highscore list. We would really appreciate your help in this task. Please do also spread the word. The more participants, the more valid our ground truth will be.

We know that this is a difficult and sometimes rather boring task. The more we would be really grateful for your assistance!

[1] Fact Ranking Web-Application, http://s16a.org/fr/

...in Times of War

For us Western Europeans, war always seems to be far, far away in some other country (or some distant times in the past). Usually, we read about it in the newspapers or see the pictures in the media. But, we are not concerned directly. This also includes me as a researcher. Of course, we also have students in our institute who come from countries or regions in crisis. But, they are here and the crisis is there... elsewhere.

As you might know, I recently finished my OpenHPI online course 'Knowledge Engineering with Semantic Web Technologies'. The course was rather popular with a total of 4,623 enrolled students from all over the world. 611 students took part in the final course exam and 450 students have finished the course successfully (yeah!!).

Of course the means of interaction with the students are limited in an online course. You follow the stream of discussions in the OpenHPI platform, answering a question here and then. Sometimes you also receive email from one of the course participants...

Today, I have received email from a course participant in Gaza, Palestine. He wrote me about his appreciation for the OpenHPI team to offer courses like this and about the projects he carried out during his University studies. Unfortunately, as he wrote, due to the current situation in Gaza, infrastructure has been destroyed including power outages as well as network failure. Of course this makes it difficult next to impossible to continue the course (not to speak about all the other major problems for the people that arise from this conflict). I am deeply impressed that in a situation like this, people still continue their efforts to invest in their education...and their future.

And yes, war has finally also knocked on the door of our small island of the fortunate...

Wednesday, July 16, 2014

New DBpedia Graph Statistics

Recently, we have been working on the DBpedia / Wikipedia Page Link dataset. We have considered the English and the German language versions for this project. In the current DBpedia 3.9 page links English and German datasets 18 million and 6 million entities are represented respectively. But the original DBpedia only contains about 4 million and 1 million distinct entities for English and German versions. This significant difference is mainly due to the current DBpedia pagelinks dataset include redirect pages and pagelinks with resources that are not considered as entites (as e.g. thumbnails and other images). So we considered cleaning up DBpedia pagelinks dataset for the computation of statistical parameters (a.g. pagerank or HITS). For the Cleanup we have removed all unnecessary and redundant RDF-Triples from the pagelinks dataset, i.e all removing the redirect pages (Redirection pages are just URIs that automatically forward a user to another Wikipedia page, but do not represent entities) as well as RDF-Triples representing resources that do not have an own rdfs:label (as per DBpedia documentation every entity has an rdfs:label reference).

One of the benefits of the cleaned up pagelink dataset is the faster computation of statistical graph measures (while not influencing the overall statistics, i.e. redirect pages usually don't have incoming links and theother removed resources (as e.g. images) don't have outgoing links). Based on this dataset we have computed PageRank, Hub and Authorities (HITS), PageInlink Counts and PageOutLink Counts. Please find the details of the datasets here on our research group's webpage [1].

For Computation of the DBpedia graph statistics we have used JUNG — the Java Universal Network/Graph Framework. Please find the source code for PageRank and HITS computation here via GitHub [2].

References and further Reading:
[1] New PageRank Computations for DBpedia 3.9 (English/German) at SemanticMultimedia
[2] Source code for DBpedia Graph Statistics

Monday, July 14, 2014

Harald's Original Miscellany - More Truth about Football - Part 4

Finally, Germany has won the Soccer Worldcup 2014. Therefore, also our little statistics on soccer will come to an end with the post today. You might ask yourself what kind of data is left for soccer players in Wikipedia and DBpedia. Well, unfortunately only a little. But, we will try to make something out of it. Last time, we've ask for the number of team changes and the correlation to popularity vs. scored goals for soccer players. What is left, if we look at the available data?





We have the data about the years in which the football players were active or have played in their national soccer team. Let's start with the national team years [1]:

nationalyears NumPlayers
8 3
7 20
6 85
5 371
4 1070
3 2479
2 6116
1 28211

Well, it was obvious that the most players have only 1 or two years in the national team. But, there are exceptional players who achieved even 8 years. But, who are these long term players? [2]:

nationalyears Player Team
8 Wojciech Łobodziński "Poland"@en
8 Wojciech Łobodziński "Poland Under 16"@en
8 Wojciech Łobodziński "Poland Under 17"@en
8 Wojciech Łobodziński "Poland Under 21"@en
8 Wojciech Łobodziński "Poland Under 18"@en
8 Santiago Cañizares "Spain Under-17"@en
8 Santiago Cañizares "Spain Under-16"@en
8 Santiago Cañizares "Spain Under-21"@en
8 Santiago Cañizares "Spain Under-18"@en
8 Santiago Cañizares "Spain Under-23"@en
8 Santiago Cañizares "Spain Under-19"@en
8 Santiago Cañizares "Spain Under-20"@en
7 Aydın Yılmaz "Turkey Under-21"@en
7 Aydın Yılmaz "Turkey"@en
7 Aydın Yılmaz "Turkey"@en
7 Aydın Yılmaz "Turkey Under-19s"@en
7 Aydın Yılmaz "Turkey Under-17"@en
7 Aydın Yılmaz "Turkey A2"@en
7 Ismael Urzaiz "Spain Under-17"@en
7 Ismael Urzaiz "Spain Under-16"@en

Possibly you will never have heard of Poland's Wojciech Łobodziński or Spain's Santiago Cañizares. Here another flaw in the data becomes visible. There is no such thing as the unique national team. We have "under 16", "under 17", "under 18", and so on... So you start your career already with 15 and after 8 years you would be 23 and possibly be in the "real" national team.

Let's have a look at the active years of players. Unfortunately, here the data is rather messy [3]: 
person From To
http://dbpedia.org/resource/Marta_(footballer) 2000 9223372036854775807
http://dbpedia.org/resource/Carlos_Alberto_Gomes_de_Lima 2006 200720082008
http://dbpedia.org/resource/Blake_Camp 2008 200420052006
http://dbpedia.org/resource/Birgit_Prinz 1998 200320042005
http://dbpedia.org/resource/Dejan_Damjanovi%C4%87 1998 20112012
http://dbpedia.org/resource/Inka_Grings 1995 20092010
http://dbpedia.org/resource/Breno_Silva 2003 20092010
http://dbpedia.org/resource/Vin%C3%ADcius_Calamari 2007 20092010
http://dbpedia.org/resource/Samuel_Jos%C3%A9_da_Silva_Vieira 1994 20082009
http://dbpedia.org/resource/Chad_Marshall 2004 20082009
http://dbpedia.org/resource/Michael_Parkhurst 2003 20072008
http://dbpedia.org/resource/Geison_Rodrigues_Marrote 2004 20072008
http://dbpedia.org/resource/Dejan_Stankovi%C4%87 1994 20062010
http://dbpedia.org/resource/Glauber_Da_Silva 2001 20062007
http://dbpedia.org/resource/Hugo_Sarmiento 1999 20032007
http://dbpedia.org/resource/Obafemi_Martins 2000 20032004
http://dbpedia.org/resource/Heslley_Couto 2005 20032006
http://dbpedia.org/resource/Dalibor_Filipovi%C4%87 1992 20022003
http://dbpedia.org/resource/Dmytro_Zayko 2005 20022004
http://dbpedia.org/resource/Aviv_Volnerman 1998 20012004

Possibly this is because of people not only writing single year's into the Wikipedia infoboxes, but time spans and other things. In the list above are only the Top 20. Just remove the LIMIT from the SPARQL query and further down you will find more valid data.

OK, let's come to our last problem related to football. Is there a correlation between the height of a player (simply because we have that data) and the number of achieved goals [4]?
Height sumgoals
243.84 1
23.0 59
215.9 1
206.0 0
204.0 0
203.2 24
203.2 47
203.0 0
203.0 0
203.0 51
202.0 19
202.0 0
202.0 15
201.0 20
201.0 0
201.0 0
200.66 11
200.66 48
200.66 0
200.66 0
200.66 0
200.66 124
200.66 146
200.66 0
200.66 1
200.66 16
200.66 0
200.66 102
200.66 6
200.66 0
200.0 0
200.0 20
199.0 0
199.0 0
199.0 78
199.0 47
199.0 0
199.0 62
199.0 32
199.0 84

It is rather difficult to recognize something in this data. You see heights and the number of goals that player of that height have scored. Fascinating that there seem to be a significant number of players that are taller than 2 meters. I guess that the list leader with 2.43 meters is just incorrect data. Now. this is so many data that we have to visualize it to recognize something....

Correlation between height of soccer players and scored goals
In the diagram to the left you see the soccer players height (x-axis) vs. the number of scored goals (y-axis). Interesting thing to notice is that the heights approximately show a Gaussian distribution, i.e most players have a "middle" height, on the extremes there are only a few. Well, there seems to be an exception. on the outer left you will notice a large fraction of players with a height of 1.52m with goal scores ranging among 0 to 300. This is extraordinary, because I have no idea what kind of group this is or if there is simply an error in the data again. What I noticed is that among this group there seems to be a larger fraction of female Asian soccer players. Maybe they are responsible for that large number of outliers, but this requires further investigation.

Alas, I want to add one last table. For each player there is the information on which position she or he is playing. Of course Wikipedia authors are far from any sort of agreement how to name the player's position. Thus, there is a rather huge variety. Nevertheless, I will leave you with the table to make any sense from it. Enjoy [5]:
position number avheight avgoals
midfielder 4450 176.822828644929299 18.077078651685393
defender 3429 181.37523034450506 8.058326042578011
striker 2643 180.169761559433703 65.213015512674991
goalkeeper 2086 186.186064557569709 0.174496644295302
forward 1787 178.865346269495465 46.388919977616116
centre back 786 185.7391350821381 10.651399491094148
winger 648 174.69354896725695 27.583333333333333
attacking midfielder 490 175.897877222177931 38.006122448979592
left back 461 177.661735253323698 6.995661605206074
defensive midfielder 427 179.741452037869347 11.978922716627635
right back 405 177.829456545982828 7.301234567901235
central defender 196 185.005408462213008 8.897959183673469
central midfielder 161 178.223105294363834 20.099378881987578
full back 158 174.659366173080242 6.759493670886076
centre forward 140 179.09685701642717 80.792857142857143
left winger 113 174.790176753449224 28.398230088495575
defender, midfielder 102 179.789412255380659 10.607843137254902
inside forward 96 172.094789346059145 64.083333333333333
defender/midfielder 82 179.178292995545911 11.463414634146341
defender / midfielder 79 180.188100887250282 15.050632911392405
left-back 65 175.642922504131603 7.692307692307692
right winger 62 175.247418803553424 30.096774193548387
centre-back 56 521.327500479561941 9.375
right-back 54 176.101110952871815 6.111111111111111
midfielder/forward 52 172.013075924836657 22.557692307692308
midfield 52 176.982691251314588 30.673076923076923
defender (retired) 50 181.37344055175781 14.48
second striker 48 174.673332850138346 63.854166666666667
full-back 44 175.908408771861666 2.681818181818182
striker / winger 41 178.235609845417295 49.219512195121951

Of course I limited the table to 30 rows. Interestingly, the average height of goalkeepers is larger than for midfielders or strikers. As expected, strikers on the average score more goals than defenders or goalkeepers.


SPARQL Queries and original data:

Thursday, July 03, 2014

Harald's Original Miscellany - More Truth about Football - Part 3

To change the team means to earn more
money...what about the football millionaires?
How often do they change the team?
Are you ready for more statistics on your favorite kind of sports? Well, data is fun, and obviously Big Data means Big Fun. There are lot's of interesting things to discover while exploring data, and wikipedia (i.e. dbpedia for the insiders) provides all the necessary means.

Have you ever wondered about this kind of slave trade in professional football? Well, I wouldn't exactly call the transfer of a millionaire to a higher paying job a 'slave trade'. But, have you ever thought about the following question: Do the real good (and well paid) players more often change the team - or is it vice versa, that teams try to get rid of players that have a bad season or are on the decline? Who knows? Let's have a look on the data:
TeamChanges NumPlayers
16 2
15 26
14 84
13 287
12 792
11 2247
10 3848
9 5109
8 6464
7 8110
6 9790
5 11264
4 11837
3 11448
2 10961
1 6515
Here, we have a table providing an overview about how many players (in wikipedia) have changed their team for how many times [1]. Obviously, it seems to be some kind of Gaussian distribution with a peak between 2 and 6 team switches. OK, what about the players? Where are the top players listed in this table? Well, David Beckham switched team 11 times according to wikipedia, Cristiano Ronaldo 8 times, Thierry Henry 9 times. At least these numbers are above average which we had identified to be between 2 and 6. This seems to give proof to our original assumption.

person TeamChanges popularity
http://dbpedia.org/resource/Cristiano_Ronaldo 8 1794
http://dbpedia.org/resource/David_Beckham 11 1572
http://dbpedia.org/resource/Thierry_Henry 9 1414
http://dbpedia.org/resource/Lionel_Messi 7 1404
http://dbpedia.org/resource/Wayne_Rooney 5 1343
http://dbpedia.org/resource/Frank_Lampard 5 1188
http://dbpedia.org/resource/Pel%C3%A9 4 1111
http://dbpedia.org/resource/Didier_Drogba 8 1047
http://dbpedia.org/resource/Ronaldo 10 1037
http://dbpedia.org/resource/Michael_Owen 6 1011

But, we get a better overview, if we look at the average popularity of each switching group in the table [3]:
TeamChanges NumPlayers avgindegree
16 2 107.5
15 26 76.769230769230769
14 84 47.297619047619048
13 287 60.885017421602787
12 788 45.073604060913706
11 2235 37.206263982102908
10 3800 32.577631578947368
9 5023 30.939080230937687
8 6332 29.685881238155401
7 7935 25.499054820415879
6 9525 23.188346456692913
5 10886 18.682895462061363
4 11423 14.937844699290904
3 10842 11.131802250507286
2 10089 7.704628803647537
1 5534 5.19588001445609
As we had originally thought, on the average, the popularity of the players rises with the number of team changes. Although the top group with 16 changes is far from the highest possible popularity scores (as e.g. 1572 for David Beckham). Hmm, maybe there is a correlation among the number of achieved goals with the number of team changes? Is it more likely that a top goal hunter switches team more often? Let's have a look [4]:
TeamChanges NumPlayers AvgGoals
16 2 22.5
15 26 28.807692307692308
14 83 37.855421686746988
13 287 38.062717770034843
12 781 39.939820742637644
11 2205 38.625850340136054
10 3784 35.646141649048626
9 4980 32.826907630522088
8 6296 29.489517153748412
7 7826 24.455788397648863
6 9314 21.937835516426884
5 10471 18.655142775284118
4 10627 16.317869577491296
3 9612 12.756450270495214
2 7193 9.911858751564021
1 4624 4.444204152249135

Looks interesting. Top goal scorer have 9 to 14 team switches. This is way above the average. Thus, the more goals you score, the more often you will have the chance of being transferred (and thus earn more money). Players that don't score goals will obviously not be transferred (that often).

References:

Wednesday, June 25, 2014

Harald's Original Miscellany - The Truth about Football - Part 2

John Terry Celebration Meme, read on and you will understand...
Of course you always wanted to know, who is the best football player of all times. Sure this might be a question about which real football afficionados might argue forever. Also Wikipedia will not be able to give you the definite answer. But, we can play around with the available data and maybe we find out something interesting about football players again ...

But, first at all, I want to say thank you to Kingsley Idehen, who gave me the hint for my SPARQL query links to use the parameter "qtxt=" instead of "query=", which enables others to see the original query and to use it for further data explorations. Thus, all SPARQL query links will be given in this form.

So let's start with the most simple query: Select all football players and their popularity (indegree) in descending order starting with the most popular player. We must be a little bit careful, because the class SoccerPlayer does not only contain "real persons" but also popular roles of football players such as e.g. "Captain". Therefore, we filter the results for entities that have a name (via foaf:name). Here are the Top50 football players according to wikipedia. For the entire list, please refer to the references [1].
Name Popularity
Cristiano Ronaldo 1794
David Beckham 1572
Thierry Henry 1414
Lionel Messi 1404
Wayne Rooney 1343
Frank Lampard 1188
Pelé 1111
Didier Drogba 1047
Ronaldo 1037
Michael Owen 1011
Steven Gerrard 1002
Zlatan Ibrahimović 964
Alessandro Del Piero 926
Ronaldinho 914
Raúl (footballer) 903
Ryan Giggs 894
Fernando Torres 889
Zinedine Zidane 867
Ruud van Nistelrooy 861
Robbie Keane 861
Samuel Eto'o 859
Landon Donovan 835
Andriy Shevchenko 823
Kaká 804
Francesco Totti 730
Robin van Persie 720
Paul Scholes 692
Hernán Crespo 680
David Villa 669
John Terry 669
Cesc Fàbregas 669
George Best 667
Carlos Tévez 666
Robinho 643
Gary Lineker 641
Teddy Sheringham 633
Andrew Cole 620
Dwayne De Rosario 617
Xavi 616
Jermain Defoe 613
Craig Bellamy 609
Dimitar Berbatov 587
David Trezeguet 587
Luis Suárez 581
Peter Crouch 577
Michael Ballack 572
Miroslav Klose 568
Luís Figo 567
Lee Dong-Gook 558
Filippo Inzaghi 557
Yes, it was obvious for everybody that names such as Ronaldo, Beckham, Thierry, Pelé occur among the top popular players. Unfortunately, I'm not a football expert to comment further on that. Let's have a look, whether popularity corresponds with the number of achieved goals. However, this information is not easy to extract. For some of the football players, there's a property dbprop:totalGoals, while most of them has dbprop:goals. But the later sometimes exists multiple times for single years or periods. Thus, we have to sum up all dbprop:goals, while keeping in mind not to count any number more often than once (because an entry might be reproduced in our result list for several reasons).
Name Goals Popularity
David Schofield (footballer) 76543210 9
Alcindo Sartori 5019110 160
Oh Seung-Bum 1842256 32
Marei Al Ramly 6037 11
Darío Espínola 1715 6
Kim Andersson 1537 23
Stefan Lövgren 1328 18
Nikola Karabatić 1318 84
Elias Ribeiro de Oliveira 1187 26
Mohd Amar Rohidan 1020 38
Slaviša Žungul 856 113
John Bartley (footballer) 762 1
Zoran Karić 759 11
Jimmy Greaves 748 342
Ernest Spiteri Gonzi 704 11
Pierre van Hooijdonk 670 238
Reg Date 664 3
Trevor Phillips (footballer) 655 4
Joan Linares 645 12
Domenic Mobilio 625 58
Pelé 620 1111
Harry Johnson (footballer born 1899) 610 25
Ernst Stojaspal 602 23
Max Morlock 588 66
Ernie Hine 574 72
Branko Šegota 561 38
Serhiy Koridze 557 4
Salvinu Schembri 538 6
Konstantin Yeryomenko 537 13
Tony Brown (English footballer) 498 47
Waldo Machado 497 36
Nguyen Minh Phuong 496 50
Tony Cascarino 496 141
Leônidas da Silva 484 98
Zeki Rıza Sporel 470 75
Ángeles Parejo 469 9
Tommy Dickson 457 13
Elisabetta Vignotto 454 22
Alberto Spencer 445 99
Stefan Schwoch 435 7
Peter Kitchen 429 16
Edgar Kail 427 7
Eusébio 423 433
Giorgos Sideris 415 51
Tommy Browell 414 88
Patricio Margetic 412 14
Arsénio Trindade Duarte 409 19
Uwe Seeler 406 153
Hughie Gallacher 406 131
Dragan Džajić 401 150
Again we see, that DBpedia data (resp. Wikipedia data) is somehow 'noisy'. The first 3 ranks are obviously wrong concerning the number of goals. Simply because if David Schofield really would have achieved 76,543,210 goals, it would mean that he had won about 5 goals per minute of all the 32 years of his entire life so far. This must be kind of an extraction error. If we look at the players with more than 1000 goals, then a closer inspection reveals some handballers that either are also footballers or are wrongly declared to be footballers. In handball it is easier to achieve a higher number of goals compared to football. Trevor Phillips and John Bartley really achieved more than 600 goals, but their popularity score signals that they did achieve this not necessarely in the major league. The first top ranked prominent football player in this list definitely is Pelé with 620 goals. The only other two in this Top50 list I have already heard of are Eusébio and Uwe Seeler, but don't take me as a reference :)

Lets order the list again the other way around according to the most popular players to investigate their goal score:
Name Goals Popularity
Cristiano Ronaldo 227 1794
David Beckham 95 1572
Thierry Henry 265 1414
Lionel Messi 223 1404
Wayne Rooney 156 1343
Frank Lampard 163 1188
Pelé 620 1111
Didier Drogba 160 1047
Ronaldo 217 1037
Michael Owen 163 1011
Steven Gerrard 98 1002
Zlatan Ibrahimović 198 964
Alessandro Del Piero 223 926
Ronaldinho 157 914
Raúl (footballer) 280 903
Ryan Giggs 114 894
Fernando Torres 161 889
Zinedine Zidane 95 867
Ruud van Nistelrooy 249 861
Robbie Keane 179 861
Samuel Eto'o 219 859
Landon Donovan 135 835
Andriy Shevchenko 219 823
Kaká 114 804
Francesco Totti 226 730
Robin van Persie 130 720
Paul Scholes 107 692
Hernán Crespo 198 680
John Terry 30 669
David Villa 234 669
Cesc Fàbregas 50 669
George Best 238 667
Carlos Tévez 135 666
Robinho 122 643
Gary Lineker 243 641
Teddy Sheringham 289 633
Andrew Cole 226 620
Dwayne De Rosario 94 617
Xavi 57 616
Jermain Defoe 151 613
Craig Bellamy 113 609
Dimitar Berbatov 189 587
David Trezeguet 218 587
Luis Suárez 128 581
Peter Crouch 102 577
Michael Ballack 117 572
Miroslav Klose 181 568
Luís Figo 91 567
Filippo Inzaghi 184 557
Patrick Vieira 45 551
As we would expect, most of the popular football players are also good goal scorers. Well, there are a few exceptions. Take John Terry with a popularity score of 669 and only 30 goals. Why might he be so popular then? Taking a closer look at Wikipedia reveals that Terry plays at centre back position and is the captain of Chelsea in the Premier League. Well, that's already something for popularity. But, if you look even closer, you will find more: under the topic 'Controversies' you will find charges for assault and affray, a £60 fine for parking his Bentley in a disabled bay, extramarital affair allegations as well as racial abuse allegations. But, neither of these is directly responsible for Terry's popularity. In fact it's an internet meme (cf. introductory picture of this article). John Terry was suspended for the UEFA Final and had to watch his team in a suit and tie on the sidelines. He did look quite miserable as he sat there, watching his team defend for their lives and then miraculously pull out the victory. However, as soon as Chelsea made the victory, it was party time for Terry! He immediately threw off his suit like Superman and revealed his full Chelsea kit underneath his suit. The internet community enjoyed his dedication to his club and soccer so much that immediately a popular internet meme lampooning his behaviour appeared on the web, becoming one of the most popular online jokes in 2012. Terry has been pictured taking part in great moments in history and fiction. These included the fall of the Berlin Wall, the freeing of Nelson Mandela, the triumph of Rocky Balboa, as well as the first landing on the Moon [3]. Well, this should be reason for some popularity :)

Monday, June 23, 2014

Harald's Original Miscellany - The Truth about Football

The England National football team, 1893, photo: wikipedia
Well, it's the time of the Worldcup 2014. Why should I bother you with the peculiarities of authors and writers, when we can also have a look on Football! As you might remember, we had about 15,328 individuals in Wikipedia classified as authors [1]. What do you think, how many footballers are there compared to authors? Well there's a huge difference: 162,597 referenced footballer players, i.e. about 10 times as much as authors [2]. Maybe you think now there might be an overlap. How many football players are also listed as authors? I am sorry to disappoint you, but there is no overlap. No football player is also listed as being an author.

Fact No. 1: Footballers are no authors, and vice versa.

So you might wonder, what other categories these football players are in to get a better overview about what we are talking about. Interestingly, when looking at the most popular categories, you will soon find the large number of expatriates among those players. The Top5 expatriate nationalities among football players are: Brazil, Argentina, Russia, France, Serbia [3]. If you look at the bottom of the list, you will find the more or less exotic combinations, such as e.g. Hungarian expatriates in Uzbekistan, or Cameroonian expatriates in Venezuela, or even German expatriates in the Netherlands ;-)

Fact No. 2: Only a few Hungarian football players emigrate to Uzbekistan.

And in which countries they prefer to emigrate? The Top5 countries for football players to emigrate are: England, Germany, Spain, Italy, France [4]. At least for France the statistics seems to be balanced somehow, while England and Germany are the leading nations to attract foreign football players all around the world. Very interesting also the bottom of the list, where as the "least attractive" countries Gambia, Guam, Nepal, or Antigua and Barbuda are listed.

Fact No.3: French football players seem to be undecided whether to stay or leave the country.

But what about the categories that do not have a direct relationship to football in the first place? Let's filter out these categories and let's have a look on what football players are up to.

alternative professions of football players #players
http://dbpedia.org/class/yago/Director110014939 279
http://dbpedia.org/class/yago/Migrant110314952 271
http://dbpedia.org/class/yago/National109625401 250
http://dbpedia.org/class/yago/Intellectual109621545 237
http://dbpedia.org/class/yago/Alumnus109786338 234
http://dbpedia.org/class/yago/Scholar110557854 234
http://dbpedia.org/class/yago/Striker110663996 134
http://dbpedia.org/class/yago/Worker109632518 73
http://dbpedia.org/class/yago/SkilledWorker110605985 64
http://dbpedia.org/class/yago/Unfortunate109630641 56
http://dbpedia.org/class/yago/Trainer110722575 55
http://dbpedia.org/class/yago/Coach109931640 54
http://dbpedia.org/class/yago/Communicator109610660 51
http://dbpedia.org/class/yago/Serviceman110582746 51
http://dbpedia.org/class/yago/Expert109617867 43
http://dbpedia.org/class/yago/Observer110369528 33
http://dbpedia.org/class/yago/Cricketer109977326 32
http://dbpedia.org/class/yago/Adult109605289 28
http://dbpedia.org/class/yago/Victim110752093 28
http://dbpedia.org/class/yago/Rival110533013 27
http://dbpedia.org/class/yago/Official110372076 26
http://dbpedia.org/class/yago/Adjudicator109769636 26
http://dbpedia.org/class/yago/MilitaryOfficer110317007 25
http://dbpedia.org/class/yago/EnlistedPerson110058777 25
http://dbpedia.org/class/yago/Soldier110622053 25
http://dbpedia.org/class/yago/Referee110514429 25
http://dbpedia.org/class/yago/BadPerson109831962 23
http://dbpedia.org/class/yago/Wrongdoer109633969 23
http://dbpedia.org/class/yago/Writer110794014 21
http://dbpedia.org/class/yago/Professional110480253 21
http://dbpedia.org/class/yago/EnglishCricketers 20
http://dbpedia.org/class/yago/OlympicBronzeMedalistsForGermany 18
http://dbpedia.org/class/yago/Creator109614315 18
http://dbpedia.org/class/yago/AsianGamesGoldMedalistsForIran 16
http://dbpedia.org/class/yago/OlympicGoldMedalistsForTheUnitedStates 16
http://dbpedia.org/class/yago/Presenter110466387 16
http://dbpedia.org/class/yago/Survivor110681194 16
http://dbpedia.org/class/yago/Entertainer109616922 16
http://dbpedia.org/class/yago/Performer110415638 16
http://dbpedia.org/class/yago/Captain109893191 15
http://dbpedia.org/class/yago/CommissionedMilitaryOfficer109943239 15
http://dbpedia.org/class/yago/CommissionedOfficer109942970 15
http://dbpedia.org/class/yago/Artist109812338 15
http://dbpedia.org/class/yago/OlympicSilverMedalistsForParaguay 14
http://dbpedia.org/class/yago/OlympicSilverMedalistsForPoland 14
http://dbpedia.org/class/yago/Journalist110224578 14
http://dbpedia.org/class/yago/Principal110474950 13
http://dbpedia.org/class/yago/Criminal109977660 13
http://dbpedia.org/class/yago/OlympicGoldMedalistsForArgentina 12
http://dbpedia.org/class/yago/OlympicBronzeMedalistsForItaly 12
Please find the entire list in the references [5].

Fact No. 4: There are more intellectuals among football players than bad persons.

Wow, 237 football players are also categorized as being intellectuals, while 23 football players are listed as "bad persons". But, in this statistics, we will also find 21 writers(!) among the football players, as well as 15 artists, 14 journalists, and 13 criminals.
Further down the list, you will also find
  • 12 politicians, 
  • 8 identical twins, 
  • 8(!) head of state, 
  • 7 musicians, 
  • 4 comedians (I'll bet there are more...), 
  • 4 scientists, 
  • 3 singers, 
  • 2 mammals,
  • 2 Gentleman Cricketeers,
  • 2 gambling addicts,
  • 2 aviators,
  • 2 painters,
  • 1 UFO conspiracy theorists,
  • 1 bank robber,
  • 1 rapper,
  • 1 plumber,

etc.

So what does this tell us about football players?

Fact No. 5: There are more politicians among football players than comedians.

Well, in general, and according to Wikipedia, football players most times stick to their original profession. While emigrating here and there sometimes, there are only a few among them who actually have a second career outside of their original profession. Please note that we did not follow categories like football manager, football trainer, football coach, etc. Anyway, we finally did find also some authors among them....

Fact No. 6: There are writers among the football players...although they are not listed as authors.

to be continued....


Please find the full tables with all the results listed here in the References:

[1] total number of authors (?author rdf:type dbpedia-owl:Writer .)
[2] total number of football players (?player rdf:type dbpedia-owl:SoccerPlayer .)
[3] expatriate football players by home country
[4] expatriate football players by emigration country
[5] occupations of football players other than football