Saturday, January 13, 2007

...against all odds


On wednesday I attended a talk given by Michael Strube from EML Research on "World Knowledge induced from Wikipedia - A New Prospect of Knowledge-Based NLP ". He was showing how the (meanwhile famous) collaborative encyclopedia can be used for information retrieval purposes in a way similar to (more traditional) online dictionaries as e.g. WordNet and - though being not well structured - provides results of almost equal quality.
First thing was that for their work, Strube and his colleague regarded each Wikipedia page as being the representation of a concept (we already had some arguments about that as you might remember...). Next, they developed some metric for similarity of concepts w.r.t. to the concept hierarchy (where the wikipedia defined 'concepts' come into play). Since 2004, wikipedia features a user defined concept hierarchy. This hierarchy of concepts also can be regarded as being a folksonomy, simply because this is not a knowledge representation carefully designed by some designated domain expert, but by the wikipedia comunity in a collaborative way. Unfortunately, the wikipedia concept hierarchy suffers exactly from that fact. From my pont of view it seems problematic to compare the proposed similarity measure (based on wikipedia concept hierarchy) with other similarity measures (based on commonly shared expert ontologies). O.k., you might argue that indeed the wikipedia concept hierarchy IS commonly shared, because it has been developed by the wikipedia community...but is the knowledge represented in wikipedia really 'common'? Just remember the diversity and manifold of Star Wars characters or Star Trek episodes in wikipedia compared with, as e.g., the history of smaller Eropean countries. As for all ontologies always the view and the knowledge of the ontology designer has to be considered. The wikipedia concept hierarchy - although partly being really appropriate - reminds me somehow to this famous literary chinese dictonary entry defining the term 'animal' which is quoted by Jorge Luis Borges. Another problem lies in the fact that the different language versions of wikipedia have developed different concept hierarchies (sic!).

In the end, I was asking how this proposed information retrieval based on wikipedia could be improved by considering a 'Semantic Wikipedia', as e.g., the Semantic MediaWiki (given that those semantic wikipedias would contain sufficient data). Instead of answering my question, Michael Strube cited Peter Norwig's argument against the Semantic Web from last years AAI2006. Just to sum up: the semantic web will not become reality because of the inability of its users to provide correct semantic annotations. But hey...this guy (Strube) was talking about wikipedia. Doesn't this argument raise any associations? Just remember the time 5 or 10 years ago. Nobody (well almost nobody) would have believed that it will be possible to write an entire encyclopedia collaboratively on an open source basis - just because the web user's did not seem to be able to write 'correct' articles....