Wednesday, November 15, 2006

wikipedia to serve as a global ontology....



Today, I met Lars Zapf for a quick coffee enjoying the rare late afternoon november sun. We were exchanging news about ISWC, WebModay, recent projects, and stuff like that. While talking about semantic annotation, Lars pointed out that instead of using (or developing) own ontologies for annotating (and authoring) documents, you could also use a wikipedia reference to indicate the semantic concept that you are writing about. Thus, as he already wrote in a comment, e.g., you could use the link http://en.wikipedia.org/wiki/Rome to indicate that you are refering to the city of Rome, the capital of Italy.
Of course you might object that there are several language versions of wikipedia and thus, there are several (different) articles that refer to the city of Rome. To use wikipedia as a 'commonly agreed and shared conceptualization' - to fulfill at least some points of Tom Gruber's ontology definition as long as wikipedia lacks the 'formal' aspect of machine understandability - we can make use of the fact that articles in wikipedia can be identified with articles in other language versions with the help of the language indicators at the lower left side of wikipedia's user interface. To serve as a real ontology, each wikipedia article should (at least) be connected to formalized concept (maybe encoded in RDF or OWL). This concept does not necessarely have to reflect all the aspects that are reported in the natural language wikipedia article. E.g., Semantic Media Wiki is working on a wiki extension to capture simple conceptualizations (such as e.g. classes or relationships).
An application for authoring documents could easily be upgraded by offering links to related wikipedia articles. If the author enters the string 'Rome', the application could offer the related wikipedia link to Rome [or any selection of related offers] and according to the authors this link can be automatically encoded as a semantic annotation (link).
O.k., that sounds pretty simply. Are any students out there to implement it (anybody in need for credit points??)? I would highly appreciate that...

4 comments:

Tom Gruber said...

I find it useful to think ontologies as social contracts and engineered artifacts designed for some purpose. Among those purposes are communication among people - to disambiguate words or to aggregate content anout a common concept.

Using the wikipedia entries is a good way to anchor a reference to an identifier so that two people (or RDF triples) could agree that they are talking about the same thing.

However, wikipedia entries are better at providing content than identity, because, among other reasons, they do not distinguish symbols (the language the word is in) from concepts. Ideally, this is the sort of thing that a tag commons might serve. In other words, there ought to be namespace services (tag spaces) and content services (wikipedia, electronic libraries, citation indices) that point to common names in the name spaces.

tom gruber

Pietro said...

What I would actually argue is that wikipedia only represents a part of humanity. And not just because many people have no access to computers, but because many people just don't find their view represented in the 'neutral' point of view. As if there was such thing as a 'neutral' point of view. This might not seem from the inside, but as soon as you try to edit anything which is not mainstream you end up having to fight for every inch of territory. I had to fight for days to get the simple concept that enzymes are unfolded under heat.

Can't we find a bettersolution that always looking for a single universal, unchanging, onthology?

hs said...

@Tom:
Thanks for your comment!
I agree with you about the shortcomings of wikipedia concerning identity (vs. content). The need for a global namespace service is pretty obvious (and as you know has been proposed already a long time ago with the concept of URIs). You also mentioned tag commons as being an initiative that addresses the problem of tag identity (for collaborative tagging).
My proposal in using just wikipedia for identity and disambiguation for sure is not perfect (just because as you mentioned that wikipedia is mixing up concepts with symbols...but I was considering just a 'viable' ad-hoc solution). But, if you try to consider it the other way around, what is needed to augment current wikipedia to serve for the proposed purpose? If we link each article to an URI (or map it via any kind of naming service) then of course we might just use the URI (or naming service) directly...

hs said...

@Pietro:
Hi Pietro...thanks for your comment!
Of course you are right about the difficulties, if you want to change/supplement an existing wikipedia article with recent research results. We have already talked about that shortcoming. As you might remember I told you about my struggle with some students in wikipedia to convince them about my status as being a researcher and that they didn't want to accept anything new that was not told in their own lectures...
But that wasn't my point. I was just loking for an already existing namespace...and the only thing that is supposed to be commonly agreed is identity and not content...(I know, maybe there might also arise new conflicts...but fortunately not for the major part of articles (=identities))

I was also thinking about your 2nd comment. Do we really need a universal ontology? Well, maybe not. But then, we are in need of procedures to map different (personal/local) ontologies to each other to check their identity (or similarity). Maybe that's even more difficult....