Tuesday, June 05, 2007

ESWC 2007 - European Semantic Web Conference, Innsbruck (Day 02)


Another warm early summer morning in Innsbruck. The poster session yesterday (official end 8pm) at least for me ended a about 10pm. After my short walk from the train station to Innsbruck congress center the 2nd day of the ESWC 2007 again starts with an invided talk.

Keynote 3 - 9.00 - 10.00
Georg Gottlob from Oxford University presents a talk on 'The Lixto systems applications in Business Intelligence and Semantic Web'. He addresses the question on how knowledge about market developments and competitor activities on the market can be extracted dynamically and automatically from semi-structured information sources on the Web. As a logic foundation, he proposes to use monadic second order (MSO) logic, which captures exactly the essence of data extraction (define sets of nodes for a document, re-label tree nodes as monadic predicates, get rid of irrelevant predicates...that's what data extraction is about). Next, the extracted data can be classified, re-assigned and transformed with the aid of Semantic Web ontological domain knowledge. All these techniques are combined in the (commercial) Lixto Visual Wrapper application (for non-commercial use Lixto is freely available. Just write an email to Robert Baumgartner).

Natural Language and Ontologies / Applications session - 10.30 - 12.00
Again, I decided to share two sessions this morning, the Applications session first and later switching to the Natural Language and Ontologies session. The application session starts with a talk from Sören Auer on 'What have Innsbruck and Leipzig in common? Extracting Semantics from Wiki Content'. The idea is to use those data from Wikipedia that are (really) structures. Yes, besides unstructured text information, Wikipedia contains real structured data. E.g., for all geographic entities such as cities, countries, etc. all of them include a so called infobox at the right upper side of the article. Within the infobox (as being a fixed template) there you will find structured data such as population, area, etc. all structured in a similar way. Thus, being subject to easy data extraction. The next step was to encode the extracted data from the infoboxes (and others) into RDF statements (more than 8 millions for the English version of wikipedia). The application 'Query wikipedia' (there is also a blog) is based on the RDF database being extracted from wikipedia and is available on the web. 'Query wikipedia' is part of the community project 'DBpedia', which offers several alternative user inferfaces for wikipedia data and aims to integrate (structured as well as unstructured) wikipedia data with other data resources on the web.

Next, Tudor Groza from DERI Galway presents a talk on 'SALT - Semantically Annotated LaTeX for scientific publications'. Normally - at least for HTML or PDF data - semantic metadata are created (if being created) a posteriori, i.e. after the actual writing process. Tudor proposes an authoring tool for generating semantic annotations concurrent to authoring the text (-> therfore, concurrent annotation). SALT deploys pdf as standard container for both annotation and document content while extending the LATEX writing environment to support the creation of metadata for scientific publications. More information about the SALT project including the underlying ontologies for representing document structure, annotations, as well as rethorical structre can be found here.

Unfortunately, I have missed an fierce argument (at least as Andreas told me...) taking place in the Natural Language and Ontologies session. But now, I'm listening to a talk given by Peyman Sazedj from Lisboa on 'Mining the web through verbs: a case study'. They focus on extracting relation instances -- in particular verbbased relation instantiationa -- among annotated entities. As a case study, they extracted verb relations from imdb biographies. Although the techniques being applied were not rather sophisticated (verb chunking and entity clustering, mapping verbs to relations from an ontology) the results on large text corpora seem promissing. The IMDb sample corpus is available here.

Keynote 4 - 14.00-15.00
The afternoon sessions start with a keynote given by Ron Bachman of Yahoo! Research on 'Emerging Sciences of the Internet: Some New Opportunities'. For positioning Yahoo! in that area of research he started his talk with a quote of Wayne Gretzky's father "Don't skate where the puck is, but where it will be...". Ok...now 20 minutes of talk are over and I'm still trying to figure out what his poit really is. We've seen a lot of recent buzzwords (including some old ones). I guess, if taking the heading into account, (at least I hope) that the talk should lead to 'critical' things to watch for in the near future (concerning the development of the web)...We'll see..Ok, now it is getting clearer. It's about a change in (computer) science. Bachman is refering to the ACM taxonomy of computer science as 'old' computer science (somehow reminds me of 'the old Europe...'). In relation to that he defines areas of 'new' computer science as, e.g. 'finding' science (beyond search), community science, algorithmic advertising, computational micro-economics, media experiences, data science...that of course can also be (somehow) subsumed in the ACM taxonomy. So all comes down to some 'new' research challenges according to Yahoo!

Querying and Web Data Models / Ontology Engineering II / Personalization
For the 2nd afternoon session I'm trying to achieve an 'Hattrick' by following all three parallel sessions. I'm starting in the Ontology Engineering II seesion with Enrico Motta's presentation on 'Integrating Folksonomies with the Semantic Web'. Starting with limitations on current tagging (different granularity, multilinguality, spelling errors) leading to a very low search recall, giving meaning to tags is an essential issue. Connecting tags with meaning (aka semantic web technology) will be fruitful in both directions. Of course for categorizing tags and enabling logical reasoning over tags, as well as (in the other direction) learning ontologies from tags.

The last talk started with some delay and in addition I had to switch levels to get to the basement 'Strassburg' hall, where the next talk on the Querying and Web Data session is taking place. I was a little bit late, so I couldn't figue out, who of the paper's authors is presenting the paper on 'A Unified Approach to Retrieving Web Documents and Semantic Web Data' (I guess it is Krishnaprasad Thirunarayan). He distinguishes between the traditional hyperlinked web and the property-linked RDF-web and discusses a unified approach which formalizes the structure and the semantics of interconnections between them. Also a new hybrid query language for retrieving data and information from the unified web has been developed and integrated in a small prototype 'SITAR' (unfortunately not available on the web...).

Ok, up again 2 levels to the 'Grenoble' hall to the Personalization II session. Here, I'm a little bit too early for the next interesting talk, given by Serge Linckels from HPI Potsdam on 'Semantic Composition of Lecture Subparts for a Personalized e-Learning'. Strange, but he starts his talk with advertising HPI's tele-task lecture recording system...ok, just waiting for the connection to semantic web. Serge is presenting a prototype system that interconnects single learning objects (videos) with the help of a (manually crafted) domain ontology. Learning objects are also semantically annotated in a manual way. The ontology defines which topic (represented within the learning object) is dependent of other topics ( i.e. other learning objects). On this dependency structure (encoded in description logics), a closure is computed w.r.t. a given query and thus, a sequence of interdependent learning objects is presented as a result.

Finally, back to Ontology Engineering II, where Johanna Völker is presenting a talk on 'Acquisition of OWL DL Axioms from Lexical Resources'. Light-weight ontologies (that can easily be learned) are not sufficient for reasoning-based applications. Thus, Johanna discusses the question of refining these light-weight ontologies to achieve more complex class descriptions. Based on a deep syntactic analysis of natural language definitions they try to achieve ontologies with expressive axioms. The feasibility of the approach is shown by generating complex class descriptions from Wikipedia definitions and from a fishery glossary of the Food and Agriculture Organization (FAO) of the United Nations.

This concludes the afternoon sessions (and the conference presentations). I'm looking forward to the conference dinner later that evening (8pm). Now, there's sufficient time for taking a walk in Innsbruck city center and to have some coffee.

[to be continued .... trrying to achieve 'live'-blogging]