Friday, February 23, 2007

Internet Pioneers ... must see!


While doing some research on internet history for writing the 2nd edition of my WWW book, I found an impressive film that is gathering a lot of the most important internet pioneers. The documentary is entitled "Computer Networks: The Heralds of Resource Sharing" and was produced back in 1972 (!!!) by Steven King from MIT. It features the ARPANET and many of the most important names in the history of computer networking.
You will see J.C.R. Licklider, former director of the IPTO at ARPA, who was the first to envision a global internet. Also Larry Roberts (also former director of IPTO), Robert Kahn (co-inventor of the TCP/IP protocol and winner of the Turing Award), or Donald Davies (co-inventor of packet switching) are giving contributions.
It's really a 'must see', simply because all I already knew about internet history had come from books. I also only knew a few pictures showing early internet hardware or some mug-shots of the mentioned scientists. It's really interesting to see them in the film and to hear them talking about their great vision of internet computing as it has become reality today.....30 years after the film was produced...

Thursday, February 22, 2007

Conspiracy ahead...Dan Brown - Angels and Demons

Finally, another year after reading 'The Da Vinci Code', I decided to give its predecessor - Angels and Demons. - a try. Ok, I really liked the 'Schnitzeljagd' (scavenger hunt) of connecting seemingly unconnected facts into wild conspiracy theories as it was presented by Dan Brown it 'The da Vinci Code'...although I was disappointed by its ending. But, as most times, it's hard to put an end to a story that is trapped in an apparently ever lasting climax. Thus, I thought, maybe the predecessor would be a little bit more well balanced.
Anyway, I had high expectations.....
So...you've heard about the Illuminati? Yes, I know. Ever since Robert A. Wilson's Illuminatus Trilogy, the Illuminati have been subject to incredible conspiracy theories. This enlightenment secret society of freethinkers, most times connected to their Bavarian section founded by Adam Weishaupt back in the 18th century, where illustrious men like Goethe have been reputed members. A lot of connections have been tried to make to Freemasons or Rosicrucians, and because of their general close connection to the movement of enlightenment -- including their opposition to the church and christian faith -- as well as for being a secret society (the government always is afraid of conspiracies) they have been banned. But, there is a lot of conspiracy literature -- reputable as well as pure fiction -- where you can read all about.
I've read Wilson's Illuminatus more than 20 years ago. As being a teenager by that time, I was really fascinated that there should be a (entire different) world out there that only opens up for those who are enlighted. Everybody else was only able to see the surface and only a happy few were able to look behind the things of daily life ... although the traces were so obvious.
So, the concept of 'Angel and Demons' was not so new to me. Dan Brown tries to draw connections between modern particle physics (the plot starts with a murder taking place at CERN) and its destructive potential (antimatter and its disastrous effects), the ancient conflict between christian faith and natural science (a.k.a. the Vatican gang against the enlightened conspirers), and the moral values of faith and christianity at all. The story is driving an accelerating pace within this conflict, and a Harvard professor of semiotics together with a female (...and rather sexy) CERN scientist trying to solve the conspiracy puzzle that is threatening the Catholic church in its very foundations. For sure this scavenger hunt is rather exciting and thrilling, but also somehow frustrating.
But for me, the end (this time I won't spoil) was reconciling again (at least a little bit, although not everything was explained, as e.g. the provenance and the story of the assassin). But we learn, that there is nothing miraculous about and we don't have to be afraid of world threatening conspiracies.
Ok...I guess you have to read it by your own. It's really entertaining...and you will have a lot places to see, when you are visiting Rome and the Vatican the next time.
Before I forget, you should really read the original English version. The German translation (I've read a few pages) is rather bad. I mean, it's well translated, but the language is rather shallow. Maybe it's the same with he English version...but for being a foreign speaker, maybe I don't realize. At least it was pretty simple to read and not difficult at all.

Monday, February 19, 2007

LEARNTEC, Karlsruhe February 13-15


This year, we participated at the LEARNTEC Fair in Karlsruhe (February 13-15). LEARNTEC is focussed of e-learning technology integrating universities and industries together within an exhibition and a congress. As officially being the advisor of an ESF/BMBF funded startup company called OSOTIS, I was visiting my students who took part at this exhibition. OSOTIS is also the name of the 'Academic Video Search Engine' that serves as a testbed for our research in semantic web and multimedia search technology.

The setting of OSOTIS is the following: We are dealing with lecture recordings and offer a search service over and also inside those lecture recordings. The main advantage of OSOTIS is that most of the video post processing that is necssary for implementing a search is done in a completely automated way. Many other video search systems depend on cost intensive post processing, such as segmenting the videos into short 'learning objects', manually annotating the video segments, etc.
OSOTIS is different:
It makes use of additional information resources such as desktop presentation (e.g. powerpoint or pdf slides or simply desktop recordings) that can be synchronized with the video recording in different ways. If there is only a lecture video without any additional information source, even speech recognition technology is able to provide keywords that can be used for the video annotation. In this way, the video can be automatically segmented and the segmants can be annotated with keyword descriptors. Additionally, if there is no way to determine the content of the video, OSOTIS offers manual annotation and social tagging services to all registered users. Thus, there is always some way to search inside each lecture recording, no matter if additional information resources are available or not.
You just enter a keyword and OSOTIS will display a list of lecture recordings that are related to that keyword. By selecting one of the results, the video will start at exactly that point in time that is directly related to the user query. OSOTIS does not host the video resources on its own server, but offers only links to the original streaming servers (for streaming resources) or origin servers with podcast/videocast recordings. Thus, also all kind of video formats can be maintained, as e.g., real media, mpeg, mp4, flash video, and others.
Up to now, the main part of hosted video lectures is in given in German (and thus being hosted by german speaking universities, as also Austria or Switzerland). But, the number of lecture recordings in English will be increased soon.

Tuesday, February 06, 2007

Arthur Schnitzler - Traumnovelle (Dream Novel)


I really have to hurry up, because in reading I'm still 2 books ahead of my reviews. So, after 'Rouge et Noir' I decided to read Arthur Schnitzler's 'Traumnovelle', which b.t.w. formed also the basis for Stanley Kubrick's as 'Eyes wide shut'. I guess I read the book because of the movie, but after reading the book I must confess that I really like the book much more. So, basically it's about that couple living in Vienna. The time is about at the beginning of the 20th century. He's a physician and the novel starts when the two are about to visit a ball (same as in the movie). Home again, she tells him about her dream and an incident that happened during their last holidays. There was a stranger and she was very attracted to him...and if he (the stranger) had said only one word, she would have followed him no mater of the consequences (but of course this didn't happen). He (the physician), somehow, is really shocked by this revelation. Then, in the night, he is called into the house of a dying patient, but as he arrives, the patient has already passed away. When leaving the patient's house again, the daughter of the dead (with tears in her eyes, her fiancé waiting for her in the other room) tells him that she loves him. Very moved, he's running through the streets of Vienna, doesn't want to go home, still thinking of some kind of 'revenge' for the 'imaginary deed' of his wife. He follows a prostitute to her home but leaves her place already before coming to business. In a bar, he meets an old friend who is playing the piano. The friend tells him that he is invited to play piano at some secret (private) party, and that people there are celebrating some kind of secretive and 'orgiastic' carnival. He persuades his friend to play some trick to get him into that party, but to get in he is in need for a mask to disguise his identity. In the middle of the night, he goes to rent a mask at a shop (again another story telling about the shop keeper selling his daughter as a prostitute...). Nevertheless wearing the mask he succeeds in getting into the party, but a strange (attractive and almost nude) woman realizes his presence there and that he is not supposed to be at this place. She warns him, but he does not care. Other people realize that he is a stranger and he is asked for the password that he can't provide. He is supposed to be punished, but the strange woman takes his place and therefore also the punishment for him. The next day, when he got home, he reads about some strange killing of a noble woman taking place in a hotel that very night, and he decides to find out, whether this killing and the 'punishment' of the last night might be connected to each other.....
I won't tell you how it ends - anyway, if you have seen 'Eyes wide shut' for sure you will know. But....it was really some experience to read it and I have enjoyed it very much. Schnitzler leaves many things to your own imagination...and Kubrick for sure invested a lot of it to setting it into scene. But, as for any movie that is based on literature, the movie just shows a special reading of the book with special emphasis on things that the director regarded as being important. Kubrick did some great job....but sorry, I don't like Tom Cruise as an actor (the only movie where I really liked his performace was 'Magnolia'...but the performance of Philip Seymour Hoffman was much more impressive...). Thus, reading the book opens up new possibilities, new ways on how your imagination might put some light into the strange story. I can highly recommend it (and it's rather short..you can make it in just one day).

P.S. you might find some other works of Arthur Schnitzler at Project Gutenberg

Saturday, January 27, 2007

Stendhal - Rouge et Noir


It seems to be the time to write about the first big novel I have read this year...although I'm already 2 books ahead and otherwise I will loose track completely. As usual - and as I have read the book in German translation - I will write a short comprehension in english, but will discuss everything in German.
Stendhal a.k.a. Henri Beyle put the scenery of "Rouge et Noir" in the time of about 1830, the Bourbone restauration in France, and subtitled it as a chronicle of the 19th century - which was still young at his time. But, it was supposed to be a novel taking place right now...and not in the past. Julien Sorell, the unusual intelligent son of a simple wood cutter - at least as being a designated priest he could speak Latin and had an enormous memory that he showed when citing entire parts of the bible by heart (and in Latin) - got the job of a house teacher in the family of the local Mayor M. de Renal. He seduces Mdme. Renal - not really out of love, but more because of his ego - and to avoid a scandal he is forced to leave. He joins the priest seminar - which by the way is one of the most impressive written parts of the book - and finally succeeds in becoming the private secretary of Marquis de la Mole. The Marquis' daugther soon got an eye on Julien and finally - this really takes Julien some time and and also sophisticated strategies - they plan to marry because she became pregnat (by him...). Of course the Marquis is rather dissappointed about this misalliance. Then, he receives a letter written by Mdme. de Renal in which she warnes the Marquis de la Mole about Julien being an imposter, whose only goal is to make carreer out of seducing women in the families where he is put in. Julien also reads the letter and for revenge shoots Mdme. de Renal while she is attending at church. Although she recovers, Julien gets voluntarily adjudged and executed......

Die vorliegende neue deutsche Übersetzung von Stendhals Klassiker ""Rot und Schwarz" kann ich allen - egal ob Fan von französoscher Literatur des 19. Jahrhunderts oder nicht - nur wärmstens ans Herz legen. Das Buch ist überaus spannend und unterhaltsam geschrieben. Stendhals mitunter kurze und prägnante Art verzichtet auf ausschweifende Schilderungen der Schauplätze ohne jedoch das jeweils für diese typische außer Acht zu lassen. Üppig, intensiv und wohlüberlegt ausgefallen sind alle Dialoge. Man durchlebt die Höhen und Tiefen von Julien Sorells Dasein - auch wenn man seine Gefühle, seinen Antrieb heute nicht immer recht verstehen kann. Die französische Revolution, Napoleons Kaiserreich und die anschließende Restauration - auf die eine weitere Revolution folgen sollte - prägen das gesellschaftliche Bild, das Stendhal zeichnet. Der Karrierist und bürgerliche Emporkömmling wird ebenso scharf charakterisiert wie der alteingesessene Adel, der ewige Streit zwischen Jesuiten und Jansenisten verfolgt die Handlung wie das gerade im Entstehen begriffene Genre des Stutzers und modebewußten Dandytums. Und natürlich die Frauen...alle scheinen sie in Julien verliebt. Angefangen von der unscheinbaren Kammerzofe, über Mdme. de Renal, einer Kaffeehausangestellten, einer verwittweten Generalin, bis hin zur Marquise de la Mode...alle weiß Julien von sich einzunehmen...und zu enttäuschen.
Das Ende jedoch - laut Stendhal Bestandteil der dem Buch zugrundeliegenden wahren Begebenheit - bleibt mir rätselhaft. Wie bereits geschildert versucht Julien Mdme. de Renal in der Kirche zu ermorden und sieht danach, obwohl diese sich von ihren Verletzungen erholt und ihm vergibt, keinen anderen Ausweg, als sich dem Gericht zu überantworten und selbst auf seine Verurteilung zum Tode zu bestehen. Natürlich...nicht gerade ein 'Hollywood'-gerechtes Ende. Aber eindringlich und wirklich kurzweilig erzählt. Besonders hervorzuheben sind in dieser Ausgabe die vielen Zugaben. Neben einem ausführlichen Anhang mit Erklärungen und Anmerkungen Stendhals (die man im laufenden Text jeweils nachschlagen kann..) bietet die Ausgabe noch Entstehungs- und Wirkungsgeschichte, sowie Stendhals eigene Rezension des Werkes. Also: Lesebefehl!

Wednesday, January 24, 2007

SOFSEM - Day 4

Now we have snow....finally :) ...even a lot of it. It was snowing all day long, roads in Czech Republic and also in southern Germany were closed. Also Prague Airport was closed until the afternoon. But, I guess as far as I remember that are the more typical weather conditions for SOFSEM.
Anyway, the day started with a keynote of Tom Henziger about 'Games, Time, and Probability: Graph Models for System Design and Analysis'. He addressed three major sources of system complexity: concurrency, real time, and uncertainty. Concurrency can be modelled as a multi-player game representing a reactive system with potential collaborators and adversaries. Real time requires the system to combine discrete state changes as well as continous state evolution, while state changes - for uncertainty - also have to be modelled in a probabilistic way.
Unfortunately some of the presenters of the following contributed papers did not show up. Thus, the conference program was subject to several changes. In the afternoon the posters of the student research forum each had a short 5 minute presentation, followed by a poster exhibition and a lot of discussions. In the end, the participants should give a vote for the best poster presentation. My choice - which of course is completely subjective - was the poster of of Henning Fernau and Daniel Raible on 'Alliances in Graphs: a Complexity-Theoretic Study'.
In the late evening I was trying to look for my car, which was buried under the snow at the parking lot. Due to the wind the snow around the parking lot (and my car) was piled up almost half a meter...which made me think about the road conditions and the plan of driving home the next day....

Tuesday, January 23, 2007

SOFSEM 2007 - Day 3

Today started with a keynote given by Ricardo Baeza-Yates from Yahoo! Research on 'Mining Web Queries'. In particular he showed how to identify categories of user queries and how to use this information to create an appropriate ranking of the search results. Besides the already identified 'coarse' categories, such as, e.g., queries being 'informational', 'navigational', or 'transactional' (which means that the user wants to have (a) information about a specified topic, (b) a starting point for further research, or (c) a homepage related to the resource for transactional purposes (e.g. shopping)...), he addressed several graphs that can be compiled out of the search engine logfile, as e. g., URL cover graph, URL link graph, session graph...These graphs can be used for identifying polysemic expressions, similar or related queries, clusterings of queries, or even a (pseudo)taxonomy of queries.
Besides web query mining, he mentioned some interesting numbers concerning Yahoo, as e.g. that Yahoo administrates about 20 PetaBytes of Data with more than 10 TeraBytes of data traffic per day. But, on the other hand, he gave an estimation of the actual world knowledge and related it to the ammount of data managed by Yahoo today: given that a person creates about 10 pages of data concerning a distinct event, and if we estimate the number of events of about 5000 in a lifetime, and if we multiply that number by the world's population....we will end up with about 0,0057% of the 'world knowledge' currently being represented in Yahoo...

Monday, January 22, 2007

SOFSEM 2007 - Day 2


The second day of SOFSEM started with a keynote of Bertrand Meyer (maybe you remember Eiffel...) from ETH Zürich on 'Automatic Testing of Object-Oriented Software'. To enable automated testing, he referred the concept of 'contracts' being directly embedded in the classes of the Eiffel programming language. With a contract you are able to specify the software's expected behaviour (preconditions, postconditions, and invariants). which can be monitored during execution. In automated software testing, contracts may serve as test oracles that decide, whether a test case has passed or failed. He presented 'Auto Test' unit testing framework, which is using Eiffel contracts as test oracles. Auto Test is able to exercise all classes by generating objects and routine arguments. Also manual testing can be embedded as well as regression testing for failed test cases, which is implemented in a 'minimized' form by retaining only the relevant instructions.

For the rest of the second day contributed (refereed) paper presentations are scheduled. I will have to chair the first session of the 'emerging web technologies' track, which will be on XML technology. If there (or in any other session I attend) will be anything of interest, you will read it right here ... :)
So...Joe Tekli from the Université de Bourgogne presented a 'Hybrid Approach on XML-Similarity', which combined structural similarity of XML-Documents with 'semantic' arguments, i.e. tag names of different XML-documents are compared with the help of WordNet to compute some similarity measure. Quite an interesting application that can be build on, esp. regarding the semantic similarity aspect. But nevertheless, maybe we can use it for our MPEG-7 based video search system (OSOTIS).

Sunday, January 21, 2007

SOFSEM 2007 - Day 1


This year, after about 7 or 8 years, I am attending again the SOFSEM conference on 'Current Trends in Theory and Practice of Computer Science' (for the 2nd time). Maybe SOFSEM is not the most important of all the computer science conferences around, but it is rather original and has quite some history (i.e. it's tradition dates back more than 30 years...). SOFSEM means SOFtware SEMinar, and this already gives some hint about its originality. Starting from a winter lecture with only limited international attendance it has developed to an interesting mixture of lectures (given by invited speakers of significant reputation), presentations of reviewed research papers, and student paper presentations. By tradition, it's location always switches between somewhere in Slowakia and the Czech Republique and always in winter. Unfortunately, this year winter did not really show up and thus, we are sitting here in Harrachow (a well known winter resort) without any snow. On the other hand, nice thing about this situation is that travelling this year has become much easier (because there is no snow even in the mountain areas).
This year, I am co-chairing the track 'emerging web technologies' as being one of the four SOFSEM tracks. By tradition, there is always a track 'foundations of computer science' besides of three changable tracks concering breaking topics of current interest , i.e. (in this year) 'multi-agent systems', 'emerging web technologies', and 'dependable software and systems'.
The first day on SOFSEM, after the opening note given by Jan van Leeuwen, in which he referred to the long tradition of SOFSEM and to Czech computer science history, starts with a full day of invited lectures covering all four topics.

  • Manfred Broy from TU Munich started with a presentation on 'Interaction and Realizability'. In interactive computation - in difference to sequential, atomic computation - input as well as output is not provided as a whole, but step by step while the computation continues. He pointed out that interactive behaviour can be modeled with Moore machines and introduced the term of 'realizability', which is a fundamental issue when asking whether a behaviour corresponds to a computation. 'Realizable functions' are defined as being abstractions of state machines (in a similar way as partial functions are abstractions of Turing machines) and can be used to extend the idea of computability to interactive computations.

  • Andrew Goldberg followed with a talk on 'Point-to-Point Shortest Path Algorithms with Preprocessing'. To run on even small devices while at the same time covering graphs with tens of millions of nodes (as, e.g., in roadmaps for navigation devices), off course efficient algorithms are required. The traditional way is to search a ball around the starting point (as e.g. in Dijkstra's algorithm) that can be speed up by biasing the search towards to intendet target point (as e.g. in A* search, if additional information is available that provides a lower-bound on the distance to the target) or by pruning the search graph (as e.g. in ALT algorithms that precompute distances to preselected landmarks, or using 'reaches').

  • Jerome Lang from IRIT (France) continued the afternoon session with a survey on 'Computational Issues in Group Decision Making', which combines 'social choice' (from economics) and AI (applications) into 'computational social choice' theory. In this new and very active discipline concepts as e.g. voting procedures, coalition formation, and fair division (from social choice), which is also important for multi-agent systems, are examined under the consideration of complexity analyses and algorithm design.

  • I realized that I will be the chairman of today's last session. Thus, the summary of Remco Veltkamp's (University of Utrecht, The Netherlands) talk on 'Multimedia Retrieval Algorithms' will come with a little delay....
    The presentation started with citing Marshal McLuhan's famous quote 'The medium is the message' smartly being connected to the basic definitions of multimedia retrieval. Difficult thing in multimedia retrieval is the proper understanding of the mechanisms of human perception and in connection to that the question of how to take care of it's peculiarity in information retrieval. E. g., the human visual system is famous for 'generic interpretations', i.e. sometimes we see things that are not really there, as already has been described by Wertheimer's Gestalttheorie back in 1923. Interesting fact, that some of these visual illusions do also exist for audio perception. For multimedia retrieval metrics have to be defined for computing similarities (as well as differences of multimedia objects) in an efficient way, while the algorithms dealing with multimedia retrieval have to be carefully designed according to the type of problem that is addressed (e.g., computing problem, optimization problem, decision problem, etc.). The presentation closed with a short demonstration of the music search engine Muugle that realizes the concept of 'query-by-humming'.

Saturday, January 13, 2007

...against all odds


On wednesday I attended a talk given by Michael Strube from EML Research on "World Knowledge induced from Wikipedia - A New Prospect of Knowledge-Based NLP ". He was showing how the (meanwhile famous) collaborative encyclopedia can be used for information retrieval purposes in a way similar to (more traditional) online dictionaries as e.g. WordNet and - though being not well structured - provides results of almost equal quality.
First thing was that for their work, Strube and his colleague regarded each Wikipedia page as being the representation of a concept (we already had some arguments about that as you might remember...). Next, they developed some metric for similarity of concepts w.r.t. to the concept hierarchy (where the wikipedia defined 'concepts' come into play). Since 2004, wikipedia features a user defined concept hierarchy. This hierarchy of concepts also can be regarded as being a folksonomy, simply because this is not a knowledge representation carefully designed by some designated domain expert, but by the wikipedia comunity in a collaborative way. Unfortunately, the wikipedia concept hierarchy suffers exactly from that fact. From my pont of view it seems problematic to compare the proposed similarity measure (based on wikipedia concept hierarchy) with other similarity measures (based on commonly shared expert ontologies). O.k., you might argue that indeed the wikipedia concept hierarchy IS commonly shared, because it has been developed by the wikipedia community...but is the knowledge represented in wikipedia really 'common'? Just remember the diversity and manifold of Star Wars characters or Star Trek episodes in wikipedia compared with, as e.g., the history of smaller Eropean countries. As for all ontologies always the view and the knowledge of the ontology designer has to be considered. The wikipedia concept hierarchy - although partly being really appropriate - reminds me somehow to this famous literary chinese dictonary entry defining the term 'animal' which is quoted by Jorge Luis Borges. Another problem lies in the fact that the different language versions of wikipedia have developed different concept hierarchies (sic!).

In the end, I was asking how this proposed information retrieval based on wikipedia could be improved by considering a 'Semantic Wikipedia', as e.g., the Semantic MediaWiki (given that those semantic wikipedias would contain sufficient data). Instead of answering my question, Michael Strube cited Peter Norwig's argument against the Semantic Web from last years AAI2006. Just to sum up: the semantic web will not become reality because of the inability of its users to provide correct semantic annotations. But hey...this guy (Strube) was talking about wikipedia. Doesn't this argument raise any associations? Just remember the time 5 or 10 years ago. Nobody (well almost nobody) would have believed that it will be possible to write an entire encyclopedia collaboratively on an open source basis - just because the web user's did not seem to be able to write 'correct' articles....

Sunday, January 07, 2007

Sam Bourne - Die Gerechten....


Etwas verspätet, aber bevor ich schon wieder das nächste Buch beendet habe, muss ich heute doch noch ein paar Worte über meine 'Feiertagslektüre', "Die Gerechten" von Sam Bourne verlieren:
Meine Erwartungen waren ja recht hoch gesteckt. Neugierig gemacht durch einen Beitrag der Kulturzeit wollte ich diesen als ungewöhnlichen Thriller angekündigten Roman unbedingt lesen. Will Monroe, ein junger Journalist der New York Times, deckt eine Spur ungewöhnlicher Morde auf, die erst auf dem zweiten Blick tatsächlich miteinander zusammenhängen. Er wird - zumindest bleibt er (und der Leser) zunächst in diesem Glauben - mit hineingezogen in eine jüdische-konservative (sic!) Weltverschwörung, deren Ziel in der Herbeiführung des Endes der Welt und im (damit erzwungenen) Erscheinen des langerwarteten Messias zu bestehen scheint. [ACHTUNG: SPOILER-WARNUNG] Fast bis zum Schluss wird man in diesem Glauben gelassen, aber 'natürlich' waren es dann doch irgendwelche sektiererischen, 'bibeltreuen' und erzkonservative Christen, die übrigens das gleiche Ziel verfolgen.[ENDE: SPOILER-WARNUNG]
Ehrlich gesagt, ich war enttäuscht. Nicht nur, dass der ganze Plott mehr als an den Haaren herbeigezogen scheint. Nein, eigentlich eher die mangelnde erzählerischen Qualitäten des Autors haben mich etwas verärgert. Die Personen werden stereotypisch (langweilig) und absolut vorhersehbar charakterisiert. Jeglicher Tiefgang - sei es der Vater-Sohn Konflikt der Hauptfigur, seine Beziehungsprobleme oder der Umgang mit seiner Ex-Freundin - erscheinen irgendwie vollkommen platt. Eigene Gedankengänge, die einem die Beweggründe der handelnden Personen näherbringen würden, fehlen fast völlig. Sicher, die Story enthält zahlreiche Cliffhanger und wird daher für eine Vielzahl der Leser spannende Lektüre bieten, aber der Handlungsfluss ist fast vollständig linear, einige Fragen bleiben ungeklärt, und die pseudo-wissenschaftlichen Herleitungen [ich sage nur: Kabbala und GPS...] verärgern den Leser eher als dass diese ihm ein Aha-Erlebnis bieten würden.

Fazit: Eine neue (alte) Verschwörungsgeschichte, die mit Endzeit-Phantasien, oberflächlich religiösen Vorurteilen und Kabbala-Techno-Babbel versucht, Boden gut zu machen, deren erzählerische Qualität meines Erachtens nach aber zu Wünschen übrig lässt...

P.S. Jedes neue Buch zum Thema Verschwörungstheorien muss sich gegen die zwei Meilensteine dieses Genres messen: Sear und Wilson's Illuminatus Trilogie (wenn schon, dann wenigstens absolut abgedreht....) oder man nimmt sich gleich den Großmeister Umberto Eco vor mit seinem Foucaultschen Pendel, in dem jegliche Verschwörungstheorien auf äußerst intelligente und lesenswerte Weise durchexerziert und ad absurdum geführt werden.

Saturday, December 30, 2006

Neal Stephenson - The Confusion (Vol. 2 of the Baroque Cycle)


I have finished 'Confusion' ... more than a thousand pages and what a story :)
As I have read the book in German, the short review will also be in German....

Ok, ich bin also durch. Erst einmal ganz dickes Lob an die beiden Übersetzer Nikolaus Stingl und Juliane Gräbener-Müller. Die schwer wiegenden 1000 Seiten kommen mit einer Leichtigkeit daher, die dieses Buch zu einem wahren Lesevergnügen werden lassen. Das Buch - eigentlich sind es ja streng genommen nach der Vorrede des Autors zwei - wartet mit zahlreichen Nebenhandlungen, erzählerischen Beschreibungen und Schnörkeln und abstrusen Handlungsverläufen auf, dass man es wahrhaft als 'barock' bezeichnen mag. Verquickt (confused) werden dabei zwei Haupthandlungsstränge - die Verschwörer (eigentlich Galeerensklaven) rund um Jack Shaftoe und ihrer Jagd nach Salomos Gold (das sie sich schnell wieder abjagen lassen) und ihrer merkantilen Expedition (-> Quecksilber) rund um den Globus....sowie die Geschichte um Eliza (mittlerweile Herzogin von Arcachon (sic!) und Qwghlm) und dem (historischen) Entstehen des bargeldlosen Zahlungsverkehrs und dessen Auswirkungen auf Wirtschaft und Politik (Confusion!) des damaligen Europas sowie Eliza's Rache an ihren ehemaligen (Duc d'Arcachon sr.) und aktuellen (von Hacklheber) Peinigern.
Etwas kurz geraten (im Gegensatz zum ersten Band 'Quicksilver') ist die Geschichte um Daniel Waterhouse (einschließlich Newton, Leibnitz, etc...), die Royal Society und den wissenschaftlichen Erkenntnissen der damaligen Zeit. Dafür stehen in diesem Band die wirtschaftlichen Errungenschaften des ausgehenden 17. und beginnenden 18. Jahrhunderts im Mittelpunkt, wie z.B. das globale Agieren der Holländischen Ostindischen Handelsgesellschaft, Spaniens Plünderung der Neuen Welt oder das Entstehen des bargeldlosen Handels.

Was mir an Stephensons Romanen immer wieder gefällt sind seine Schilderungen vollkommen abstruser Situationen, in die mitunter wichtige Dialoge und Handlungen eingebettet werden, sei es die großmaßstäbliche (behelfsmäßige) Herstellung von Phosphor aus Urin im indischen Hinterland oder die Jagd nach einer Fledermaus im Esszimmer mit Hilfe des angerosteten Rapiers von Leibnitz.
Auf alle Fälle freue ich mich schon auf den letzten Teil der Trilogie, auf dessen Übersetzung wir wahrscheinlich noch ein gutes Jahr warten werden müssen...

Links: -> Clearing up the Confusion in wired news

Thursday, December 21, 2006

Semantic Search ... confusion ahead


When I was attending a talk of Thilo Götz on UIMA, the word came to 'Semantic Search'. Up to that point in time, I was quite sure about the meaning of this term. But, I had to realize that several people think in different ways about it.
As far as I have understood the meaning of that term, 'Semantic Search' refers to all techniques and activities that deploy semantic web technology on any stage of the search process. Thilo Götz (and he's not alone with that) refered to 'Semantic Search' as a 'traditional' search engine that is using a semantically enriched search index (i.e. a search index that incorporates ontologies or information/relationships infered from ontologies).

From my point of view the later definition refers only to a part of the whole process. Let's take a brief look at search engine technology: You have to consider the index generation (including the crawling processes, information retrieval for creating descriptors, inverse indexing, overall ranking, clustering, ...) as well as the user interaction (including query string evaluation, query string refinement, visualization of search results, navigation inside the search domain), not to forget personalization (concerning a personalized ranking of the search results including some kind of 'pre-selection' according to the personal information needs of the user, a personalized visualization, etc.) -- which will become of much more importance in the nearby future.

But, to generate a semantic search index there are several possibilities to consider:

  • Using unstructured web-data (html, etc. ...) in combination with information retrieval techniques to map the information represented in the web-data to (commonly agreed) ontologies of well defined semantic.

  • Using semi-structured web-data that already include links to well defined ontologies (being commonly agreed upon or at least being mapped to som standard ontologies).


For both steps, the generation of a semantic index requires more than just compilation of the retrieved data. Although the index might contain unstructured web-data including ontologies of well defined semantics, the main purpose of the index is to provide fast access to the information being represented in it. To generate the answer for a query, the search engine simply does not have enough time for performing logical inferences to deduce knowledge (a.k.a. answers) online. Of course, this (logical inference) has to be deployed beforehand (i.e. offline), just in a similar way as the computation of today's pageRank.

So, what is the use of machine processible semantics in a search engine's index data structure? The following possibilities can be considered (the list is open for additional suggestions...):

  • to add new cross-references between single index entries (associations),

  • to find similarities between index entries ( = web data) w.r.t. their content, and

  • to discover dependencies between single index entries to enable
    • better visualization of the retrieved domain of information, and also

    • efficient navigation to better fulfill the users information needs.


  • of course also to disambiguate and to cluster web-data for better precision and recall (this is already done with current IR techniques).


To compile a semantic index, also the crawling process has to be considered. While the primary goal of a web crawler is to gather as much web-data as possible as fast as possible (and of course to maintain its level of consistency), a 'semantic' web crawler besides unstructured web-data also has to look for semantic data, as e.g., RDF and OWL files, and also for possible connections between unstructured web-data and semantic data. For crawling RDF or OWL, a traditional crawler has to be modified. While the traditional crawler just analyzes the HTML data for link tags, RDF and OWL don't contain link tags but they often include several namespaces that determine new input for the crawler. Besides mere data gathering, the crawler should also preprocess data for being included within the index. This task often is implemented as a separate step (and denoted as 'information retrieval'). But, it influences the crawlers direction and crawling strategy and thus, also has to be considered here.
Web crawlers often are implemented in a distributed way to increase their efficiency while working in parallel. New URLs found in the web pages being crawled can be arranged according to the location of their domain (geographically). In this way, an instance of the distributed crawler receives only new URLs to be crawled that are located in the same (or a nearby) domain. The same argument holds for the resources that are to be crawled by semantic web crawlers. But, for semantic crawlers, also the (semantic) domain of the crawled data might be of importance, e.g., an instance of the distributed crawler might be responsible for crawling a specific domain (=topic) or only domains that are (semantically) closely related to the domain of interest.


For the semantic search engine the compilation of an index from the web pages being delivered by the crawler differs from the compilation process of the traditional search engine. Let us first recall the traditional index compilation process (for text related data, i.e. this does not hold for multimedia data such as images or video clips):

  1. resource normalization, i.e. all resources that contain explicit textual content have to be transformed into text files

  2. word stemming, i.e. transform all terms of a retrieved and normalized web-document to their word stems

  3. stop word removal, i.e. cut out all terms that are not well suited for identifying the processed text file (i.e. that are not suitable as descriptors). Often only nouns are taken as descriptors (this can partly be achieved by applying pos-stemmers (=part-of-speech stemmers).

  4. black list processing, i.e. terms that for some reason do not qualify as descriptors are cut out.


This process results in a list of descriptors that describe the web-document being processed. For each descriptor a weight according to its descriptive value for the text file has to be calculated (e.g., by term frequency - inverse document frequency (tf-idf) or other weigth function). The table resulting from combining the weighted descriptors with their underlying web-documents constitutes the index. By inverting this index a descriptor delivers all related web-documents in combination with their pre-calculated weight function (that determines how well a given descriptor is suited to describe the content of the according web-document). To increase precision and recall, the general relevance of the web-documents can be computed beforehand (i.e. nothing else but the Google PageRank).

For a 'semantic index', metadata (such as, e.g., ontologies) have to be taken into account and be combined with the traditional index...

...to be continued

Thursday, November 30, 2006

Literary Confusion according to Babel


As mentioned in a previous article, I decided also to write about (i.e. to review) the books I'm currently reading. I realized that my ability to express myself in English if it comes to the authoring of non scientific content is rather limited. Also I was thinking about whether it makes sense to write (in English) about books (written in German)...esp. if I want to address a German speaking audience (at least concerning the non-scientific content of this blog). Alas, the new beta-blogger allows the use of tags and thus, it will be possible to categorize articles (and to distinguish between content written in English or German). To make it short: from now on, book reviews for books of non-scientific content that are written in German will also be written in German. Everything else (including book reviews of books that I have read in English) will be written in English.
To all those who are also interested in those book reviews, you might consider to use babelfish for translation (although I have no idea about the result:)...

Als ich vor wenigen Wochen die ISWC besuchte, hatte ich mich ja bereits darüber beklagt, dass meine derzeitige Lektüre für das Handgepäck schlicht und einfach zu 'schwer' sei. Folglich stand ich vor dem Problem, was ich während der jeweils über 9 Stunden dauernden Flüge lesen sollte (natürlich habe ich währendessen auch geschlafen...so gut es eben ging...). Normalerweise lese ich nicht gerne mehrere Bücher parallel, und da ich es vermeiden wollte, nach meiner Reise mit zwei angefangenen Büchern dazustehen, entschied ich mich für Kurzgeschichten bzw. Erzählungen. Im Bücherschrank stand schon seit längerer Zeit ein kleines Bändchen mit Erzählungen von Thomas Mann, für das zu lesen ich bis dato noch nicht die Muse aufbringen konnte. Da es (zumindest gewichtstechnisch) den Anforderungen an meine Reiselektüre entsprach, durchlebte ich also während des Fluges die Welt von 'Tonio Kröger', schmeckte den Fluch des 'Wälsungenblutes' und machte mich als Hunden gegenüber völlig indifferenten Zeitgenossen mit den Abgründen der Beziehungen zwischen 'Herr und Hund' vertraut....

Also gut, vielleicht sollte ich damit beginnen zu erwähnen, dass ich im Deutschunterricht niemals mit den Werken von Thomas Mann als Lektüre Bekanntschaft schließen musste. Viele meiner Bekannter stöhnen bereits laut auf, sobald nur der Name 'Thomas Mann' fällt...sicherlich, da sich unliebsame, bereits verdrängte Erinnerungen an die Untiefen der im Schulunterricht bis zum Erbrechen interpretierten und analysierten Soap-Opera 'Buddenbrooks' ihren Weg zurück an die Oberfläche bahnten. Aber nicht bei mir. Die Buddenbrooks hatte ich das erste mal im 'zarten Alter' von knapp 30 Jahren vor mir. Ich dachte, "das reicht erst mal für die Weihnachtswoche", nur zog mich dieses Monstrum von einer Familiensaga derart in seinen Bann, dass ich es nach zweieinhalb Tagen (leise seufzend, da es 'schon' zu Ende war) wieder zurück ins Regal stellte. Allerdings kam neulich auf der Frankfurter Buchmesse während eines gemeinsamen Mitttagessens die Rede auf die wohl "am meisten überschätzten" deutschen Autoren. Nachdem ich meiner Nachbarin gegenüber mein Unverständnis ihrer Einschätzung, dass Fontane dabei ganz oben auf ihrer Liste stehe, zum Ausdruck brachte, fiel mir daraufhin der 'Der Zauberberg' und insbesondere der 'Doktor Faustus' ein. Ohne mich jetzt hier zu vertiefen sei mir kurz der Hinweis gestattet "Thomas Mann hätte sich in meinem Gedächtnis sicherlich einen besseren Ruf behalten, hätte er zumindest auf das Schreiben des Letzteren der beiden genannten Romane verzichtet...".

Tonio Kröger bietet dabei alle Höhen und Tiefen der Mann'schen Erzähltradition in kondensierter Form. Die relativ kurze Erzählung beginnt mit der Geschichte der Kindheit und des Heranwachsens Tonio Krögers in allerbesster kurzweiliger 'Buddenbrooks-Manier' und ergießt sich in der zweiten Hälfte in einer Introspektive (-> siehe Zauberberg) Tonios in seiner Rolle als Künstler (im Zwiegespräch und in Briefen an seine Künstlerfreundin Lisaweta). Im letzten Drittel unternimmt unser Held eine Reise zurück an die Stätten seiner Kindheit und beginnt zu begreifen, dass er als Künstler sehr wohl Teil der von ihm verächtlich abgelehnten 'Gesellschaft' ist und dass Gefühle einen Künstler nicht in seiner Arbeit hemmen, sondern gar bestimmen....(-> Wandlung siehe Doktor Faustus).
Fazit: Ein Kurztripp durch Thomas Manns Mikrokosmos, sehr zu empfehlen für alle, die in seine Welt mal 'kurz hineinschmecken' wollen, ohne das Wagniss eingehen zu müssen, sich einem der vorgenannten 'Monstren' zu stellen :)

Das Wälsungenblut ist da schon etwas anders gestrickt - auch wenn die eindrucksvoll plastische Schilderung dieser etwas kurios dekadenten Bankiersfamilie Aarenhold stellenweise an die Addams-Family erinnert.... Thomas Mann karikiert dabei den schwülstige Pathos der Opern-Atmosphäre Richard Wagners - hier ganz speziell die der Walküre. Geschildert wird die inzestiöse Liebe des Wälsungen-Geschwisterpaares Siegmund und Sieglinde, wobei es Mann gelingt, eine durch Liebe zum Detail perfekte Inszenierung abzuliefern, die vom Kunstgenuss des Opernzitats mit Kognak-Kirsche bis hin zum Liebesreigen auf dem Bärenfell (-> siehe Wagners Walküre) reicht.
Fazit: ein kurzweiliges Stück skuriler Prosa, in dem sich Thomas Mann als Meister der pointierten Schilderung und 'Anti-Wagnerianer' erweist...

Tuesday, November 28, 2006

UIMA - Unstructured Information Management Architecture

This morning, we were invited to a talk given by Thilo Götz from IBM about UIMA (Unstructured Information Management Framework), IBM's Framework for the Management of unstructured information that happened to take place at the department of computer linguistics.
UIMA represents (1) an architecture and (2) a software framework for the analysis of ustructured data (just for the record: structured data refers to data that has been formally structured, a.g. data within a relational database, while unstructured data e.g., refers to text in natural language, speech, images, or video data). The purpose of UIMA is to provide a modular framework that enables easy integration and reuse of data analysis modules. In general, the UIMA framework distinguishes three steps in data analysis:

(1) reading data from distinguished sources
(2) (multiple) data analysis
(3) presentation of data/results to the 'consumer'

Also it enables remote processing (and thus simple parallelization of analysis tasks). Unfortunately, at least up to now, there is no GRID support for large scale parallel execution.
Also, simple applications of UIMA, e.g. in semantic search were presented (although their approach to semantic search means: do information retrieval on unstructured data and fit the resulting data into the index of the 'semantic search engine'...)
Nevertheless, we will take a closer look at UIMA. We are planning to map the workflow of our automated semantic annotation process (see [1]) into the UIMA architecture and I will tell you about our experiences made....
UIMA is available as a free SDK, and the core Java framework is also available as open source.

References:
[1] H. Sack, J. Waitelonis: Automated Annotations of Synchronized Multimedia Presentations, in Proceedings of Mastering the Gap : From Information Extraction to Semantic Representation (MTG06 / ESWC2006), Budva, Montenegro, June 12, 2006.

Tuesday, November 21, 2006

Document Retrieval vs. Fact Retrieval - In Search for a Qualified User Interface


Today, if you are looking for information in the Web, you enter a set of keywords (query string) into a search engine and in return you will receive a list (= ordered set) of documents that are supposed to contain those keyword(s) (or their word stem). This list of documents (therefore 'document retrieval') is ordered according to the document's relevance with respect to the user's query string. 'Relevance' - at least for Google - refers to PageRank. To make it short, PageRank reflects the number of links referring to the document under consideration, each link weighted with its own relevance being adjusted by the number of total links starting at the document that contains this link (in addition with some black magic that is still under copyright restriction, see U.S. Patent 6285999).
But, is this list really what the user expects for an answer? O.k. meanwhile, we - the users - have become used to this kind of search engine interface. In fact, there exist books and courses about how to use search engines in order to get the information you want. Interesting fact is that it is the user, who has to get adapted to the search engine interface....and not vice versa.
Instead it should be the other way around. The search engine interface should get adapted to the user - and even better to each different user! But, how then should a search engine interface should look like? In fact, there are already search engines that are able to give the answer to simple questions ('What is the capital of Italy?'). But, they stil fail in answering more complex questions ('What was the reason for Galileo's house arrest?').

In real life - at least if you happen to have one - if you are in need for information, you have different possibilities to get it:

  1. If there is somebody you can ask, then ask.
  2. If there is nobody to ask, then look it up (e.g. in a book).
  3. If there is nobody to ask, and if there is no way to look it up, then think!

Let's consider the first two possibilities. Both do also have their drawbacks: Asking somebody is only helpful, if the person being asked does know the answer. (O.k., there is also the social aspect that you might get another reward just by making social contact...instead of getting the answer). If the person does not know the answer, maybe she/he knows, whom to ask or where to look it up. But we might consider this fact as being a kind of referential answer. On the other hand, even if the person does know the answer, she/he might not be able to communicate the answer. Maybe you speak different languages (not necessarely different languages in the sense of 'English' and 'Suaheli', but also consider a philosopher answering the question of an engineer...). Sometimes you have to read in between the lines to understand somebody's answer. At least, in some sense we have to 'adapt' to the way the other person is giving the answer to understand the answer.
Considering the other possibility of looking up the information, we have the same situation as if asking the www search engine. E.g., if we look up an article in an encyclopedia, we use our knowledge of how to access the encyclopedia (alphabetical order of entries, reading the article, considering links to other articles...being able to read...).
Have you realized that in both cases we have to adapt ourselves to an interface. Even when asking sombody, we have to adopt to way this person is talking to us (her/his level of expertise, background, context, language, etc.). From this point of view, adapting to the search engine interface of Google seems not to be such a bad thing at all....

If it comes to fact retrieval, the first thing to do is to understand the user's query. To understand an ordinary query (and not only a list of interconnected query keywords), natural language processing is the key (or even as they say the 'holy grail'). But even, if the query phrase can be parsed correctly, we have to consider (a) context and (b) the user's background knowledge. While the context helps to disambiguate and to find the correct meaning of the user's query, the user's background determines its level of expertise and the level of detail in which the answer is best suited for the user.

Thus, I propose that there is no such thing as 'the perfect user interface'. Anyway, different kind of interfaces might serve for different users in different situations. No matter how the interface will look like, we - the users - will adapt (because we are used to do that and we learn very quickly). Of course, if the search engine is able to identify the circumstances of the user (maybe she/he's retrieving information orally with a cell phone or the user is sitting in front of a keyboard with a huge display) the search engine may choose (according to the user's infrastructure) the suitable interface for entering the query as well as for presenting the answer...

WebMonday 2 in Jena - Aftermath


Yesterday evening the 2nd WebModay took place in Jena Intershop Tower. I thought that the number of participants that happend to come by the last time could not be surpassed (we had almost 50 people up there), but belief it or not, I counted more than 70 people this time! Lars Zapf moderated the event and we had 4 interesting speakers this evening.
For me, the most interesting talk was the presentation of Prof. Benno Stein from the Bauhaus-University Weimar about Information Retrieval and current projects. He was addressing the way how we are using the web today for retrieving information. Most current search engines are only offering 'document retrieval', i.e. after evaluating the keywords given in the user's query string the search engine presents an ordered list of documents that the user has to read in order to get the information. Instead, the more 'common' way to get information would be to ask a question and to receive an 'real' answer (= fact retrieval). I will discuss these different types of 'user interfaces' in an upcoming post. Interesting thing to mention is that Weimar is so close to Jena and both our research really seems to have some interconnections (thus, this new contact might be considered to be another WebMonday's networking success).
After that, Matthias Leonhard was giving the first part of a series of talks related to Microsoft's .NET 3.0.
Then, Ryan Orrock addressed the problem of 'localisation' and translation of applications. If translating an application into another language, simple translation of all text parts is not sufficient. There are also different units of measure to consider as well as the adaption of screen design, if texts in different languages have diferent sizes.
In the last presentation Karsten Schmidt was addressing networking with openBC/Xing, an interesting social networking tool that is supposed to make business contacts.(At least, now I know that I need some other tool to store (physicaly) my (and other people's) business cards :) ).
Even more interesting was - as always - the socializing part after the presentations. Markus Kämmerer made some photos .

Here you can find other blog articles on the 2nd WebMonday:

Wednesday, November 15, 2006

wikipedia to serve as a global ontology....



Today, I met Lars Zapf for a quick coffee enjoying the rare late afternoon november sun. We were exchanging news about ISWC, WebModay, recent projects, and stuff like that. While talking about semantic annotation, Lars pointed out that instead of using (or developing) own ontologies for annotating (and authoring) documents, you could also use a wikipedia reference to indicate the semantic concept that you are writing about. Thus, as he already wrote in a comment, e.g., you could use the link http://en.wikipedia.org/wiki/Rome to indicate that you are refering to the city of Rome, the capital of Italy.
Of course you might object that there are several language versions of wikipedia and thus, there are several (different) articles that refer to the city of Rome. To use wikipedia as a 'commonly agreed and shared conceptualization' - to fulfill at least some points of Tom Gruber's ontology definition as long as wikipedia lacks the 'formal' aspect of machine understandability - we can make use of the fact that articles in wikipedia can be identified with articles in other language versions with the help of the language indicators at the lower left side of wikipedia's user interface. To serve as a real ontology, each wikipedia article should (at least) be connected to formalized concept (maybe encoded in RDF or OWL). This concept does not necessarely have to reflect all the aspects that are reported in the natural language wikipedia article. E.g., Semantic Media Wiki is working on a wiki extension to capture simple conceptualizations (such as e.g. classes or relationships).
An application for authoring documents could easily be upgraded by offering links to related wikipedia articles. If the author enters the string 'Rome', the application could offer the related wikipedia link to Rome [or any selection of related offers] and according to the authors this link can be automatically encoded as a semantic annotation (link).
O.k., that sounds pretty simply. Are any students out there to implement it (anybody in need for credit points??)? I would highly appreciate that...

International Semantic Web Conference 2006 (ISWC 2006) - Aftermath


Back home again, jetlag is almost gone while already travelling to Potsdam again for a talk on 'Semantic Annotation in Use'....After all, ISWC 2006 was a very nice and interesting conference...although being set up at a rather remote location (at least from my point of view). One of its highlights (as already pointed out) was the panel discussion about web 2.0 and semantic web. Leo Sauermann was raising the question, why there is such a bad marketing of semantic web technology. Obviously - as TBL was replied - because the W3C invests all funding into hiring scientist and not marketing people. One of the major problems is that semantic web applications don't look 'cool' and 'sexy' ... and therefore, they don't get public attention. BTW, right at the same time, the 3rd Web 2.0 conference took place in San Francisco. Why did the conference organizers of ISWC not try to organize a panel discussion with a live connection between the two conferences? At ISWC of course there were only 'Semantic Web'-people and (at least as far as I have realized) nobody from the 'Web 2.0'-community. Allright, you can be both SemWeb and Web2.0. But, as long as you are are focused on Semantic Web most people will have common focus (and argument) on Web 2.0. Another point of view for sure would have been interesting to listen to.

Closely related to the lack of marketing question is the question about the Semantic Web killer application. Nobody knows what type of application it will be - of course ... otherwise the application would already be there. But, as for all killer applications, it will not necessarely be somthing 'really' useful :) If you consider that the killer application for the WWW (at least for the time in its early beginnings back at CERN) was the 'telephone book'. Not to mention the sms and the mobile phone. Maybe the semantic web killer application will be related to rather ordinary applications such as a dating service that is really able to find a match....

BTW, I have switched to the new beta release of blogger.com... (comments should be working now - at last! - and also keywords)...and the other guy on the picture is Ulrich Küster (also from FSU Jena) at the ISWC dinner reception...

Thursday, November 09, 2006

International Semantic Web Conference 2006 (ISWC 2006), Athens (GA), USA - Day 3


Thursday, the last day of ISWC started with a keynote of Rudi Studer from the University of Karlsruhe on 'The Semantic Web: Suppliers and Customers'. He drew the historic connection from databases to the Semantic web as being a web of human-readable content connected to machine-interpretable semantic resources. He also pointed out the importance of interdisciplinary research for realizing the Semantic Web, while on the other hand, the Semantic Web also contributes to other disciplines and comunities. After that I was listening to an interesting talk of Andreas Harth from DERI on 'Crawling and Indexing the Semantic Web', where he introduced an architecture for a semantic web crawler and gave some first results.
The most interesting talk of the day was the talk of Ivan Herman from W3C (here you can find his foaf data) on 'Semantic Web @ W3C: Activities, Recommendations and State of Adoption'. He proposed 2007 to be the 'year of rules', because finally, we might come to a recommendation concerning rule languages for the Semantic Web. He also mentioned the efforts of integrating RDF data into XHTML via RDFa or - vice versa - to get RDF data out of XHTML with GRDDL.
The ISWC closed with the announcement of the best paper awards and the winners of theis year's semantic web challenge.
If you are interested in the conference, you might have a look at the video recordings of the talks.

International Semantic Web Conference 2006 (ISWC 2006), Athens (GA), USA - Day 2


Wednesday...the 2nd day of ISWC started with a keynote of Jane E. Fountain from the University of Massachussetts in Amherst about 'The Semantic Web and Networked Governance'. From her point of view, Governements have to be considered as major information processing [and knowledge creating] entities in the world, and she was trying topoint out the key challenges faced by governements in a networked world (for me the topic was not that interesting...). Also today's sessions - at least those that I have attended - were not that exciting. I liked one presentation given by Natasha Noy from Stanford on 'A Framework for Ontology Evolution in Collaborative Environments' in the 'Collaboration and Cooperation' session. She presented an extension of the protégé ontology editor for collaborative ontology development.
The most interesting session for me was the 'Web 2.0' panel in the afternoon. Amon the panelist were Prof. Jürgen Angele (Ontoprise), Dave Beckett (Yahoo!), Sir Tim Berners-Lee (W3C), Prof. Benjamin Grosof (MIT Sloane School of Management), and Tom Gruber. The panelwas discussing the role of semantic web technology for web 2.0 applications.


Jürgen Angele pointed out that the only thing that is really new about web 2.0 is ad-hoc remixability. Everything else is nothing but 'old' technology. But, as he stated, web 2.0 could be a driving force for semantic web technology.

Dave Beckett made some advertising for Yahoo! in the sense that he was pointing out that Yahoo! indeed is making use of semantic web technology (at least in their new system called Yahoo!Food) and Yahoo! is a great participation platform with more than 500 million visitors per month.

Tim Berners-Lee gave a survey on the flaws and drawbacks of web 2.0 and how semantic web technology could help. While web 2.0 is not able to provide real inter-application integration, the semantic web on the other side does not provide such cool interfaces to data. Together both in combination, they could become interesting.
All so called new aspects of web 2.0 have already been the goals of the original web (1.0), as easy creation of content, collaborative spaces, intercreativity, collective intelligence from designing together, creating relationships, reuse of information, and of course user-generated content. Web 2.0 architecture consists of client side (AJAX) interaction and server side data processing (aka the good old 'client-server'-paradigm) and mashups (one per application / each needs coding in javascript, each needs scraping/converting/...). Essentially, web 2.0 is fully centralized. So, why are skype, del.icio.us, or flickr websites instead of protocols (as foaf is)? The reuse of web 2.0 data is only limited to the hostside. Only with the help of feeds, data are able to break out from centralized sites. What will happen with all of your tags? Will they end up as simply being words or will they become real (and usefull) URIs?
With semantic web technology, web 2.0 enables multiple identities for you. You may have many URIs, enabling you to access different sorts of data, to fullfill different expectations concerning trust, accuracy, and persistence. In the end, web 2.0 and semantic web while being good seperately could be great together!

Benjamin Grosof asked, where semantic web technology could help web 2.0. He focused on backend semantic integration and mediation (augment your information via shallow inferences), collaboration and semantic search. Semantic search will enable you a morhuman centered search interface, as e.g., 'Give me all recipes of cake....but I don't like any fruits' and 'I want a good recommendation from a well reputed web site'. He sees semantic web technology piggyback on web 2.0 interactions ('web 2.0 = search for terrestrial intelligence in the crowd' :) The semantic web should exploit web 2.0 to obtain knowledge.

Tom Gruber was asking 'Where is the mojo in Web 2.0?'. He characterized web 2.0 as being a fundamentally democratic architecture, driven by social and entertainment payoffs (universal appeal...), while the web 1.0 business model actually keeps working ('attention economy'). He was discussing the way from today's 'collected intelligence' to real 'collective intelligence'. He concluded 'don't ask what the web knows....ask what the world knows!' and 'don't make the web smart...make the world smart'.

Wednesday, November 08, 2006

International Semantic Web Conference 2006 (ISWC 2006), Athens (GA), USA - Day 1


Tuesday morning 9 a.m. ... the ISWC 2006 starts with the keynote of Tom Gruber (godfather of computer science based definition of the term 'ontology') on 'Where the Social Web Meets the Semantic Web'. He focused on 'Collective Intelligence' as being the reason that companies as google or amazon did survive the first Dot-com bubble, because they where making use of their users' collective knowledge. Google uses other people's intelligence by computing a page rank out of the users' links to other webpages. Amazon uses the people's choices for their recommentation system, and ebay uses the people's reputation. Interesting thing about that is that the notion of 'Collective Intelligence' (aka 'Social Web', aka 'Web 2.0') - was already addressed by Douglas Engelbart in the late 60's. Engelbart did not only invent the mouse, the window-based user interface, and many other important things that are part of today's computing environment, his driving force - as Gruber said - was 'Collective Intelligence'....to cope with the set of growing problems that humanity is facing today. Thus - as I have also stated in another post - also the semantic web depends on collaboration and participation of the users and therfore, on 'Collective Intelligence' to become a success.

BTW, I prefer using the term 'Social Web' instead of 'Web 2.0'. From my point of view 'Social Web' hits exactly the point and does not suggest any new and exciting technology (but only the fact that people are using existing web technology in a collaborative way to interact with each other).

After the keynote I visited the 'Knowledge Representation' session with an interesting talk of Sören Auer on OntoWiki (a semantic wiki system .. interesting, because one of my students is alsoimplementing a semantic wiki). In the afternoon sessions I esp. liked the talks about representation and visualization (esp. the talk of Eyal Oren on 'Extending faceted navigation for RDF data', where he presented a nice server application that is able to visualize arbitrary RDF-data). In the evening, a dinner buffet (including cuban music) was combined with the poster sessions and the 'Semantic Web Challenge' exhibition, where I found the possibility for a cooperation with Siegfried Handschuh from DERI (on semantic authoring and annotation....).

Oh...I already forgot to mention that there is also a flickr group with ISWC photographs...

Monday, November 06, 2006

International Semantic Web Conference 2006 (ISWC 2006), Athens (GA), USA - Day 0


The very first day here at the ISWC...ok, it's the workshop and tutorial day. Officially, the conference will start tomorrow morning with Tom Gruber's keynote. I did already arrive here in Athens at Saturday. The 10 hours flight from Frankfurt was really exhausting...at least I had no stop-over. Athens is about 90 minutes away from Atlanta and it is famous for her university, which is the oldest public funded university of the US. It has a really nice historic campus (I will provide some pictures later on).
Today started with the "1st Semantic Authoring and Annotation Workshop" (SAAW 2006), where I had two papers to present...two day ago I was told (by email) that the short presentations will be 'lightning talks' of 5 minutes length each. I had prepared slides for some 15 minutes talks :) ...and was a little bit 'pissed off' by throwing away all the 'interesting stuff'. But, at least I could raise interest of a few people. The afternoon's workshop (on Web Content Mining with Human Language Technologies) also had some interesting topics. Esp., I liked the talk of Gijs Geleijnse about 'Instance Classification using Co-Occurrences on the Web'. It was about classifying musicians and artists (as instances) with their genre (as concepts) by finding co-occurence relationships of terms with the help of Google.