Tuesday, May 15, 2007

Next Generation Search Engines revisited...

As reader of my blog for sure you know that one of the main topics is (semantically enhanced) searching the web. Recently I was rereading Andrei Broder's short paper on 'A taxonomy of web search'[1], where he was also referring to the three generations of web search engines. His article dates back to 2002. Environment and technology in the web are rapidly changing. So, what about this three generations? Do we already need a 'next generation'? And what about the discussion about 'Search 2.0'?...
But first, let's recall the three generations according to Broder:

  • First generation: search engines are useing almost only on-page data such as text and formatting information to compute result ranking (1995-1997, cf. Alta Vista, Excite, etc...).

  • Second generation: search engines are using off-page, web-related data such as link analysis, anchor-texts, and click-through data (1998-..., cf. Google).

  • Third generation: search engines try to blend data from multiple, heterogeneous sources trying to answer 'the need behind the query'. The computed results are customized according to the user's information needs, taking into account the user's personal data background, context, and intention (now? - ...).

Clearly, search engines of the third generation include social networking information, tagging, user feedback, semantic analysis, recommendations, and trustworthyness of information (according to its source).
In read/write web the topic is also addressed as comparison of traditional search technologies with what they call 'Search 2.0' [2]. As usual, I don't like the 2.0-term. What is discussed there, refers to Broder's definition of 'Third generation' and adds nothing significant new to it (besides the marketing term). But, the article is definitely worth reading, because a lot of recent search engines are referenced and discussed (and also because of the interesting discussion that follows). They distinguish between 'Finding' information and 'Discovering' information, while relating the second term to 'Search 2.0'.

Broder distinguishes different sorts of web search queries:

  1. Navigational: intented use is to reach a particular web page (similar to 'known item' search in classical information retrieval). Therefore, navigational queries usually do have only one 'right' result.

  2. Informational:
  3. intended use is to acquire information assumed to be present on one or more web pages (as in classical information retrieval).
  4. Transactional:
  5. intended use is to find a web page, where further transactions (e.g. shopping) will take place.

If we take social bookmarking services, navigational queries can be computed simply by using the user's personomy (i.e. the set of all tags used by a distinct user). If the goal is to find a web page, which has been already accessed in the past, the page might be found quickly, if the user has registered the page within the bookmarking service (which comes to 'Finding' information). But, the query might also be resolved by using other people's tags, if somebody has tagged the page (with objectively descriptive tags).
Social bookmarking services are also usefull for the other two purposes. In addition, if the page is found, the social networking information can be utilized for 'discovering' new, previously unkown, but related (similar) information (which comes to 'Discovering' information). Hotho et al. present an adaptive ranking algorithm (FolkRank) for social bookmarking systems and discuss the problems that arise for tag-based search engines [3].

But, to answer the 'need behind the query' as Broder states in his definition of 'Third generation search engines', further personalization is mandatory. Only, if the search engine is able to find out the context of a query w.r.t. a given user and a given situation (i.e. even the same user might have different information needs in different situations), then it is possible to grasp the actual context of the query, and thus, also the 'need behind the query'...
[to be continued]

[1] Andrei Broder: A taxonomy of web search, SIGIR Forum 36, pp. 3-10, 2002.
[2] Ebrahim Ezzy, Richard MacManos: Search 2.0 vs. Traditional Search, read/write web, June 20th, 2006.
[3] Andreas Hotho, Robert Jäschke, Christoph Schmitz and Gerd Stumme: Information Retrieval in Folksonomies: Search and Ranking, in Proceedings of the 3rd European Semantic Web Conference, pp. 411-426, 2006.