Friday, April 13, 2007

Tag search vs. keyword search......substitution or complement

As you know, collaborative tagging systems (CTS) have become rather popular Web 2.0 applications (although I don't like the term 'Web 2.0'...please use 'Social Web' instead). A CTS allows each registered user to maintain her own tags that add semantic annotation to corresponding web links. Today, 'tags' are simple unformatted text data. Tags are transporting meaning, i.e. semantics. Because the user is free to choose any text string (symbol) for a certain semantics (concept) related to a given resource (web page or object). To communicate this semantics, two or more users have to agree upon using the same symbols denoting an object (remember the semiotic triangle [1]).

First difficulty is syntax: there are several posibilities to write a word (of course not all of them are necessarely correct or not all of them belong to the same language). The problem becomes even worse, if one tries to combine several words in a single string (how to separate words?...use CamelCase, underscores, blanks, ...).
Next comes language dependent problems such as polysemy (homonyms or synonyms). For homonyms we have the same symbol but different meanings, and for synonyms vice versa.

Syntax and language dependent problems alone cause tag based search to be more difficult to handle than traditional keyword based approaches (by keyword based approach we refer to full text search or keywords assigned to the resource by the resource author or by some designated expert). For full text search, a query string given by the user (or at least its word stem) has to match some string being part of the searched resource. Keywords given to a resource by some designated expert should meet some level of objectivity and thus, a user might be able to 'guess' the keyword while thinking of a well suited query string. Keywords provided by the author refer to her specific point of view (same with tagging). These 'subjective' keywords are much harder to guess for the arbitrary user, because she does not necessarely share the same context with the (tag) author.
In CTS we distinguish several distinct categories of tags [2]. Among others, there are two fundamental different tag categories: descriptive tags and functional tags. Descriptive tags refer to more objective tags, tags that are used to describe a resource in some general maner. Functional tags on the other hand do include an intended functional use esp. for the tag author and thus, are more subjective. While descriptive tags serve better for general web search, functional tags are useful most for their authors, but not for other users.
To analyse the benefit of tagging for web search, we have to take into account that many users are providing tags for a specific resource. Depending on the distribution of the tags attached to a specific resource, one can observe a power law (see also [2]). Few tags are used very often, while most of all the tags attached to a resource do occur only scarcely. Those few tags rather often can be identified with descriptive tags, while the so called 'long tail' of the other tags often belong to the category of functional tags.
So, how can we make use f that fact?
In [3] the authors propose to use tags for search query refinement. For that reason, they distinguish between two defferent categores of tags (that do not necessarely correspondent with descriptive and functional tags). They distinguish search keywords as being the most popular tags assigned to a resource, which can help to increase the hit rate if being used for query refinement, and exploration keywords, which cannot. Because exploration keywords reflect the personalized search context and information need of an individual user they are supposed to be helpful for the exploration process.