Thursday, August 03, 2006

An Integrated View on Document Annotation...

Did you realize that documents - no matter what type of document, as e. g., text documents, graphic documents, or videos - can be all subsumed within an abstract view?

We can define a document to be a string of addressable tokens. Some of these tokens are special tokens, the so called tags. In difference to all the other tokens that can be regarded as the document content, tags have a special function: they mark up single tokens or groups of tokens (document units) to be interpreted in a special syntactic or semantic way. On the other hand, tags can also be interpreted as document annotation, i.e. the document can be regarded as an unformatted string of tokens, while the tag (annotation) defines the document structure. We distinguish between different types of tags:
  • structural tags define the structural elements of a document, as e.g., in a text document we can define sentences, paragraphs, sections, chapters, headings, annotations, e.a.
  • referential tags define relationships between document units of the same (internal) or some other (external) document unit, as e.g., in a text document see-references or bibliographic references, e.a.
  • conceptual tags define concepts and relationships of concepts within a document, as e.g., in a text document index entries and hierarchical relationships of index entries, e.a.
Thus, within a document, we can distinguish between the logical structure, the referential structure, and the conceptual structure that are each dependend on each other. Additionally, we distinguish between tags that are supplied by the author (e.g. structural tags) and tags provided by the user (e.g. "tags", annotations, reviews, e.a.).

We can apply this view also on other types of documents, as e.g., video documents: The smallest unit - the tokens - of a video document is a single pixel. Considering the logical structure, pixels can be subsumed to blocks, which form macro blocks, which can be subsumed to slices, which together constitute a frame (picture). In difference to a text document, a video document also depends on time. Thus, we can identify groups of pictures (gop) that form the entire video sequence. In mpeg-4 we even have the possibility to identify objects within a picture that can be subsumed within a scene. With mpeg-7 also conceptual and referential information can be added to the video by adding metadata about the author or about objects or scenes within the video.