Saturday, September 15, 2007

Taxonomy .... really it's a battleground

Taxonomy as a science has been founded by the 18th century Swedish botanist Carl Linné (later enobled Carl von Linné or more fashionable in Latin Carolus Linnaeus). He was born in 1707 (we are celebrating his 300th birthday!!) and after some difficulties to start (he was a rather sluggish student and his dissapointed father saw no other option than to apprentice him to a cobbler...Linné soon realized that academia might not be the worst choice and begged for a second chance, which was granted) he studied medicine in Sweden and Holland. But, his passion should become nature and all living things. He started to write catalogues of the world's plants and animal species, using a system devised by his own, which lead to his great Systema Naturae and made him famous. He is reported not being the most modest man of his time. He even suggested that his grave stone should bear the inscription Princeps Botanicorum (similar to the title having been granted to Carl Friedrich Gauss, the Prince of Mathematicians).

Linné classified all living things on earth according to its physical attributes. The idea is to categorize everything hierarchically. A certain species belongs to a special genus, several genera belong to a special family, which is further summarized in orders, classes, subphyla, phyla, kingdoms, and domains. So, e.g., man is of the genus Homo and of the species Sapiens. We belong to the family of Hominidae, which is part of the order Primates, which belong to the class Mammalia, which belong to the subphylum Vertebrae, belonging to the phylum Chordata. Furthermore we belong to the kingdom animalia and the domain eucaria. This is the Taxonomy we use today. At the times of Linné genus and species were already in use for about a hundred years, the term phylum (being adapted from the Greek φυλαί [phylai], the clan-based voting groups in Greek city-states) has first been coined by one of our local heroes (remember, I'm working at the University of Jena) Ernst Haeckel in 1876.

Although the system of taxonomy seems to be pretty straightforward, the trouble is to determine how many categories to divise from each section above. Even at a such basic level as phylum, which determines the very basic building plans of organisms, Wikipedia refers to 35 phyla, other biologists opt for a total of about thirty, others for even less. The American entomologist Edward O. Wilson actually votes for 89 phyla. So, where to decide the divisions...?
The other problem is double entries. There are, e.g., more than 5000 species of grass -- all of them looking rather similar. Some of them are reported to being inserted in taxonomy under about twenty different names independently by different scientists.

So, the general trouble seems to be common agreement. Common agreement is a fundamental principle of ontologies -- and taxonomies are a form of lightweight ontologies. There are millions of species populating our planet. Taxonomy is the prime tool to put (some) order in this diversity. Linné introduced his system in the early 18th century. For more than 200 years now biologists are trying to work on an unambiguous and consistent classification scheme and try to populate it a consistent way. Still today they have not finished. Besides some quarrels there is a vast ammount of species that has not been discovered now. So, the work being invested into the science taxonomy will continue...and who knows if it will ever end....
Thinking of ontologies and the Semantic Web, I think we are in some similar situation as biology has experienced during the last 200 years. Computer scientists are quarrelling about top level ontologies and there are many independently developed (overlapping and ambiguous) domain ontologies. Ontological Engineering -- including ontology mapping, ontology alignment, and ontology merging -- tries to find some way to get along even with inconsistencies...

BTW, ontology mapping then should also find a way to map Linné's taxonomy to Jorge Luis Borges "The Analytical Language of John Wilkins," where he describes 'a certain Chinese Encyclopedia,' the Celestial Emporium of Benevolent Knowledge, in which it is written that animals are divided into:
(1) those that belong to the Emperor,
(2) embalmed ones,
(3) those that are trained,
(4) suckling pigs,
(5) mermaids,
(6) fabulous ones,
(7) stray dogs,
(8) those included in the present classification,
(9) those that tremble as if they were mad,
(10) innumerable ones,
(11) those drawn with a very fine camelhair brush,
(12) others,
(13) those that have just broken a flower vase,
(14) those that from a long way off look like flies.

More on Carl Linné, taxonomy in biology, and 'nearly everything' you may find here:
- Bill Bryson: A Short History of Nearly Everything, Doubleday, 2003.