Ok, let's start with the general shortcomings of keyword based search engines:
(1) Homonymy: The keyword used to query the search engine can have multiple meanings while being the same word. This means that you will get many non relevant results (bad precision) because their keyword refers to another meaning.
(2) Synonymy: There are other words that have the same meaning as the keyword that you have entered as query string. Thus, you won't get all relevant results (bad recall). This also refers to the fact that flickr is an international platform and the tags being used are of multilangual origin.
Usually, one can try to cope with this polysemic shortcomings by refining the query string -- broadening or narrowing the search results by connecting additional keywords with a boolean operator (AND / OR).
Then, in CTS there are additional problems that originate from the people's way to generate tags.
(3) Tagging: Different kind of tags can be distinguished. A rough classification distinguish between descritional and functional tags. While descriptional tags -- as the name says -- describe the resource being tagged in a general way ( and thus, might be also useful for others), functional tags most times are only relevant for the single user who tagged this resource.
Classic example: A picture of an apple can be tagged as 'apple' (relevant for all users) as well as 'breakfast' (relevant maybe for a fraction of users).
Taken all three arguments together, collaborative tagging systems really have some problems, if they are considered as search engines, because bad precision and recall lead to bad results.
Another point is relevance. Search engines -- as e.g. google -- introduce relevance weights (as e.g. google's pagerank) to consider the importance of the single search results. The search results are ordered according to their relevance. Google's pagerank roughly reflects the fact that a resource can be considered as being important, if many other resources link to that particular resource (in addition also the number of outgoing links and the weight of each single link is considered). So, try to adapt this concept to flickr.... First, a picture doesn't link to other pictures. No big deal, flickr offers the user to choose 'favourites' among all the pictures. In addition there are some statistical indicators, as e.g. page views or an indicator of 'interestingness' as well as the number of comments. Thus, a relevance indicator could be moulded that could be used for a ranking of search results.
Unfortunately, if you try a search in flickr you will get a huge amount of results without any visible order or relevance. (Simply try 'Rome', you will get more than 290.000 results. Of course there are pictures of the city of Rome among the results...but also many other pictures, where the tag 'Rome' has been assigned for other reasons.)
One problem with flickr for sure is that it is not a 'real' CTS. It seems to be focussed on people tagging their 'own' resources and not the resources of others. In addition, the tags being assigned to a resource are considered as being a set and not a list (or bag - where you can see how often a tag has been assigned by how many users). Thus, tag convergence is out of reach.
On the other hand, there are some nice tools around. Just try this nice little flickr tagbrowser application. OK, it has the already mentioned shortcomings, but it's a nice little gadget.