Friday, June 20, 2014

Harald's Original Miscellany - Prolificacy vs. Popularity in Literature

Oh wow, it's quite a while that I wrote my last post here in the blog... But, while preparing exercises for the OpenHPI MOOC course 'Knowledge Engineering with Semantic Technologies', I was about to play around a little with SPARQL to come up with new exercises for the students of the course. To make it short, our current lecture examples all deal with writers and books. Thus, to learn how to query RDF knowledge bases with the SPARQL query language, I chose the DBpedia. While trying to think of some interesting toy examples, I started to play around and the facts that I discovered by chance were so interesting that I totally forgot about my lunch break :)

So here are some interesting facts about books and authors that will be continued in later posts. All presented statistics is based on the (English) Wikipedia (of course for the SPARQL queries we use DBpedia)...but nevertheless, it is wikipedia knowledge.

There are currently 15,328 authors listed (i.e. they are member of the class dbpedia-owl:Writer). First thing I wanted to find out was, who are the most prolific authors according to Wikipedia (at least this means, whose works also exist as Wikipedia Pages and who are connected via dbpedia-owl:author).

Well, here are the Top 40 Most Prolific Writers:

name numOfWorks popularityOfWorks
"L. Sprague de Camp"@en 128 10.9
"Agatha Christie"@en 103 32.7
"Isaac Asimov"@en 75 28.3
"Stephen King"@en 75 44.0
"Philip K. Dick"@en 74 14.4
"Edgar Rice Burroughs"@en 73 18.7
"Ruth Rendell"@en 70 5.3
"Dean Koontz"@en 67 6.2
"Lin Carter"@en 64 9.0
"Terry Pratchett"@en 63 41.7
"Jules Verne"@en 63 31.2
"P. G. Wodehouse"@en 63 20.9
"Robert E. Howard"@en 62 10.5
"Gary Paulsen"@en 61 4.1
"K. A. Applegate"@en 61 18.5
"August Derleth"@en 60 5.2
"John Dickson Carr"@en 59 5.9
"H. G. Wells"@en 57 30.0
"James Patterson"@en 56 12.0
"Leslie Charteris"@en 55 8.8
"Robert A. Heinlein"@en 52 36.5
"Rex Stout"@en 52 21.4
"Arthur C. Clarke"@en 50 20.8
"Harry Turtledove"@en 49 11.3
"Ray Bradbury"@en 49 15.3
"David Weber"@en 49 30.9
"Danielle Steel"@en 48 3.3
"J. M. G. Le Clézio"@en 48 3.7
"Henry James"@en 48 18.5
"Piers Anthony"@en 48 14.0
"Clive Cussler"@en 47 10.1
"Roger Zelazny"@en 45 10.5
"Alan Dean Foster"@en 44 6.7
"Joe R. Lansdale"@en 43 4.7
"Gordon R. Dickson"@en 41 5.0
"Marion Zimmer Bradley"@en 41 7.6
"Samuel R. Delany"@en 41 9.0
"Bernard Cornwell"@en 40 16.6
"Enid Blyton"@en 40 6.8
"Dr. Seuss"@en 40 26.5
The average Popularity Score that you see in the third column corresponds to the number of references (links) from other wikipedia articles to these books. Interestingly, Agatha Christie as well as Isaac Asimov are rather prolific authors whose books also have an above the average popularity. On the other hand, Ruth Rendell or Dean Koontz are rather prolific, but not very popular (at least according to wikipedia). Most popular in this list are Stephen King and Terry Prachett.

Well, let's turn it the other way around. Let's sort this list by the average popularity of the books of these authors....

Here is the Top 40 list of the authors with the most popular books (on average):
name numOfWorks popularityOfWorks
"John Simpson (lexicographer)"@en 1 1627.0
"John Milton"@en 1 669.0
"Kenneth Grahame"@en 1 423.0
"Emily Brontë"@en 1 387.0
"Wilhelm Grimm"@en 1 343.0
"Jacob Grimm"@en 1 343.0
"Harper Lee"@en 1 334.0
"Lewis Carroll"@en 6 319.8
"Miguel de Cervantes"@en 4 310.5
"Jonathan Swift"@en 2 303.5
"Wilbert Awdry"@en 1 296.0
"Cao Xueqin"@en 1 291.0
"Giovanni Boccaccio"@en 1 287.0
"William Shakespeare"@en 3 264.6
"Ian McFarlane"@en 1 255.0
"Antoine de Saint-Exupéry"@en 1 253.0
"Margaret Mitchell"@en 2 242.0
"Suetonius"@en 1 229.0
"Roger Hargreaves"@en 2 214.0
"Joseph O'Neill (writer)"@en 1 211.0
"T. S. Eliot"@en 2 186.0
"George Bernard Shaw"@en 2 184.5
"Harriet Beecher Stowe"@en 3 173.6
"Petronius"@en 1 162.0
"Charles Dickens"@en 30 159.5
"Johanna Spyri"@en 1 156.0
"Herman Melville"@en 7 154.7
"Jaroslav Hašek"@en 1 152.0
"Pierre Choderlos de Laclos"@en 1 152.0
"Oscar Wilde"@en 3 148.0
"Dave Arneson"@en 3 146.0
"Ngô Sĩ Liên"@en 1 145.0
"John Eric Holmes"@en 2 143.5
"Carlo Collodi"@en 1 142.0
"George Orwell"@en 11 135.0
"Erik Mona"@en 3 134.6
"Daniel Defoe"@en 5 132.8
"Monte Cook"@en 3 132.3
"Apuleius"@en 1 132.0
"Dan Brown"@en 6 126.6
Possibly you have never heard of John Simpson? But you will have heard about the Oxford English Dictionary. Well John Simpson was its Chief Editor...that makes sense, doesn't it? What about Kenneth Graham? Maybe you know his 1908 published novel The Wind and the Willows...

In this list it seems that it is more about literary excellency. Only one author with a rather prolific output is found, which is Charles Dickens with 30 listed Books.  But, to find also Dan Brown on this list tells me, that popularity doesn't hold for literary excellency or quality. At least he is last among the Top 40 after Herman Melville, George Orwell, Daniel Defoe, Lewis Caroll or Jonathan Swift. On the other hand, John Milton did not become rich with his one shot Paradise Lost although it is rather popular.

Here are the links to the online queries to get the most recent and complete results:
Enjoy....I'll be back, when I will find again something interesting ;-)