Sunday, June 22, 2014

Harald's Original Miscellany - Prolificacy vs. Popularity - Part 3

photo: wikipedia
So we were stranded with the problem that The Lord of the Rings was (of course) a notable work of J.R.R. Tolkien, but DBpedia said that The Lord of the Rings is not a "book", but it consists of 3 books [1,2]. The problem then is that filtering "notable works" with "books" cuts out book series. If we don't use the filter "notable works", then we will have also paintings, photographs, sculptures, etc. in our result list. 

Let's find out about books in general in DBpedia. How many are there anyway? If we simply ask for all entities of the type "book", then we end up with currently 28,128 books [3]. If we are asking for all things that have an author, then we end up with 63.071 [4] or 71.046 [5] depending on the way we ask. OK, not everything what is authored by somebody is also a book. There might be short stories, essays, articles, but also series of books. Ok, say we don't care and try it with all these kind of written works. Moreover, let's also consider the overall impact of all the written works of an author (by simply sum up all indegrees (=popularity) of her books). The next table shows the Top 40 authors list based on all written works of the author and ordered by the overall impact (GrandTotal) [6]:

name numOfWorks popularityOfWorks GrandTotal authDegree
"Charles Dickens"@en 8 398.7 3190 4026
"J. R. R. Tolkien"@en 3 819.2 2964 2859
"Elizabeth Sarnoff"@en 1 2457.0 2457 99
"Robin Green"@en 1 2277.0 2277 130
"Lewis Carroll"@en 2 939.0 1878 1576
"J. K. Rowling"@en 1 1829.0 1829 983
"Michael Stewart (playwright)"@en 6 233.6 1402 120
"Robert Louis Stevenson"@en 3 448.0 1344 1457
"George Orwell"@en 4 363.7 1309 1687
"Arthur Miller"@en 4 319.2 1277 1077
"Miguel de Cervantes"@en 2 611.0 1222 992
"Henrik Ibsen"@en 4 299.7 1199 1630
"Bram Stoker"@en 1 1145.0 1145 717
"Stephen King"@en 7 146.4 1105 2906
"Oscar Wilde"@en 2 486.3 1032 2324
"Samuel Beckett"@en 8 83.2 889 1414
"C. S. Lewis"@en 6 105.4 881 1530
"Naoko Takeuchi"@en 1 875.0 875 133
"Alexandre Dumas"@en 2 427.5 855 1125
"Roald Dahl"@en 16 53.8 834 855
"Tony Barwick"@en 4 202.2 809 114
"Terry Pratchett"@en 2 292.6 809 1032
"Jeremy Lloyd"@en 6 129.0 774 251
"Jimmy Perry"@en 2 382.5 765 84
"Victoria Morrow"@en 1 760.0 760 12
"Leo Tolstoy"@en 2 374.5 749 1927
"Dan Brown"@en 3 224.3 673 387
"John Steinbeck"@en 3 218.0 654 984
"Mark Twain"@en 2 333.6 615 2426
"Isaac Asimov"@en 4 123.7 607 2026
"Jonathan Swift"@en 2 303.5 607 1190
"Joseph Conrad"@en 11 59.4 603 909
"Tsugumi Ohba"@en 2 283.0 566 62
"Vladimir Nabokov"@en 4 157.1 535 1015
"Yoshihiro Togashi"@en 2 266.5 533 70
"Ryukishi07"@en 2 264.5 529 57
"Aldous Huxley"@en 3 175.3 526 968
"Dustin Lance Black"@en 2 262.5 525 159
"Rudyard Kipling"@en 3 196.2 520 1829
"Charlotte Brontë"@en 2 258.0 516 510

The column "authDegree" denotes the popularity of the author alone (measured by the indegree of the author's article in Wikipedia). We notice that the situation has changed since our last experiment. Tolkien is now reported with 3 books (denoting that The Lord of the Rings is now included in his notable works list). Interestingly, he is now leading the list together with Charles Dickens, followed by Elizabeth Sarnoff and Robin Green, who only are mentioned with one notable work. Never heard of the later two? Well, Elizabeth Sarnoff is a writer for tv series as e.g. Lost, while Robin Green was writer and producer of The Sopranos. By opening the category "books" we now also have screen writers in our list, and the popularity of tv series seems to be rather significant, at least compared to literature.

The same holds for Naoko Takeuchi. Ever heard about her? Well, maybe you don't. But, then you probably will know Sailor Moon, a very popular Japanese manga series. Yes, now we also include what belongs our contemporary literary culture: tv series and comics. Naoko Takeuchi between C.S. Lewis (The Narnia Chronicles) and Alexandre Dumas (The Three Musketeers). That fits well, doesn't it? If you look at the last column (authDegree), you will notice that this value is significant lower for Elizabeth Sarnoff, Robin Green, and Naoko Takeuchi as compared with C.S. Lewis or Alexandre Dumas. So maybe there works are currently rather popular, but in total the cultural influence of the already established writers of the past seems to have more impact.

As a last question - and then I won't bother you with this statistics again - lets reorder the current table according to the authors general impact (authDegree) [7].

name numOfWorks popularityOfWorks GrandTotal authDegree
"Charles Dickens"@en 8 398.7 3190 4026
"Stephen King"@en 7 146.4 1105 2906
"J. R. R. Tolkien"@en 3 819.2 2964 2859
"Johann Wolfgang von Goethe"@en 5 73.8 363 2734
"Cicero"@en 1 23.0 23 2469
"Mark Twain"@en 2 333.6 615 2426
"Oscar Wilde"@en 2 486.3 1032 2324
"Isaac Asimov"@en 4 123.7 607 2026
"H. P. Lovecraft"@en 3 101.3 304 2024
"T. S. Eliot"@en 3 169.8 477 1995
"Leo Tolstoy"@en 2 374.5 749 1927
"Arthur Conan Doyle"@en 1 130.0 130 1882
"Rudyard Kipling"@en 3 196.2 520 1829
"George Orwell"@en 4 363.7 1309 1687
"Henrik Ibsen"@en 4 299.7 1199 1630
"Neil Gaiman"@en 5 89.1 424 1589
"Lewis Carroll"@en 2 939.0 1878 1576
"C. S. Lewis"@en 6 105.4 881 1530
"Rabindranath Tagore"@en 4 67.8 276 1530
"Alan Moore"@en 1 17.0 17 1480
"Robert Louis Stevenson"@en 3 448.0 1344 1457
"Samuel Beckett"@en 8 83.2 889 1414
"Alexander Pushkin"@en 2 131.0 262 1372
"Philip K. Dick"@en 4 95.0 380 1259
"Arthur C. Clarke"@en 2 72.5 145 1252
"Ray Bradbury"@en 3 129.0 387 1249
"Robert E. Howard"@en 3 35.0 90 1223
"Jonathan Swift"@en 2 303.5 607 1190
"William Wordsworth"@en 1 79.0 79 1164
"Henry James"@en 6 74.0 444 1128
"Alexandre Dumas"@en 2 427.5 855 1125
"William S. Burroughs"@en 1 159.0 159 1119
"Arthur Miller"@en 4 319.2 1277 1077
"Jack Kerouac"@en 3 123.0 369 1034
"Terry Pratchett"@en 2 292.6 809 1032
"Virginia Woolf"@en 3 89.6 269 1026
"Vladimir Nabokov"@en 4 157.1 535 1015
"Miguel de Cervantes"@en 2 611.0 1222 992
"John Steinbeck"@en 3 218.0 654 984
"J. K. Rowling"@en 1 1829.0 1829 983

Now we also see authors that haven't shown up in the first place because their single works were somehow too insignificant, but their overall impact wasn't. Goethe, Mark Twain, but also Stephen King, Cicero, or Arthur Conan Doyle are then among the top ranked authors. But no writers of tv shows or any mangas.

Well, you might ask why it should make sense to make a statistics like that when you get so many different and in the end confusing results. Which one should we trust? If you ask me, trust none of them. All data is insufficient and doesn't reflect the "whole story", esp. in DBpedia (which is again only a fraction of all information available in Wikipedia, which again only reflects one (or some) specific viewpoints of reality. Thus the old saying is reinforced again: Don't ever trust statistics that you haven't falsified yourself.


[3] Number of all books in DBpedia, -> ?book rdf:type dbpedia:Book .
[4] Number of books by counting what is authored -> ?book dbpedia-owl:author ?author .
[5] Number of books by counting what is authored, using dbpedia-owl:author or dbprop:author