photo: wikipedia |
So we were stranded with the problem that The Lord of the Rings was (of course) a notable work of J.R.R. Tolkien, but DBpedia said that The Lord of the Rings is not a "book", but it consists of 3 books [1,2]. The problem then is that filtering "notable works" with "books" cuts out book series. If we don't use the filter "notable works", then we will have also paintings, photographs, sculptures, etc. in our result list.
Let's find out about books in general in DBpedia. How many are there anyway? If we simply ask for all entities of the type "book", then we end up with currently 28,128 books [3]. If we are asking for all things that have an author, then we end up with 63.071 [4] or 71.046 [5] depending on the way we ask. OK, not everything what is authored by somebody is also a book. There might be short stories, essays, articles, but also series of books. Ok, say we don't care and try it with all these kind of written works. Moreover, let's also consider the overall impact of all the written works of an author (by simply sum up all indegrees (=popularity) of her books). The next table shows the Top 40 authors list based on all written works of the author and ordered by the overall impact (GrandTotal) [6]:
name | numOfWorks | popularityOfWorks | GrandTotal | authDegree |
---|---|---|---|---|
"Charles Dickens"@en | 8 | 398.7 | 3190 | 4026 |
"J. R. R. Tolkien"@en | 3 | 819.2 | 2964 | 2859 |
"Elizabeth Sarnoff"@en | 1 | 2457.0 | 2457 | 99 |
"Robin Green"@en | 1 | 2277.0 | 2277 | 130 |
"Lewis Carroll"@en | 2 | 939.0 | 1878 | 1576 |
"J. K. Rowling"@en | 1 | 1829.0 | 1829 | 983 |
"Michael Stewart (playwright)"@en | 6 | 233.6 | 1402 | 120 |
"Robert Louis Stevenson"@en | 3 | 448.0 | 1344 | 1457 |
"George Orwell"@en | 4 | 363.7 | 1309 | 1687 |
"Arthur Miller"@en | 4 | 319.2 | 1277 | 1077 |
"Miguel de Cervantes"@en | 2 | 611.0 | 1222 | 992 |
"Henrik Ibsen"@en | 4 | 299.7 | 1199 | 1630 |
"Bram Stoker"@en | 1 | 1145.0 | 1145 | 717 |
"Stephen King"@en | 7 | 146.4 | 1105 | 2906 |
"Oscar Wilde"@en | 2 | 486.3 | 1032 | 2324 |
"Samuel Beckett"@en | 8 | 83.2 | 889 | 1414 |
"C. S. Lewis"@en | 6 | 105.4 | 881 | 1530 |
"Naoko Takeuchi"@en | 1 | 875.0 | 875 | 133 |
"Alexandre Dumas"@en | 2 | 427.5 | 855 | 1125 |
"Roald Dahl"@en | 16 | 53.8 | 834 | 855 |
"Tony Barwick"@en | 4 | 202.2 | 809 | 114 |
"Terry Pratchett"@en | 2 | 292.6 | 809 | 1032 |
"Jeremy Lloyd"@en | 6 | 129.0 | 774 | 251 |
"Jimmy Perry"@en | 2 | 382.5 | 765 | 84 |
"Victoria Morrow"@en | 1 | 760.0 | 760 | 12 |
"Leo Tolstoy"@en | 2 | 374.5 | 749 | 1927 |
"Dan Brown"@en | 3 | 224.3 | 673 | 387 |
"John Steinbeck"@en | 3 | 218.0 | 654 | 984 |
"Mark Twain"@en | 2 | 333.6 | 615 | 2426 |
"Isaac Asimov"@en | 4 | 123.7 | 607 | 2026 |
"Jonathan Swift"@en | 2 | 303.5 | 607 | 1190 |
"Joseph Conrad"@en | 11 | 59.4 | 603 | 909 |
"Tsugumi Ohba"@en | 2 | 283.0 | 566 | 62 |
"Vladimir Nabokov"@en | 4 | 157.1 | 535 | 1015 |
"Yoshihiro Togashi"@en | 2 | 266.5 | 533 | 70 |
"Ryukishi07"@en | 2 | 264.5 | 529 | 57 |
"Aldous Huxley"@en | 3 | 175.3 | 526 | 968 |
"Dustin Lance Black"@en | 2 | 262.5 | 525 | 159 |
"Rudyard Kipling"@en | 3 | 196.2 | 520 | 1829 |
"Charlotte Brontë"@en | 2 | 258.0 | 516 | 510 |
The column "authDegree" denotes the popularity of the author alone (measured by the indegree of the author's article in Wikipedia). We notice that the situation has changed since our last experiment. Tolkien is now reported with 3 books (denoting that The Lord of the Rings is now included in his notable works list). Interestingly, he is now leading the list together with Charles Dickens, followed by Elizabeth Sarnoff and Robin Green, who only are mentioned with one notable work. Never heard of the later two? Well, Elizabeth Sarnoff is a writer for tv series as e.g. Lost, while Robin Green was writer and producer of The Sopranos. By opening the category "books" we now also have screen writers in our list, and the popularity of tv series seems to be rather significant, at least compared to literature.
The same holds for Naoko Takeuchi. Ever heard about her? Well, maybe you don't. But, then you probably will know Sailor Moon, a very popular Japanese manga series. Yes, now we also include what belongs our contemporary literary culture: tv series and comics. Naoko Takeuchi between C.S. Lewis (The Narnia Chronicles) and Alexandre Dumas (The Three Musketeers). That fits well, doesn't it? If you look at the last column (authDegree), you will notice that this value is significant lower for Elizabeth Sarnoff, Robin Green, and Naoko Takeuchi as compared with C.S. Lewis or Alexandre Dumas. So maybe there works are currently rather popular, but in total the cultural influence of the already established writers of the past seems to have more impact.
As a last question - and then I won't bother you with this statistics again - lets reorder the current table according to the authors general impact (authDegree) [7].
name | numOfWorks | popularityOfWorks | GrandTotal | authDegree |
---|---|---|---|---|
"Charles Dickens"@en | 8 | 398.7 | 3190 | 4026 |
"Stephen King"@en | 7 | 146.4 | 1105 | 2906 |
"J. R. R. Tolkien"@en | 3 | 819.2 | 2964 | 2859 |
"Johann Wolfgang von Goethe"@en | 5 | 73.8 | 363 | 2734 |
"Cicero"@en | 1 | 23.0 | 23 | 2469 |
"Mark Twain"@en | 2 | 333.6 | 615 | 2426 |
"Oscar Wilde"@en | 2 | 486.3 | 1032 | 2324 |
"Isaac Asimov"@en | 4 | 123.7 | 607 | 2026 |
"H. P. Lovecraft"@en | 3 | 101.3 | 304 | 2024 |
"T. S. Eliot"@en | 3 | 169.8 | 477 | 1995 |
"Leo Tolstoy"@en | 2 | 374.5 | 749 | 1927 |
"Arthur Conan Doyle"@en | 1 | 130.0 | 130 | 1882 |
"Rudyard Kipling"@en | 3 | 196.2 | 520 | 1829 |
"George Orwell"@en | 4 | 363.7 | 1309 | 1687 |
"Henrik Ibsen"@en | 4 | 299.7 | 1199 | 1630 |
"Neil Gaiman"@en | 5 | 89.1 | 424 | 1589 |
"Lewis Carroll"@en | 2 | 939.0 | 1878 | 1576 |
"C. S. Lewis"@en | 6 | 105.4 | 881 | 1530 |
"Rabindranath Tagore"@en | 4 | 67.8 | 276 | 1530 |
"Alan Moore"@en | 1 | 17.0 | 17 | 1480 |
"Robert Louis Stevenson"@en | 3 | 448.0 | 1344 | 1457 |
"Samuel Beckett"@en | 8 | 83.2 | 889 | 1414 |
"Alexander Pushkin"@en | 2 | 131.0 | 262 | 1372 |
"Philip K. Dick"@en | 4 | 95.0 | 380 | 1259 |
"Arthur C. Clarke"@en | 2 | 72.5 | 145 | 1252 |
"Ray Bradbury"@en | 3 | 129.0 | 387 | 1249 |
"Robert E. Howard"@en | 3 | 35.0 | 90 | 1223 |
"Jonathan Swift"@en | 2 | 303.5 | 607 | 1190 |
"William Wordsworth"@en | 1 | 79.0 | 79 | 1164 |
"Henry James"@en | 6 | 74.0 | 444 | 1128 |
"Alexandre Dumas"@en | 2 | 427.5 | 855 | 1125 |
"William S. Burroughs"@en | 1 | 159.0 | 159 | 1119 |
"Arthur Miller"@en | 4 | 319.2 | 1277 | 1077 |
"Jack Kerouac"@en | 3 | 123.0 | 369 | 1034 |
"Terry Pratchett"@en | 2 | 292.6 | 809 | 1032 |
"Virginia Woolf"@en | 3 | 89.6 | 269 | 1026 |
"Vladimir Nabokov"@en | 4 | 157.1 | 535 | 1015 |
"Miguel de Cervantes"@en | 2 | 611.0 | 1222 | 992 |
"John Steinbeck"@en | 3 | 218.0 | 654 | 984 |
"J. K. Rowling"@en | 1 | 1829.0 | 1829 | 983 |
Now we also see authors that haven't shown up in the first place because their single works were somehow too insignificant, but their overall impact wasn't. Goethe, Mark Twain, but also Stephen King, Cicero, or Arthur Conan Doyle are then among the top ranked authors. But no writers of tv shows or any mangas.
Well, you might ask why it should make sense to make a statistics like that when you get so many different and in the end confusing results. Which one should we trust? If you ask me, trust none of them. All data is insufficient and doesn't reflect the "whole story", esp. in DBpedia (which is again only a fraction of all information available in Wikipedia, which again only reflects one (or some) specific viewpoints of reality. Thus the old saying is reinforced again: Don't ever trust statistics that you haven't falsified yourself.
[3] Number of all books in DBpedia, -> ?book rdf:type dbpedia:Book .
[4] Number of books by counting what is authored -> ?book dbpedia-owl:author ?author .
[5] Number of books by counting what is authored, using dbpedia-owl:author or dbprop:author