Tuesday, July 27, 2010

There are more Things in Heaven and Earth... - DBPedia Link Graph Analysis Revisited

In the course of our ongoing work with Linked Open Data, we recently made some analysis on the graph structure of DBPedia data. For this we only took under consideration the original link graph (aka 'wikilinks'), where we did some cleanup first, such as, e.g., resolving redirects, etc.

As a side effect, we had to compute in-degree and out-degree of all DBPedia entities according to wikilinks, ... and we discovered some more or less surprising facts (thanks to Nadine):

The entity with the hightest out-degree (i.e. number of outgoing links) currently is:
with 7.147 outlinks (after cleanup, and remember it's wikilinks and not typed links of DBPedia)

The entity with the highest in-degree (i.e. number of incoming links) currently is:
with 440.151 inlinks (after cleanup)

While the 2nd one (living people) seemed pretty clear to me, the first (Afghanistan places...) was a bit of a surprise (as also are trilobytes...). For all the explorers among us, I have included the Top Ten list of incoming and outgoing wikilinks, each with indegree and associated outdegree...

Top Ten Incoming in out
http://dbpedia.org/resource/Living_people 440151 0
http://dbpedia.org/resource/United_States 385407 963
http://dbpedia.org/resource/France 124206 759
http://dbpedia.org/resource/England 123223 1320
http://dbpedia.org/resource/United_Kingdom 121203 1152
http://dbpedia.org/resource/List_of_sovereign_states 114086 465
http://dbpedia.org/resource/Canada 105849 523
http://dbpedia.org/resource/Germany 103382 889
http://dbpedia.org/resource/Animal 98680 236
http://dbpedia.org/resource/World_War_II 93555 771
http://dbpedia.org/resource/Association_football 90673 196

Top Ten Outgoinginout
http://dbpedia.org/resource/Flora_of_New_South_Wales 917 6819
http://dbpedia.org/resource/List_of_municipalities_of_Brazil 1 5503
http://dbpedia.org/resource/Index_of_India-related_articles 4 5369
http://dbpedia.org/resource/Area_codes_in_Germany 6 5360
http://dbpedia.org/resource/IUCN_Red_List_vulnerable_species_%28Plantae%29 0 5172
http://dbpedia.org/resource/List_of_trilobites6 5102
http://dbpedia.org/resource/List_of_Social_Democratic_Party_of_Germany_members 24 5078
http://dbpedia.org/resource/List_of_French_words_of_Germanic_origin 9 5010
http://dbpedia.org/resource/Index_of_Thailand-related_articles 4 4831

But there are more interesting things to discover ... stay tuned!

