Publications

View publication

Title Using Linked Data to Mine RDF from Wikipedia's Tables
Authors Emir Muñoz, Aidan Hogan, Alessandra Mileo
Publication date 2014
Abstract The tables embedded in Wikipedia articles contain rich,
semi-structured encyclopaedic content. However, the cumulative content of
these tables cannot be queried against. We thus propose methods to recover
the semantics of Wikipedia tables and, in particular, to extract facts from
them in the form of RDF triples. Our core method uses an existing Linked
Data knowledge-base to find pre-existing relations between entities in
Wikipedia tables, suggesting the same relations as holding for other
entities in analogous columns on different rows. We find that such an
approach extracts RDF triples from Wikipedia's tables at a raw precision of
40%. To improve the raw precision, we define a set of features for extracted
triples that are tracked during the extraction phase. Using a manually
labelled gold standard, we then test a variety of machine learning methods
for classifying correct/incorrect triples. One such method extracts 7.9
million unique and novel RDF triples from over one million Wikipedia tables
at an estimated precision of 81.5%.
Pages 533-542
Conference name International ACM Web Search and Data Mining Conference
Publisher ACM Press (New York, NY, USA)
Reference URL View reference page