View publication
Title | Merging Web Tables for Relation Extraction with Knowledge Graphs |
Authors | Jhomara Luzuriaga, Emir Muñoz, Henry Rosales-Méndez, Aidan Hogan |
Publication date | 2023 |
Abstract |
We propose methods for extracting triples from Wikipedia's HTML tables using a reference knowledge graph. Our methods use a distant-supervision approach to find existing triples in the knowledge graph for pairs of entities on the same row of a table, postulating the corresponding relation for pairs of entities from other rows in the corresponding columns, thus extracting novel candidate triples. Binary classifiers are applied on these candidates to detect correct triples and thus increase the precision of the output triples. We extend this approach with a preliminary step where we first group and merge similar tables, thereafter applying extraction on the larger merged tables. More specifically, we propose an observed schema for individual tables, which is used to group and merge tables. We compare the precision and number of triples extracted with and without table merging, where we show that with merging, we can extract a larger number of triples at a similar precision. Ultimately, from the tables of English Wikipedia, we extract 5.9 million novel and unique triples for Wikidata at an estimated precision of 0.718. |
Downloaded | 33 times |
Pages | 1803-1816 |
Volume | 35 |
Journal name | IEEE Transactions on Knowledge and Data Engineering |
Publisher | IEEE Press (Piscataway, NJ, USA) |
Reference URL |