View publication
| Title | QAWiki: A Knowledge Graph Question Answering & SPARQL Query Generation Dataset for Wikidata |
| Authors | Alberto Moya Loustaunau, Aidan Hogan |
| Publication date | 2025 |
| Abstract | In this resource paper, we present QAWiki: a multilingual, handcrafted, knowledge graph question answering and SPARQL query generation dataset for Wikidata. QAWiki consists of 526 questions over Wikidata, of which 518 are associated with SPARQL queries, and 8 are disambiguation questions. Each question is presented in both English and Spanish, and includes paraphrased versions of the question, as well as annotations of entity and relation mentions for Wikidata. The dataset is hosted in a Wikibase instance, which allows for collaborative editing and refinement of the dataset by the community, among other features. Further metadata include tagging questions with issues (e.g., incompleteness, imprecision, ambiguity) as well as defining relations between questions (e.g., a question whose answers are contained in another question, etc.). QAWiki can thus be used as an evaluation (and training) dataset for knowledge graph question answering & query generation systems. We provide illustrative experiments over QAWiki using GPT 4o to generate SPARQL queries over Wikidata, comparing performance with and without passing entity mentions to the model via the prompt. |
| Conference name | Wikidata Workshop |
| Publisher | CEUR Publications |
| Reference URL |
|

