Publications

Stats

View publication

Title Hate Speech Detection is not as Easy as you may Think: A Closer Look at Model Validation (Extended Version)
Authors Ayme Arango, Jorge Pérez, Bárbara Poblete
Publication date 2021
Abstract Hate speech is an important problem that is seriously
affecting
the dynamics and usefulness of online social communities. Large scale social
platforms are currently investing important resources into automatically
detecting and classifying hateful content, without much success. On the
other hand, the results reported by state-of-the-art systems indicate that
supervised approaches achieve almost perfect performance but only within
specific datasets, most of them in English language. In this work, we
analyze this apparent contradiction between existing literature and actual
applications. We study closely the experimental methodology used in prior
work and their generalizability to other datasets. Our findings evidence
methodological issues, as well as an important dataset bias. As a
consequence, performance claims of the current state-of-the-art have become
significantly overestimated. The problems that we have found are mostly
related to data overfitting and sampling issues. We discuss the implications
for current research and re-conduct experiments to give a more accurate
picture of the current state-of-the art methods. Moreover, we design some
baseline approaches to perform cross-lingual experiments, using English and
Spanish datasets.
Volume 105
Journal name Information Systems
Publisher Elsevier Science (Amsterdam, The Netherlands)
Reference URL View reference page