Publications

Stats

View publication

Title Multilingual Resources for Offensive Language Detection
Authors Ayme Arango, Jorge Pérez, Bárbara Poblete, Valentina Proust, Magdalena Saldaña
Publication date 2022
Abstract Most of the published approaches and resources for
offensive
language and hate speech detection are tailored for the English language. In
consequence, cross-lingual and cross-cultural perspectives lack some
essential resources. The lack of diversity of the datasets in Spanish is
notable. Variations throughout Spanish-speaking countries make existing
datasets not enough to encompass the task in the different Spanish variants.
We manually annotated 9834 tweets from Chile to enrich the existing Spanish
resources with different words and new targets of hate that have not been
considered in previous studies. We conducted several cross-dataset
evaluation experiments of the models published in the literature using our
Chilean dataset and two others in English and Spanish. We propose a
comparative framework for quickly conducting comparative experiments using
different previously published models. In addition, we set up a Codalab
competition for further comparison of new models in a standard scenario,
that is, data partitions and evaluation metrics. All resources can be
accessed through a centralized repository for researchers to get a complete
picture of the progress on the multilingual hate speech and offensive
language detection task.
Downloaded 3 times
Pages 122-130
Conference name Workshop on Online Abuse and Harms
Publisher Association for Computational Linguistic
PDF View PDF
Reference URL View reference page