View publication

Title Divide and Conquer: An Extreme Multi-Label Classification Approach for Coding Diseases and Procedures in Spanish
Authors Jose Barros, Matías Rojas, Jocelyn Dunstan, Andrés Abeliuk
Publication date 2022
Abstract Clinical coding is the task of transforming medical documents
into structured codes following a standard ontology. Since these
terminologies are composed of hundreds of codes, this problem can be
considered an Extreme Multi-label Classification task. This paper proposes a
novel neural network-based architecture for clinical coding. First, we take
full advantage of the hierarchical nature of ontologies to create clusters
based on semantic relations. Then, we use a Matcher module to assign the
probability of documents belonging to each cluster. Finally, the Ranker
calculates the probability of each code considering only the documents in
the cluster. This division allows a fine-grained differentiation within the
cluster, which cannot be addressed using a single classifier. In addition,
since most of the previous work has focused on solving this task in English,
we conducted our experiments on three clinical coding corpora in Spanish.
The experimental results demonstrate the effectiveness of our model,
achieving state-of-the-art results on two of the three datasets.
Specifically, we outperformed previous models on two subtasks of the CodiEsp
shared task: CodiEsp-D (diseases) and CodiEsp-P (procedures). Automatic
coding can profoundly impact healthcare by structuring critical information
written in free text in electronic health records.
Pages 138-147
Conference name International Workshop on Health Text Mining and Information Analysis
Publisher Association for Computational Linguistic
Reference URL View reference page