Publications

Stats

View publication

Title NLP Modeling Recommendations for Restricted Data Availability in Clinical Settings
Authors Fabian Villena, Felipe Bravo-Marquez, Jocelyn Dunstan
Publication date March 2025
Abstract Background.
Clinical decision-making in healthcare often relies on unstructured text
data, which can be challenging to analyze using traditional methods. Natural
Language Processing (NLP) has emerged as a promising solution, but its
application in clinical settings is hindered by restricted data availability
and the need for domain-specific knowledge.
\n\n
Methods.
We conducted an experimental analysis to evaluate the performance of various
NLP modeling paradigms on multiple clinical NLP tasks in Spanish. These
tasks included referral prioritization and referral specialty
classification. We simulated three clinical settings with varying levels of
data availability and evaluated the performance of four foundation models.
\n\n
Results.
Clinical-specific pre-trained language models (PLMs) achieved the highest
performance across tasks. For referral prioritization, Clinical PLMs
attained an 88.85 % macro F1 score when fine-tuned. In referral specialty
classification, the same models achieved a 53.79 % macro F1 score,
surpassing domain-agnostic models. Continuing pre-training with
environment-specific data improved model performance, but the gains were
marginal compared to the computational resources required. Few-shot learning
with large language models (LLMs) demonstrated lower performance but showed
potential in data-scarce scenarios.
\n\n
Conclusions.
Our study provides evidence-based recommendations for clinical NLP
practitioners on selecting modeling paradigms based on data availability. We
highlight the importance of considering data availability, task complexity,
and institutional maturity when designing and training clinical NLP models.
Our findings can inform the development of effective clinical NLP solutions
in real-world settings.
Pages article 116
Volume 25
Journal name BMC Medical Informatics and Decision Making
Publisher BioMed Central (London, UK)
Reference URL View reference page