Publications

View publication

Title Malicious Domain Names Detection with DeepDGA, a Hybrid Character and Word Embeddings Deep Learning Architecture
Authors Lucas Torrealba, Pedro Casas-Hernandez, Diego García, Javier Bustos-Jiménez, Ivana Bachmann
Publication date 2025
Abstract The rapid expansion of the Internet has enabled
cybercriminal
operations at unprecedented scale. A recurring tactic is the use of
algorithmically generated domains (AGDs) created by domain generation
algorithms (DGAs) to orchestrate botnet command-and-control, host phishing
content, and distribute malware. Traditional defenses such as blocklists and
heuristic rules are brittle against new domains and evolving attacker
strategies. We present DeepDGA, a hybrid deep learning architecture that
fuses character-level and word-level representations to detect both
pseudo-random and dictionary-based DGAs. Character-level embeddings
processed by a BiLSTM capture subword patterns and entropy; word-level
embeddings derived from a dom2words tokenization and Word2Vec capture
linguistic regularities exploited by dictionary-based DGAs. Evaluations on a
public benchmark with more than 670,000 domains, including 25 DGA families
and benign top-popular domains, demonstrate the superiority of DeepDGA. The
model achieves precision and recall above 0.97 for dictionary-based DGAs,
and even higher (above 0.98) for pseudo-random DGAs, consistently
outperforming state-of-the-art methods across multiple metrics. DeepDGA's
effectiveness, particularly in detecting the more challenging
dictionary-based DGAs, highlights the benefit of combining diverse embedding
strategies into the same deep learning architecture
Pages 1-6
Conference name International Conference on Network and Service Management
Publisher Austrian Institute of Technology
Reference URL View reference page