View publication
| Title | Malicious Domain Names Detection with DeepDGA, a Hybrid Character and Word Embeddings Deep Learning Architecture |
| Authors | Lucas Torrealba, Pedro Casas-Hernandez, Diego García, Javier Bustos-Jiménez, Ivana Bachmann |
| Publication date | 2025 |
| Abstract | The rapid expansion of the Internet has enabled cybercriminal operations at unprecedented scale. A recurring tactic is the use of algorithmically generated domains (AGDs) created by domain generation algorithms (DGAs) to orchestrate botnet command-and-control, host phishing content, and distribute malware. Traditional defenses such as blocklists and heuristic rules are brittle against new domains and evolving attacker strategies. We present DeepDGA, a hybrid deep learning architecture that fuses character-level and word-level representations to detect both pseudo-random and dictionary-based DGAs. Character-level embeddings processed by a BiLSTM capture subword patterns and entropy; word-level embeddings derived from a dom2words tokenization and Word2Vec capture linguistic regularities exploited by dictionary-based DGAs. Evaluations on a public benchmark with more than 670,000 domains, including 25 DGA families and benign top-popular domains, demonstrate the superiority of DeepDGA. The model achieves precision and recall above 0.97 for dictionary-based DGAs, and even higher (above 0.98) for pseudo-random DGAs, consistently outperforming state-of-the-art methods across multiple metrics. DeepDGA's effectiveness, particularly in detecting the more challenging dictionary-based DGAs, highlights the benefit of combining diverse embedding strategies into the same deep learning architecture |
| Pages | 1-6 |
| Conference name | International Conference on Network and Service Management |
| Publisher | Austrian Institute of Technology |
| Reference URL |
|

