
Slide

Centre Interdisciplinaire
de Recherche et d’Innovation
en Cybersécurité et Société
de Recherche et d’Innovation
en Cybersécurité et Société
1.
Ameyoud, S. Mohamed; Allili, M. Saïd
Multi-modal malware classification with hierarchical consistency and saliency-constrained adversarial training Article de journal
Dans: Journal of Information Security and Applications, vol. 99, 2026, ISSN: 22142134 (ISSN).
Résumé | Liens | BibTeX | Étiquettes: Adversarial training, Capability of detection, Classification (of information), Convolution, convolutional neural network, Convolutional neural networks, Detection system, Hierarchical consistency, Hierarchical systems, Malware, Malware classification, Malware classifications, Malware families, Malwares, Multi-modal, Multi-modal learning, Semantics, Vision transformer, Vision transformers
@article{mohamed_ameyoud_multi-modal_2026,
title = {Multi-modal malware classification with hierarchical consistency and saliency-constrained adversarial training},
author = {S. Mohamed Ameyoud and M. Saïd Allili},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105031186108&doi=10.1016%2Fj.jisa.2026.104429&partnerID=40&md5=2425da4ab40f9043ba4e67d223a1bdd9},
doi = {10.1016/j.jisa.2026.104429},
issn = {22142134 (ISSN)},
year = {2026},
date = {2026-01-01},
journal = {Journal of Information Security and Applications},
volume = {99},
abstract = {The increasing complexity of malware, including polymorphic, obfuscated, and adversarial variants, continues to outpace the capabilities of detection systems. Here, we introduce a robust multi-modal hierarchical framework that jointly leverages visual and code-level semantics to enhance malware family and type classification. Our architecture fuses convolutional and transformer-based encoders to extract complementary representations from raw malware binaries and decompiled control-flow functions, enabling a rich, cross-modal understanding of malicious behavior. The classification pipeline follows a two-stage hierarchical protocol, where the predicted malware type informs the family-level classification. This enforces ontological consistency between type and family prediction levels. To further bolster robustness against adversarial and obfuscated malware, we integrate a novel adversarial training strategy that generates plausible perturbations guided by attention distributions. Evaluation on multiple large-scale benchmarks including BODMAS, Malimg, Microsoft BIG 2015, and a curated set of from MalwareBazaar, demonstrate that our framework consistently outperforms state-of-the-art baselines, including ResNet, Swin Transformer, and MalBERTv2, across both malware type and family prediction tasks. Notably, our model exhibits outstanding generalization to unpacked, obfuscated, and previously unseen samples, with minimal performance degradation. It achieves accuracy gains of +3-6% over leading methods and exhibits superior resilience under adversarial threat models. These results highlight the effectiveness of hierarchical conditioning, adversarial robustness, and multi-modal fusion in tackling the evolving landscape of malware. The proposed framework thus offers a scalable and generalizable approach for next-generation malware classification in real-world cybersecurity environments. © 2026 Elsevier Ltd.},
keywords = {Adversarial training, Capability of detection, Classification (of information), Convolution, convolutional neural network, Convolutional neural networks, Detection system, Hierarchical consistency, Hierarchical systems, Malware, Malware classification, Malware classifications, Malware families, Malwares, Multi-modal, Multi-modal learning, Semantics, Vision transformer, Vision transformers},
pubstate = {published},
tppubtype = {article}
}
The increasing complexity of malware, including polymorphic, obfuscated, and adversarial variants, continues to outpace the capabilities of detection systems. Here, we introduce a robust multi-modal hierarchical framework that jointly leverages visual and code-level semantics to enhance malware family and type classification. Our architecture fuses convolutional and transformer-based encoders to extract complementary representations from raw malware binaries and decompiled control-flow functions, enabling a rich, cross-modal understanding of malicious behavior. The classification pipeline follows a two-stage hierarchical protocol, where the predicted malware type informs the family-level classification. This enforces ontological consistency between type and family prediction levels. To further bolster robustness against adversarial and obfuscated malware, we integrate a novel adversarial training strategy that generates plausible perturbations guided by attention distributions. Evaluation on multiple large-scale benchmarks including BODMAS, Malimg, Microsoft BIG 2015, and a curated set of from MalwareBazaar, demonstrate that our framework consistently outperforms state-of-the-art baselines, including ResNet, Swin Transformer, and MalBERTv2, across both malware type and family prediction tasks. Notably, our model exhibits outstanding generalization to unpacked, obfuscated, and previously unseen samples, with minimal performance degradation. It achieves accuracy gains of +3-6% over leading methods and exhibits superior resilience under adversarial threat models. These results highlight the effectiveness of hierarchical conditioning, adversarial robustness, and multi-modal fusion in tackling the evolving landscape of malware. The proposed framework thus offers a scalable and generalizable approach for next-generation malware classification in real-world cybersecurity environments. © 2026 Elsevier Ltd.



