Multimodal RGB–Thermal Deep Learning for Red Palm Weevil Damage Classification in Date Palms with a Tree-Level Evaluation of Six Backbone Architectures

Authors

  • Jumana Mohamed Ahmed Beni-Suef University image/svg+xml Author
    Competing Interests

    The author declares that they have no competing interests.

  • Mohammed Melhi University of Bradford image/svg+xml Author
    Competing Interests

    The author declares that they have no competing interests.

  • AA Somaie University of Calgary image/svg+xml , October University of Modern Sciences and Arts image/svg+xml Author
    Competing Interests

    The author declares that they have no competing interests

DOI:

https://doi.org/10.66279/cv4ss337

Keywords:

Multimodal Fusion, Deep Learning, Red Palm Weevil, Damage Classification, Ensemble Learning

Abstract

The red palm weevil (Rhynchophorus ferrugineus) is the most destructive pest of date palm (Phoenix dactylifera L.), feeding inside the trunk and destroying vascular tissue well before any external symptom becomes visible. Field surveys, acoustic sensing, and single-modality imaging each address part of this early-detection problem but none jointly exploits the complementary visual and thermal symptoms that a weevil infestation produces. This study introduces a dual-branch deep learning framework that fuses paired RGB and thermal photographs of individual date palms for four-class health classification (non-infected, infected, badly damaged, dead). Six convolutional and transformer backbones, namely ResNet-50, EfficientNet-B0, ConvNeXt-Tiny, Xception, ViT-Small, and ConvNeXt V2-Tiny, are trained under one identical protocol that combines a multi-scale feature-pyramid fusion neck, thermal-guided spatial attention, efficient channel attention, class-balanced focal loss, prototype-based contrastive regularization, curriculum-scheduled mixup and CutMix, and a stochastic weight averaging tail phase. Evaluation uses a leakage-free, tree-level split of 179 field-surveyed palms (125 train, 27 validation, 27 test trees) with 95% bootstrap confidence intervals and paired McNemar and DeLong significance testing. The strongest individual backbone, ConvNeXt-Tiny, reaches 74.6% test accuracy and a macro-F1 of 0.764; a calibration-weighted ensemble of all six backbones reaches 76.2% accuracy with a macro-F1 of 0.756 (95% bootstrap interval, 0.65 to 0.86 for accuracy). Ensembling yields a statistically significant gain over the two weakest backbones but not over the stronger individual models at this sample size, indicating that multimodal fusion and ensembling provide measurable, architecture-dependent gains that remain modest on a dataset of this scale.

Downloads

Download data is not yet available.

Author Biographies

  • Jumana Mohamed Ahmed, Beni-Suef University

    Computer Science and Artificial Intelligence, Beni-Suef University, Beni-Suef City, Egypt

  • Mohammed Melhi, University of Bradford

    PhD, University of Bradford, UK, Chief Executive Officer CEO of Rushd AI Company and Amatrix AI Company, Riyadh, KSA.

  • AA Somaie, University of Calgary, October University of Modern Sciences and Arts

    PhD, University of Bradford, UK, PDF Post-Doctoral Fellow Research Associate, University of Calgary, Canada, Software Engineering Program SE, Faculty of Computer Science, October University for Modern Sciences & Arts MSA, 6 October, Giza, Egypt.

References

[1] W. Wakil, J. R. Faleiro, and T. A. Miller, “Sustainable pest management in date palm: current status and emerging challenges,” tech. rep., Springer, 2015. DOI: https://doi.org/10.1007/978-3-319-24397-9

[2] H. Naveed, V. Andoh, W. Islam, L. Chen, and K. Chen, “Sustainable pest management in date palm ecosystems: Unveiling the ecological dynamics of red palm weevil (coleoptera: Curculionidae) infestations,” Insects, vol. 14, no. 11, p. 859, 2023. DOI: https://doi.org/10.3390/insects14110859

[3] W. Boulila, A. Alzahem, A. Koubaa, B. Benjdira, and A. Ammar, “Early detection of red palm weevil infestations using deep learning classification of acoustic signals,” Computers and Electronics in Agriculture, vol. 212, p. 108154, 2023. DOI: https://doi.org/10.1016/j.compag.2023.108154

[4] I. Ashry, B. Wang, Y. Mao, M. Sait, Y. Guo, Y. Al-Fehaid, A. Al-Shawaf, T. K. Ng, and B. S. Ooi, “Cnn–aided optical fiber distributed acoustic sensing for early detection of red palm weevil: A field experiment,” Sensors, vol. 22, no. 17, p. 6491, 2022. DOI: https://doi.org/10.3390/s22176491

[5] S. Delalieux, T. Hardy, M. Ferry, S. Gomez, L. Kooistra, M. Culman, and L. Tits, “Red palm weevil detection in date palm using temporal uav imagery,” Remote Sensing, vol. 15, no. 5, p. 1380, 2023. DOI: https://doi.org/10.3390/rs15051380

[6] D. Kagan, G. F. Alpert, and M. Fire, “Automatic large scale detection of red palm weevil infestation using street view images,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 182, pp. 122–133, 2021. DOI: https://doi.org/10.1016/j.isprsjprs.2021.10.004

[7] M. Ashraf, M. Z. Aslam, N. Saeed, and S. J. Hussain, “Palmnext: a convnext-based deep learning model for pest detection in date palm leaves,” Frontiers in Plant Science, vol. 16, p. 1738129, 2026. DOI: https://doi.org/10.3389/fpls.2025.1738129

[8] G. I. Sayed, S. Ibrahim, and A. E. Hassanien, “Early detection of red palm weevil in agricultural environment using deep learning,” Optical Memory and Neural Networks, vol. 34, no. 1, pp. 63–76, 2025. DOI: https://doi.org/10.3103/S1060992X24700899

[9] M. A. Arasi, L. Almuqren, I. Issaoui, N. S. Almalki, A. Mahmud, and M. Assiri, “Enhancing red palm weevil detection using bird swarm algorithm with deep learning model,” IEEE Access, vol. 12, pp. 1542–1551, 2023. DOI: https://doi.org/10.1109/ACCESS.2023.3348412

[10] B. Martin, S. Saranya, and P. J. Chowdary, “Smart iot thermal imaging approach for early identification of red palm weevil (rpw) infestation on palms,” Scientific Reports, vol. 16, no. 1, p. 5392, 2026. DOI: https://doi.org/10.1038/s41598-025-32783-4

[11] A. Nadeem, M. Ashraf, A. Mehmood, K. Rizwan, and M. S. Siddiqui, “Dataset of date palm tree (phoenix dactyliferal.) thermal images and their classification based on red palm weevil (rhynchophorus ferrugineus) infestation,” Frontiers in Agronomy, vol. 7, p. 1604188, 2025. DOI: https://doi.org/10.3389/fagro.2025.1604188

[12] Y. Cui, M. Jia, T.-Y. Lin, Y. Song, and S. Belongie, “Class-balanced loss based on effective number of samples,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9268–9277, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00949

[13] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, pp. 2980–2988, 2017. DOI: https://doi.org/10.1109/ICCV.2017.324

[14] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “mixup: Beyond empirical risk minimization,” arXiv preprint arXiv:1710.09412, 2017.

[15] S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y. Yoo, “Cutmix: Regularization strategy to train strong classifiers with localizable features,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 6023–6032, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00612

[16] Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, and Q. Hu, “Eca-net: Efficient channel attention for deep convolutional neural networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11534–11542, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.01155

[17] P. Izmailov, D. Podoprikhin, T. Garipov, D. Vetrov, and A. G. Wilson, “Averaging weights leads to wider optima and better generalization,” arXiv preprint arXiv:1803.05407, 2018.

[18] L. N. Smith and N. Topin, “Super-convergence: Very fast training of neural networks using large learning rates,” in Artificial intelligence and machine learning for multi-domain operations applications, vol. 11006, pp. 369–386, SPIE, 2019. DOI: https://doi.org/10.1117/12.2520589

[19] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.

[20] J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few-shot learning,” Advances in neural information processing systems, vol. 30, 2017.

[21] P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, and D. Krishnan, “Supervised contrastive learning,” Advances in neural information processing systems, vol. 33, pp. 18661–18673, 2020.

[22] A. K. Menon, S. Jayasumana, A. S. Rawat, H. Jain, A. Veit, and S. Kumar, “Long-tail learning via logit adjustment,” arXiv preprint arXiv:2007.07314, 2020.

[23] D. Hendrycks, N. Mu, E. D. Cubuk, B. Zoph, J. Gilmer, and B. Lakshminarayanan, “Augmix: A simple data processing method to improve robustness and uncertainty,” arXiv preprint arXiv:1912.02781, 2019.

[24] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE international conference on computer vision, pp. 618–626, 2017. DOI: https://doi.org/10.1109/ICCV.2017.74

[25] A. Chattopadhay, A. Sarkar, P. Howlader, and V. N. Balasubramanian, “Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks,” in 2018 IEEE winter conference on applications of computer vision (WACV), pp. 839–847, IEEE, 2018. DOI: https://doi.org/10.1109/WACV.2018.00097

[26] M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,” in International conference on machine learning, pp. 3319–3328, PMLR, 2017.

[27] E. R. DeLong, D. M. DeLong, and D. L. Clarke-Pearson, “Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach,” Biometrics, pp. 837–845, 1988. DOI: https://doi.org/10.2307/2531595

[28] Q. McNemar, “Note on the sampling error of the difference between correlated proportions or percentages,” Psychometrika, vol. 12, no. 2, pp. 153–157, 1947. DOI: https://doi.org/10.1007/BF02295996

[29] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016. DOI: https://doi.org/10.1109/CVPR.2016.90

[30] M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International conference on machine learning, pp. 6105–6114, PMLR, 2019.

[31] Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11976–11986, 2022. DOI: https://doi.org/10.1109/CVPR52688.2022.01167

[32] F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1251–1258, 2017. DOI: https://doi.org/10.1109/CVPR.2017.195

[33] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.

[34] S. Woo, S. Debnath, R. Hu, X. Chen, Z. Liu, I. S. Kweon, and S. Xie, “Convnext v2: Co-designing and scaling convnets with masked autoencoders,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16133–16142, 2023. DOI: https://doi.org/10.1109/CVPR52729.2023.01548

Downloads

Published

30-06-2026

Data Availability Statement

DATA AVAILABILITY
The datasets generated during the current study are available at https://figshare.com/articles/dataset/A_
Dataset_of_Date_Palm_Trees_Thermal_and_RGB_Images_for_Pest_Management_/25974295?file=53580344

How to Cite

Multimodal RGB–Thermal Deep Learning for Red Palm Weevil Damage Classification in Date Palms with a Tree-Level Evaluation of Six Backbone Architectures. (2026). Computational Discovery and Intelligent Systems (CDIS), 4(1), 1-19. https://doi.org/10.66279/cv4ss337

Most read articles by the same author(s)

Similar Articles

1-10 of 14

You may also start an advanced similarity search for this article.