Multimodal RGB–Thermal Deep Learning for Red Palm Weevil Damage Classification in Date Palms with a Tree-Level Evaluation of Six Backbone Architectures

Jumana Mohamed Ahmed; Mohammed Melhi; A. A. Somaie

doi:10.66279/cv4ss337

Authors

Jumana Mohamed Ahmed Beni-Suef University Author

Competing Interests

The author declares that they have no competing interests.
Mohammed Melhi University of Bradford Author

Competing Interests

The author declares that they have no competing interests.
AA Somaie University of Calgary , October University of Modern Sciences and Arts Author

Competing Interests

The author declares that they have no competing interests

DOI:

https://doi.org/10.66279/cv4ss337

Keywords:

Multimodal Fusion, Deep Learning, Red Palm Weevil, Damage Classification, Ensemble Learning

Abstract

The red palm weevil (Rhynchophorus ferrugineus) is the most destructive pest of date palm (Phoenix dactylifera L.), feeding inside the trunk and destroying vascular tissue well before any external symptom becomes visible. Field surveys, acoustic sensing, and single-modality imaging each address part of this early-detection problem but none jointly exploits the complementary visual and thermal symptoms that a weevil infestation produces. This study introduces a dual-branch deep learning framework that fuses paired RGB and thermal photographs of individual date palms for four-class health classification (non-infected, infected, badly damaged, dead). Six convolutional and transformer backbones, namely ResNet-50, EfficientNet-B0, ConvNeXt-Tiny, Xception, ViT-Small, and ConvNeXt V2-Tiny, are trained under one identical protocol that combines a multi-scale feature-pyramid fusion neck, thermal-guided spatial attention, efficient channel attention, class-balanced focal loss, prototype-based contrastive regularization, curriculum-scheduled mixup and CutMix, and a stochastic weight averaging tail phase. Evaluation uses a leakage-free, tree-level split of 179 field-surveyed palms (125 train, 27 validation, 27 test trees) with 95% bootstrap confidence intervals and paired McNemar and DeLong significance testing. The strongest individual backbone, ConvNeXt-Tiny, reaches 74.6% test accuracy and a macro-F1 of 0.764; a calibration-weighted ensemble of all six backbones reaches 76.2% accuracy with a macro-F1 of 0.756 (95% bootstrap interval, 0.65 to 0.86 for accuracy). Ensembling yields a statistically significant gain over the two weakest backbones but not over the stronger individual models at this sample size, indicating that multimodal fusion and ensembling provide measurable, architecture-dependent gains that remain modest on a dataset of this scale.

Downloads

Download data is not yet available.

Author Biographies

Jumana Mohamed Ahmed, Beni-Suef University

Computer Science and Artificial Intelligence, Beni-Suef University, Beni-Suef City, Egypt
Mohammed Melhi, University of Bradford

PhD, University of Bradford, UK, Chief Executive Officer CEO of Rushd AI Company and Amatrix AI Company, Riyadh, KSA.
AA Somaie, University of Calgary, October University of Modern Sciences and Arts

PhD, University of Bradford, UK, PDF Post-Doctoral Fellow Research Associate, University of Calgary, Canada, Software Engineering Program SE, Faculty of Computer Science, October University for Modern Sciences & Arts MSA, 6 October, Giza, Egypt.

References

[1] W. Wakil, J. R. Faleiro, and T. A. Miller, “Sustainable pest management in date palm: current status and emerging challenges,” tech. rep., Springer, 2015. DOI: https://doi.org/10.1007/978-3-319-24397-9

[2] H. Naveed, V. Andoh, W. Islam, L. Chen, and K. Chen, “Sustainable pest management in date palm ecosystems: Unveiling the ecological dynamics of red palm weevil (coleoptera: Curculionidae) infestations,” Insects, vol. 14, no. 11, p. 859, 2023. DOI: https://doi.org/10.3390/insects14110859

[3] W. Boulila, A. Alzahem, A. Koubaa, B. Benjdira, and A. Ammar, “Early detection of red palm weevil infestations using deep learning classification of acoustic signals,” Computers and Electronics in Agriculture, vol. 212, p. 108154, 2023. DOI: https://doi.org/10.1016/j.compag.2023.108154

[4] I. Ashry, B. Wang, Y. Mao, M. Sait, Y. Guo, Y. Al-Fehaid, A. Al-Shawaf, T. K. Ng, and B. S. Ooi, “Cnn–aided optical fiber distributed acoustic sensing for early detection of red palm weevil: A field experiment,” Sensors, vol. 22, no. 17, p. 6491, 2022. DOI: https://doi.org/10.3390/s22176491

[5] S. Delalieux, T. Hardy, M. Ferry, S. Gomez, L. Kooistra, M. Culman, and L. Tits, “Red palm weevil detection in date palm using temporal uav imagery,” Remote Sensing, vol. 15, no. 5, p. 1380, 2023. DOI: https://doi.org/10.3390/rs15051380

[6] D. Kagan, G. F. Alpert, and M. Fire, “Automatic large scale detection of red palm weevil infestation using street view images,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 182, pp. 122–133, 2021. DOI: https://doi.org/10.1016/j.isprsjprs.2021.10.004

[7] M. Ashraf, M. Z. Aslam, N. Saeed, and S. J. Hussain, “Palmnext: a convnext-based deep learning model for pest detection in date palm leaves,” Frontiers in Plant Science, vol. 16, p. 1738129, 2026. DOI: https://doi.org/10.3389/fpls.2025.1738129

[8] G. I. Sayed, S. Ibrahim, and A. E. Hassanien, “Early detection of red palm weevil in agricultural environment using deep learning,” Optical Memory and Neural Networks, vol. 34, no. 1, pp. 63–76, 2025. DOI: https://doi.org/10.3103/S1060992X24700899

[9] M. A. Arasi, L. Almuqren, I. Issaoui, N. S. Almalki, A. Mahmud, and M. Assiri, “Enhancing red palm weevil detection using bird swarm algorithm with deep learning model,” IEEE Access, vol. 12, pp. 1542–1551, 2023. DOI: https://doi.org/10.1109/ACCESS.2023.3348412

[10] B. Martin, S. Saranya, and P. J. Chowdary, “Smart iot thermal imaging approach for early identification of red palm weevil (rpw) infestation on palms,” Scientific Reports, vol. 16, no. 1, p. 5392, 2026. DOI: https://doi.org/10.1038/s41598-025-32783-4

[11] A. Nadeem, M. Ashraf, A. Mehmood, K. Rizwan, and M. S. Siddiqui, “Dataset of date palm tree (phoenix dactyliferal.) thermal images and their classification based on red palm weevil (rhynchophorus ferrugineus) infestation,” Frontiers in Agronomy, vol. 7, p. 1604188, 2025. DOI: https://doi.org/10.3389/fagro.2025.1604188

[12] Y. Cui, M. Jia, T.-Y. Lin, Y. Song, and S. Belongie, “Class-balanced loss based on effective number of samples,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9268–9277, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00949

[13] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, pp. 2980–2988, 2017. DOI: https://doi.org/10.1109/ICCV.2017.324

[14] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “mixup: Beyond empirical risk minimization,” arXiv preprint arXiv:1710.09412, 2017.

[15] S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y. Yoo, “Cutmix: Regularization strategy to train strong classifiers with localizable features,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 6023–6032, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00612

[16] Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, and Q. Hu, “Eca-net: Efficient channel attention for deep convolutional neural networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11534–11542, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.01155

[17] P. Izmailov, D. Podoprikhin, T. Garipov, D. Vetrov, and A. G. Wilson, “Averaging weights leads to wider optima and better generalization,” arXiv preprint arXiv:1803.05407, 2018.

[18] L. N. Smith and N. Topin, “Super-convergence: Very fast training of neural networks using large learning rates,” in Artificial intelligence and machine learning for multi-domain operations applications, vol. 11006, pp. 369–386, SPIE, 2019. DOI: https://doi.org/10.1117/12.2520589

[19] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.

[20] J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few-shot learning,” Advances in neural information processing systems, vol. 30, 2017.

[21] P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, and D. Krishnan, “Supervised contrastive learning,” Advances in neural information processing systems, vol. 33, pp. 18661–18673, 2020.

[22] A. K. Menon, S. Jayasumana, A. S. Rawat, H. Jain, A. Veit, and S. Kumar, “Long-tail learning via logit adjustment,” arXiv preprint arXiv:2007.07314, 2020.

[23] D. Hendrycks, N. Mu, E. D. Cubuk, B. Zoph, J. Gilmer, and B. Lakshminarayanan, “Augmix: A simple data processing method to improve robustness and uncertainty,” arXiv preprint arXiv:1912.02781, 2019.

[24] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE international conference on computer vision, pp. 618–626, 2017. DOI: https://doi.org/10.1109/ICCV.2017.74

[25] A. Chattopadhay, A. Sarkar, P. Howlader, and V. N. Balasubramanian, “Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks,” in 2018 IEEE winter conference on applications of computer vision (WACV), pp. 839–847, IEEE, 2018. DOI: https://doi.org/10.1109/WACV.2018.00097

[26] M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,” in International conference on machine learning, pp. 3319–3328, PMLR, 2017.

[27] E. R. DeLong, D. M. DeLong, and D. L. Clarke-Pearson, “Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach,” Biometrics, pp. 837–845, 1988. DOI: https://doi.org/10.2307/2531595

[28] Q. McNemar, “Note on the sampling error of the difference between correlated proportions or percentages,” Psychometrika, vol. 12, no. 2, pp. 153–157, 1947. DOI: https://doi.org/10.1007/BF02295996

[29] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016. DOI: https://doi.org/10.1109/CVPR.2016.90

[30] M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International conference on machine learning, pp. 6105–6114, PMLR, 2019.

[31] Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11976–11986, 2022. DOI: https://doi.org/10.1109/CVPR52688.2022.01167

[32] F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1251–1258, 2017. DOI: https://doi.org/10.1109/CVPR.2017.195

[33] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.

[34] S. Woo, S. Debnath, R. Hu, X. Chen, Z. Liu, I. S. Kweon, and S. Xie, “Convnext v2: Co-designing and scaling convnets with masked autoencoders,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16133–16142, 2023. DOI: https://doi.org/10.1109/CVPR52729.2023.01548

Multimodal RGB–Thermal Deep Learning for Red Palm Weevil Damage Classification in Date Palms with a Tree-Level Evaluation of Six Backbone Architectures

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biographies

References

Downloads

Published

Data Availability Statement

Issue

Section

Categories

License

How to Cite

Most read articles by the same author(s)

Similar Articles

Make a Submission

Indexing

Announcements

Information

Share

Latest publications

Browse

Browse Articles

Keywords

Journal Metrics

FIRST DECISION

SUBMISSIONS RECEIVED

SUBMISSIONS ACCEPTED

ACCEPTANCE RATE

FREQUENCY

ACCESS TYPE

LICENSE TYPE

EDITORIAL PROCESS

Visitors

Publisher