Multimodal RGB–Thermal Deep Learning for Red Palm Weevil Damage Classification in Date Palms with a Tree-Level Evaluation of Six Backbone Architectures
DOI:
https://doi.org/10.66279/cv4ss337Keywords:
Multimodal Fusion, Deep Learning, Red Palm Weevil, Damage Classification, Ensemble LearningAbstract
The red palm weevil (Rhynchophorus ferrugineus) is the most destructive pest of date palm (Phoenix dactylifera L.), feeding inside the trunk and destroying vascular tissue well before any external symptom becomes visible. Field surveys, acoustic sensing, and single-modality imaging each address part of this early-detection problem but none jointly exploits the complementary visual and thermal symptoms that a weevil infestation produces. This study introduces a dual-branch deep learning framework that fuses paired RGB and thermal photographs of individual date palms for four-class health classification (non-infected, infected, badly damaged, dead). Six convolutional and transformer backbones, namely ResNet-50, EfficientNet-B0, ConvNeXt-Tiny, Xception, ViT-Small, and ConvNeXt V2-Tiny, are trained under one identical protocol that combines a multi-scale feature-pyramid fusion neck, thermal-guided spatial attention, efficient channel attention, class-balanced focal loss, prototype-based contrastive regularization, curriculum-scheduled mixup and CutMix, and a stochastic weight averaging tail phase. Evaluation uses a leakage-free, tree-level split of 179 field-surveyed palms (125 train, 27 validation, 27 test trees) with 95% bootstrap confidence intervals and paired McNemar and DeLong significance testing. The strongest individual backbone, ConvNeXt-Tiny, reaches 74.6% test accuracy and a macro-F1 of 0.764; a calibration-weighted ensemble of all six backbones reaches 76.2% accuracy with a macro-F1 of 0.756 (95% bootstrap interval, 0.65 to 0.86 for accuracy). Ensembling yields a statistically significant gain over the two weakest backbones but not over the stronger individual models at this sample size, indicating that multimodal fusion and ensembling provide measurable, architecture-dependent gains that remain modest on a dataset of this scale.
Downloads
References
[1] W. Wakil, J. R. Faleiro, and T. A. Miller, “Sustainable pest management in date palm: current status and emerging challenges,” tech. rep., Springer, 2015. DOI: https://doi.org/10.1007/978-3-319-24397-9
[2] H. Naveed, V. Andoh, W. Islam, L. Chen, and K. Chen, “Sustainable pest management in date palm ecosystems: Unveiling the ecological dynamics of red palm weevil (coleoptera: Curculionidae) infestations,” Insects, vol. 14, no. 11, p. 859, 2023. DOI: https://doi.org/10.3390/insects14110859
[3] W. Boulila, A. Alzahem, A. Koubaa, B. Benjdira, and A. Ammar, “Early detection of red palm weevil infestations using deep learning classification of acoustic signals,” Computers and Electronics in Agriculture, vol. 212, p. 108154, 2023. DOI: https://doi.org/10.1016/j.compag.2023.108154
[4] I. Ashry, B. Wang, Y. Mao, M. Sait, Y. Guo, Y. Al-Fehaid, A. Al-Shawaf, T. K. Ng, and B. S. Ooi, “Cnn–aided optical fiber distributed acoustic sensing for early detection of red palm weevil: A field experiment,” Sensors, vol. 22, no. 17, p. 6491, 2022. DOI: https://doi.org/10.3390/s22176491
[5] S. Delalieux, T. Hardy, M. Ferry, S. Gomez, L. Kooistra, M. Culman, and L. Tits, “Red palm weevil detection in date palm using temporal uav imagery,” Remote Sensing, vol. 15, no. 5, p. 1380, 2023. DOI: https://doi.org/10.3390/rs15051380
[6] D. Kagan, G. F. Alpert, and M. Fire, “Automatic large scale detection of red palm weevil infestation using street view images,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 182, pp. 122–133, 2021. DOI: https://doi.org/10.1016/j.isprsjprs.2021.10.004
[7] M. Ashraf, M. Z. Aslam, N. Saeed, and S. J. Hussain, “Palmnext: a convnext-based deep learning model for pest detection in date palm leaves,” Frontiers in Plant Science, vol. 16, p. 1738129, 2026. DOI: https://doi.org/10.3389/fpls.2025.1738129
[8] G. I. Sayed, S. Ibrahim, and A. E. Hassanien, “Early detection of red palm weevil in agricultural environment using deep learning,” Optical Memory and Neural Networks, vol. 34, no. 1, pp. 63–76, 2025. DOI: https://doi.org/10.3103/S1060992X24700899
[9] M. A. Arasi, L. Almuqren, I. Issaoui, N. S. Almalki, A. Mahmud, and M. Assiri, “Enhancing red palm weevil detection using bird swarm algorithm with deep learning model,” IEEE Access, vol. 12, pp. 1542–1551, 2023. DOI: https://doi.org/10.1109/ACCESS.2023.3348412
[10] B. Martin, S. Saranya, and P. J. Chowdary, “Smart iot thermal imaging approach for early identification of red palm weevil (rpw) infestation on palms,” Scientific Reports, vol. 16, no. 1, p. 5392, 2026. DOI: https://doi.org/10.1038/s41598-025-32783-4
[11] A. Nadeem, M. Ashraf, A. Mehmood, K. Rizwan, and M. S. Siddiqui, “Dataset of date palm tree (phoenix dactyliferal.) thermal images and their classification based on red palm weevil (rhynchophorus ferrugineus) infestation,” Frontiers in Agronomy, vol. 7, p. 1604188, 2025. DOI: https://doi.org/10.3389/fagro.2025.1604188
[12] Y. Cui, M. Jia, T.-Y. Lin, Y. Song, and S. Belongie, “Class-balanced loss based on effective number of samples,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9268–9277, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00949
[13] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, pp. 2980–2988, 2017. DOI: https://doi.org/10.1109/ICCV.2017.324
[14] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “mixup: Beyond empirical risk minimization,” arXiv preprint arXiv:1710.09412, 2017.
[15] S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y. Yoo, “Cutmix: Regularization strategy to train strong classifiers with localizable features,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 6023–6032, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00612
[16] Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, and Q. Hu, “Eca-net: Efficient channel attention for deep convolutional neural networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11534–11542, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.01155
[17] P. Izmailov, D. Podoprikhin, T. Garipov, D. Vetrov, and A. G. Wilson, “Averaging weights leads to wider optima and better generalization,” arXiv preprint arXiv:1803.05407, 2018.
[18] L. N. Smith and N. Topin, “Super-convergence: Very fast training of neural networks using large learning rates,” in Artificial intelligence and machine learning for multi-domain operations applications, vol. 11006, pp. 369–386, SPIE, 2019. DOI: https://doi.org/10.1117/12.2520589
[19] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.
[20] J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few-shot learning,” Advances in neural information processing systems, vol. 30, 2017.
[21] P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, and D. Krishnan, “Supervised contrastive learning,” Advances in neural information processing systems, vol. 33, pp. 18661–18673, 2020.
[22] A. K. Menon, S. Jayasumana, A. S. Rawat, H. Jain, A. Veit, and S. Kumar, “Long-tail learning via logit adjustment,” arXiv preprint arXiv:2007.07314, 2020.
[23] D. Hendrycks, N. Mu, E. D. Cubuk, B. Zoph, J. Gilmer, and B. Lakshminarayanan, “Augmix: A simple data processing method to improve robustness and uncertainty,” arXiv preprint arXiv:1912.02781, 2019.
[24] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE international conference on computer vision, pp. 618–626, 2017. DOI: https://doi.org/10.1109/ICCV.2017.74
[25] A. Chattopadhay, A. Sarkar, P. Howlader, and V. N. Balasubramanian, “Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks,” in 2018 IEEE winter conference on applications of computer vision (WACV), pp. 839–847, IEEE, 2018. DOI: https://doi.org/10.1109/WACV.2018.00097
[26] M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,” in International conference on machine learning, pp. 3319–3328, PMLR, 2017.
[27] E. R. DeLong, D. M. DeLong, and D. L. Clarke-Pearson, “Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach,” Biometrics, pp. 837–845, 1988. DOI: https://doi.org/10.2307/2531595
[28] Q. McNemar, “Note on the sampling error of the difference between correlated proportions or percentages,” Psychometrika, vol. 12, no. 2, pp. 153–157, 1947. DOI: https://doi.org/10.1007/BF02295996
[29] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016. DOI: https://doi.org/10.1109/CVPR.2016.90
[30] M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International conference on machine learning, pp. 6105–6114, PMLR, 2019.
[31] Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11976–11986, 2022. DOI: https://doi.org/10.1109/CVPR52688.2022.01167
[32] F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1251–1258, 2017. DOI: https://doi.org/10.1109/CVPR.2017.195
[33] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
[34] S. Woo, S. Debnath, R. Hu, X. Chen, Z. Liu, I. S. Kweon, and S. Xie, “Convnext v2: Co-designing and scaling convnets with masked autoencoders,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16133–16142, 2023. DOI: https://doi.org/10.1109/CVPR52729.2023.01548
Downloads
Published
Data Availability Statement
DATA AVAILABILITY
The datasets generated during the current study are available at https://figshare.com/articles/dataset/A_
Dataset_of_Date_Palm_Trees_Thermal_and_RGB_Images_for_Pest_Management_/25974295?file=53580344
Issue
Section
Categories
License
Copyright (c) 2026 Computational Discovery and Intelligent Systems (CDIS)

This work is licensed under a Creative Commons Attribution 4.0 International License.
Computational Discovery and Intelligent Systems (CDIS) content is published under a Creative Commons Attribution License (CCBY). This means that content is freely available to all readers upon publication, and content is published as soon as production is complete.
Computational Discovery and Intelligent Systems (CDIS) seeks to publish the most influential papers that will significantly advance scientific understanding. Selected articles must present new and widely significant data, syntheses, or concepts. They should merit recognition by the wider scientific community and the general public through publication in a reputable scientific journal.



