Certainty-Aware Skin Lesion Segmentation with Post-Hoc Reliability Estimation for the Segment Anything Model

Hamdi A. Mahmoud; Osamah Ibrahim Khalaf; Ola Farid

doi:10.66279/hzkw5y24

Authors

Hamdi A. Mahmoud Beni-Suef University Author
Osamah Ibrahim Khalaf Nahrain University Author
Ola Farid Beni-Suef University Author

DOI:

https://doi.org/10.66279/hzkw5y24

Keywords:

Image Segmentation, Skin Lesion Segmentation, Pixel-wise Certainty Map, Reliability Estimation

Abstract

The Segment Anything Model (SAM) represents a major advance in zero-shot visual segmentation, yet it provides purely deterministic outputs without any measure of prediction reliability, a critical limitation for safety-conscious medical imaging applications. This paper introduces a certainty-aware segmentation framework that augments SAM-based zero-shot inference with principled, post-hoc reliability estimation. Three complementary outputs are introduced: a pixel-wise certainty map that identifies spatially localized regions of ambiguity; a global confidence score that provides a scalar measure of overall segmentation trustworthiness; and a quality-flagging mechanism that enables automated screening of unreliable predictions. The framework requires no modification to SAM's architecture and no additional training data, thereby preserving its zero-shot generalization properties. Evaluation on the ISIC 2018 Task 1 skin lesion segmentation benchmark comprising 2,594 dermoscopic images in a fully zero-shot setting yields a mean Dice Similarity Coefficient of 0.820 pm 0.095 and a mean Intersection-over-Union of 0.750 \pm 0.101. A strong positive correlation (Pearson r = 0.84, p < 0.001, n = 2,594) is observed between certainty scores and segmentation quality. High-quality segmentations (DSC> 0.80) are consistently associated with certainty scores above 80%, while low-quality predictions (DSC< 0.70) yield certainty scores below 50%. Stratified analysis confirms a mean DSC difference of over 0.25 between high- and low-certainty tiers (Wilcoxon p < 0.001, Cohen's d = 2.31). These results demonstrate that the proposed certainty metrics reliably track segmentation accuracy and provide a practical mechanism for risk-aware deployment of foundation models in clinical environments.

Downloads

Download data is not yet available.

Author Biographies

Hamdi A. Mahmoud, Beni-Suef University

Associate Professor at Computer Science Department

Faculty of Computers and Artificial Intelligence

Beni-Suef University
Ola Farid, Beni-Suef University

Teaching Assistant at Computer Science Department, Faculty of Science, Beni-Suef University

References

[1] A. Kirillov, K. He, R. Girshick, C. Rother, and P. Dollár, “Panoptic segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9404–9413, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00963

[2] M. K. Hasan, L. Dahal, P. N. Samarakoon, F. I. Tushar, and R. Marti, “Dsnet: Automatic dermoscopic skin lesion segmentation,” Computers in biology and medicine, vol. 120, p. 103738, 2020. DOI: https://doi.org/10.1016/j.compbiomed.2020.103738

[3] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, pp. 234–241, Springer, 2015. DOI: https://doi.org/10.1007/978-3-319-24574-4_28

[4] M. Z. Alom, C. Yakopcic, M. Hasan, T. M. Taha, and V. K. Asari, “Recurrent residual u-net for medical image segmentation,” Journal of medical imaging, vol. 6, no. 1, pp. 014006–014006, 2019. DOI: https://doi.org/10.1117/1.JMI.6.1.014006

[5] O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S. McDonagh, N. Y. Hammerla, B. Kainz, et al., “Attention u-net: Learning where to look for the pancreas,” arXiv preprint arXiv:1804.03999, 2018.

[6] Z. Zhou, M. M. Rahman Siddiquee, N. Tajbakhsh, and J. Liang, “Unet++: A nested u-net architecture for medical image segmentation,” in International workshop on deep learning in medical image analysis, pp. 3–11, Springer, 2018. DOI: https://doi.org/10.1007/978-3-030-00889-5_1

[7] F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen, and K. H. Maier-Hein, “nnu-net: a self-configuring method for deep learning-based biomedical image segmentation,” Nature methods, vol. 18, no. 2, pp. 203–211, 2021. DOI: https://doi.org/10.1038/s41592-020-01008-z

[8] R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill, et al., “On the opportunities and risks of foundation models,” arXiv preprint arXiv:2108.07258, 2021.

[9] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y.

Lo, et al., “Segment anything,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 4015–4026, 2023.

[10] E. Tjoa and C. Guan, “A survey on explainable artificial intelligence (xai): Toward medical xai,” IEEE transactions on neural networks and learning systems, vol. 32, no. 11, pp. 4793–4813, 2020. DOI: https://doi.org/10.1109/TNNLS.2020.3027314

[11] M. Ghassemi, L. Oakden-Rayner, and A. L. Beam, “The false hope of current approaches to explainable artificial intelligence in health care,” The lancet digital health, vol. 3, no. 11, pp. e745–e750, 2021. DOI: https://doi.org/10.1016/S2589-7500(21)00208-9

[12] A. Kendall and Y. Gal, “What uncertainties do we need in bayesian deep learning for computer vision?,” Advances in neural information processing systems, vol. 30, 2017.

[13] B. Lakshminarayanan, A. Pritzel, and C. Blundell, “Simple and scalable predictive uncertainty estimation using deep ensembles,” Advances in neural information processing systems, vol. 30, 2017.

[14] M. Abdar, F. Pourpanah, S. Hussain, D. Rezazadegan, L. Liu, M. Ghavamzadeh, P. Fieguth, X. Cao, A. Khosravi, U. R. Acharya, et al., “A review of uncertainty quantification in deep learning: Techniques, applications and challenges,” Information fusion, vol. 76, pp. 243–297, 2021. DOI: https://doi.org/10.1016/j.inffus.2021.05.008

[15] L. R. Dice, “Measures of the amount of ecologic association between species,” Ecology, vol. 26, no. 3, pp. 297–302, 1945. DOI: https://doi.org/10.2307/1932409

[16] P. Jaccard, “The distribution of the flora in the alpine zone. 1,” New phytologist, vol. 11, no. 2, pp. 37–50, 1912. DOI: https://doi.org/10.1111/j.1469-8137.1912.tb05611.x

[17] Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, “3d u-net: learning dense volumetric segmentation from sparse annotation,” in International conference on medical image computing and computer-assisted intervention, pp. 424–432, Springer, 2016. DOI: https://doi.org/10.1007/978-3-319-46723-8_49

[18] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proceedings of the European conference on computer vision (ECCV), pp. 801–818, 2018. DOI: https://doi.org/10.1007/978-3-030-01234-2_49

[19] A. Mehrtash, W. M. Wells, C. M. Tempany, P. Abolmaesumi, and T. Kapur, “Confidence calibration and predictive uncertainty estimation for deep medical image segmentation,” IEEE transactions on medical imaging, vol. 39, no. 12, pp. 3868–3878, 2020. DOI: https://doi.org/10.1109/TMI.2020.3006437

[20] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.

[21] J. Ma, Y. He, F. Li, L. Han, C. You, and B. Wang, “Segment anything in medical images,” Nature communications, vol. 15, no. 1, p. 654, 2024. DOI: https://doi.org/10.1038/s41467-024-44824-z

[22] J. Cheng, J. Ye, Z. Deng, J. Chen, T. Li, H. Wang, Y. Su, Z. Huang, J. Chen, L. Jiang, et al., “Sam-med2d,” arXiv preprint arXiv:2308.16184, 2023.

[23] N. Ravi, V. Gabeur, Y.-T. Hu, R. Hu, C. Ryali, T. Ma, H. Khedr, R. Rädle, C. Rolland, L. Gustafson, et al., “Sam 2: Segment anything in images and videos,” arXiv preprint arXiv:2408.00714, 2024.

[24] K. Zhang and D. Liu, “Customized segment anything model for medical image segmentation,” arXiv preprint arXiv:2304.13785, 2023. DOI: https://doi.org/10.2139/ssrn.4495221

[25] N. Codella, V. Rotemberg, P. Tschandl, M. E. Celebi, S. Dusza, D. Gutman, B. Helba, A. Kalloo, K. Liopyris, M. Marchetti, et al., “Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic),” arXiv preprint arXiv:1902.03368, 2019.