Evidence-Grounded Vision–RAG Framework for Clinically Reliable Visual Reasoning in Chest X-Ray Analysis

Authors

Keywords:

Medical Vision-Language Models, Retrieval-Augmented Reasoning, Visual Evidence Retrieval, Chest X-ray Interpretation, Clinical Decision Support

Abstract

Vision–language models have shown potential for medical image understanding tasks such as visual question answering (VQA); however, their clinical adoption is limited by diagnostic ambiguity, limited supervision, and the risk of generating hallucinated or clinically unsafe responses. To address these challenges, this paper proposes an evidence-grounded Vision Retrieval-Augmented Generation (Vision–RAG) framework for reliable visual reasoning in chest X-ray analysis. The framework integrates visual retrieval with evidence-aware language generation to support clinically grounded reasoning without task-specific supervised training. A pretrained vision encoder retrieves semantically similar chest X-ray images and corresponding radiology reports from the MIMIC-CXR dataset, providing external clinical evidence to guide the vision–language model. The retrieval index is built from the training split, and evaluation is performed on a held-out validation set for unbiased assessment. The system is evaluated using approximately 2,000 automatically generated clinical questions. Results demonstrate effective evidence retrieval, achieving a Recall@1 of 66.88%, while yes/no question accuracy reaches 56.8%, reflecting the inherent challenge of unsupervised medical reasoning. Concept-level analysis shows clear separation between normal and infectious cases, with most ambiguity occurring between overlapping conditions such as pleural effusion and consolidation. Importantly, the model exhibits conservative prediction behavior with low false-positive tendencies, highlighting clinical safety. These findings indicate that evidence-grounded Vision–RAG provides an interpretable and reliable paradigm for medical visual reasoning in chest X-ray analysis, supporting decision-making in clinical workflows rather than replacing human expertise.

Downloads

Download data is not yet available.

References

Downloads

Published

08-02-2026

How to Cite

Evidence-Grounded Vision–RAG Framework for Clinically Reliable Visual Reasoning in Chest X-Ray Analysis. (2026). Journal of Smart Algorithms and Applications (JSAA), 2(2), 49-60. https://pub.scientificirg.com/index.php/JSAA/article/view/48

Most read articles by the same author(s)

Similar Articles

You may also start an advanced similarity search for this article.