Evidence-Grounded Vision–RAG Framework for Clinically Reliable Visual Reasoning in Chest X-Ray Analysis
Keywords:
Medical Vision-Language Models, Retrieval-Augmented Reasoning, Visual Evidence Retrieval, Chest X-ray Interpretation, Clinical Decision SupportAbstract
Vision–language models have shown potential for medical image understanding tasks such as visual question answering (VQA); however, their clinical adoption is limited by diagnostic ambiguity, limited supervision, and the risk of generating hallucinated or clinically unsafe responses. To address these challenges, this paper proposes an evidence-grounded Vision Retrieval-Augmented Generation (Vision–RAG) framework for reliable visual reasoning in chest X-ray analysis. The framework integrates visual retrieval with evidence-aware language generation to support clinically grounded reasoning without task-specific supervised training. A pretrained vision encoder retrieves semantically similar chest X-ray images and corresponding radiology reports from the MIMIC-CXR dataset, providing external clinical evidence to guide the vision–language model. The retrieval index is built from the training split, and evaluation is performed on a held-out validation set for unbiased assessment. The system is evaluated using approximately 2,000 automatically generated clinical questions. Results demonstrate effective evidence retrieval, achieving a Recall@1 of 66.88%, while yes/no question accuracy reaches 56.8%, reflecting the inherent challenge of unsupervised medical reasoning. Concept-level analysis shows clear separation between normal and infectious cases, with most ambiguity occurring between overlapping conditions such as pleural effusion and consolidation. Importantly, the model exhibits conservative prediction behavior with low false-positive tendencies, highlighting clinical safety. These findings indicate that evidence-grounded Vision–RAG provides an interpretable and reliable paradigm for medical visual reasoning in chest X-ray analysis, supporting decision-making in clinical workflows rather than replacing human expertise.
Downloads
References
Downloads
Published
Issue
Section
Categories
License
Copyright (c) 2026 Journal of Smart Algorithms and Applications (JSAA)

This work is licensed under a Creative Commons Attribution 4.0 International License.
Journal of Smart Algorithms and Applications (JSAA) content is published under a Creative Commons Attribution License (CCBY). This means that content is freely available to all readers upon publication, and content is published as soon as production is complete.
Journal of Smart Algorithms and Applications (JSAA) seeks to publish the most influential papers that will significantly advance scientific understanding. Selected articles must present new and widely significant data, syntheses, or concepts. They should merit recognition by the wider scientific community and the general public through publication in a reputable scientific journal.



