Return to Issue Details A Robust Two-Stage Retrieval-Augmented Vision-Language Framework for Knowledge-Intensive Multimodal Reasoning and Alignment Download Download PDF