Mitigating Easy Option Bias in Multiple-Choice Question Answering

Source: arXiv AI Papers

The research highlights a significant issue in multiple-choice Visual Question Answering (VQA) benchmarks such as MMStar, RealWorldQA, SEED-Bench, Next-QA, STAR benchmark, and Video-MME, where vision-language models (VLMs) exploit an Easy-Options Bias (EOB). This bias enables models to select the correct answer using only visual inputs and answer options, bypassing the need for the question itself. The root cause is an imbalance in visual relevance, where the correct answer is more visually aligned with the content than the incorrect options, allowing models to rely on vision-option similarity rather than true question understanding. To mitigate this, the authors propose GroundAttack, a toolkit that automatically generates hard negative options that are visually plausible and closely resemble the correct answer. Applying GroundAttack to NExT-QA and MMStar datasets results in new annotations free from EOB. On these revised datasets, VLMs perform close to random chance when only vision and options are provided, and their accuracy drops under full input settings, indicating a more realistic assessment of their question-answering capabilities. This work underscores the importance of addressing biases in benchmark datasets to ensure genuine model comprehension and robustness. The release of codes and new annotations promises to facilitate further research and development in this area. Overall, GroundAttack offers a practical solution to improve the reliability and fairness of VQA evaluations by removing shortcuts that models might exploit.

👉 Pročitaj original: arXiv AI Papers