Recent research highlights a significant vulnerability in Multimodal Large Language Models (MLLMs), which can generate harmful outputs from innocuous unimodal inputs due to implicit reasoning risks. The proposed solution involves the Safe-Semantics-but-Unsafe-Interpretation (SSUI) dataset and the Safety-aware Reasoning Path Optimization (SRPO) training framework. This innovation is crucial in embedding human safety values into the long-chain reasoning processes of MLLMs.
The introduction of the SSUI dataset provides interpretable reasoning paths that distinctively address the challenges found in cross-modal data handling. Experimental results demonstrate that MLLMs trained with the SRPO framework achieve superior performance on key safety benchmarks, including the newly proposed Reasoning Path Benchmark (RSBench). This advancement not only improves model safety but also sets a foundation for further research into safeguarding AI systems against potential safety alignments in diverse applications.
👉 Pročitaj original: arXiv AI Papers