Exploring Adversarial Robustness and Safety Alignment in Multilingual Multi-Modal Large Language Models
exploring-adversarial-robustness-and-safety-alignment-in-multilingual-multi-modal-large-language-models-7633268e·1 events·first seen 13d agoAliases: Exploring Adversarial Robustness and Safety Alignment in Multilingual Multi-Modal Large Language Models
Co-occurring entities
More like this (12)
Recent events (1)
Adversarial robustness and safety alignment in multilingual multimodal LLMs: cross-lingual vulnerability and 'safety-by-failure'
A systematic study evaluates adversarial robustness and safety alignment of multimodal LLMs across 12 languages, finding that adversarial images optimized in one language transfer to others (cross-lingual transferability). The paper introduces the concept of 'safety-by-failure': low-resource languages appear safer not due to genuine alignment but because models fail to comprehend harmful instructions in those languages. Models like Qwen3-VL that integrate multilingual capability throughout training (rather than only at instruction tuning) show genuine cross-lingual safety with active refusal. The findings challenge the assumption that low-resource language safety metrics reflect real alignment.