benchmark
M³Exam
benchmarkactiveprovisional
m-exam-6a47f7ef·1 events·first seen 9d agoAliases: M³Exam
Co-occurring entities
More like this (12)
Recent events (1)
M³Exam: Benchmark for Multimodal Memory in Realistic User-Agent Interactions
Researchers introduce M³Exam, a query-centric multimodal conversational memory benchmark designed to evaluate language agents on realistic user-agent interactions, including cross-modal grounding and implicit information inference. Existing benchmarks are critiqued for assuming sparse visuals and human-human interaction formats. The paper also proposes M³Proctor, a companion memory method that detects query modality bias and retrieves raw visual sources on demand, achieving 13% accuracy improvement while reducing index-construction time and retrieved tokens by over 70%.