benchmark

IndicContextEval

benchmarkactiveprovisionalindiccontexteval-5a21f60f·1 events·first seen 2d ago

Aliases: IndicContextEval

More like this (12)

ValueEval ParaEval IndicBERT ProActEval T-Eval BenHalluEval CruxEval Every Eval Ever Codex HumanEval InstructS2S-Eval IndicBERT-HPA GenEval

Recent events (1)

4arXiv · cs.CL·2d ago·source ↗

IndicContextEval: Benchmark for context utilisation in Audio LLMs across 8 Indic languages

Researchers introduce IndicContextEval, a 56-hour multilingual speech benchmark covering 555 speakers across 8 Indian languages and 23 professional domains, designed to test whether Audio LLMs genuinely use textual context (domain descriptions, entity lists) or rely on parametric knowledge. The benchmark employs a 7-level prompting framework that progressively introduces contextual signals including adversarial prompts with incorrect entities. Evaluation of five models reveals substantial variation in context utilisation behaviour, exposing a gap in existing ASR benchmarks that test only fixed prompting conditions.

Evaluation and Benchmarking Multimodal Progress IndicContextEval