Almanac
benchmark

Multi-LCB

benchmarkactiveprovisionalmulti-lcb-c94a015a·1 events·first seen 47h ago

Aliases: Multi-LCB

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.AI·47h ago·source ↗

Multi-LCB extends LiveCodeBench to twelve programming languages for cross-language code evaluation

Researchers introduce Multi-LCB, a benchmark that extends the widely-used LiveCodeBench (LCB) to twelve programming languages by transforming Python tasks into equivalent tasks in other languages while preserving LCB's contamination controls. The benchmark evaluates 24 LLMs and uncovers Python overfitting, language-specific contamination, and large performance disparities across languages. Multi-LCB is designed to auto-update with future LCB releases, making it a living benchmark for multilingual code generation assessment.