benchmark
Phun-Bench
benchmarkactiveprovisional
phun-bench-98dbdd0a·1 events·first seen 9d agoAliases: Phun-Bench
More like this (12)
Recent events (1)
Phun-Bench: A Chinese benchmark for evaluating LLM phonological understanding
Researchers introduce Phun-Bench, a purpose-built benchmark for evaluating LLMs on phonological understanding in Chinese across three dimensions: Homophony, Rhyme, and Phonetic Similarity. The benchmark is designed to avoid rote-memorization shortcuts that plague existing phonological evals. Results show LLMs can recall correct pronunciations but fail to apply phonological knowledge flexibly as human speakers do, and the authors propose a hypothesis about the underlying mechanism of LLM phonological 'perception'.