Almanac
benchmark

Phun-Bench

benchmarkactiveprovisionalphun-bench-98dbdd0a·1 events·first seen 9d ago

Aliases: Phun-Bench

More like this (12)

Recent events (1)

4arXiv · cs.CL·9d ago·source ↗

Phun-Bench: A Chinese benchmark for evaluating LLM phonological understanding

Researchers introduce Phun-Bench, a purpose-built benchmark for evaluating LLMs on phonological understanding in Chinese across three dimensions: Homophony, Rhyme, and Phonetic Similarity. The benchmark is designed to avoid rote-memorization shortcuts that plague existing phonological evals. Results show LLMs can recall correct pronunciations but fail to apply phonological knowledge flexibly as human speakers do, and the authors propose a hypothesis about the underlying mechanism of LLM phonological 'perception'.