Entity · benchmark

Phun-Bench

benchmarkactivephun-bench-98dbdd0a·1 events·first seen Jun 8, 2026

Aliases: Phun-Bench

More like this (12)

PhantomBench FoldBench FeatBench SorryBench NatureBench PseudoBench CharacterBench FuzzyBench FuzzyBench PaperBench FutureBench FinBench

Recent events (1)

4arXiv · cs.CL·Jun 8, 2026·source ↗

Phun-Bench: A Chinese benchmark for evaluating LLM phonological understanding

Researchers introduce Phun-Bench, a purpose-built benchmark for evaluating LLMs on phonological understanding in Chinese across three dimensions: Homophony, Rhyme, and Phonetic Similarity. The benchmark is designed to avoid rote-memorization shortcuts that plague existing phonological evals. Results show LLMs can recall correct pronunciations but fail to apply phonological knowledge flexibly as human speakers do, and the authors propose a hypothesis about the underlying mechanism of LLM phonological 'perception'.

Evaluation and Benchmarking Phun-Bench