Almanac
benchmark

National Computer Rank Examination

benchmarkactiveprovisionalnational-computer-rank-examination-661d40cd·1 events·first seen 7d ago

Aliases: National Computer Rank Examination

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·7d ago·source ↗

NCRE-based benchmark reveals frontier LLMs top out at 68.8% on professional Office automation tasks

Researchers introduce an evaluation suite derived from China's National Computer Rank Examination (NCRE), comprising 200 practical tasks across Word, Excel, and PowerPoint scored via 7,118 machine-gradable criteria. Seven frontier LLMs are benchmarked: single-turn models peak at 36.6% Score Rate, while a full agentic system with execution feedback and iterative repair reaches 68.8%, still well below the 95.5% community-reference score. The results demonstrate that fine-grained, long-horizon Office document automation remains a significant unsolved challenge for current LLM and agent systems despite strong code-generation capabilities.