Almanac
paper

Mind the Gap: Can Frontier LLMs Pass a Standardized Office Proficiency Exam?

paperactiveprovisionalmind-the-gap-can-frontier-llms-pass-a-standardized-office-proficiency-exam--a9381614·1 events·first seen 7d ago

Aliases: Mind the Gap: Can Frontier LLMs Pass a Standardized Office Proficiency Exam?

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·7d ago·source ↗

NCRE-based benchmark reveals frontier LLMs top out at 68.8% on professional Office automation tasks

Researchers introduce an evaluation suite derived from China's National Computer Rank Examination (NCRE), comprising 200 practical tasks across Word, Excel, and PowerPoint scored via 7,118 machine-gradable criteria. Seven frontier LLMs are benchmarked: single-turn models peak at 36.6% Score Rate, while a full agentic system with execution feedback and iterative repair reaches 68.8%, still well below the 95.5% community-reference score. The results demonstrate that fine-grained, long-horizon Office document automation remains a significant unsolved challenge for current LLM and agent systems despite strong code-generation capabilities.