paper
Mind the Gap: Can Frontier LLMs Pass a Standardized Office Proficiency Exam?
paperactiveprovisional
mind-the-gap-can-frontier-llms-pass-a-standardized-office-proficiency-exam--a9381614·1 events·first seen 7d agoAliases: Mind the Gap: Can Frontier LLMs Pass a Standardized Office Proficiency Exam?
Co-occurring entities
More like this (12)
frontier LLMsReassessing High-Performing LLMs on Polish Medical Exams: True Competence or Bias-Driven Performance?LLM PretrainingLLM-judged explanation scoreDep-LLMLLM Debate CompetitionSpeechLLMAudio-LLMLLM-as-a-JudgeThe Masked Advantage: Uncovering Local-Language Access to Cultural Knowledge in LLMsOpen Leaderboard for Japanese LLMsLLM-judge scoring
Recent events (1)
NCRE-based benchmark reveals frontier LLMs top out at 68.8% on professional Office automation tasks
Researchers introduce an evaluation suite derived from China's National Computer Rank Examination (NCRE), comprising 200 practical tasks across Word, Excel, and PowerPoint scored via 7,118 machine-gradable criteria. Seven frontier LLMs are benchmarked: single-turn models peak at 36.6% Score Rate, while a full agentic system with execution feedback and iterative repair reaches 68.8%, still well below the 95.5% community-reference score. The results demonstrate that fine-grained, long-horizon Office document automation remains a significant unsolved challenge for current LLM and agent systems despite strong code-generation capabilities.