GDPval
gdpval-bf84da34·3 events·first seen 28d agoAliases: GDPval
Co-occurring entities
More like this (12)
Recent events (3)
OpenAI Introduces GDPval: Evaluation of Model Performance on Economically Valuable Real-World Tasks
OpenAI has released GDPval, a new benchmark designed to measure AI model performance on real-world economically valuable tasks spanning 44 occupations. The evaluation aims to move beyond traditional academic benchmarks by grounding model assessment in tasks with direct economic relevance. This represents OpenAI's effort to better quantify the practical utility and labor-market impact of frontier models.
Agent Benchmarks Skew Toward Software Engineering, Missing Most Economically Valuable Labor
Researchers from Carnegie Mellon University and Stanford University mapped over 10,000 examples from 43 agent benchmarks to U.S. labor statistics using O*NET occupational taxonomies, finding that current benchmarks heavily over-represent software engineering relative to its share of employment and wages. Office and administrative support (18.2M workers, $869.8B wages) and management (11M workers, $1326.3B wages) are vastly under-represented compared to computer and mathematical occupations (5.2M workers, $563.6B wages). No single benchmark covered more than 50% of work activities, and all 43 benchmarks combined covered only 56.5% of work activities. The study identifies a systematic gap between where agentic AI is being evaluated and where the largest economic opportunity lies.
Data Points: GPT-5.4 Pro, Luma Uni-1, Phi-4-reasoning-vision-15B, Yuan 3.0 Ultra, OpenAI hardware chief resignation
The Batch's weekly roundup covers several significant AI developments: OpenAI released GPT-5.4 and GPT-5.4 Pro with computer-use agent capabilities, 1M token context, and strong benchmark gains on GDPval and OSWorld-Verified; Luma AI released Uni-1, a unified autoregressive model for visual understanding and generation; Microsoft released Phi-4-reasoning-vision-15B, an open-weights multimodal model trained on 200B tokens; Yuan Lab AI released Yuan 3.0 Ultra, a 1T-parameter MoE model with SOTA on document retrieval benchmarks. Additionally, OpenAI hardware chief Caitlin Kalinowski resigned over the company's Pentagon deal, citing concerns about surveillance and autonomous weapons governance.