5arXiv cs.AI (Artificial Intelligence)·6d ago

Critical review of AI exposure scores: methodological limits and research-policy coordination gaps

A new arXiv paper critically examines the 'GPTs are GPTs' occupational exposure scores (Eloundou et al., 2023), which have become a dominant empirical input to future-of-work policy debates. The authors identify two compounding gaps: structural limitations of static exposure scores (temporal, geographic, ontological) versus what policy questions actually require, and a coordination failure between researchers and policymakers who continue citing outdated measures. The paper surveys five families of methodological responses and argues that closing the research-policy gap requires participatory methods, better data infrastructure, and a shift from prediction to preparedness.

Evaluation and Benchmarking Regulatory Developments AI Exposure Scores: what they measure, what they miss, and what comes next Eloundou et al.GPTs are GPTs

Related guides (2)

Regulatory DevelopmentsTopic guide

AI Regulatory Developments: From Voluntary Frameworks to Government Enforcement

Read asBeginner In-depth

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

4Ai Snake Oil·1mo ago·source ↗

AI existential risk probabilities are too unreliable to inform policy

This commentary argues that numerical probability estimates for AI existential risk are epistemically unreliable and should not be used as a basis for policy decisions. The piece critiques the practice of assigning precise figures to speculative scenarios, characterizing it as pseudo-quantification that lends false credibility to uncertain claims. The author contends that such estimates are laundered speculation rather than grounded forecasting.

AI Safety Research Regulatory Developments Normal Tech AI Existential Risk

7Openai Blog·1mo ago·source ↗

GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models

OpenAI published research examining the potential labor market impacts of large language models, analyzing which occupations and tasks are most exposed to automation or augmentation by GPT-class models. The study introduces a framework for assessing LLM 'exposure' across job categories, finding that a significant share of U.S. workers could see at least 50% of their tasks affected. The paper represents an early systematic attempt to quantify economic disruption potential from frontier AI systems.

Evaluation and Benchmarking Enterprise Deployment Patterns Sam Manning Tyna Eloundou OpenAI +5 more

3Hacker News·20d ago·source ↗

Retrospective on GPT-2's 'Too Dangerous to Release' decision (2019)

A blog post revisiting OpenAI's 2019 decision to initially withhold GPT-2 due to misuse concerns has surfaced on Hacker News with significant engagement (239 points, 89 comments). The post examines the historical episode where OpenAI staged the release of GPT-2, citing fears of misuse for disinformation. This retrospective is relevant as a case study in AI safety communication and the evolution of lab release policies.

Open Weights Progress AI Safety Research GPT-2 OpenAI

6Anthropic News·25d ago·source ↗

Anthropic publishes policy brief calling for targeted AI regulation within 18 months

Anthropic published a policy position paper arguing that governments have an 18-month window to enact narrowly-targeted AI regulation before risks in cyber and CBRN domains become acute. The post cites rapid capability gains—SWE-bench scores rising from 1.96% to 49% in a year, GPQA scores approaching human expert level—as evidence that frontier models are approaching meaningful misuse thresholds. Anthropic also reviews its Responsible Scaling Policy as a model for adaptive, proportionate risk governance and calls for similar frameworks to be adopted industry-wide and codified in law.

AI Safety Research Regulatory Developments Anthropic Policy Frontier Red Team Claude 3.5 Sonnet UK AI Security Institute +5 more

5arXiv · cs.AI·14d ago·source ↗

Taxonomy and governance gap analysis for AI contributors in open-source software

A preprint from arXiv analyzes how open-source organizations are handling AI-generated and agent-driven contributions, comparing policies across six major projects (SymPy, LLVM, matplotlib, OpenInfra, Apache Software Foundation, Linux Foundation). The authors develop a six-dimensional taxonomy covering disclosure, responsibility, human oversight, licensing, enforcement, and maintainer workload, and score each organization's policy maturity. The paper maps documented agent incidents onto governance gaps and identifies misalignments with emerging regulatory frameworks including the EU AI Act, NIST AI RMF, and ISO/IEC 42001, proposing a harmonized tiered framework.

AI Safety Research Regulatory Developments LLVM Linux Foundation NIST AI RMF +6 more

8Openai Blog·1mo ago·source ↗

Measuring AI's capability to accelerate biological research

OpenAI introduces a real-world evaluation framework designed to measure how AI systems can accelerate biological research in wet lab settings. The work uses GPT-5 to optimize a molecular cloning protocol as a concrete demonstration case. The framework explicitly addresses both the potential benefits and biosecurity risks of AI-assisted experimentation, positioning this as a dual-use capability assessment.

Frontier Model Releases Evaluation and Benchmarking wet lab biological research evaluation framework OpenAI molecular cloning +3 more

6Don'T Worry About The Vase·3d ago·source ↗

Zvi Mowshowitz critiques White House ad hoc access policy for frontier AI models

Zvi Mowshowitz (Don't Worry About the Vase) analyzes a newly announced White House policy that would grant individual access to frontier AI models like GPT-5.6 on a case-by-case basis. The post frames this as a significant and problematic new standard for frontier model release governance. The commentary signals a notable regulatory development at the intersection of AI access policy and executive branch oversight.

Frontier Model Releases Regulatory Developments White House OpenAI Zvi Mowshowitz +1 more

5arXiv · cs.CL·12d ago·source ↗

Benchmark gap paper: EU AI Act requires doctrinal legal reasoning evals that don't yet exist

A new arXiv preprint identifies a critical measurement gap in legal AI evaluation: existing benchmarks test paralegal and ancillary tasks rather than doctrinal legal reasoning, which is the interpretive core of legal work. The authors argue this gap is not merely methodological but legally significant, because the EU AI Act's 'appropriate accuracy' requirement for high-risk AI in the judicial domain cannot be operationalized without a doctrinal-reasoning benchmark. The paper proposes a benchmark framework aimed at filling this gap under EU AI Act compliance requirements.

Evaluation and Benchmarking Regulatory Developments The Measurement Gap in the Automation of EU Law: Benchmarking Doctrinal Legal Reasoning under the EU AI Act EU AI Act