Almanac
← Events
5arXiv cs.AI (Artificial Intelligence)·6d ago

Critical review of AI exposure scores: methodological limits and research-policy coordination gaps

A new arXiv paper critically examines the 'GPTs are GPTs' occupational exposure scores (Eloundou et al., 2023), which have become a dominant empirical input to future-of-work policy debates. The authors identify two compounding gaps: structural limitations of static exposure scores (temporal, geographic, ontological) versus what policy questions actually require, and a coordination failure between researchers and policymakers who continue citing outdated measures. The paper surveys five families of methodological responses and argues that closing the research-policy gap requires participatory methods, better data infrastructure, and a shift from prediction to preparedness.

Related guides (2)

Related events (8)

4Ai Snake Oil·1mo ago·source ↗

AI existential risk probabilities are too unreliable to inform policy

This commentary argues that numerical probability estimates for AI existential risk are epistemically unreliable and should not be used as a basis for policy decisions. The piece critiques the practice of assigning precise figures to speculative scenarios, characterizing it as pseudo-quantification that lends false credibility to uncertain claims. The author contends that such estimates are laundered speculation rather than grounded forecasting.

7Openai Blog·1mo ago·source ↗

GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models

OpenAI published research examining the potential labor market impacts of large language models, analyzing which occupations and tasks are most exposed to automation or augmentation by GPT-class models. The study introduces a framework for assessing LLM 'exposure' across job categories, finding that a significant share of U.S. workers could see at least 50% of their tasks affected. The paper represents an early systematic attempt to quantify economic disruption potential from frontier AI systems.

3Hacker News·20d ago·source ↗

Retrospective on GPT-2's 'Too Dangerous to Release' decision (2019)

A blog post revisiting OpenAI's 2019 decision to initially withhold GPT-2 due to misuse concerns has surfaced on Hacker News with significant engagement (239 points, 89 comments). The post examines the historical episode where OpenAI staged the release of GPT-2, citing fears of misuse for disinformation. This retrospective is relevant as a case study in AI safety communication and the evolution of lab release policies.

6Anthropic News·25d ago·source ↗

Anthropic publishes policy brief calling for targeted AI regulation within 18 months

Anthropic published a policy position paper arguing that governments have an 18-month window to enact narrowly-targeted AI regulation before risks in cyber and CBRN domains become acute. The post cites rapid capability gains—SWE-bench scores rising from 1.96% to 49% in a year, GPQA scores approaching human expert level—as evidence that frontier models are approaching meaningful misuse thresholds. Anthropic also reviews its Responsible Scaling Policy as a model for adaptive, proportionate risk governance and calls for similar frameworks to be adopted industry-wide and codified in law.

5arXiv · cs.AI·14d ago·source ↗

Taxonomy and governance gap analysis for AI contributors in open-source software

A preprint from arXiv analyzes how open-source organizations are handling AI-generated and agent-driven contributions, comparing policies across six major projects (SymPy, LLVM, matplotlib, OpenInfra, Apache Software Foundation, Linux Foundation). The authors develop a six-dimensional taxonomy covering disclosure, responsibility, human oversight, licensing, enforcement, and maintainer workload, and score each organization's policy maturity. The paper maps documented agent incidents onto governance gaps and identifies misalignments with emerging regulatory frameworks including the EU AI Act, NIST AI RMF, and ISO/IEC 42001, proposing a harmonized tiered framework.

8Openai Blog·1mo ago·source ↗

Measuring AI's capability to accelerate biological research

OpenAI introduces a real-world evaluation framework designed to measure how AI systems can accelerate biological research in wet lab settings. The work uses GPT-5 to optimize a molecular cloning protocol as a concrete demonstration case. The framework explicitly addresses both the potential benefits and biosecurity risks of AI-assisted experimentation, positioning this as a dual-use capability assessment.

6Don'T Worry About The Vase·3d ago·source ↗

Zvi Mowshowitz critiques White House ad hoc access policy for frontier AI models

Zvi Mowshowitz (Don't Worry About the Vase) analyzes a newly announced White House policy that would grant individual access to frontier AI models like GPT-5.6 on a case-by-case basis. The post frames this as a significant and problematic new standard for frontier model release governance. The commentary signals a notable regulatory development at the intersection of AI access policy and executive branch oversight.

5arXiv · cs.CL·12d ago·source ↗

Benchmark gap paper: EU AI Act requires doctrinal legal reasoning evals that don't yet exist

A new arXiv preprint identifies a critical measurement gap in legal AI evaluation: existing benchmarks test paralegal and ancillary tasks rather than doctrinal legal reasoning, which is the interpretive core of legal work. The authors argue this gap is not merely methodological but legally significant, because the EU AI Act's 'appropriate accuracy' requirement for high-risk AI in the judicial domain cannot be operationalized without a doctrinal-reasoning benchmark. The paper proposes a benchmark framework aimed at filling this gap under EU AI Act compliance requirements.