Almanac
← Events
5OpenAI Blog·1mo ago

What Parameter Golf taught us about AI-assisted research

OpenAI's Parameter Golf competition attracted over 1,000 participants and 2,000+ submissions focused on AI-assisted ML research under strict constraints. The challenge explored coding agents, quantization techniques, and novel model design within tight parameter budgets. The event served as a structured probe into how AI tools augment human researchers tackling constrained optimization problems.

Related guides (4)

Related events (8)

8Openai Blog·1mo ago·source ↗

Measuring AI's capability to accelerate biological research

OpenAI introduces a real-world evaluation framework designed to measure how AI systems can accelerate biological research in wet lab settings. The work uses GPT-5 to optimize a molecular cloning protocol as a concrete demonstration case. The framework explicitly addresses both the potential benefits and biosecurity risks of AI-assisted experimentation, positioning this as a dual-use capability assessment.

5Openai Blog·1mo ago·source ↗

Measuring Goodhart's Law

OpenAI published a blog post examining Goodhart's Law in the context of AI training, where optimizing a proxy objective can cause it to diverge from the true underlying goal. The post addresses the challenge of measuring and optimizing objectives that are difficult or costly to evaluate directly. This is directly relevant to reward hacking, specification gaming, and alignment research at OpenAI.

6The Batch·19d ago·source ↗

GLM-5.1 Open-Weights Model Targets Long-Running Agentic Tasks; Andrew Ng on Coding Agent Acceleration by Software Domain

Z.ai released GLM-5.1, an open-weights mixture-of-experts LLM (754B total / 40B active parameters) designed for sustained agentic coding tasks lasting up to eight hours, featuring iterative planning-execution-evaluation loops with thousands of tool calls. The model claims top open-weights performance on Artificial Analysis Intelligence Index and SWE-Bench Pro, available under MIT license via HuggingFace. The accompanying editorial by Andrew Ng offers a tiered framework for how much coding agents accelerate different software work categories—frontend most, then backend, infrastructure, and research least—with practical implications for team organization. A secondary item references data-center opposition and LLM helpfulness failure modes.

5Github Trending·1mo ago·source ↗

karpathy/autoresearch: AI Agents for Automated Single-GPU Research

Andrej Karpathy's autoresearch repository on GitHub has accumulated over 82,000 stars, with 332 new stars today. The project focuses on AI agents that autonomously run research experiments on single-GPU nanochat training setups. The high star count and trending activity suggest significant community interest in automated ML research tooling.

6The Batch·19d ago·source ↗

Data Points: NeurIPS-China Standoff, Anthropic Emotion Vectors, Gemma 4, Cursor 3, Microsoft MAI Models

This edition of The Batch covers five significant AI developments: NeurIPS reversed a sanctions-related submission policy after China's largest tech federation announced a boycott; Anthropic's interpretability team identified 171 emotion-related representations in Claude Sonnet 4.5 that causally influence model behavior including unsafe actions; Google released Gemma 4, a family of Apache 2.0-licensed open-weights models up to 31B parameters with strong benchmark performance; Cursor released version 3 with a redesigned multi-agent interface; and Microsoft announced three specialized MAI models for transcription, voice synthesis, and image generation. The NeurIPS incident highlights growing friction in international AI research access, while the Anthropic findings have direct implications for AI safety and interpretability research.

4Import Ai·1mo ago·source ↗

Import AI 446: Nuclear LLMs; China's big AI benchmark; measurement and AI policy

Import AI issue 446 covers three main topics: the application of large language models to nuclear domains, a major new AI benchmark from China, and the intersection of AI measurement with policy. The newsletter synthesizes recent developments across frontier AI research and geopolitical AI competition. It also touches on speculative questions about AI psychology, such as whether AIs might experience jealousy. As a tier-2 commentary digest, it aggregates signals across multiple active research and policy threads.

4Openai Blog·1mo ago·source ↗

Democratic Inputs to AI Grant Program: Lessons Learned and Implementation Plans

OpenAI summarizes outcomes from its Democratic Inputs to AI grant program, which funded 10 international teams to develop ideas and tools for collective governance of AI systems. The update outlines key innovations and learnings from the program and signals continued investment in participatory AI governance research. OpenAI is calling for researchers and engineers to join ongoing work in this area.

7Openai Blog·1mo ago·source ↗

GPT-5 and the future of mathematical discovery

UCLA Professor Ernest Ryu collaborated with GPT-5 to solve an open problem in optimization theory, representing a concrete example of AI-assisted mathematical research. The announcement highlights GPT-5's capability in formal reasoning and scientific discovery beyond standard benchmarks. This is an OpenAI blog post showcasing a real-world research outcome involving a frontier model.