4OpenAI Blog·1mo ago

Evolution through large models

OpenAI published a blog post titled 'Evolution through large models' in June 2022, exploring the relationship between large-scale models and evolutionary or emergent capabilities. The post appears to examine how scaling laws and large model training relate to the emergence of novel behaviors and capabilities. As a Tier 1 source publication from OpenAI, it likely addresses foundational themes around capability emergence in large language models.

Frontier Model Releases Open Weights Progress OpenAI

Related guides (3)

OpenAI

OpenAI: The Lab That Made AI a Household Word

Read asBeginner In-depth

Frontier Model ReleasesTopic guide

Frontier Model Releases: The Race From Language to Action

Read asBeginner In-depth

Open Weights ProgressTopic guide

Open Weights Progress: How Freely Available AI Models Caught Up to the Frontier

Read asBeginner In-depth

Related events (8)

2Openai Blog·1mo ago·source ↗

OpenAI: Generative Models Overview (2016)

A 2016 OpenAI blog post describing four research projects centered on generative models as a branch of unsupervised learning. The post explains what generative models are, their importance, and potential future directions. This is an archival piece predating modern large language models and diffusion systems, representing early foundational work at OpenAI.

generative models unsupervised learning OpenAI

9Openai Blog·1mo ago·source ↗

Scaling Laws for Neural Language Models

OpenAI published foundational research establishing empirical scaling laws for neural language models, showing that model performance scales predictably with compute, data, and parameters. The work demonstrated power-law relationships between these factors and loss, providing a principled framework for allocating training resources. This paper became a cornerstone of modern large language model development strategy.

Training Infrastructure Frontier Model Releases Jared Kaplan Sam McCandlish OpenAI +3 more

8Openai Blog·1mo ago·source ↗

Evaluating Large Language Models Trained on Code

OpenAI published research on evaluating large language models trained on code, introducing the Codex model and the HumanEval benchmark for assessing code generation capabilities. The work established foundational methodology for measuring functional correctness of code produced by LLMs using a pass@k metric. This paper became a landmark reference for code-focused LLM evaluation and influenced subsequent code generation research across the field.

Frontier Model Releases Evaluation and Benchmarking GPT-3 pass@k OpenAI +3 more

4Hugging Face Blog·1mo ago·source ↗

Very Large Language Models and How to Evaluate Them

This Hugging Face blog post from October 2022 discusses approaches to zero-shot evaluation of large language models hosted on the Hub. It covers methodologies for benchmarking LLMs without task-specific fine-tuning, addressing the practical challenges of evaluating very large models at scale. The post situates evaluation tooling within the broader ecosystem of open model hosting and assessment.

Evaluation and Benchmarking Open Weights Progress zero-shot evaluation Hugging Face

5Openai Blog·1mo ago·source ↗

Evolution Strategies as a Scalable Alternative to Reinforcement Learning

OpenAI published research showing that evolution strategies (ES), a decades-old optimization technique, can match standard reinforcement learning performance on benchmarks like Atari and MuJoCo. The approach offers practical advantages over RL including easier parallelization and fewer hyperparameter sensitivities. This positions ES as a viable alternative training paradigm for policy optimization tasks.

Evaluation and Benchmarking Alignment and RLHF Evolution Strategies MuJoCo Reinforcement Learning +2 more

8Openai Blog·1mo ago·source ↗

OpenAI Releases Most Capable Open-Weights Models

OpenAI has released what it describes as its most capable open-weights models, framing the move as a major step toward broader AI accessibility. The announcement emphasizes openness, flexibility, and global reach as core motivations. This marks a significant shift in OpenAI's historically closed model distribution strategy.

Frontier Model Releases Open Weights Progress open-weight models OpenAI +2 more

8Openai Blog·1mo ago·source ↗

Better language models and their implications

OpenAI announced GPT-2, a large-scale unsupervised language model capable of generating coherent multi-paragraph text and achieving state-of-the-art performance on language modeling benchmarks. The model demonstrated zero-shot capability across reading comprehension, machine translation, question answering, and summarization without task-specific fine-tuning. OpenAI notably withheld the full model release citing misuse concerns, marking an early high-profile instance of staged/responsible release policy.

Frontier Model Releases Evaluation and Benchmarking GPT-2 zero-shot learning unsupervised language modeling +3 more

5Openai Blog·1mo ago·source ↗

Lessons learned on language model safety and misuse

OpenAI published a post summarizing their evolving thinking on language model safety and misuse in deployed systems. The piece is intended to share lessons with other AI developers facing similar challenges. It covers OpenAI's internal approaches to mitigating harmful outputs and misuse patterns observed in production.

AI Safety Research Enterprise Deployment Patterns OpenAI