7OpenAI Blog·1mo ago

OpenAI Releases gpt-oss-safeguard-120b and gpt-oss-safeguard-20b: Open-Weight Policy-Reasoning Safety Models

OpenAI has released two open-weight reasoning models, gpt-oss-safeguard-120b and gpt-oss-safeguard-20b, post-trained from the gpt-oss base models to perform policy-conditioned content labeling. The models are designed to reason from a provided policy document and classify content accordingly, functioning as configurable safety classifiers. A technical report accompanies the release, covering capabilities and baseline safety evaluations benchmarked against the underlying gpt-oss models.

Open Weights Progress AI Safety Research Agent and Tool Ecosystem GPT-OSS gpt-oss-safeguard OpenAI

Related guides (4)

OpenAI

OpenAI: The Lab That Made AI a Household Word

Read asBeginner In-depth

AI Safety ResearchTopic guide

AI Safety Research: From Lab Policies to Real-World Flashpoints

Read asBeginner In-depth

Open Weights ProgressTopic guide

Open Weights Progress: How Freely Available AI Models Caught Up to the Frontier

Read asBeginner

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How the Infrastructure Layer Around LLMs Is Consolidating

Read asIn-depth

Related events (8)

7Openai Blog·1mo ago·source ↗

Introducing gpt-oss-safeguard

OpenAI has released gpt-oss-safeguard, a set of open-weight reasoning models designed for safety classification tasks. The models are intended to help developers implement and iterate on custom content safety policies. This represents OpenAI's entry into the open-weight safety tooling space, providing infrastructure-level moderation capabilities that can be customized and deployed independently.

Open Weights Progress AI Safety Research gpt-oss-safeguard OpenAI +2 more

9Openai Blog·1mo ago·source ↗

OpenAI Releases gpt-oss-120b and gpt-oss-20b Open-Weight Reasoning Models

OpenAI has published model cards for gpt-oss-120b and gpt-oss-20b, two open-weight reasoning models released under the Apache 2.0 license alongside a dedicated gpt-oss usage policy. This marks a significant move by OpenAI into the open-weights space, offering both a large 120B parameter model and a smaller 20B variant. The release signals a strategic shift for OpenAI, which has historically kept its frontier models proprietary.

Frontier Model Releases Open Weights Progress gpt-oss usage policy Apache 2.0 GPT-OSS 120B +4 more

9Openai Blog·1mo ago·source ↗

OpenAI Releases gpt-oss-120b and gpt-oss-20b Open-Weight Models Under Apache 2.0

OpenAI is releasing two open-weight language models, gpt-oss-120b and gpt-oss-20b, under the Apache 2.0 license. The models are claimed to outperform similarly sized open models on reasoning tasks and feature strong tool use capabilities. They are optimized for efficient deployment on consumer hardware, positioning them as cost-effective alternatives in the open-weights ecosystem.

Frontier Model Releases Open Weights Progress Apache 2.0 GPT-OSS 120B OpenAI +3 more

5Openai Blog·1mo ago·source ↗

OpenAI Releases Teen Safety Policies for Developers via gpt-oss-safeguard

OpenAI has published prompt-based teen safety policies targeting developers who build on its models, specifically leveraging the gpt-oss-safeguard model to moderate age-specific risks. The release provides structured guidance and tooling for filtering or adjusting AI outputs in contexts where minors may be users. This represents an extension of OpenAI's safety infrastructure into the developer-facing layer, addressing regulatory and reputational pressure around youth-facing AI deployments.

AI Safety Research Enterprise Deployment Patterns gpt-oss-safeguard OpenAI +1 more

4Openai Blog·1mo ago·source ↗

SafetyKit scales risk agents with OpenAI's most capable models

SafetyKit, a content moderation and compliance platform, has integrated OpenAI's GPT-5 to power its risk-detection agents. The deployment targets content moderation accuracy and compliance enforcement, positioning itself as a replacement for legacy safety systems. This represents a production enterprise use case of GPT-5 in trust and safety workflows.

Enterprise Deployment Patterns Agent and Tool Ecosystem OpenAI SafetyKit GPT-5.5

8Openai Blog·1mo ago·source ↗

OpenAI Releases Most Capable Open-Weights Models

OpenAI has released what it describes as its most capable open-weights models, framing the move as a major step toward broader AI accessibility. The announcement emphasizes openness, flexibility, and global reach as core motivations. This marks a significant shift in OpenAI's historically closed model distribution strategy.

Frontier Model Releases Open Weights Progress open-weight models OpenAI +2 more

8Hugging Face Blog·1mo ago·source ↗

Welcome GPT OSS, the new open-source model family from OpenAI!

Hugging Face published a blog post welcoming OpenAI's GPT OSS, described as a new open-source model family from OpenAI. The post appears on the Hugging Face blog, signaling the models are being hosted or integrated into the Hugging Face ecosystem. This represents a notable shift in OpenAI's historically closed-weights strategy toward open-weight model releases.

Frontier Model Releases Open Weights Progress GPT-OSS Hugging Face OpenAI +1 more

8Openai Blog·1mo ago·source ↗

Estimating Worst-Case Frontier Risks of Open-Weight LLMs

OpenAI introduces a methodology called malicious fine-tuning (MFT) to assess worst-case risks of releasing open-weight models, specifically applied to their internal model gpt-oss. The study attempts to elicit maximum dangerous capabilities in biology and cybersecurity domains through targeted fine-tuning. This represents a systematic effort to quantify uplift risks before open-weight releases, informing OpenAI's open-weight release policy.

Evaluation and Benchmarking Open Weights Progress cybersecurity risk uplift biology risk uplift malicious fine-tuning +3 more