4OpenAI Blog·1mo ago

SafetyKit scales risk agents with OpenAI's most capable models

SafetyKit, a content moderation and compliance platform, has integrated OpenAI's GPT-5 to power its risk-detection agents. The deployment targets content moderation accuracy and compliance enforcement, positioning itself as a replacement for legacy safety systems. This represents a production enterprise use case of GPT-5 in trust and safety workflows.

Enterprise Deployment Patterns Agent and Tool Ecosystem OpenAI SafetyKit GPT-5.5

Related guides (4)

OpenAI

OpenAI: The Lab That Made AI a Household Word

Read asBeginner In-depth

GPT-5.5

GPT-5.5: OpenAI's Most Capable Model — and Its Most Complicated

Read asBeginner In-depth

Enterprise Deployment PatternsTopic guide

Enterprise Deployment Patterns: From LLM Demo to Production Reality

Read asIn-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How the Infrastructure Layer Around LLMs Is Consolidating

Read asIn-depth

Related events (8)

7Openai Blog·1mo ago·source ↗

Introducing gpt-oss-safeguard

OpenAI has released gpt-oss-safeguard, a set of open-weight reasoning models designed for safety classification tasks. The models are intended to help developers implement and iterate on custom content safety policies. This represents OpenAI's entry into the open-weight safety tooling space, providing infrastructure-level moderation capabilities that can be customized and deployed independently.

Open Weights Progress AI Safety Research gpt-oss-safeguard OpenAI +2 more

5Openai Blog·1mo ago·source ↗

OpenAI Releases Teen Safety Policies for Developers via gpt-oss-safeguard

OpenAI has published prompt-based teen safety policies targeting developers who build on its models, specifically leveraging the gpt-oss-safeguard model to moderate age-specific risks. The release provides structured guidance and tooling for filtering or adjusting AI outputs in contexts where minors may be users. This represents an extension of OpenAI's safety infrastructure into the developer-facing layer, addressing regulatory and reputational pressure around youth-facing AI deployments.

AI Safety Research Enterprise Deployment Patterns gpt-oss-safeguard OpenAI +1 more

7Openai Blog·1mo ago·source ↗

GPT-5.1-Codex-Max System Card

OpenAI has published the system card for GPT-5.1-Codex-Max, a coding-focused model variant. The card details model-level safety mitigations including specialized safety training against harmful tasks and prompt injection attacks, as well as product-level controls such as agent sandboxing and configurable network access. This represents OpenAI's formal safety documentation for an agentic coding model deployment.

Frontier Model Releases AI Safety Research prompt injection GPT-5.1-Codex-Max OpenAI +2 more

7Openai Blog·1mo ago·source ↗

OpenAI Expands Trusted Access for Cyber Defense Program with GPT-5.4-Cyber

OpenAI is expanding its Trusted Access for Cyber program, introducing a specialized model called GPT-5.4-Cyber to vetted cybersecurity defenders. The program aims to provide advanced AI capabilities to legitimate security professionals while strengthening safeguards against misuse. This represents a structured approach to deploying frontier AI in sensitive security contexts with access controls.

Frontier Model Releases AI Safety Research GPT-5.5-Cyber Trusted Access for Cyber OpenAI +2 more

7Openai Blog·1mo ago·source ↗

OpenAI Launches GPT-5.5 and GPT-5.5-Cyber with Expanded Trusted Access for Cyber Program

OpenAI is expanding its Trusted Access for Cyber program with two new models: GPT-5.5 and GPT-5.5-Cyber, a specialized variant aimed at cybersecurity applications. The program provides verified defenders with access to these models to accelerate vulnerability research and protect critical infrastructure. This represents a continuation of OpenAI's strategy of releasing domain-specialized model variants with controlled access tiers for sensitive use cases.

Frontier Model Releases AI Safety Research GPT-5.5-Cyber Trusted Access for Cyber OpenAI +2 more

7Openai Blog·1mo ago·source ↗

OpenAI Releases gpt-oss-safeguard-120b and gpt-oss-safeguard-20b: Open-Weight Policy-Reasoning Safety Models

OpenAI has released two open-weight reasoning models, gpt-oss-safeguard-120b and gpt-oss-safeguard-20b, post-trained from the gpt-oss base models to perform policy-conditioned content labeling. The models are designed to reason from a provided policy document and classify content accordingly, functioning as configurable safety classifiers. A technical report accompanies the release, covering capabilities and baseline safety evaluations benchmarked against the underlying gpt-oss models.

Open Weights Progress AI Safety Research GPT-OSS gpt-oss-safeguard OpenAI +1 more

7Openai Blog·1mo ago·source ↗

OpenAI Releases GPT-5.2 System Card Update

OpenAI has published a system card update for GPT-5.2, the latest model family in the GPT-5 series. The safety mitigation approach is described as largely consistent with the prior GPT-5 and GPT-5.1 system cards. Training data sources follow the same pattern as other OpenAI models: publicly available internet data, third-party partnerships, and user/researcher-generated content.

Frontier Model Releases AI Safety Research GPT-5.2 OpenAI GPT-5.5 System Card +1 more

5Openai Blog·1mo ago·source ↗

OpenAI Upgrades Moderation API with GPT-4o-Based Multimodal Model

OpenAI has released an updated Moderation API powered by a new model built on GPT-4o, extending content moderation capabilities to both text and images. The update aims to improve accuracy in detecting harmful content, giving developers better tools for building moderation systems. This represents an expansion of OpenAI's safety infrastructure into multimodal domains.

AI Safety Research Enterprise Deployment Patterns GPT-4o OpenAI Moderation API OpenAI +1 more