A shared playbook for trustworthy third party evaluations
OpenAI has published guidance outlining a shared framework for conducting trustworthy third-party evaluations of frontier AI systems. The playbook covers methodology for assessing model capabilities, safeguards, and evaluation validity. This represents OpenAI's attempt to standardize and legitimize external auditing practices for frontier models.
Related guides (3)
Related events (8)
OpenAI Expands External Safety Testing Ecosystem
OpenAI published a post describing its use of independent experts to evaluate frontier AI systems through third-party testing. The initiative aims to strengthen safety validation, verify safeguards, and increase transparency around capability and risk assessments. The announcement signals a continued push toward external accountability mechanisms for frontier model evaluation.
Anthropic advocates for third-party testing regime as core AI policy infrastructure
Anthropic published a policy position paper arguing that frontier AI systems require a third-party testing and oversight regime, distinct from self-governance approaches like their own Responsible Scaling Policy. The post outlines what such a regime should include: trusted third-party auditors, precisely scoped tests targeting only the most computationally intensive systems, and international coordination via shared standards and Mutual Recognition agreements. Anthropic acknowledges their RSP is insufficient alone because it relies on single private-sector actors, and calls for industry-wide mandatory testing that would eventually become a legal requirement for wide deployment.
OpenAI and Anthropic Share Findings from Joint Safety Evaluation
OpenAI and Anthropic conducted a first-of-its-kind cross-lab safety evaluation, testing each other's frontier models across dimensions including misalignment, instruction following, hallucinations, and jailbreaking resistance. The collaboration represents a novel form of inter-lab safety research cooperation. Findings highlight both progress and ongoing challenges in AI safety, and establish a potential template for future cross-organizational evaluations.
OpenAI proposes federal governance blueprint for frontier AI safety and national security
OpenAI published a policy blueprint calling for a U.S. federal framework to govern frontier AI, covering safety, resilience, and national security dimensions. The proposal outlines OpenAI's vision for democratic oversight of the most capable AI systems. As a tier-1 primary source from a leading lab, this represents a significant public policy position that will likely influence regulatory discussions.
Improving Verifiability in AI Development: Multi-Stakeholder Report
OpenAI contributed to a multi-stakeholder report co-authored by 58 researchers across 30 organizations, including Mila, CSET, and the Schwartz Reisman Institute. The report identifies 10 mechanisms for improving the verifiability of claims about AI systems. These tools are intended to help developers demonstrate safety, security, fairness, and privacy properties, while enabling policymakers and civil society to evaluate AI development processes.
OpenAI's Frontier Governance Framework
OpenAI has published its Frontier Governance Framework, a document outlining the company's AI safety, security, and risk management practices. The framework is explicitly positioned to align with emerging regulatory requirements from the EU and California. As a Tier 1 source announcement, this represents OpenAI's formal public stance on frontier model governance and regulatory compliance strategy.
Frontier AI regulation: Managing emerging risks to public safety
OpenAI published a policy position on regulating frontier AI systems, focusing on managing emerging risks to public safety. The piece outlines OpenAI's perspective on how governments and regulatory bodies should approach oversight of the most capable AI models. This represents a formal public stance from a leading AI lab on the shape of future AI governance frameworks.
OpenAI Introduces Trusted Access for Cyber Framework
OpenAI has announced Trusted Access for Cyber, a tiered trust-based framework designed to expand access to frontier AI capabilities relevant to cybersecurity while implementing stronger safeguards against misuse. The framework appears to govern how security researchers, organizations, and other actors can access more powerful cyber-relevant AI features. This represents a policy and access-control development at the intersection of AI safety and offensive/defensive cyber capabilities.


