Entity · company

HackerOne

companyactivehackerone-1c72f46c·3 events·first seen Jun 2, 2026

Aliases: HackerOne

Co-occurring entities

Anthropic Claude Fable 5 Glasswing Hiroshima AI Process White House Voluntary AI Commitments Constitutional Classifiers Claude Opus 4.6 Responsible Scaling Policy Claude 3.7 Sonnet

More like this (12)

Hacker News NetHack NetHack MiniHack MiniHack Hacktivate AI mukul975/Anthropic-Cybersecurity-Skills reward hacking HackAgent CyberAv3ngers ExploitBench OWASP IoT Top 10

Recent events (3)

7Anthropic News·Jul 3, 2026·source ↗

Anthropic details Fable 5 cybersecurity safeguards and proposes AI jailbreak severity framework

Anthropic has re-deployed Claude Fable 5 globally and published detailed documentation of its cybersecurity safety classifiers, which categorize uses into prohibited, high-risk dual use, low-risk dual use, and benign tiers. The post also introduces an early-draft jailbreak severity framework developed with Glasswing partners, intended to give AI developers and governments a shared vocabulary for describing jailbreak risk levels. Anthropic is soliciting public feedback on the framework and has launched a HackerOne bug bounty program for cyber jailbreaks in Fable 5. The disclosure is notable for its specificity about classifier design trade-offs, including the deliberate 'safety margin' that accepts higher false-positive rates to reduce harmful outputs.

Frontier Model Releases AI Safety Research HackerOne Claude Fable 5 Glasswing +2 more

6Anthropic News·Jun 4, 2026·source ↗

Anthropic expands model safety bug bounty to target universal jailbreaks in CBRN and cybersecurity domains

Anthropic is expanding its HackerOne-partnered bug bounty program to offer up to $15,000 for novel universal jailbreak attacks against a next-generation safety mitigation system not yet publicly deployed. The program specifically targets high-risk domains including CBRN (chemical, biological, radiological, nuclear) and cybersecurity, with participants given early access to test the new safeguards before release. The initiative begins as invite-only and aligns with Anthropic's commitments under the White House Voluntary AI Commitments and G7 Hiroshima Process Code of Conduct.

AI Safety Research Regulatory Developments HackerOne Hiroshima AI Process White House Voluntary AI Commitments +1 more

6Anthropic News·Jun 2, 2026·source ↗

Anthropic launches bug bounty program to stress-test ASL-3 Constitutional Classifiers

Anthropic launched an invite-only bug bounty program in partnership with HackerOne to find universal jailbreaks in its Constitutional Classifiers system before public deployment, offering up to $25,000 per verified vulnerability. The program targets CBRN-related safety bypasses on Claude 3.7 Sonnet and is part of Anthropic's work to meet its AI Safety Level-3 (ASL-3) Deployment Standard under its Responsible Scaling Policy. A follow-up update extended the program to test Constitutional Classifiers on the new Claude Opus 4 model and began accepting reports of universal jailbreaks found on public platforms. The initiative reflects Anthropic's structured approach to pre-deployment safety validation for increasingly capable models.

Frontier Model Releases AI Safety Research Constitutional Classifiers Claude Opus 4.6 HackerOne +3 more