7The Batch (DeepLearning.AI)·19d ago

US Government Prepares AI Model Vetting System; GPT-5.5 Instant, Claude Finance Agents, Pentagon AI Partnerships

The White House is preparing an executive order to create an FDA-style vetting system for new AI models, prompted partly by Anthropic's Mythos model disclosing cybersecurity risks; the Commerce Department separately expanded a voluntary testing program with Google, Microsoft, and xAI. OpenAI rolled out GPT-5.5 Instant as the default ChatGPT model, claiming 52.5% fewer hallucinations on high-stakes prompts. Anthropic released ten financial agent templates running on Claude Opus 4.7, while the Pentagon expanded AI vendor agreements to include Microsoft, Amazon, Nvidia, and Reflection AI after canceling its Anthropic contract over autonomous weapons restrictions. Major pharma companies report AI gains primarily in manufacturing optimization rather than drug discovery breakthroughs.

Related guides (5)

OpenAI

OpenAI: The Lab That Made AI a Household Word

Read asBeginner

Claude Code

Claude Code: Anthropic's Autonomous Coding Agent

Read asBeginnerfeatured

Claude Opus 4.6

Claude Opus 4.6: Anthropic's Milestone Model for Long-Context and Agentic Work

Read asBeginner

NVIDIA

NVIDIA: The Hardware Backbone of the AI Era

Read asBeginner In-depth

Microsoft

Microsoft: The AI Infrastructure Superpower Behind Every Major Lab

Read asIn-depth

Related events (8)

7The Batch·1mo ago·source ↗

U.S. Government to Pre-Release Test AI Models for National Security Risks via NIST TRAINS Task Force

NIST announced a new multi-agency task force called TRAINS (Testing Risks of AI for National Security), overseen by its Center for AI Standards and Innovation, to evaluate frontier AI models for cybersecurity, biosecurity, and chemical weapons risks before public deployment. Google, Microsoft, xAI, Anthropic, and OpenAI have voluntarily agreed to submit models with limited guardrails for evaluation. The policy shift follows Anthropic's announcement that Claude Mythos Preview can autonomously exploit software vulnerabilities, and marks a sharp reversal from the Trump Administration's earlier deregulatory stance. The White House is also considering an executive order that would make pre-release government testing mandatory.

Frontier Model Releases Evaluation and Benchmarking White House Center for AI Standards and Innovation DeepSeek V4 +11 more

6The Batch·19d ago·source ↗

Data Points: Hackers Break Into Claude Mythos; OpenAI Launches Cybersecurity Rival; Maine Data Center Moratorium; McClatchy AI Backlash

A small group of unauthorized users gained access to Anthropic's restricted Claude Mythos cybersecurity model via Discord coordination and insider knowledge, raising questions about securing high-risk AI systems. OpenAI responded to the competitive landscape by launching GPT-5.4-Cyber, a vetted-access model for defensive cybersecurity tasks. Maine passed the first U.S. state moratorium on large AI data centers over 20MW, pending the governor's signature. McClatchy's deployment of a Claude-powered content scaling agent triggered newsroom backlash over attribution, consent, and AI disclosure standards.

Training Infrastructure Frontier Model Releases GPT-5.5-Cyber Discord Claude Mythos +11 more

6The Batch·19d ago·source ↗

Data Points: Anthropic's Claude Mythos Cybersecurity Claims Face Scrutiny; OpenAI-Cerebras Deal; Meta AI CEO Avatar; Infrastructure Delays

A multi-item digest covers skepticism around Anthropic's Claude Mythos zero-day vulnerability claims (flagged as overstated by Tom's Hardware based on limited 198-case evidence), OpenAI's $20B+ deal with Cerebras for AI processors including a potential ~10% equity stake, and satellite data showing ~40% of U.S. AI data center projects are behind schedule. Additional items cover Meta developing an AI avatar of CEO Zuckerberg for internal use, Moody's flagging credit stress in AI-disrupted sectors, and Luma AI launching an AI-driven film production studio using its Uni-1 model.

Training Infrastructure Frontier Model Releases Tom's Hardware Claude Mythos Luma AI +14 more

7The Batch·1mo ago·source ↗

U.S. Government to Pre-Deployment Evaluate Frontier AI Models via NIST TRAINS Task Force

The U.S. National Institute of Standards and Technology (NIST) announced a new multi-agency task force called TRAINS (Testing Risks of AI for National Security) to assess national-security risks from frontier AI models before public deployment. Major AI companies including Google, Microsoft, xAI, Anthropic, and OpenAI have agreed to submit models—including versions with limited guardrails—for evaluation focused on cybersecurity, biosecurity, and chemical weapons risks. The White House is also considering an executive order requiring pre-deployment approval for AI models. TRAINS draws on multiple federal agencies and differs from prior NIST groups in its rapid-response design, though its specific benchmarks have not been disclosed.

Frontier Model Releases Evaluation and Benchmarking DeepSeek V4 Microsoft Google +9 more

5The Batch·19d ago·source ↗

Insurance Companies Carve Out AI Risk Exceptions; GPT-Rosalind, Claude Design, and Agentic Retail Deployments Highlighted

Major insurers including Berkshire Hathaway units, Travelers Group, and Chubb are excluding or restricting AI-related liability coverage, signaling growing concern over hard-to-model AI-driven claims. OpenAI introduced GPT-Rosalind, a domain-specific LLM fine-tuned for life sciences workflows, while Anthropic launched Claude Design for visual asset generation targeting non-designers. Additional items cover an AI-run San Francisco retail store exposing agentic system limitations, Wall Street banks cutting junior roles via AI deployment, and Anthropic's continued engagement with the Trump administration despite prior Pentagon restrictions.

Frontier Model Releases Inference Economics DeepLearning.AI Claude Mythos Chubb Limited +15 more

8The Batch·19d ago·source ↗

Anthropic Releases Claude Mythos Preview with Extraordinary Cybersecurity Capabilities, Forms Project Glasswing Consortium

Anthropic has published a 244-page model card for Claude Mythos Preview, a large language model not yet commercially available, which broadly outperforms Claude Opus 4.6 and is described as 'strikingly capable' at identifying and exploiting code vulnerabilities. To mitigate risks before potential release, Anthropic assembled Project Glasswing, a consortium including AWS, Apple, Google, Microsoft, CrowdStrike, Nvidia, and 40+ other organizations, funded with $100 million in API credits and $4 million in open-source security donations. This marks the first time Anthropic has published a model card without making the model commercially available, signaling an unusual safety-first deployment posture. The issue also includes commentary from Andrew Ng on AI's impact on software engineering jobs, arguing against an 'AI jobpocalypse' narrative.

Frontier Model Releases AI Safety Research JPMorganChase Linux Foundation Claude Opus 4.6 +14 more

7Openai Blog·1mo ago·source ↗

OpenAI Expands Trusted Access for Cyber Defense Program with GPT-5.4-Cyber

OpenAI is expanding its Trusted Access for Cyber program, introducing a specialized model called GPT-5.4-Cyber to vetted cybersecurity defenders. The program aims to provide advanced AI capabilities to legitimate security professionals while strengthening safeguards against misuse. This represents a structured approach to deploying frontier AI in sensitive security contexts with access controls.

Frontier Model Releases AI Safety Research GPT-5.5-Cyber Trusted Access for Cyber OpenAI +2 more

7The Batch·17d ago·source ↗

Data Points: GPT-5.4 Pro, Luma Uni-1, Phi-4-reasoning-vision-15B, Yuan 3.0 Ultra, OpenAI hardware chief resignation

The Batch's weekly roundup covers several significant AI developments: OpenAI released GPT-5.4 and GPT-5.4 Pro with computer-use agent capabilities, 1M token context, and strong benchmark gains on GDPval and OSWorld-Verified; Luma AI released Uni-1, a unified autoregressive model for visual understanding and generation; Microsoft released Phi-4-reasoning-vision-15B, an open-weights multimodal model trained on 200B tokens; Yuan Lab AI released Yuan 3.0 Ultra, a 1T-parameter MoE model with SOTA on document retrieval benchmarks. Additionally, OpenAI hardware chief Caitlin Kalinowski resigned over the company's Pentagon deal, citing concerns about surveillance and autonomous weapons governance.

Frontier Model Releases Open Weights Progress Black Forest Labs Layer-Adaptive Expert Pruning Caitlin Kalinowski +19 more