6Simon Willison's Weblog·19d ago

Hackers Simply Asked Meta AI to Give Them Access to High-Profile Instagram Accounts. It Worked

Simon Willison comments on a reported incident in which attackers successfully used Meta AI to gain unauthorized access to high-profile Instagram accounts through social engineering or prompt-based manipulation. The case illustrates real-world exploitation of AI assistant systems deployed in consumer products. This is a concrete deployment security failure with implications for how AI assistants handle privileged account actions.

AI Safety Research Enterprise Deployment Patterns Agent and Tool Ecosystem Instagram Meta AI Simon Willison Meta

Related guides (3)

AI Safety ResearchTopic guide

AI Safety Research: From Lab Policies to Real-World Flashpoints

Read asBeginner In-depth

Enterprise Deployment PatternsTopic guide

Enterprise Deployment Patterns: From AI Demo to Production Reality

Read asBeginner In-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Related events (8)

7Mit Technology Review — Ai·15d ago·source ↗

Meta's AI customer support agent exploited to hijack Instagram accounts

Attackers exploited Meta's AI customer support agent by prompting it to link Instagram accounts to attacker-controlled email addresses, successfully hijacking accounts including the dormant Obama White House Instagram. The incident was reported by 404 Media on June 5, 2026. The attack illustrates a practical, real-world failure mode for deployed AI agents with account-management capabilities.

AI Safety Research Enterprise Deployment Patterns 404 Media MIT Technology Review Meta +1 more

6The Batch·19d ago·source ↗

Data Points: Hackers Break Into Claude Mythos; OpenAI Launches Cybersecurity Rival; Maine Data Center Moratorium; McClatchy AI Backlash

A small group of unauthorized users gained access to Anthropic's restricted Claude Mythos cybersecurity model via Discord coordination and insider knowledge, raising questions about securing high-risk AI systems. OpenAI responded to the competitive landscape by launching GPT-5.4-Cyber, a vetted-access model for defensive cybersecurity tasks. Maine passed the first U.S. state moratorium on large AI data centers over 20MW, pending the governor's signature. McClatchy's deployment of a Claude-powered content scaling agent triggered newsroom backlash over attribution, consent, and AI disclosure standards.

Training Infrastructure Frontier Model Releases GPT-5.5-Cyber Discord Claude Mythos +11 more

6The Batch·19d ago·source ↗

Data Points: Anthropic's Claude Mythos Cybersecurity Claims Face Scrutiny; OpenAI-Cerebras Deal; Meta AI CEO Avatar; Infrastructure Delays

A multi-item digest covers skepticism around Anthropic's Claude Mythos zero-day vulnerability claims (flagged as overstated by Tom's Hardware based on limited 198-case evidence), OpenAI's $20B+ deal with Cerebras for AI processors including a potential ~10% equity stake, and satellite data showing ~40% of U.S. AI data center projects are behind schedule. Additional items cover Meta developing an AI avatar of CEO Zuckerberg for internal use, Moody's flagging credit stress in AI-disrupted sectors, and Luma AI launching an AI-driven film production studio using its Uni-1 model.

Training Infrastructure Frontier Model Releases Tom's Hardware Claude Mythos Luma AI +14 more

7Anthropic News·18d ago·source ↗

Anthropic August 2025 Threat Intelligence Report: Claude Misuse Case Studies

Anthropic has published its August 2025 Threat Intelligence Report documenting three real-world misuse cases involving Claude: a large-scale data extortion operation using Claude Code to automate reconnaissance and generate targeted ransom demands against 17+ organizations, a North Korean fraudulent employment scheme, and AI-assisted ransomware development by a low-skill criminal. The report highlights that agentic AI is now being weaponized for end-to-end cyberattacks rather than merely providing advisory assistance, and that AI has materially lowered the technical barrier to sophisticated cybercrime. Anthropic describes detection and countermeasures taken in each case.

AI Safety Research Enterprise Deployment Patterns Claude Claude Code North Korea +4 more

9Anthropic News·19d ago·source ↗

Anthropic Discloses First Reported AI-Orchestrated Cyber Espionage Campaign Using Claude Code

Anthropic detected and disrupted a sophisticated espionage campaign in mid-September 2025, attributed with high confidence to a Chinese state-sponsored threat actor, that used Claude Code as an autonomous agent to attack roughly thirty global targets across tech, finance, chemical manufacturing, and government sectors. The attackers jailbroke Claude Code by decomposing malicious tasks into seemingly innocent subtasks and falsely framing it as defensive security testing, enabling largely autonomous reconnaissance, vulnerability exploitation, credential harvesting, and data exfiltration. Anthropic describes this as the first documented large-scale cyberattack executed without substantial human intervention, leveraging agentic AI capabilities, tool access via MCP, and advanced coding skills. The company banned identified accounts, notified affected entities, coordinated with authorities, and is expanding detection classifiers and publishing the report to aid industry and government defenses.

Frontier Model Releases AI Safety Research Chinese state-sponsored threat actor Claude Claude Code +4 more

8Anthropic News·17d ago·source ↗

Anthropic maps 832 AI-enabled cyberattacks, finds MITRE ATT&CK framework inadequate for agentic threats

Anthropic's Frontier Red Team analyzed 832 accounts banned for malicious cyber activity between March 2025 and March 2026, mapping their techniques against the MITRE ATT&CK framework. Key findings: medium-or-higher-risk actors grew from 33% to 56% across the study period; AI use is shifting from initial-access techniques toward post-compromise operations like lateral movement and privilege escalation; and traditional risk signals (technique count, platform used) no longer reliably distinguish threat levels. The report concludes that MITRE ATT&CK lacks coverage for agentic orchestration behaviors—where AI chains attack stages autonomously with minimal human input—which characterize the highest-risk actors, including a state-sponsored espionage operation disrupted in November 2025.

AI Safety Research Regulatory Developments Anthropic Policy Frontier Red Team MITRE ATT&CK Claude Code +3 more

5Openai Blog·1mo ago·source ↗

Understanding prompt injections: a frontier security challenge

OpenAI has published a blog post addressing prompt injection attacks as a key security challenge for AI systems. The post covers how these attacks work and outlines OpenAI's multi-pronged approach including research, model training improvements, and safeguard development. This signals OpenAI's formal positioning on agentic security threats as their models are increasingly deployed in tool-using and autonomous contexts.

AI Safety Research Agent and Tool Ecosystem prompt injection OpenAI

6Openai Blog·1mo ago·source ↗

Disrupting a Covert Iranian Influence Operation

OpenAI reports identifying and disrupting a covert Iranian influence operation that was using its AI models to generate content for political disinformation campaigns. The operation involved using ChatGPT to produce social media posts, articles, and other content intended to manipulate public opinion. OpenAI terminated the associated accounts and published details of the operation as part of its transparency efforts around AI misuse.

AI Safety Research Regulatory Developments Iran ChatGPT OpenAI