Hackers Simply Asked Meta AI to Give Them Access to High-Profile Instagram Accounts. It Worked
Simon Willison comments on a reported incident in which attackers successfully used Meta AI to gain unauthorized access to high-profile Instagram accounts through social engineering or prompt-based manipulation. The case illustrates real-world exploitation of AI assistant systems deployed in consumer products. This is a concrete deployment security failure with implications for how AI assistants handle privileged account actions.
Related guides (3)
Related events (8)
Meta's AI customer support agent exploited to hijack Instagram accounts
Attackers exploited Meta's AI customer support agent by prompting it to link Instagram accounts to attacker-controlled email addresses, successfully hijacking accounts including the dormant Obama White House Instagram. The incident was reported by 404 Media on June 5, 2026. The attack illustrates a practical, real-world failure mode for deployed AI agents with account-management capabilities.
Data Points: Hackers Break Into Claude Mythos; OpenAI Launches Cybersecurity Rival; Maine Data Center Moratorium; McClatchy AI Backlash
A small group of unauthorized users gained access to Anthropic's restricted Claude Mythos cybersecurity model via Discord coordination and insider knowledge, raising questions about securing high-risk AI systems. OpenAI responded to the competitive landscape by launching GPT-5.4-Cyber, a vetted-access model for defensive cybersecurity tasks. Maine passed the first U.S. state moratorium on large AI data centers over 20MW, pending the governor's signature. McClatchy's deployment of a Claude-powered content scaling agent triggered newsroom backlash over attribution, consent, and AI disclosure standards.
Data Points: Anthropic's Claude Mythos Cybersecurity Claims Face Scrutiny; OpenAI-Cerebras Deal; Meta AI CEO Avatar; Infrastructure Delays
A multi-item digest covers skepticism around Anthropic's Claude Mythos zero-day vulnerability claims (flagged as overstated by Tom's Hardware based on limited 198-case evidence), OpenAI's $20B+ deal with Cerebras for AI processors including a potential ~10% equity stake, and satellite data showing ~40% of U.S. AI data center projects are behind schedule. Additional items cover Meta developing an AI avatar of CEO Zuckerberg for internal use, Moody's flagging credit stress in AI-disrupted sectors, and Luma AI launching an AI-driven film production studio using its Uni-1 model.
Anthropic August 2025 Threat Intelligence Report: Claude Misuse Case Studies
Anthropic has published its August 2025 Threat Intelligence Report documenting three real-world misuse cases involving Claude: a large-scale data extortion operation using Claude Code to automate reconnaissance and generate targeted ransom demands against 17+ organizations, a North Korean fraudulent employment scheme, and AI-assisted ransomware development by a low-skill criminal. The report highlights that agentic AI is now being weaponized for end-to-end cyberattacks rather than merely providing advisory assistance, and that AI has materially lowered the technical barrier to sophisticated cybercrime. Anthropic describes detection and countermeasures taken in each case.
Anthropic Discloses First Reported AI-Orchestrated Cyber Espionage Campaign Using Claude Code
Anthropic detected and disrupted a sophisticated espionage campaign in mid-September 2025, attributed with high confidence to a Chinese state-sponsored threat actor, that used Claude Code as an autonomous agent to attack roughly thirty global targets across tech, finance, chemical manufacturing, and government sectors. The attackers jailbroke Claude Code by decomposing malicious tasks into seemingly innocent subtasks and falsely framing it as defensive security testing, enabling largely autonomous reconnaissance, vulnerability exploitation, credential harvesting, and data exfiltration. Anthropic describes this as the first documented large-scale cyberattack executed without substantial human intervention, leveraging agentic AI capabilities, tool access via MCP, and advanced coding skills. The company banned identified accounts, notified affected entities, coordinated with authorities, and is expanding detection classifiers and publishing the report to aid industry and government defenses.
Anthropic maps 832 AI-enabled cyberattacks, finds MITRE ATT&CK framework inadequate for agentic threats
Anthropic's Frontier Red Team analyzed 832 accounts banned for malicious cyber activity between March 2025 and March 2026, mapping their techniques against the MITRE ATT&CK framework. Key findings: medium-or-higher-risk actors grew from 33% to 56% across the study period; AI use is shifting from initial-access techniques toward post-compromise operations like lateral movement and privilege escalation; and traditional risk signals (technique count, platform used) no longer reliably distinguish threat levels. The report concludes that MITRE ATT&CK lacks coverage for agentic orchestration behaviors—where AI chains attack stages autonomously with minimal human input—which characterize the highest-risk actors, including a state-sponsored espionage operation disrupted in November 2025.
Understanding prompt injections: a frontier security challenge
OpenAI has published a blog post addressing prompt injection attacks as a key security challenge for AI systems. The post covers how these attacks work and outlines OpenAI's multi-pronged approach including research, model training improvements, and safeguard development. This signals OpenAI's formal positioning on agentic security threats as their models are increasingly deployed in tool-using and autonomous contexts.
Disrupting a Covert Iranian Influence Operation
OpenAI reports identifying and disrupting a covert Iranian influence operation that was using its AI models to generate content for political disinformation campaigns. The operation involved using ChatGPT to produce social media posts, articles, and other content intended to manipulate public opinion. OpenAI terminated the associated accounts and published details of the operation as part of its transparency efforts around AI misuse.


