6Simon Willison's Weblog·25d ago

Microsoft Copilot Cowork Exfiltrates Files

Simon Willison reports on a security vulnerability in Microsoft Copilot Cowork that exfiltrates files. The item appears to document a prompt injection or data exfiltration attack vector in Microsoft's AI-powered collaboration tooling. This is relevant to AI safety and enterprise deployment risks of agentic AI assistants.

AI Safety Research Enterprise Deployment Patterns Agent and Tool Ecosystem Microsoft Copilot prompt injection Microsoft Simon Willison

Related guides (4)

prompt injectionConcept

Prompt Injection: The Security Threat Hiding in Plain Text

Read asBeginner

Microsoft

Microsoft: The AI Infrastructure Giant Betting on Every Horse

Read asBeginner In-depth

AI Safety ResearchTopic guide

AI Safety Research: From Lab Policies to Real-World Flashpoints

Read asBeginner In-depth

Enterprise Deployment PatternsTopic guide

Enterprise Deployment Patterns: From LLM Demo to Production Reality

Read asIn-depth

Related events (8)

6Simon Willison'S Weblog·19d ago·source ↗

Hackers Simply Asked Meta AI to Give Them Access to High-Profile Instagram Accounts. It Worked

Simon Willison comments on a reported incident in which attackers successfully used Meta AI to gain unauthorized access to high-profile Instagram accounts through social engineering or prompt-based manipulation. The case illustrates real-world exploitation of AI assistant systems deployed in consumer products. This is a concrete deployment security failure with implications for how AI assistants handle privileged account actions.

AI Safety Research Enterprise Deployment Patterns Instagram Meta AI Simon Willison +2 more

6The Batch·8d ago·source ↗

Andrew Ng introduces OpenCoworker, an open-source desktop AI agent harness

Andrew Ng and collaborators Rohit Prasad and Devika Verma have released OpenCoworker, a free open-source desktop agent built by extending the aisuite library to support agent harnesses. The tool allows users to connect frontier LLMs (OpenAI, Anthropic, Google) or local models via Ollama to desktop tasks including file access, messaging, and workflow automation, with privacy as a design priority. Ng frames this as a response to data-retention concerns with commercial desktop agents, citing Anthropic's Fable release as a recent example of policy opacity. The post also provides a concise overview of the current desktop agent landscape and the shift toward LLM-driven agentic loops.

Open Weights Progress Agent and Tool Ecosystem Ollama DeepLearning.AI aisuite +7 more

7The Batch·16d ago·source ↗

Microsoft Build: Seven in-house AI models, GitHub Copilot desktop agent manager, and Web IQ search API for agents

Microsoft announced seven new AI models trained from scratch (not distilled from OpenAI), including the flagship MAI-Thinking-1 reasoning model and MAI-Transcribe-1.5, plus a 'Frontier Tuning' reinforcement learning approach for enterprise workflow training. GitHub released a desktop Copilot app designed to manage multiple parallel AI agents with isolated git worktrees and bidirectional canvases. Microsoft also launched Web IQ, an agent-native Bing-powered grounding API already powering search in Copilot and ChatGPT, running 2.5x faster than alternatives with lower token costs. The roundup also covers Nous Research's Hermes Desktop cross-platform agent app, Alibaba's Qwen3.7-Plus multimodal model, and OpenAI's role-specific Codex plugins.

Frontier Model Releases Inference Economics MAI-Thinking-1 FLEURS Frontier Tuning +15 more

5Github Trending·17d ago·source ↗

HexStrike AI: MCP server exposing 150+ cybersecurity tools to AI agents

HexStrike AI is an open-source MCP server that enables AI agents (Claude, GPT, Copilot, and others) to autonomously invoke over 150 offensive security tools for penetration testing, vulnerability discovery, and bug bounty automation. The project bridges LLMs with real-world offensive security capabilities via the Model Context Protocol. With 9,221 GitHub stars, it represents a notable community signal around agentic security tooling and the expanding attack surface of AI-driven automation.

AI Safety Research Agent and Tool Ecosystem Claude HexStrike AI MCP +1 more

6Openai Blog·1mo ago·source ↗

Codex Security: now in research preview

OpenAI has launched Codex Security in research preview, an AI-powered application security agent. It analyzes project context to detect, validate, and patch complex vulnerabilities with the goal of higher confidence and reduced false-positive noise compared to traditional tools. The product extends OpenAI's Codex brand into the security domain.

Enterprise Deployment Patterns Agent and Tool Ecosystem OpenAI Codex Security Codex

5Github Trending·29d ago·source ↗

Microsoft Agent Governance Toolkit: Policy Enforcement and Zero-Trust Security for Autonomous AI Agents

Microsoft has published an open-source Agent Governance Toolkit on GitHub covering policy enforcement, zero-trust identity, execution sandboxing, and reliability engineering for autonomous AI agents. The toolkit claims full coverage of the OWASP Agentic Top 10 security risks. It has accumulated 1,828 stars with 113 added today, indicating active community interest. This positions Microsoft as a contributor to emerging standards for safe agentic AI deployment.

AI Safety Research Enterprise Deployment Patterns execution sandboxing zero-trust identity Microsoft +3 more

4Github Trending·22d ago·source ↗

Deep Eye: Multi-Provider AI-Orchestrated Vulnerability Scanner

Deep Eye is an open-source Python tool that orchestrates multiple AI providers (OpenAI, Claude, Grok, Gemini, Ollama, Groq, Mistral, and others) to generate attack payloads and scan targets for 45+ vulnerability types. It produces professional security reports with compliance mapping. The project has accumulated 1,572 GitHub stars with 42 added today, indicating growing community interest in AI-augmented offensive security tooling.

AI Safety Research Agent and Tool Ecosystem Ollama Grok zakirkun +5 more

7Anthropic News·19d ago·source ↗

Anthropic Launches Claude Code Security: AI-Powered Vulnerability Detection for Defenders

Anthropic has released Claude Code Security in limited research preview for Enterprise and Team customers, a capability built into Claude Code that scans codebases for security vulnerabilities and suggests patches for human review. Unlike rule-based static analysis tools, it uses Claude's reasoning to understand code context, trace data flows, and detect complex vulnerabilities including novel ones. Built on Claude Opus 4.6, the system found over 500 previously undetected vulnerabilities in production open-source codebases during internal research. The release is framed as a defensive measure to put AI-enabled vulnerability discovery in the hands of defenders before attackers can exploit the same capabilities.

Frontier Model Releases AI Safety Research Claude Opus 4.6 Anthropic Policy Frontier Red Team Pacific Northwest National Laboratory +5 more