OpenAI, Georgetown CSET, and Stanford Internet Observatory Publish LLM Disinformation Misuse Report
OpenAI researchers collaborated with Georgetown University's Center for Security and Emerging Technology (CSET) and Stanford Internet Observatory to produce a report on how large language models could be misused to augment disinformation campaigns. The work draws on an October 2021 workshop with 30 experts across disinformation research, ML, and policy, plus over a year of additional research. The report outlines threat models for LLM-enabled disinformation and proposes a framework for analyzing potential mitigations.
Related guides (3)
Related events (8)
OpenAI report: PRC-linked influence operations targeting U.S. AI debates
OpenAI published a report documenting PRC-linked influence operations that use AI to target U.S. technology policy debates, including narratives around data centers, tariffs, and false claims about ChatGPT. The report identifies a pattern of coordinated inauthentic behavior aimed at shaping American discourse on AI. This is notable both as a safety/threat-intelligence disclosure and as evidence of AI being weaponized in geopolitical information operations.
Lessons learned on language model safety and misuse
OpenAI published a post summarizing their evolving thinking on language model safety and misuse in deployed systems. The piece is intended to share lessons with other AI developers facing similar challenges. It covers OpenAI's internal approaches to mitigating harmful outputs and misuse patterns observed in production.
Disrupting Malicious Uses of AI | OpenAI Threat Report February 2026
OpenAI published its latest threat report examining how malicious actors are combining AI models with websites and social platforms for harmful purposes. The report analyzes detection and defense implications of these combined attack vectors. This represents OpenAI's ongoing effort to document and counter adversarial misuse of AI systems.
Disrupting Malicious Uses of AI: OpenAI June 2025 Report
OpenAI published its June 2025 report on detecting and preventing malicious uses of its AI systems. The report features case studies of threat actors attempting to abuse OpenAI's models and the countermeasures deployed. This is part of OpenAI's ongoing transparency series on adversarial misuse.
Disagreement among frontier LLMs on real-world fact-checks
A study examines how frontier large language models diverge in their responses to real-world fact-checking queries, surfacing systematic disagreements across models on factual claims. The work appears to benchmark multiple leading models against a set of verifiable facts, revealing inconsistencies that have implications for reliability and deployment. With 475 HN points and 333 comments, the piece has generated substantial community discussion. The findings are relevant to evaluation methodology, model calibration, and trust in AI-generated factual content.
Google Study Shows LLM-Generated Malware Is Getting Harder to Track and Stop
A Google security report catalogs emerging LLM-enabled cyberattack techniques including morphing malware with mutation engines, logical-flaw discovery in code, and AI-directed obfuscation networks. The report was prompted in part by a real incident where hackers used an LLM to find a zero-day in a widely used web administration tool. Separately, the UK AI Security Institute found that Claude Mythos Preview and GPT-5.5 can reliably execute attacks expected to take humans 3 hours, up from earlier 1-hour benchmarks, with performance scaling further when token limits are relaxed. The findings suggest an accelerating gap between LLM offensive capability and conventional defensive tooling.
Disrupting Malicious Uses of AI: OpenAI October 2025 Report
OpenAI published its October 2025 report on detecting and disrupting malicious uses of its AI systems. The report covers enforcement actions, policy violations, and efforts to counter real-world harms from misuse. This is part of OpenAI's ongoing transparency series documenting adversarial abuse patterns and mitigation responses.
Disrupting Malicious Uses of AI
OpenAI published a report on its efforts to detect and disrupt malicious uses of its AI systems. The post covers threat actor activity identified and terminated on OpenAI's platform, including influence operations, cyberattack assistance, and other adversarial uses. It represents OpenAI's ongoing transparency reporting on abuse cases and countermeasures.


