5OpenAI Blog·1mo ago

OpenAI, Georgetown CSET, and Stanford Internet Observatory Publish LLM Disinformation Misuse Report

OpenAI researchers collaborated with Georgetown University's Center for Security and Emerging Technology (CSET) and Stanford Internet Observatory to produce a report on how large language models could be misused to augment disinformation campaigns. The work draws on an October 2021 workshop with 30 experts across disinformation research, ML, and policy, plus over a year of additional research. The report outlines threat models for LLM-enabled disinformation and proposes a framework for analyzing potential mitigations.

AI Safety Research Regulatory Developments large language models Stanford Internet Observatory Georgetown University Center for Security and Emerging Technology disinformation OpenAI

Related guides (3)

OpenAI

OpenAI: The Lab That Made AI a Household Word

Read asBeginner In-depth

AI Safety ResearchTopic guide

AI Safety Research: From Lab Policies to Real-World Flashpoints

Read asBeginner In-depth

Regulatory DevelopmentsTopic guide

AI Regulatory Developments: From Voluntary Frameworks to Government Enforcement

Read asBeginner In-depth

Related events (8)

7Openai Blog·10d ago·source ↗

OpenAI report: PRC-linked influence operations targeting U.S. AI debates

OpenAI published a report documenting PRC-linked influence operations that use AI to target U.S. technology policy debates, including narratives around data centers, tariffs, and false claims about ChatGPT. The report identifies a pattern of coordinated inauthentic behavior aimed at shaping American discourse on AI. This is notable both as a safety/threat-intelligence disclosure and as evidence of AI being weaponized in geopolitical information operations.

AI Safety Research Regulatory Developments ChatGPT OpenAI People's Republic of China

5Openai Blog·1mo ago·source ↗

Lessons learned on language model safety and misuse

OpenAI published a post summarizing their evolving thinking on language model safety and misuse in deployed systems. The piece is intended to share lessons with other AI developers facing similar challenges. It covers OpenAI's internal approaches to mitigating harmful outputs and misuse patterns observed in production.

AI Safety Research Enterprise Deployment Patterns OpenAI

6Openai Blog·1mo ago·source ↗

Disrupting Malicious Uses of AI | OpenAI Threat Report February 2026

OpenAI published its latest threat report examining how malicious actors are combining AI models with websites and social platforms for harmful purposes. The report analyzes detection and defense implications of these combined attack vectors. This represents OpenAI's ongoing effort to document and counter adversarial misuse of AI systems.

Evaluation and Benchmarking AI Safety Research OpenAI

5Openai Blog·1mo ago·source ↗

Disrupting Malicious Uses of AI: OpenAI June 2025 Report

OpenAI published its June 2025 report on detecting and preventing malicious uses of its AI systems. The report features case studies of threat actors attempting to abuse OpenAI's models and the countermeasures deployed. This is part of OpenAI's ongoing transparency series on adversarial misuse.

AI Safety Research Regulatory Developments OpenAI

5Hacker News·23d ago·source ↗

Disagreement among frontier LLMs on real-world fact-checks

A study examines how frontier large language models diverge in their responses to real-world fact-checking queries, surfacing systematic disagreements across models on factual claims. The work appears to benchmark multiple leading models against a set of verifiable facts, revealing inconsistencies that have implications for reliability and deployment. With 475 HN points and 333 comments, the piece has generated substantial community discussion. The findings are relevant to evaluation methodology, model calibration, and trust in AI-generated factual content.

Frontier Model Releases Evaluation and Benchmarking frontier LLMs lenz.io Hacker News

6The Batch·28d ago·source ↗

Google Study Shows LLM-Generated Malware Is Getting Harder to Track and Stop

A Google security report catalogs emerging LLM-enabled cyberattack techniques including morphing malware with mutation engines, logical-flaw discovery in code, and AI-directed obfuscation networks. The report was prompted in part by a real incident where hackers used an LLM to find a zero-day in a widely used web administration tool. Separately, the UK AI Security Institute found that Claude Mythos Preview and GPT-5.5 can reliably execute attacks expected to take humans 3 hours, up from earlier 1-hour benchmarks, with performance scaling further when token limits are relaxed. The findings suggest an accelerating gap between LLM offensive capability and conventional defensive tooling.

Frontier Model Releases Evaluation and Benchmarking Claude Opus 4.6 Google UK AI Security Institute +8 more

5Openai Blog·1mo ago·source ↗

Disrupting Malicious Uses of AI: OpenAI October 2025 Report

OpenAI published its October 2025 report on detecting and disrupting malicious uses of its AI systems. The report covers enforcement actions, policy violations, and efforts to counter real-world harms from misuse. This is part of OpenAI's ongoing transparency series documenting adversarial abuse patterns and mitigation responses.

AI Safety Research Regulatory Developments OpenAI

5Openai Blog·1mo ago·source ↗

Disrupting Malicious Uses of AI

OpenAI published a report on its efforts to detect and disrupt malicious uses of its AI systems. The post covers threat actor activity identified and terminated on OpenAI's platform, including influence operations, cyberattack assistance, and other adversarial uses. It represents OpenAI's ongoing transparency reporting on abuse cases and countermeasures.

AI Safety Research Regulatory Developments OpenAI