New AI classifier for indicating AI-written text
OpenAI launched a classifier designed to distinguish between AI-generated and human-written text. The tool was positioned as an aid for detecting content produced by large language models. OpenAI acknowledged limitations including unreliability on short texts and non-English content, and noted the classifier should not be used as a sole decision-making tool.
Related guides (3)
Related events (8)
SV-Detect: AI-generated text detection via steering vectors in representation space
SV-Detect proposes a method for detecting machine-generated text by extracting steering vectors from the hidden representations of a frozen language model, constructing layer-wise directions that separate human from AI-written text. A lightweight classifier trained on projection features achieves strong performance both in-distribution and under distribution shift across domains, source models, and editing attacks like polishing and rewriting. The approach reframes AI-text detection as a representation-space probing problem, with interpretation analyses showing the learned directions capture stylistic cues beyond surface features.
OpAI-Bench: Benchmark for detecting AI text across progressive human-AI co-editing workflows
Researchers introduce OpAI-Bench, a benchmark for studying AI-text detection across progressive human-to-AI document revision workflows, covering document, sentence, token, and span granularities. Starting from human-written documents, the benchmark constructs nine sequentially revised versions per sample under five AI edit operations and varying AI coverage levels across four domains. Key findings include that mixed-authorship intermediate versions are often harder to detect than fully human or heavily AI-edited endpoints, revealing non-monotonic detection patterns absent from existing benchmarks. The work addresses a gap in AI-text detection research as real-world documents increasingly result from iterative human-AI co-editing rather than pure generation.
AI-Written Critiques Help Humans Notice Flaws in Summaries
OpenAI trained critique-writing models to identify flaws in AI-generated summaries, finding that human evaluators catch significantly more errors when assisted by model-generated critiques. A key finding is that scale improves critique-writing ability more than summary-writing ability. The work is framed as a step toward using AI to assist human oversight of AI systems on difficult tasks, relevant to scalable oversight research.
Adversarial methodology improves detection of AI-generated social bot content
Researchers introduce an adversarial framework that simulates malicious actors impersonating real social media users to generate training data for AI-content detection. The approach produces a multilingual, cross-platform dataset of paired human and AI-generated messages. Models trained on this adversarial data significantly outperform existing content-based bot detection systems on out-of-distribution real-world data.
Multi-domain benchmark for detecting AI-generated text-rich images from GPT-Image-2
Researchers introduce a new benchmark of 8,602 images across six categories (commercial posters, infographics, academic posters, receipts, tables, UI screenshots) specifically for detecting AI-generated text-rich images produced by OpenAI's GPT-Image-2. Five zero-shot detectors are evaluated, revealing highly domain-dependent performance and severe sensitivity to JPEG compression even in the strongest conventional detector. A multimodal VLM is also explored as a detector, showing promise but limitations on structured formats. The work highlights a gap in existing benchmarks that focus on object-centric rather than text-layout-centric images.
SynthID Detector — a new portal to help identify AI-generated content
Google DeepMind announced SynthID Detector, a new web portal unveiled at Google I/O 2025 that allows users to check whether content was generated by AI. The tool extends the existing SynthID watermarking system, which embeds imperceptible signals into AI-generated text, images, audio, and video. The portal is intended to help people verify the provenance of online content at scale.
Image-Semantic Guided Detection of AI-Generated Modern Chinese Poetry Using MLLMs
This paper proposes a multimodal detection method for identifying AI-generated modern Chinese poetry by incorporating images that reflect poetic content alongside text. The approach uses example-driven prompting to integrate meaning, imagery, and emotional cues from images as a complement to textual analysis. A Gemini-based detector using this method achieves 85.65% Macro-F1, outperforming both plain-text LLM baselines and the traditional RoBERTa detector. The work extends AI-generated content detection research into a domain—modern Chinese poetry—previously unaddressed by prior studies.
Introducing SynthID Text
Hugging Face published a blog post introducing SynthID Text, Google DeepMind's watermarking technique for AI-generated text. The method embeds imperceptible signals into LLM outputs by modifying token sampling distributions, enabling detection of AI-generated content without degrading text quality. The post likely covers integration with Hugging Face's transformers library, making the technique accessible to the broader ML community.


