6arXiv cs.AI (Artificial Intelligence)·Jun 26, 2026

Large-scale study of AI nudification on 4chan finds majority of targets are now non-celebrities

A new arXiv paper presents a large-scale empirical study of AI-generated non-consensual sexually explicit imagery (SNEACI) on 4chan, identifying 24,105 items. A key finding is a demographic shift: non-celebrity individuals now constitute 55.8% of targets, up from 4.7% in prior studies, indicating the harm has expanded from public figures to people in users' personal social circles. Open-source models dominate production, with Stable Diffusion generating 42.7% of images and Wan 66.5% of videos, enabled by thousands of shared fine-tuned models and tutorials. The study characterizes the community dynamics, finding a small cohort of prolific producers drives most content and lowers barriers for new participants.

Open Weights Progress AI Safety Research 4chan From Celebrities to Anyone: Characterizing AI Nudification Content, Technology, and Community Dynamics on 4chan Stable Diffusion 3 Wan

Related guides (2)

AI Safety ResearchTopic guide

AI Safety Research: From Lab Principles to Real-World Flashpoints

Read asBeginner In-depth

Open Weights ProgressTopic guide

Open Weights Progress: How Free AI Models Caught Up to the Frontier

Read asBeginner In-depth

Related events (8)

6The Batch·Jun 1, 2026·source ↗

Data Points: NeurIPS-China Standoff, Anthropic Emotion Vectors, Gemma 4, Cursor 3, Microsoft MAI Models

This edition of The Batch covers five significant AI developments: NeurIPS reversed a sanctions-related submission policy after China's largest tech federation announced a boycott; Anthropic's interpretability team identified 171 emotion-related representations in Claude Sonnet 4.5 that causally influence model behavior including unsafe actions; Google released Gemma 4, a family of Apache 2.0-licensed open-weights models up to 31B parameters with strong benchmark performance; Cursor released version 3 with a redesigned multi-agent interface; and Microsoft announced three specialized MAI models for transcription, voice synthesis, and image generation. The NeurIPS incident highlights growing friction in international AI research access, while the Anthropic findings have direct implications for AI safety and interpretability research.

Frontier Model Releases Open Weights Progress FLEURS NeurIPS WPP +19 more

4The Batch·May 18, 2026·source ↗

Abeba Birhane on Bias in Web-Scraped Training Datasets

Researcher Abeba Birhane examines how large-scale web-scraped datasets used to train trillion-parameter NLP and vision models propagate bias and antisocial content. The commentary highlights that performance gains in deep neural networks come alongside inherited societal biases from web training data. Two posts from The Batch summarize her work on cleaning up web datasets and the specific mechanisms by which NLP models absorb web-sourced biases.

Evaluation and Benchmarking AI Safety Research DeepLearning.AI Abeba Birhane The Batch

6The Batch·May 29, 2026·source ↗

Internet Traffic Driven By AI Tripled Last Year, Study Shows

Human Security's 2026 State of AI Traffic and Cyberthreat Benchmark Report, based on over 1 quadrillion internet interactions, found AI-driven traffic nearly tripled in 2025, with agentic browser-style traffic growing ~80x year-over-year (though still only 1.7% of AI-driven traffic by December). OpenAI accounted for ~69% of automated traffic, Meta 16%, and Anthropic 11%. The report also flags a 47% rise in malicious scraping and new security challenges as legitimate AI agents increasingly mimic historically suspicious bot behaviors like account creation and transaction completion.

Training Infrastructure Inference Economics ChatGPT Human Security OAI-SearchBot +7 more

5arXiv · cs.CL·Jun 15, 2026·source ↗

Study finds AI-generated stories rely on superficial cultural markers rather than holistic localization

Researchers propose a method to measure the degree of 'templated' versus 'holistic' cultural localization in AI-generated stories, finding that only 9-17% of vocabulary accounts for cross-national variation and that a shared culturally-agnostic narrative template underlies most outputs. The study evaluates five models across 125 topics and 193 nationalities. A notable finding is that cultural markers associated with 19 countries—mostly in the Global South—are rated as offensive on average, raising concerns about bias and representation in multilingual/multicultural AI content generation.

Evaluation and Benchmarking AI Safety Research Characterizing Cultural Localization in AI-Generated Stories

7arXiv · cs.LG·May 18, 2026·source ↗

AI-Mediated Communication Can Steer Collective Opinion via LLM Editing Biases

This paper demonstrates empirically that LLMs from multiple model families introduce directional biases when editing human-written texts on contested topics (e.g., nudging toward gun control, against atheism). The authors develop a mathematical opinion-dynamics model showing these biases are amplified through social networks, shifting collective opinion at scale. An audit of X's 'Explain this post' feature finds evidence of pro-life bias in Grok's outputs on abortion content, traced to specific design choices. The paper concludes with implications for EU legislative efforts on AI-mediated communication.

Evaluation and Benchmarking AI Safety Research Grok X (Twitter)EU AI Act +5 more

5arXiv · cs.CL·Jun 8, 2026·source ↗

Adversarial methodology improves detection of AI-generated social bot content

Researchers introduce an adversarial framework that simulates malicious actors impersonating real social media users to generate training data for AI-content detection. The approach produces a multilingual, cross-platform dataset of paired human and AI-generated messages. Models trained on this adversarial data significantly outperform existing content-based bot detection systems on out-of-distribution real-world data.

Evaluation and Benchmarking AI Safety Research Adversarial Creation and Detection of AI-Generated Social Bot Content

5Hugging Face Blog·May 19, 2026·source ↗

4M Models Scanned: Protect AI + Hugging Face 6 Months In

Protect AI and Hugging Face report on six months of collaborative model security scanning, having scanned 4 million models on the Hub for malicious payloads and vulnerabilities. The partnership focuses on supply-chain security for open-weight models, detecting threats like pickle exploits and unsafe serialization formats. The post provides a retrospective on findings, scale, and tooling developed over the period.

Open Weights Progress AI Safety Research pickle exploit Protect AI Hugging Face

4Hugging Face Blog·May 19, 2026·source ↗

Ethics and Society Newsletter #4: Bias in Text-to-Image Models

Hugging Face's Ethics and Society team publishes their fourth newsletter focusing on bias in text-to-image generative models. The piece examines how these models encode and reproduce societal biases in visual outputs, likely covering evaluation methods, documented failure modes, and mitigation approaches. As a Tier 2 commentary piece from a major ML platform, it contributes to ongoing discourse around fairness and safety in multimodal AI systems.

Evaluation and Benchmarking AI Safety Research Hugging Face Ethics and Society Team text-to-image models Hugging Face +1 more

Large-scale study of AI nudification on 4chan finds majority of targets are now non-celebrities

Related events (8)

6The Batch·Jun 1, 2026·source ↗