A new arXiv paper presents a large-scale empirical study of AI-generated non-consensual sexually explicit imagery (SNEACI) on 4chan, identifying 24,105 items. A key finding is a demographic shift: non-celebrity individuals now constitute 55.8% of targets, up from 4.7% in prior studies, indicating the harm has expanded from public figures to people in users' personal social circles. Open-source models dominate production, with Stable Diffusion generating 42.7% of images and Wan 66.5% of videos, enabled by thousands of shared fine-tuned models and tutorials. The study characterizes the community dynamics, finding a small cohort of prolific producers drives most content and lowers barriers for new participants.
This edition of The Batch covers five significant AI developments: NeurIPS reversed a sanctions-related submission policy after China's largest tech federation announced a boycott; Anthropic's interpretability team identified 171 emotion-related representations in Claude Sonnet 4.5 that causally influence model behavior including unsafe actions; Google released Gemma 4, a family of Apache 2.0-licensed open-weights models up to 31B parameters with strong benchmark performance; Cursor released version 3 with a redesigned multi-agent interface; and Microsoft announced three specialized MAI models for transcription, voice synthesis, and image generation. The NeurIPS incident highlights growing friction in international AI research access, while the Anthropic findings have direct implications for AI safety and interpretability research.
Researcher Abeba Birhane examines how large-scale web-scraped datasets used to train trillion-parameter NLP and vision models propagate bias and antisocial content. The commentary highlights that performance gains in deep neural networks come alongside inherited societal biases from web training data. Two posts from The Batch summarize her work on cleaning up web datasets and the specific mechanisms by which NLP models absorb web-sourced biases.
Human Security's 2026 State of AI Traffic and Cyberthreat Benchmark Report, based on over 1 quadrillion internet interactions, found AI-driven traffic nearly tripled in 2025, with agentic browser-style traffic growing ~80x year-over-year (though still only 1.7% of AI-driven traffic by December). OpenAI accounted for ~69% of automated traffic, Meta 16%, and Anthropic 11%. The report also flags a 47% rise in malicious scraping and new security challenges as legitimate AI agents increasingly mimic historically suspicious bot behaviors like account creation and transaction completion.
Researchers propose a method to measure the degree of 'templated' versus 'holistic' cultural localization in AI-generated stories, finding that only 9-17% of vocabulary accounts for cross-national variation and that a shared culturally-agnostic narrative template underlies most outputs. The study evaluates five models across 125 topics and 193 nationalities. A notable finding is that cultural markers associated with 19 countries—mostly in the Global South—are rated as offensive on average, raising concerns about bias and representation in multilingual/multicultural AI content generation.
This paper demonstrates empirically that LLMs from multiple model families introduce directional biases when editing human-written texts on contested topics (e.g., nudging toward gun control, against atheism). The authors develop a mathematical opinion-dynamics model showing these biases are amplified through social networks, shifting collective opinion at scale. An audit of X's 'Explain this post' feature finds evidence of pro-life bias in Grok's outputs on abortion content, traced to specific design choices. The paper concludes with implications for EU legislative efforts on AI-mediated communication.
Researchers introduce an adversarial framework that simulates malicious actors impersonating real social media users to generate training data for AI-content detection. The approach produces a multilingual, cross-platform dataset of paired human and AI-generated messages. Models trained on this adversarial data significantly outperform existing content-based bot detection systems on out-of-distribution real-world data.
Protect AI and Hugging Face report on six months of collaborative model security scanning, having scanned 4 million models on the Hub for malicious payloads and vulnerabilities. The partnership focuses on supply-chain security for open-weight models, detecting threats like pickle exploits and unsafe serialization formats. The post provides a retrospective on findings, scale, and tooling developed over the period.
Hugging Face's Ethics and Society team publishes their fourth newsletter focusing on bias in text-to-image generative models. The piece examines how these models encode and reproduce societal biases in visual outputs, likely covering evaluation methods, documented failure modes, and mitigation approaches. As a Tier 2 commentary piece from a major ML platform, it contributes to ongoing discourse around fairness and safety in multimodal AI systems.