DeepMind announces DiffusionGemma with 4x faster text generation
DeepMind published a blog post introducing DiffusionGemma, a diffusion-based variant of the Gemma model family claiming 4x faster text generation. The announcement suggests a departure from standard autoregressive decoding in favor of diffusion-based generation. If the claims hold, this could represent a meaningful inference efficiency advance for the Gemma line.
Related guides (3)
Related events (8)
Simon Willison on DiffusionGemma
Simon Willison covers DiffusionGemma, a diffusion-based language model in the Gemma family from Google. The post appears to be commentary or a brief note on the model's release or capabilities. Diffusion-based LLMs represent an active area of research as an alternative to autoregressive generation.
Interpretability study of DiffusionGemma reveals novel diffusion-specific reasoning phenomena
Researchers investigate the reasoning transparency of DiffusionGemma, a diffusion-based language model, decomposing transparency into variable and algorithmic components. They show that mapping information through an interpretable token bottleneck reduces DiffusionGemma's opaque serial depth from 28.6X to just 1.1X that of autoregressive Gemma 4, with no performance loss. Interpretability case studies uncover diffusion-specific phenomena including non-chronological reasoning, token smearing, and intermediate-context reasoning. Monitorability tests find DiffusionGemma comparable to Gemma 4, suggesting diffusion LMs are not inherently less amenable to safety oversight.
DiffusionGemma hits 1,000+ tokens/sec; Claude Fable 5 export controls; Agents' Last Exam benchmark launch
Google introduced DiffusionGemma, an experimental 26B MoE model using diffusion-based text generation that produces 256-token blocks simultaneously, achieving over 1,000 tokens/second on H100 hardware at the cost of lower output quality versus standard Gemma 4. Separately, the US government issued an export control directive forcing Anthropic to suspend Claude Fable 5 and Claude Mythos 5 globally, while Anthropic also reversed a controversial silent-degradation safeguard on Fable 5 after researcher backlash. UC Berkeley's Center for RDI launched Agents' Last Exam (ALE), a 1,500+ task agentic benchmark using deterministic grading, where GPT-5.5 topped the leaderboard at only 24% pass rate, highlighting the difficulty gap between current models and professional-grade workflows.
T5Gemma: A new collection of encoder-decoder Gemma models
DeepMind has announced T5Gemma, a new collection of encoder-decoder large language models under the Gemma family. The release extends the Gemma model line beyond its existing decoder-only architecture to include encoder-decoder variants, following the T5 paradigm. Further technical details are sparse in the announcement but the models represent a notable architectural expansion of the open Gemma ecosystem.
Introducing Gemma 3 270M: The compact model for hyper-efficient AI
Google DeepMind has released Gemma 3 270M, a 270-million parameter compact language model added to the Gemma 3 family. The model is positioned as a highly specialized, hyper-efficient tool for resource-constrained deployments. This extends the Gemma 3 lineup into the sub-billion parameter range, targeting edge and on-device use cases.
Introducing Gemma 3
Google DeepMind has released Gemma 3, described as the most capable model runnable on a single GPU or TPU. The announcement comes from DeepMind's official blog, indicating a new generation of the open-weights Gemma model family. Specific capability details, parameter counts, and benchmark results are not included in the provided body text.
Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language Models
NVIDIA's Nemotron-Labs introduces diffusion-based language models targeting extremely fast text generation, published as a Hugging Face blog post. The piece covers the approach of using diffusion processes for language modeling as an alternative to autoregressive generation, with a focus on inference speed. This represents a continued push by NVIDIA's research arm into non-autoregressive generation paradigms.
Announcing Gemma 3n Preview: Powerful, Efficient, Mobile-First AI
Google DeepMind has released a preview of Gemma 3n, an open-weights model optimized for on-device multimodal inference. The model features a 2-in-1 architecture for flexible deployment and adds audio understanding to its multimodal capabilities. It is designed for mobile and edge environments, targeting developers building real-time interactive applications.


