7Google DeepMind Blog·10d ago

DeepMind announces DiffusionGemma with 4x faster text generation

DeepMind published a blog post introducing DiffusionGemma, a diffusion-based variant of the Gemma model family claiming 4x faster text generation. The announcement suggests a departure from standard autoregressive decoding in favor of diffusion-based generation. If the claims hold, this could represent a meaningful inference efficiency advance for the Gemma line.

Frontier Model Releases Inference Economics DiffusionGemma Gemma Google DeepMind

Related guides (3)

Google DeepMind

Google DeepMind: The Lab Behind Gemini, AlphaFold, and Frontier AI

Read asBeginner In-depth

Frontier Model ReleasesTopic guide

Frontier Model Releases: The Race From Language to Action

Read asBeginner In-depth

Inference EconomicsTopic guide

Inference Economics: The Cost of Running AI in Production

Read asBeginner In-depth

Related events (8)

5Simon Willison'S Weblog·10d ago·source ↗

Simon Willison on DiffusionGemma

Simon Willison covers DiffusionGemma, a diffusion-based language model in the Gemma family from Google. The post appears to be commentary or a brief note on the model's release or capabilities. Diffusion-based LLMs represent an active area of research as an alternative to autoregressive generation.

Frontier Model Releases Open Weights Progress DiffusionGemma Google Simon Willison

6arXiv · cs.AI·46h ago·source ↗

Interpretability study of DiffusionGemma reveals novel diffusion-specific reasoning phenomena

Researchers investigate the reasoning transparency of DiffusionGemma, a diffusion-based language model, decomposing transparency into variable and algorithmic components. They show that mapping information through an interpretable token bottleneck reduces DiffusionGemma's opaque serial depth from 28.6X to just 1.1X that of autoregressive Gemma 4, with no performance loss. Interpretability case studies uncover diffusion-specific phenomena including non-chronological reasoning, token smearing, and intermediate-context reasoning. Monitorability tests find DiffusionGemma comparable to Gemma 4, suggesting diffusion LMs are not inherently less amenable to safety oversight.

Frontier Model Releases AI Safety Research DiffusionGemma Google How Transparent is DiffusionGemma?+1 more

7The Batch·3d ago·source ↗

DiffusionGemma hits 1,000+ tokens/sec; Claude Fable 5 export controls; Agents' Last Exam benchmark launch

Google introduced DiffusionGemma, an experimental 26B MoE model using diffusion-based text generation that produces 256-token blocks simultaneously, achieving over 1,000 tokens/second on H100 hardware at the cost of lower output quality versus standard Gemma 4. Separately, the US government issued an export control directive forcing Anthropic to suspend Claude Fable 5 and Claude Mythos 5 globally, while Anthropic also reversed a controversial silent-degradation safeguard on Fable 5 after researcher backlash. UC Berkeley's Center for RDI launched Agents' Last Exam (ALE), a 1,500+ task agentic benchmark using deterministic grading, where GPT-5.5 topped the leaderboard at only 24% pass rate, highlighting the difficulty gap between current models and professional-grade workflows.

Frontier Model Releases Evaluation and Benchmarking Claude Mythos Claude Opus 4.6 DiffusionGemma +13 more

6Google Deepmind Blog·1mo ago·source ↗

T5Gemma: A new collection of encoder-decoder Gemma models

DeepMind has announced T5Gemma, a new collection of encoder-decoder large language models under the Gemma family. The release extends the Gemma model line beyond its existing decoder-only architecture to include encoder-decoder variants, following the T5 paradigm. Further technical details are sparse in the announcement but the models represent a notable architectural expansion of the open Gemma ecosystem.

Frontier Model Releases Open Weights Progress T5Gemma Gemma Google DeepMind +1 more

5Google Deepmind Blog·1mo ago·source ↗

Introducing Gemma 3 270M: The compact model for hyper-efficient AI

Google DeepMind has released Gemma 3 270M, a 270-million parameter compact language model added to the Gemma 3 family. The model is positioned as a highly specialized, hyper-efficient tool for resource-constrained deployments. This extends the Gemma 3 lineup into the sub-billion parameter range, targeting edge and on-device use cases.

Open Weights Progress Inference Economics Gemma 3 Google DeepMind Gemma 3 270M +1 more

7Google Deepmind Blog·1mo ago·source ↗

Introducing Gemma 3

Google DeepMind has released Gemma 3, described as the most capable model runnable on a single GPU or TPU. The announcement comes from DeepMind's official blog, indicating a new generation of the open-weights Gemma model family. Specific capability details, parameter counts, and benchmark results are not included in the provided body text.

Frontier Model Releases Open Weights Progress Gemma Gemma 3 Google DeepMind +1 more

5Hugging Face Blog·28d ago·source ↗

Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language Models

NVIDIA's Nemotron-Labs introduces diffusion-based language models targeting extremely fast text generation, published as a Hugging Face blog post. The piece covers the approach of using diffusion processes for language modeling as an alternative to autoregressive generation, with a focus on inference speed. This represents a continued push by NVIDIA's research arm into non-autoregressive generation paradigms.

Frontier Model Releases Inference Economics Diffusion Language Models NVIDIA Hugging Face +3 more

7Google Deepmind Blog·1mo ago·source ↗

Announcing Gemma 3n Preview: Powerful, Efficient, Mobile-First AI

Google DeepMind has released a preview of Gemma 3n, an open-weights model optimized for on-device multimodal inference. The model features a 2-in-1 architecture for flexible deployment and adds audio understanding to its multimodal capabilities. It is designed for mobile and edge environments, targeting developers building real-time interactive applications.

Open Weights Progress Inference Economics Gemma Gemma 3n Google DeepMind +2 more