NVlabs/Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
NVIDIA Labs has released Sana, an open-source image synthesis system using a Linear Diffusion Transformer architecture designed for efficient high-resolution image generation. The repository has accumulated 6,261 stars with 472 added in a single day, indicating strong community interest. The project targets improved computational efficiency in diffusion-based image synthesis, a key challenge for scaling to higher resolutions.
Related guides (2)
Related events (8)
RayDer: Scalable Self-Supervised Novel View Synthesis via Unified Feed-Forward Transformer
RayDer is a unified feed-forward transformer that consolidates camera estimation, scene reconstruction, and rendering into a single backbone for self-supervised novel view synthesis (NVS). By treating dynamic content as a nuisance factor absorbed by a minimal dynamic state, it enables stable training on unconstrained real-world video without requiring dynamic-scene reconstruction. The model exhibits clean power-law scaling with both data and compute across multiple model sizes, and achieves zero-shot open-set performance competitive with supervised state-of-the-art methods on multiple benchmarks.
Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language Models
NVIDIA's Nemotron-Labs introduces diffusion-based language models targeting extremely fast text generation, published as a Hugging Face blog post. The piece covers the approach of using diffusion processes for language modeling as an alternative to autoregressive generation, with a focus on inference speed. This represents a continued push by NVIDIA's research arm into non-autoregressive generation paradigms.
Nano Banana 2: Combining Pro capabilities with lightning-fast speed
DeepMind has announced Nano Banana 2, a new image generation model described as combining Pro-level capabilities with Flash-level inference speed. The model is positioned as production-ready, featuring advanced world knowledge, subject consistency, and fast generation. The announcement appears to target developers and enterprise users seeking high-quality image generation at lower latency.
Neural Super Sampling on Arm Hardware via Hugging Face
Arm and Hugging Face announce neural super sampling, a technique that uses neural networks to upscale lower-resolution rendered frames to higher resolutions in real time. The approach targets Arm-based hardware and aims to reduce rendering workload while maintaining visual quality. This represents an application of ML inference to graphics and gaming pipelines on edge/mobile hardware.
Memory-efficient Diffusion Transformers with Quanto and Diffusers
This Hugging Face blog post describes integrating the Quanto quantization library with the Diffusers framework to reduce memory requirements for diffusion transformer models. The approach enables running large image/video generation models on consumer-grade hardware by applying int8 and int4 quantization to model weights. The post covers practical implementation details and benchmarks showing memory savings for models like Flux and others in the diffusion transformer family.
Stable Diffusion in JAX / Flax
Hugging Face published a blog post demonstrating Stable Diffusion running in JAX/Flax, enabling efficient inference on TPU hardware. The post covers the technical implementation of diffusion pipelines using Flax's functional programming model. This represents an early effort to bring high-performance image generation to Google's TPU ecosystem via the Diffusers library.
Accelerating Stable Diffusion XL Inference with JAX on Cloud TPU v5e
Hugging Face published a technical blog post detailing how to accelerate Stable Diffusion XL inference using JAX on Google Cloud TPU v5e hardware. The post covers the integration of JAX-based diffusion pipelines with TPU v5e, demonstrating performance gains from hardware-software co-optimization. This represents a practical deployment pattern for large image generation models on non-GPU accelerators.
ControlNet in 🧨 Diffusers
Hugging Face's Diffusers library added support for ControlNet, a technique that enables fine-grained spatial and structural control over diffusion model image generation. The blog post covers how ControlNet conditions image synthesis on auxiliary inputs such as edge maps, depth maps, pose skeletons, and segmentation masks. This integration makes ControlNet-based generation accessible through the standard Diffusers pipeline API.

