5GitHub Trending (AI/LLM filtered)·1mo ago

NVlabs/Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

NVIDIA Labs has released Sana, an open-source image synthesis system using a Linear Diffusion Transformer architecture designed for efficient high-resolution image generation. The repository has accumulated 6,261 stars with 472 added in a single day, indicating strong community interest. The project targets improved computational efficiency in diffusion-based image synthesis, a key challenge for scaling to higher resolutions.

Inference Economics Multimodal Progress NVIDIA Labs Linear Diffusion Transformer Sana

Related guides (2)

Multimodal ProgressTopic guide

Multimodal Progress: How AI Learned to See, Hear, and Act

Read asBeginner In-depth

Inference EconomicsTopic guide

Inference Economics: The Cost of Running AI in Production

Read asBeginner In-depth

Related events (8)

7arXiv · cs.LG·19d ago·source ↗

RayDer: Scalable Self-Supervised Novel View Synthesis via Unified Feed-Forward Transformer

RayDer is a unified feed-forward transformer that consolidates camera estimation, scene reconstruction, and rendering into a single backbone for self-supervised novel view synthesis (NVS). By treating dynamic content as a nuisance factor absorbed by a minimal dynamic state, it enables stable training on unconstrained real-world video without requiring dynamic-scene reconstruction. The model exhibits clean power-law scaling with both data and compute across multiple model sizes, and achieves zero-shot open-set performance competitive with supervised state-of-the-art methods on multiple benchmarks.

Training Infrastructure Frontier Model Releases feed-forward transformer power-law scaling CompVis +4 more

5Hugging Face Blog·28d ago·source ↗

Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language Models

NVIDIA's Nemotron-Labs introduces diffusion-based language models targeting extremely fast text generation, published as a Hugging Face blog post. The piece covers the approach of using diffusion processes for language modeling as an alternative to autoregressive generation, with a focus on inference speed. This represents a continued push by NVIDIA's research arm into non-autoregressive generation paradigms.

Frontier Model Releases Inference Economics Diffusion Language Models NVIDIA Hugging Face +3 more

6Google Deepmind Blog·1mo ago·source ↗

Nano Banana 2: Combining Pro capabilities with lightning-fast speed

DeepMind has announced Nano Banana 2, a new image generation model described as combining Pro-level capabilities with Flash-level inference speed. The model is positioned as production-ready, featuring advanced world knowledge, subject consistency, and fast generation. The announcement appears to target developers and enterprise users seeking high-quality image generation at lower latency.

Frontier Model Releases Inference Economics Google DeepMind Nano Banana 2 +1 more

4Hugging Face Blog·1mo ago·source ↗

Neural Super Sampling on Arm Hardware via Hugging Face

Arm and Hugging Face announce neural super sampling, a technique that uses neural networks to upscale lower-resolution rendered frames to higher resolutions in real time. The approach targets Arm-based hardware and aims to reduce rendering workload while maintaining visual quality. This represents an application of ML inference to graphics and gaming pipelines on edge/mobile hardware.

Inference Economics Agent and Tool Ecosystem Arm Hugging Face Neural Super Sampling

5Hugging Face Blog·1mo ago·source ↗

Memory-efficient Diffusion Transformers with Quanto and Diffusers

This Hugging Face blog post describes integrating the Quanto quantization library with the Diffusers framework to reduce memory requirements for diffusion transformer models. The approach enables running large image/video generation models on consumer-grade hardware by applying int8 and int4 quantization to model weights. The post covers practical implementation details and benchmarks showing memory savings for models like Flux and others in the diffusion transformer family.

Inference Economics Agent and Tool Ecosystem Quanto Linear Diffusion Transformer Hugging Face +3 more

4Hugging Face Blog·1mo ago·source ↗

Stable Diffusion in JAX / Flax

Hugging Face published a blog post demonstrating Stable Diffusion running in JAX/Flax, enabling efficient inference on TPU hardware. The post covers the technical implementation of diffusion pipelines using Flax's functional programming model. This represents an early effort to bring high-performance image generation to Google's TPU ecosystem via the Diffusers library.

Training Infrastructure Inference Economics Google TPU Stable Diffusion 3 Flax +4 more

4Hugging Face Blog·1mo ago·source ↗

Accelerating Stable Diffusion XL Inference with JAX on Cloud TPU v5e

Hugging Face published a technical blog post detailing how to accelerate Stable Diffusion XL inference using JAX on Google Cloud TPU v5e hardware. The post covers the integration of JAX-based diffusion pipelines with TPU v5e, demonstrating performance gains from hardware-software co-optimization. This represents a practical deployment pattern for large image generation models on non-GPU accelerators.

Training Infrastructure Inference Economics Google Cloud Stable Diffusion 3 Hugging Face +3 more

6Hugging Face Blog·1mo ago·source ↗

ControlNet in 🧨 Diffusers

Hugging Face's Diffusers library added support for ControlNet, a technique that enables fine-grained spatial and structural control over diffusion model image generation. The blog post covers how ControlNet conditions image synthesis on auxiliary inputs such as edge maps, depth maps, pose skeletons, and segmentation masks. This integration makes ControlNet-based generation accessible through the standard Diffusers pipeline API.

Agent and Tool Ecosystem Multimodal Progress Hugging Face Diffusers Stable Diffusion 3 Hugging Face +1 more