FLUX3D: Diffusion-aligned sparse representation for high-fidelity image-to-3D Gaussian Splatting
Researchers introduce FLUX3D, an image-to-3D Gaussian Splatting framework that addresses two structural bottlenecks in sparse voxel-based 3D generation: a representation bottleneck from discriminative 2D features and a cross-modal correspondence bottleneck in diffusion transformers. The system introduces Diffusion-Aligned Structured Latents (DA-SLAT) and a Sparse-structure Multimodal Diffusion Transformer (SMDiT) with Modal-Aware Rotary Positional Embedding (MARoPE) to improve 2D-3D alignment. Benchmark results claim substantial improvements in appearance fidelity over all current state-of-the-art methods for 3DGS asset generation.
Related guides (1)
Related events (8)
Introduction to 3D Gaussian Splatting
A Hugging Face blog post introduces 3D Gaussian Splatting, a technique for real-time novel view synthesis and 3D scene reconstruction. The method represents scenes as collections of 3D Gaussians rather than implicit neural fields, enabling fast rendering. The post serves as an educational overview of the technique's mechanics and applications.
Diffusers welcomes FLUX-2
Hugging Face's Diffusers library has added support for FLUX-2, the successor to Black Forest Labs' FLUX image generation model. The blog post announces integration of the new model into the Diffusers ecosystem, enabling developers to use FLUX-2 through the standard Diffusers API. This represents a tooling and ecosystem update for one of the leading open-weights image generation model families.
OrbitForge: Text-to-3D scene generation via reconstruction-anchored video synthesis using Gaussian Splatting
OrbitForge is a new method for converting text-generated videos into 3D Gaussian Splatting scenes without task-specific fine-tuning or score-distillation optimization. The approach uses a frozen video diffusion model as a prior, performs an initial 3D reconstruction via Deformable Gaussian Splatting, detects missing viewpoints from a prescribed orbit, and completes only those views before final reconstruction. On a 300-prompt T3Bench-derived audit, OrbitForge achieves a 359-degree median orbit span and substantially improves coverage quality over a MedianGS-only baseline. The work also argues for coverage-aware evaluation metrics in text-to-3D tasks.
Fast LoRA inference for Flux with Diffusers and PEFT
Hugging Face published a technical blog post detailing optimizations for LoRA inference speed with the Flux image generation model using the Diffusers and PEFT libraries. The post covers techniques to accelerate adapter loading and inference throughput for diffusion models. This is relevant to practitioners deploying fine-tuned image generation models in production or research settings.
Stable Diffusion in JAX / Flax
Hugging Face published a blog post demonstrating Stable Diffusion running in JAX/Flax, enabling efficient inference on TPU hardware. The post covers the technical implementation of diffusion pipelines using Flax's functional programming model. This represents an early effort to bring high-performance image generation to Google's TPU ecosystem via the Diffusers library.
PLAID: Repurposing Protein Folding Models for Multimodal Protein Generation with Latent Diffusion
PLAID is a generative model that simultaneously produces protein 1D sequences and 3D all-atom structures by learning a diffusion model over the latent space of ESMFold, a protein folding model. It requires only sequence data for training—leveraging databases 2-4 orders of magnitude larger than structure databases—and decodes structure at inference via frozen folding model weights. The approach supports compositional prompting by function and organism, addressing practical drug-design constraints like humanization and solubility. A companion compression model, CHEAP, addresses the high-dimensionality of transformer latent spaces to make the diffusion training tractable.
(LoRA) Fine-Tuning FLUX.1-dev on Consumer Hardware
This Hugging Face blog post covers techniques for fine-tuning the FLUX.1-dev image generation model using LoRA (Low-Rank Adaptation) on consumer-grade hardware. The post likely addresses quantization strategies (QLoRA) to reduce memory requirements, enabling training on GPUs with limited VRAM. This is relevant to the open-weights and accessible fine-tuning ecosystem for diffusion models.
Memory-efficient Diffusion Transformers with Quanto and Diffusers
This Hugging Face blog post describes integrating the Quanto quantization library with the Diffusers framework to reduce memory requirements for diffusion transformer models. The approach enables running large image/video generation models on consumer-grade hardware by applying int8 and int4 quantization to model weights. The post covers practical implementation details and benchmarks showing memory savings for models like Flux and others in the diffusion transformer family.
