5arXiv cs.AI (Artificial Intelligence)·15h ago

FLUX3D: Diffusion-aligned sparse representation for high-fidelity image-to-3D Gaussian Splatting

Researchers introduce FLUX3D, an image-to-3D Gaussian Splatting framework that addresses two structural bottlenecks in sparse voxel-based 3D generation: a representation bottleneck from discriminative 2D features and a cross-modal correspondence bottleneck in diffusion transformers. The system introduces Diffusion-Aligned Structured Latents (DA-SLAT) and a Sparse-structure Multimodal Diffusion Transformer (SMDiT) with Modal-Aware Rotary Positional Embedding (MARoPE) to improve 2D-3D alignment. Benchmark results claim substantial improvements in appearance fidelity over all current state-of-the-art methods for 3DGS asset generation.

Multimodal Progress Sparse-structure Multimodal Diffusion Transformer FLUX3D Diffusion-Aligned Structured Latents Modal-Aware Rotary Positional Embedding

Related guides (1)

Multimodal ProgressTopic guide

Multimodal Progress: How AI Learned to See, Hear, and Act

Read asBeginner In-depth

Related events (8)

4Hugging Face Blog·1mo ago·source ↗

Introduction to 3D Gaussian Splatting

A Hugging Face blog post introduces 3D Gaussian Splatting, a technique for real-time novel view synthesis and 3D scene reconstruction. The method represents scenes as collections of 3D Gaussians rather than implicit neural fields, enabling fast rendering. The post serves as an educational overview of the technique's mechanics and applications.

Multimodal Progress 3D Gaussian Splatting Hugging Face NeRF

6Hugging Face Blog·1mo ago·source ↗

Diffusers welcomes FLUX-2

Hugging Face's Diffusers library has added support for FLUX-2, the successor to Black Forest Labs' FLUX image generation model. The blog post announces integration of the new model into the Diffusers ecosystem, enabling developers to use FLUX-2 through the standard Diffusers API. This represents a tooling and ecosystem update for one of the leading open-weights image generation model families.

Open Weights Progress Agent and Tool Ecosystem Black Forest Labs Hugging Face Diffusers FLUX-2 +3 more

4arXiv · cs.AI·15h ago·source ↗

OrbitForge: Text-to-3D scene generation via reconstruction-anchored video synthesis using Gaussian Splatting

OrbitForge is a new method for converting text-generated videos into 3D Gaussian Splatting scenes without task-specific fine-tuning or score-distillation optimization. The approach uses a frozen video diffusion model as a prior, performs an initial 3D reconstruction via Deformable Gaussian Splatting, detects missing viewpoints from a prescribed orbit, and completes only those views before final reconstruction. On a 300-prompt T3Bench-derived audit, OrbitForge achieves a 359-degree median orbit span and substantially improves coverage quality over a MedianGS-only baseline. The work also argues for coverage-aware evaluation metrics in text-to-3D tasks.

Multimodal Progress T3Bench 3D Gaussian Splatting VideoMV +2 more

4Hugging Face Blog·1mo ago·source ↗

Fast LoRA inference for Flux with Diffusers and PEFT

Hugging Face published a technical blog post detailing optimizations for LoRA inference speed with the Flux image generation model using the Diffusers and PEFT libraries. The post covers techniques to accelerate adapter loading and inference throughput for diffusion models. This is relevant to practitioners deploying fine-tuned image generation models in production or research settings.

Inference Economics Agent and Tool Ecosystem PEFT LoRA Hugging Face +2 more

4Hugging Face Blog·1mo ago·source ↗

Stable Diffusion in JAX / Flax

Hugging Face published a blog post demonstrating Stable Diffusion running in JAX/Flax, enabling efficient inference on TPU hardware. The post covers the technical implementation of diffusion pipelines using Flax's functional programming model. This represents an early effort to bring high-performance image generation to Google's TPU ecosystem via the Diffusers library.

Training Infrastructure Inference Economics Google TPU Stable Diffusion 3 Flax +4 more

6Berkeley Ai Research (Bair) Blog·1mo ago·source ↗

PLAID: Repurposing Protein Folding Models for Multimodal Protein Generation with Latent Diffusion

PLAID is a generative model that simultaneously produces protein 1D sequences and 3D all-atom structures by learning a diffusion model over the latent space of ESMFold, a protein folding model. It requires only sequence data for training—leveraging databases 2-4 orders of magnitude larger than structure databases—and decodes structure at inference via frozen folding model weights. The approach supports compositional prompting by function and organism, addressing practical drug-design constraints like humanization and solubility. A companion compression model, CHEAP, addresses the high-dimensionality of transformer latent spaces to make the diffusion training tractable.

Frontier Model Releases Multimodal Progress CHEAP Berkeley AI Research (BAIR)AlphaFold2 +4 more

5Hugging Face Blog·1mo ago·source ↗

(LoRA) Fine-Tuning FLUX.1-dev on Consumer Hardware

This Hugging Face blog post covers techniques for fine-tuning the FLUX.1-dev image generation model using LoRA (Low-Rank Adaptation) on consumer-grade hardware. The post likely addresses quantization strategies (QLoRA) to reduce memory requirements, enabling training on GPUs with limited VRAM. This is relevant to the open-weights and accessible fine-tuning ecosystem for diffusion models.

Open Weights Progress Inference Economics Black Forest Labs FLUX.1-dev LoRA +3 more

5Hugging Face Blog·1mo ago·source ↗

Memory-efficient Diffusion Transformers with Quanto and Diffusers

This Hugging Face blog post describes integrating the Quanto quantization library with the Diffusers framework to reduce memory requirements for diffusion transformer models. The approach enables running large image/video generation models on consumer-grade hardware by applying int8 and int4 quantization to model weights. The post covers practical implementation details and benchmarks showing memory savings for models like Flux and others in the diffusion transformer family.

Inference Economics Agent and Tool Ecosystem Quanto Linear Diffusion Transformer Hugging Face +3 more