Almanac
technique

Sparse-structure Multimodal Diffusion Transformer

techniqueactiveprovisionalsparse-structure-multimodal-diffusion-transformer-6e50d00b·1 events·first seen 15h ago

Aliases: Sparse-structure Multimodal Diffusion Transformer

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.AI·15h ago·source ↗

FLUX3D: Diffusion-aligned sparse representation for high-fidelity image-to-3D Gaussian Splatting

Researchers introduce FLUX3D, an image-to-3D Gaussian Splatting framework that addresses two structural bottlenecks in sparse voxel-based 3D generation: a representation bottleneck from discriminative 2D features and a cross-modal correspondence bottleneck in diffusion transformers. The system introduces Diffusion-Aligned Structured Latents (DA-SLAT) and a Sparse-structure Multimodal Diffusion Transformer (SMDiT) with Modal-Aware Rotary Positional Embedding (MARoPE) to improve 2D-3D alignment. Benchmark results claim substantial improvements in appearance fidelity over all current state-of-the-art methods for 3DGS asset generation.