technique
VAE-based identity conditioning
techniqueactiveprovisional
vae-based-identity-conditioning-0a1b56db·1 events·first seen 22d agoAliases: VAE-based identity conditioning
Co-occurring entities
More like this (12)
Recent events (1)
Squeezing Capacity from MLLMs for Subject-driven Image Generation via Dual Layer Aggregation
This paper proposes conditioning diffusion models on Multimodal Large Language Models (MLLMs) that jointly encode text and reference images, augmented with VAE-based identity conditioning to address copy-paste artifacts and identity preservation failures in subject-driven image generation. A Dual Layer Aggregation (DLA) module aggregates multi-level MLLM features, and a multi-stage denoising strategy progressively balances semantic and fine-detail identity signals during inference. Experiments show improved human preference scores on subject-driven generation benchmarks compared to prior approaches that encode text and reference images separately.