other
Subject-driven Image Generation
otheractiveprovisional
subject-driven-image-generation-5e6aec23·1 events·first seen 22d agoAliases: Subject-driven Image Generation
Co-occurring entities
More like this (12)
Recent events (1)
Squeezing Capacity from MLLMs for Subject-driven Image Generation via Dual Layer Aggregation
This paper proposes conditioning diffusion models on Multimodal Large Language Models (MLLMs) that jointly encode text and reference images, augmented with VAE-based identity conditioning to address copy-paste artifacts and identity preservation failures in subject-driven image generation. A Dual Layer Aggregation (DLA) module aggregates multi-level MLLM features, and a multi-stage denoising strategy progressively balances semantic and fine-detail identity signals during inference. Experiments show improved human preference scores on subject-driven generation benchmarks compared to prior approaches that encode text and reference images separately.