Almanac
other

Subject-driven Image Generation

otheractiveprovisionalsubject-driven-image-generation-5e6aec23·1 events·first seen 22d ago

Aliases: Subject-driven Image Generation

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.LG·22d ago·source ↗

Squeezing Capacity from MLLMs for Subject-driven Image Generation via Dual Layer Aggregation

This paper proposes conditioning diffusion models on Multimodal Large Language Models (MLLMs) that jointly encode text and reference images, augmented with VAE-based identity conditioning to address copy-paste artifacts and identity preservation failures in subject-driven image generation. A Dual Layer Aggregation (DLA) module aggregates multi-level MLLM features, and a multi-stage denoising strategy progressively balances semantic and fine-detail identity signals during inference. Experiments show improved human preference scores on subject-driven generation benchmarks compared to prior approaches that encode text and reference images separately.