Qwen-Image: 20B MMDiT Image Foundation Model with Native Text Rendering
Alibaba's Qwen team has released Qwen-Image, a 20B parameter MMDiT (Multimodal Diffusion Transformer) image generation foundation model. The model claims significant advances in complex text rendering capabilities, including multi-line layouts, paragraph-level semantics, and fine-grained typographic details across alphabetic and other language scripts. It also features precise image editing capabilities and is accessible via Qwen Chat and open-weight repositories on HuggingFace and ModelScope.
Related guides (3)
Related events (8)
Qwen-Image-Edit: Image Editing Model with Text Rendering and Dual Visual Control
Alibaba's Qwen team has released Qwen-Image-Edit, a 20B-parameter image editing model built on the Qwen-Image foundation. The model extends Qwen-Image's text rendering capabilities to editing tasks, enabling precise in-image text modification. It uses a dual-path architecture that simultaneously feeds input images into Qwen2.5-VL for semantic control and a VAE Encoder for appearance control, enabling both semantic and appearance-level edits.
Qwen releases Qwen3.5-2B multimodal model on Hugging Face
Alibaba's Qwen team released Qwen3.5-2B, a 2-billion-parameter image-text-to-text model, on Hugging Face. The model supports conversational use and is compatible with Azure deployment endpoints. With nearly 2 million downloads, it has seen substantial community uptake.
Qwen releases Qwen3.5-0.8B multimodal model on Hugging Face
Alibaba's Qwen team released Qwen3.5-0.8B, a small-scale image-text-to-text model, on Hugging Face. The model supports conversational use and is compatible with Azure deployment endpoints. With over 2.7 million downloads and 562 likes, it has seen substantial community uptake for a sub-1B parameter multimodal model.
Qwen releases Qwen3.5-9B multimodal model on Hugging Face
Qwen has released Qwen3.5-9B, a 9-billion parameter image-text-to-text model, on Hugging Face. The model supports conversational use cases and is compatible with Azure deployment endpoints. With over 9 million downloads and 1,500+ likes, it has seen substantial community uptake.
Qwen releases Qwen3.5-4B multimodal model on Hugging Face
Qwen has released Qwen3.5-4B, a 4-billion parameter image-text-to-text model, on Hugging Face. The model supports conversational use and is compatible with Azure deployment endpoints. With over 10 million downloads and 604 likes, it has seen substantial community uptake.
Qwen releases Qwen3.5-27B multimodal model on Hugging Face
Qwen has released Qwen3.5-27B, a 27-billion parameter image-text-to-text model, on Hugging Face. The model supports conversational use and is compatible with Azure deployment endpoints. With nearly 3 million downloads and 981 likes, it has seen substantial community uptake.
Qwen releases Qwen3.6-27B multimodal model on Hugging Face
Qwen published Qwen3.6-27B, a 27-billion-parameter image-text-to-text model, on Hugging Face. The model supports conversational use and is compatible with Azure deployment endpoints. With over 5.4 million downloads and 1,619 likes, it has seen substantial community uptake.
Qwen releases Qwen3.5-9B-Base multimodal model on Hugging Face
Qwen has released Qwen3.5-9B-Base, a 9-billion-parameter image-text-to-text base model on Hugging Face. The model supports conversational use and is compatible with the transformers library and inference endpoints. With over 153,000 downloads, it has seen substantial early adoption.


