Almanac
← Events
6Qwen Research (via RSSHub)·1mo ago

Qwen-Image-Edit: Image Editing Model with Text Rendering and Dual Visual Control

Alibaba's Qwen team has released Qwen-Image-Edit, a 20B-parameter image editing model built on the Qwen-Image foundation. The model extends Qwen-Image's text rendering capabilities to editing tasks, enabling precise in-image text modification. It uses a dual-path architecture that simultaneously feeds input images into Qwen2.5-VL for semantic control and a VAE Encoder for appearance control, enabling both semantic and appearance-level edits.

Related guides (2)

Related events (8)

7Qwen Research·1mo ago·source ↗

Qwen-Image: 20B MMDiT Image Foundation Model with Native Text Rendering

Alibaba's Qwen team has released Qwen-Image, a 20B parameter MMDiT (Multimodal Diffusion Transformer) image generation foundation model. The model claims significant advances in complex text rendering capabilities, including multi-line layouts, paragraph-level semantics, and fine-grained typographic details across alphabetic and other language scripts. It also features precise image editing capabilities and is accessible via Qwen Chat and open-weight repositories on HuggingFace and ModelScope.

5Qwen·15d ago·source ↗

Qwen releases Qwen-Image-Bench, a multimodal judge/evaluation model

Qwen has released Qwen-Image-Bench on Hugging Face, an image-text-to-text model tagged as a judge-model for evaluation and benchmarking purposes. The model supports both English and Chinese and appears designed to evaluate text-to-image outputs. With 8,572 downloads and 50 likes shortly after release, it has attracted modest early interest.

6Qwen·15d ago·source ↗

Qwen releases Qwen3.5-2B multimodal model on Hugging Face

Alibaba's Qwen team released Qwen3.5-2B, a 2-billion-parameter image-text-to-text model, on Hugging Face. The model supports conversational use and is compatible with Azure deployment endpoints. With nearly 2 million downloads, it has seen substantial community uptake.

5Qwen·15d ago·source ↗

Qwen releases Qwen3.5-0.8B multimodal model on Hugging Face

Alibaba's Qwen team released Qwen3.5-0.8B, a small-scale image-text-to-text model, on Hugging Face. The model supports conversational use and is compatible with Azure deployment endpoints. With over 2.7 million downloads and 562 likes, it has seen substantial community uptake for a sub-1B parameter multimodal model.

7Qwen Research·1mo ago·source ↗

Qwen VLo: Unified Multimodal Understanding and Generation Model

Alibaba's Qwen team has announced Qwen VLo, a new model that unifies multimodal understanding and image generation in a single architecture. Building on the Qwen2.5 VL lineage, the model is positioned to both comprehend and generate high-quality visual content. This represents a step toward unified perception-and-creation models, a direction several frontier labs are pursuing simultaneously.

7Qwen·15d ago·source ↗

Qwen releases Qwen3.5-35B-A3B multimodal MoE model on Hugging Face

Qwen has released Qwen3.5-35B-A3B, a 35B-parameter mixture-of-experts image-text-to-text model with approximately 3B active parameters, published on Hugging Face. The model supports conversational use and is compatible with Azure deployment endpoints. With over 2.8 million downloads and 1,400+ likes, it has seen substantial community uptake.

6Qwen Research·1mo ago·source ↗

Introducing Qwen-VL-Plus and Qwen-VL-Max: Upgraded Multimodal Models from Alibaba

Alibaba's Qwen team has launched two enhanced versions of their multimodal model, Qwen-VL-Plus and Qwen-VL-Max, building on the open-sourced Qwen-VL released in September 2023. Key improvements include substantially boosted image reasoning capabilities, enhanced detail recognition and text extraction from images, and support for high-definition images exceeding one million pixels across various aspect ratios. The upgrades represent a significant step forward in the Qwen-VL series' generalization and visual understanding capabilities.

6Qwen·15d ago·source ↗

Qwen releases Qwen3.6-35B-A3B multimodal MoE model on Hugging Face

Qwen published Qwen3.6-35B-A3B, a 35B-parameter mixture-of-experts image-text-to-text model with 3B active parameters, on Hugging Face. The model supports conversational use and is compatible with Azure deployment endpoints. With over 5.9 million downloads and 2,000 likes, it has seen substantial community uptake.