Entity · technique

Q-Former

techniqueactiveq-former-ed9e31d2·2 events·first seen May 19, 2026

Aliases: Q-Former, ><former

Co-occurring entities

Variable-Width Transformers Salesforce Research Hugging Face Transformers Hugging Face BLIP-2

More like this (12)

OneFormer QwQ-32B Qwen3-4B Qwen3 Q8-Chat Qwen-VL QIMMA QVQ-Max QwQ-32B-Preview Qwen 3.5 MedQADE Qwen

Recent events (2)

5arXiv · cs.CL·Jun 17, 2026·source ↗

Variable-Width Transformers: X-shaped architecture outperforms uniform-width baselines with 22% fewer FLOPs

Researchers propose the ><former (X-shaped transformer), a decoder-only architecture that uses wider early and late layers with narrower middle layers, implemented via a parameter-free residual resizing mechanism. Evaluated on models from 200M to 2B dense parameters and 3B MoE, the architecture consistently outperforms parameter-matched uniform-width baselines on language modeling loss. The design yields a 22% reduction in FLOPs and 15% reduction in KV cache memory under fitted scaling curves, suggesting nonuniform width allocation is a viable path to more compute-efficient language models.

Frontier Model Releases Inference Economics Q-Former Variable-Width Transformers

5Hugging Face Blog·May 19, 2026·source ↗

Zero-shot image-to-text generation with BLIP-2

Hugging Face published a blog post introducing BLIP-2, a multimodal model that enables zero-shot image-to-text generation by bridging frozen image encoders and large language models via a lightweight Querying Transformer (Q-Former). The post covers the model's architecture, capabilities, and how to use it via the Hugging Face Transformers library. BLIP-2 achieves strong performance on visual question answering and image captioning tasks without task-specific fine-tuning.

Open Weights Progress Agent and Tool Ecosystem Q-Former Salesforce Research Hugging Face Transformers +3 more