model

BLIP-2

modelactiveblip-2-4d17f3b4·1 events·first seen 28d ago

Aliases: BLIP-2

Co-occurring entities

Q-Former Salesforce Research Hugging Face Transformers Hugging Face

More like this (12)

SigLIP 2 SigLIP2 SmolLM2 SmolVLM2 CLIP Phi-2 E2B LM1B FLUX-2 LamPO CLIPSeg cuBLAS

Recent events (1)

5Hugging Face Blog·28d ago·source ↗

Zero-shot image-to-text generation with BLIP-2

Hugging Face published a blog post introducing BLIP-2, a multimodal model that enables zero-shot image-to-text generation by bridging frozen image encoders and large language models via a lightweight Querying Transformer (Q-Former). The post covers the model's architecture, capabilities, and how to use it via the Hugging Face Transformers library. BLIP-2 achieves strong performance on visual question answering and image captioning tasks without task-specific fine-tuning.

Open Weights Progress Agent and Tool Ecosystem Q-Former Salesforce Research Hugging Face Transformers +3 more