Contrastive Language-Image Pretraining (CLIP)
contrastive-language-image-pretraining-clip--186010c3·2 events·first seen 28d agoAliases: Contrastive Language-Image Pretraining (CLIP), Contrastive Language-Image Pre-training
Co-occurring entities
More like this (12)
Recent events (2)
CLIP: Connecting Text and Images
OpenAI introduced CLIP (Contrastive Language-Image Pre-training), a neural network that learns visual concepts from natural language supervision. CLIP enables zero-shot visual classification by accepting natural language descriptions of categories rather than requiring task-specific training data. The approach mirrors the zero-shot transfer capabilities demonstrated by GPT-2 and GPT-3 in the language domain.
A Dive into Vision-Language Models
This Hugging Face blog post provides a technical overview of vision-language model (VLM) pretraining approaches, covering architectures and training strategies used to align visual and textual representations. It surveys key models and techniques in the multimodal learning space as of early 2023. The post serves as an educational reference for practitioners working with or building VLMs.