Entity · technique

Contrastive Language-Image Pretraining (CLIP)

techniqueactivecontrastive-language-image-pretraining-clip--186010c3·2 events·first seen May 19, 2026

Aliases: Contrastive Language-Image Pretraining (CLIP), Contrastive Language-Image Pre-training

Co-occurring entities

GPT-3 GPT-2 OpenAI CLIP Vision-Language Models Hugging Face

More like this (12)

Contrastive Pre-training contrastive vision-language pretraining CLIP contrastive learning Scalable Visual Pretraining for Language Intelligence Chinese CLIP Contrastive Search unCLIP CLIPSeg instruction-based multitask pretraining Unsupervised Pre-training PixelCNN

Recent events (2)

9Openai Blog·May 20, 2026·source ↗

CLIP: Connecting Text and Images

OpenAI introduced CLIP (Contrastive Language-Image Pre-training), a neural network that learns visual concepts from natural language supervision. CLIP enables zero-shot visual classification by accepting natural language descriptions of categories rather than requiring task-specific training data. The approach mirrors the zero-shot transfer capabilities demonstrated by GPT-2 and GPT-3 in the language domain.

Frontier Model Releases Evaluation and Benchmarking GPT-3 GPT-2 Contrastive Language-Image Pretraining (CLIP)+3 more

4Hugging Face Blog·May 19, 2026·source ↗

A Dive into Vision-Language Models

This Hugging Face blog post provides a technical overview of vision-language model (VLM) pretraining approaches, covering architectures and training strategies used to align visual and textual representations. It surveys key models and techniques in the multimodal learning space as of early 2023. The post serves as an educational reference for practitioners working with or building VLMs.

Multimodal Progress Contrastive Language-Image Pretraining (CLIP)Vision-Language Models Hugging Face