5OpenAI Blog·1mo ago

Text and Code Embeddings by Contrastive Pre-training

OpenAI published research on generating text and code embeddings using contrastive pre-training. The approach trains models to produce dense vector representations useful for semantic search, classification, and code retrieval tasks. This work underpins OpenAI's embeddings API offerings and represents an early public articulation of their embedding methodology.

Inference Economics Enterprise Deployment Patterns Contrastive Pre-training OpenAI Embeddings API text-embedding-ada-002 OpenAI

Related guides (3)

OpenAI

OpenAI: The Lab That Made AI a Household Word

Read asBeginner In-depth

Enterprise Deployment PatternsTopic guide

Enterprise Deployment Patterns: From AI Demo to Production Reality

Read asBeginner In-depth

Inference EconomicsTopic guide

Inference Economics: The Cost of Running AI in Production

Read asBeginner In-depth

Related events (8)

5Openai Blog·1mo ago·source ↗

Introducing text and code embeddings

OpenAI launched a new embeddings endpoint in its API, enabling natural language and code tasks such as semantic search, clustering, topic modeling, and classification. The endpoint provides vector representations of text and code, making it easier for developers to build applications requiring semantic understanding. This was a significant early step in OpenAI's API product expansion beyond text generation.

Enterprise Deployment Patterns Agent and Tool Ecosystem OpenAI Embeddings API OpenAI API OpenAI

9Openai Blog·1mo ago·source ↗

CLIP: Connecting Text and Images

OpenAI introduced CLIP (Contrastive Language-Image Pre-training), a neural network that learns visual concepts from natural language supervision. CLIP enables zero-shot visual classification by accepting natural language descriptions of categories rather than requiring task-specific training data. The approach mirrors the zero-shot transfer capabilities demonstrated by GPT-2 and GPT-3 in the language domain.

Frontier Model Releases Evaluation and Benchmarking GPT-3 GPT-2 Contrastive Language-Image Pretraining (CLIP)+3 more

6Mistral Ai News·19d ago·source ↗

Mistral AI Releases Codestral Embed: First Code-Specialized Embedding Model

Mistral AI has launched Codestral Embed (codestral-embed-2505), its first embedding model specialized for code retrieval and semantic understanding. The model claims to outperform leading competitors including Voyage Code 3, Cohere Embed v4.0, and OpenAI's large embedding model across benchmarks including SWE-Bench, CodeSearchNet, and Text2SQL tasks. It supports variable output dimensions and precisions (including int8), enabling storage/quality trade-offs, and is priced at $0.15 per million tokens via Mistral's API with batch discounts available.

Frontier Model Releases Evaluation and Benchmarking Mistral AI Codestral Embed CodeSearchNet +7 more

5Openai Blog·1mo ago·source ↗

New embedding models and API updates from OpenAI

OpenAI announced new embedding models alongside API updates, expanding their developer-facing infrastructure offerings. The release likely includes updated text-embedding models with improved performance or cost characteristics. This is part of OpenAI's ongoing effort to maintain and grow its API platform for enterprise and developer use cases.

Inference Economics Enterprise Deployment Patterns OpenAI Embeddings API OpenAI +1 more

4Hugging Face Blog·1mo ago·source ↗

Generating Human-level Text with Contrastive Search in Transformers

Hugging Face introduces contrastive search, a decoding strategy for autoregressive language models that aims to produce more coherent and human-like text compared to standard methods like beam search or nucleus sampling. The technique works by balancing a model's confidence in its next-token prediction against a contrastive penalty that discourages repetitive or degenerate outputs. The blog post describes integration of contrastive search into the Hugging Face Transformers library, making it accessible to practitioners.

Frontier Model Releases Agent and Tool Ecosystem Contrastive Search Hugging Face Transformers Hugging Face

5Openai Blog·1mo ago·source ↗

OpenAI Releases New and Improved Embedding Model

OpenAI announced a new embedding model described as significantly more capable, cost-effective, and simpler to use than prior offerings. The announcement was published in December 2022 and represents an update to OpenAI's text embedding API surface. No specific benchmark numbers or architectural details are provided in the available body text.

Inference Economics Enterprise Deployment Patterns text-embedding-ada-002 OpenAI

4Hugging Face Blog·1mo ago·source ↗

Train a Sentence Embedding Model with 1B Training Pairs

This Hugging Face blog post describes a methodology for training sentence embedding models using approximately 1 billion training pairs. The post covers data curation, model architecture choices, and training strategies for large-scale contrastive learning of sentence representations. It serves as a practical guide for practitioners building semantic search and similarity systems.

Agent and Tool Ecosystem contrastive learning sentence embeddings Hugging Face +1 more

4Qwen Research·1mo ago·source ↗

Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese

Alibaba's Qwen team released Chinese CLIP, a language-specific vision-language contrastive pretraining model targeting Chinese multimodal representation learning. The project addresses a gap in open-source Chinese CLIP models, particularly for cross-modal retrieval tasks. It follows the CLIP framework but is adapted for Chinese language and cultural context.

Open Weights Progress Multimodal Progress contrastive vision-language pretraining Chinese CLIP CLIP +1 more