5arXiv cs.LG (Machine Learning)·16h ago

Theoretical framework explains why contrastive embedding norms encode semantic specificity

A new arXiv preprint provides a formal theoretical explanation for why embedding magnitudes in contrastive models trained with scale-invariant losses correlate with semantic properties like concept specificity, token frequency, and human uncertainty — despite norms being ignored by cosine similarity metrics. The authors derive an analytic formula showing that embedding length encodes this information as a byproduct of optimization dynamics. The work suggests these norms can serve as 'free' calibration signals in retrieval tasks, grounding a previously heuristic observation.

Evaluation and Benchmarking Optimization Dynamics Imprint Semantic Specificity in Contrastive Embedding Norms

Related guides (1)

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

5arXiv · cs.CL·1mo ago·source ↗

Conditional Scale Entropy: A Wavelet-Derived Tool for Mechanistic Interpretability of Metaphor Processing in Transformers

This paper introduces Conditional Scale Entropy (CSE), a wavelet-derived measure of how transformer computation engages across frequency scales at each layer, and applies it to study metaphor processing in decoder-only language models. The authors prove CSE is invariant to update magnitude, isolating structural computation patterns from intensity. Across architectures ranging from GPT-2 (124M) to LLaMA-2 7B and GPT-oss 20B, metaphorical tokens consistently produce higher spectral breadth than literal tokens in early-to-mid layers, with the effect surviving permutation correction and specificity controls. The work establishes multi-scale coordination as a consistent mechanistic signature of metaphorical language processing and positions CSE as a general interpretability tool for cross-depth structure in transformers.

Evaluation and Benchmarking AI Safety Research Conditional Scale Entropy mechanistic interpretability GPT-2 +3 more

5arXiv · cs.CL·21d ago·source ↗

BODHI: Contrastive embedding training for causal discovery in Large Behavioural Models

Researchers identify a critical failure mode in biomedical language model embeddings: off-the-shelf encoders (BioBERT, PubMedBERT, BioM-ELECTRA) assign high cosine similarity (0.76–0.92) to causally unrelated cross-domain pairs, achieving 0% accuracy on cross-domain discrimination. The paper introduces BODHI, a contrastive training approach using hard negatives mined from a biomedical knowledge graph, which improves within-vs-across-domain separation from 1.05x to 2.30x and raises discrimination gap by +0.392. The work targets Large Behavioural Models (LBMs)—foundation models that reason over personal life graphs—where false embedding proximity directly produces false causal edges. Additional contributions include an OpenVINO inference optimization achieving 133x latency reduction (1367ms to 10ms) on Intel AMX hardware, plus a counterintuitive finding that FP16 outperforms INT8 on this silicon.

Evaluation and Benchmarking Inference Economics BIOSSES BioBERT PubMedBERT +4 more

7arXiv · cs.AI·1mo ago·source ↗

The Matching Principle: A Geometric Theory Unifying Robustness, Domain Adaptation, and Alignment via Nuisance Covariance

This paper proposes the 'matching principle': a unified geometric framework arguing that robustness methods (CORAL, IRM, adversarial training, augmentation, metric learning, Jacobian penalties, alignment constraints) are all estimators of the same object—the covariance of label-preserving deployment nuisance—and that regularizing the encoder Jacobian along this covariance's range is the core statistical problem. The authors prove closed-form optimality results in a linear-Gaussian model, introduce the Trajectory Deviation Index (TDI) as a label-free embedding sensitivity probe, and validate predictions across 13 pre-registered experimental blocks including Qwen2.5-7B. At 7B scale, matched style-PMH improves selective honesty while standard DPO degrades Style TDI, connecting the theory to alignment safety.

Evaluation and Benchmarking AI Safety Research Invariant Risk Minimization Matching Principle Qwen2.5-7B +5 more

4arXiv · cs.CL·14d ago·source ↗

Transformer embeddings shown to intrinsically encode Russell's circumplex model of emotion geometry

A new arXiv paper investigates whether Transformer-based text and speech encoders (RoBERTa, wav2vec 2.0) recover the geometric structure of Russell's circumplex model of affect — a valence-arousal topology from psychology. Experiments on naturalistic datasets (MSP-Podcast) and LLM-generated stimuli show that multimodal fusion achieves perfect topological alignment with Russell's primary emotion ordering, and zero-shot generic text embeddings place fine-grained emotion terms near their human-mapped coordinates. The authors argue this structure is intrinsically encoded in the representations rather than being an artifact of labeling, bridging psychological theory and representation learning.

Evaluation and Benchmarking Multimodal Progress Data-Driven Decoding of Russell's Circumplex Model of Affect RoBERTa MSP-Podcast +1 more

4arXiv · cs.CL·1mo ago·source ↗

Chinese Sensorimotor and Embodiment Norms for 3,000 Lexicalized Concepts

Researchers present a large-scale normative database of sensorimotor and embodiment ratings for 3,000 Mandarin Chinese concepts, collected from 378 native speakers across 11 sensorimotor dimensions. A validation study identifies PSE-Sensorimotor and Minkowski-3 as the strongest composite predictors of lexical decision performance. An exploratory analysis finds that sensorimotor ratings are substantially recoverable from purely linguistic (distributional) representations via simple regression (mean Spearman r = .62), with visual and auditory dimensions recovering better than chemosensory ones. The work provides both a cognitive science resource and empirical evidence bearing on whether LLMs can acquire embodied conceptual knowledge from text alone.

Evaluation and Benchmarking Multimodal Progress Perceptual Strength of Embodiment (PSE)Representational Similarity Analysis Huang et al. 2025 +3 more

3arXiv · cs.CL·22d ago·source ↗

Comparative study of semantic geometry in transformer embeddings vs. graph-based lexical models

A preprint from arXiv compares the geometric and topological properties of transformer-based vector embeddings (CamemBERT) against lexical co-occurrence graphs for representing semantic structure. Applied to a French civic debate corpus, the study finds similar local topology but divergent global structure between the two approaches. The authors argue graph-based models offer more interpretable semantic organization and suggest graphs could guide neural architectures toward more stable, interpretable convergence.

Evaluation and Benchmarking CamemBERT Geometry of Semantic Space: Comparative Study of Discrete and Continuous Models

4arXiv · cs.CL·7d ago·source ↗

LLM embedding spaces partially recover expert-defined symptom structure in mental health language

A new arXiv preprint investigates whether LLM embedding geometry aligns with expert-defined symptom structure in mental health language, using 28 Reddit communities as a testbed. The authors compare pretrained and fine-tuned Qwen3 embeddings (0.6B and 4B) against an expert symptom matrix via representational similarity analysis, with controls for affective, stylistic, and topic confounds. Results show measurable but level-dependent alignment: fine-tuning strengthens it at fine-grained category levels, and larger scale improves both zero-shot alignment and fine-tuning gains. The paper argues that classification accuracy alone is insufficient to validate embedding geometry against domain knowledge.

Evaluation and Benchmarking Reddit Do LLM Embedding Spaces Recover Expert Structure?Qwen3

5Openai Blog·1mo ago·source ↗

Text and Code Embeddings by Contrastive Pre-training

OpenAI published research on generating text and code embeddings using contrastive pre-training. The approach trains models to produce dense vector representations useful for semantic search, classification, and code retrieval tasks. This work underpins OpenAI's embeddings API offerings and represents an early public articulation of their embedding methodology.

Inference Economics Enterprise Deployment Patterns Contrastive Pre-training OpenAI Embeddings API text-embedding-ada-002 +1 more