4arXiv cs.CL (Computation and Language)·4d ago

Transformer embeddings shown to intrinsically encode Russell's circumplex model of emotion geometry

A new arXiv paper investigates whether Transformer-based text and speech encoders (RoBERTa, wav2vec 2.0) recover the geometric structure of Russell's circumplex model of affect — a valence-arousal topology from psychology. Experiments on naturalistic datasets (MSP-Podcast) and LLM-generated stimuli show that multimodal fusion achieves perfect topological alignment with Russell's primary emotion ordering, and zero-shot generic text embeddings place fine-grained emotion terms near their human-mapped coordinates. The authors argue this structure is intrinsically encoded in the representations rather than being an artifact of labeling, bridging psychological theory and representation learning.

Evaluation and Benchmarking Multimodal Progress Data-Driven Decoding of Russell's Circumplex Model of Affect RoBERTa MSP-Podcast wav2vec 2.0

Related guides (2)

Multimodal ProgressTopic guide

Multimodal Progress: How AI Learned to See, Hear, and Act

Read asBeginner In-depth

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

5Hugging Face Blog·1mo ago·source ↗

Multimodal Embedding & Reranker Models with Sentence Transformers

Hugging Face's Sentence Transformers library has added support for multimodal embedding and reranking models, enabling joint text-image (and potentially other modality) representations within a unified framework. The update extends the library's existing text-focused embedding capabilities to handle cross-modal retrieval and reranking tasks. This lowers the barrier for practitioners building multimodal search and RAG pipelines using open-weights models.

Inference Economics Agent and Tool Ecosystem multimodal embedding reranking Hugging Face +2 more

3arXiv · cs.CL·12d ago·source ↗

Comparative study of semantic geometry in transformer embeddings vs. graph-based lexical models

A preprint from arXiv compares the geometric and topological properties of transformer-based vector embeddings (CamemBERT) against lexical co-occurrence graphs for representing semantic structure. Applied to a French civic debate corpus, the study finds similar local topology but divergent global structure between the two approaches. The authors argue graph-based models offer more interpretable semantic organization and suggest graphs could guide neural architectures toward more stable, interpretable convergence.

Evaluation and Benchmarking CamemBERT Geometry of Semantic Space: Comparative Study of Discrete and Continuous Models

5Hugging Face Blog·1mo ago·source ↗

Introduction to Matryoshka Embedding Models

This Hugging Face blog post introduces Matryoshka Representation Learning (MRL), a technique for training embedding models that encode information at multiple granularities within a single vector. The approach allows truncating embeddings to smaller dimensions without significant loss in retrieval quality, enabling flexible trade-offs between storage/compute costs and accuracy. The post covers training, evaluation, and practical usage of Matryoshka embedding models via the Sentence Transformers library.

Inference Economics Agent and Tool Ecosystem MRL Hugging Face Sentence Transformers +1 more

5arXiv · cs.CL·1mo ago·source ↗

Conditional Scale Entropy: A Wavelet-Derived Tool for Mechanistic Interpretability of Metaphor Processing in Transformers

This paper introduces Conditional Scale Entropy (CSE), a wavelet-derived measure of how transformer computation engages across frequency scales at each layer, and applies it to study metaphor processing in decoder-only language models. The authors prove CSE is invariant to update magnitude, isolating structural computation patterns from intensity. Across architectures ranging from GPT-2 (124M) to LLaMA-2 7B and GPT-oss 20B, metaphorical tokens consistently produce higher spectral breadth than literal tokens in early-to-mid layers, with the effect surviving permutation correction and specificity controls. The work establishes multi-scale coordination as a consistent mechanistic signature of metaphorical language processing and positions CSE as a general interpretability tool for cross-depth structure in transformers.

Evaluation and Benchmarking AI Safety Research Conditional Scale Entropy mechanistic interpretability GPT-2 +3 more

4Hugging Face Blog·1mo ago·source ↗

You Could Have Designed State of the Art Positional Encoding

A Hugging Face blog post walks through the design space of positional encoding for transformer models, building intuition for why modern schemes like RoPE emerged. The post takes a pedagogical approach, showing how one could derive state-of-the-art positional encoding from first principles. It covers the evolution from absolute to relative positional encodings and the properties that make certain schemes preferable for long-context generalization.

Long Context Evolution Transformers Rotary Position Embedding (RoPE)Positional Encoding +1 more

5Hugging Face Blog·1mo ago·source ↗

Training and Finetuning Multimodal Embedding & Reranker Models with Sentence Transformers

Hugging Face published a blog post detailing how to train and finetune multimodal embedding and reranker models using the Sentence Transformers library. The post covers techniques for building models that can jointly embed text and images for retrieval and reranking tasks. This represents an extension of the Sentence Transformers ecosystem into multimodal territory, enabling practitioners to build cross-modal search and ranking systems.

Agent and Tool Ecosystem Multimodal Progress reranker models multimodal embedding Hugging Face +1 more

4arXiv · cs.CL·29d ago·source ↗

Multimodal Pathos Analysis in Political Speech: LLM-Based vs. Acoustic Emotion Models

Researchers compare acoustic speech emotion recognition (emotion2vec_plus_large), multimodal LLM analysis (Gemini 2.5 Flash), and a multi-agent LLM ensemble (TRUST pipeline) for detecting Pathos in a Bundestag political speech. Gemini Valence correlates strongly with TRUST-Pathos scores (rho=+0.664) while acoustic Valence does not (rho=+0.097), suggesting LLMs capture semantically defined political emotion far better than acoustic models. The study also critiques standard SER benchmark corpora (EMO-DB) for acted speech, cultural bias, and category incompatibility. Results indicate acoustic features remain useful for low-level arousal estimation but are insufficient proxies for rhetorical-emotional analysis.

Agent and Tool Ecosystem Multimodal Progress Gemini-2.5-Flash-Lite Felix Banaszak emotion2vec_plus_large +4 more

4arXiv · cs.CL·4d ago·source ↗

RDS Fusion: Hybrid neuro-symbolic gating with compressed CoT for zero-shot irony detection

Researchers introduce the Robust Dual-Signal (RDS) Fusion framework, a hybrid neuro-symbolic architecture that compresses Chain-of-Thought reasoning without supervised fine-tuning for irony and sarcasm detection in social media text. Evaluated on TweetEval (N=734) and iSarcasm, the zero-shot system matches fine-tuned BERTweet performance and outperforms supervised SemEval transformer ensembles on the imbalanced iSarcasm dataset. A statistical ablation shows that only the full concurrent fusion of all three signals yields a validated improvement, with individual components providing no significant standalone gain.

Evaluation and Benchmarking TweetEval BERTweet Robust Dual-Signal Fusion +1 more