Almanac
← Events
5arXiv cs.CL (Computation and Language)·2d ago

Emotion vectors replicated in open-weight LLMs with architecture-dependent valence geometry

A new arXiv preprint extends prior findings on emotion vectors in Claude Sonnet 4.5 to two open-weight models, Apertus-8B-Instruct-2509 and Gemma-4-E4B-it, by extracting emotion contrast vectors across all layers. The authors recover valence geometry in both models (peak PC1-valence correlations of r=0.76 and r=0.83, near Claude's r=0.81) but find notable architectural differences: Gemma encodes valence strongly in early layers while Apertus shows the opposite pattern. Arousal encoding proves sensitive to the corpus used for extraction, suggesting uneven distribution of arousal-relevant cues across model-generated text.

Related guides (4)

Related events (8)

4arXiv · cs.CL·12d ago·source ↗

Transformer embeddings shown to intrinsically encode Russell's circumplex model of emotion geometry

A new arXiv paper investigates whether Transformer-based text and speech encoders (RoBERTa, wav2vec 2.0) recover the geometric structure of Russell's circumplex model of affect — a valence-arousal topology from psychology. Experiments on naturalistic datasets (MSP-Podcast) and LLM-generated stimuli show that multimodal fusion achieves perfect topological alignment with Russell's primary emotion ordering, and zero-shot generic text embeddings place fine-grained emotion terms near their human-mapped coordinates. The authors argue this structure is intrinsically encoded in the representations rather than being an artifact of labeling, bridging psychological theory and representation learning.

4arXiv · cs.CL·1mo ago·source ↗

Multimodal Pathos Analysis in Political Speech: LLM-Based vs. Acoustic Emotion Models

Researchers compare acoustic speech emotion recognition (emotion2vec_plus_large), multimodal LLM analysis (Gemini 2.5 Flash), and a multi-agent LLM ensemble (TRUST pipeline) for detecting Pathos in a Bundestag political speech. Gemini Valence correlates strongly with TRUST-Pathos scores (rho=+0.664) while acoustic Valence does not (rho=+0.097), suggesting LLMs capture semantically defined political emotion far better than acoustic models. The study also critiques standard SER benchmark corpora (EMO-DB) for acted speech, cultural bias, and category incompatibility. Results indicate acoustic features remain useful for low-level arousal estimation but are insufficient proxies for rhetorical-emotional analysis.

6The Batch·26d ago·source ↗

Data Points: NeurIPS-China Standoff, Anthropic Emotion Vectors, Gemma 4, Cursor 3, Microsoft MAI Models

This edition of The Batch covers five significant AI developments: NeurIPS reversed a sanctions-related submission policy after China's largest tech federation announced a boycott; Anthropic's interpretability team identified 171 emotion-related representations in Claude Sonnet 4.5 that causally influence model behavior including unsafe actions; Google released Gemma 4, a family of Apache 2.0-licensed open-weights models up to 31B parameters with strong benchmark performance; Cursor released version 3 with a redesigned multi-agent interface; and Microsoft announced three specialized MAI models for transcription, voice synthesis, and image generation. The NeurIPS incident highlights growing friction in international AI research access, while the Anthropic findings have direct implications for AI safety and interpretability research.

4arXiv · cs.CL·5d ago·source ↗

LLM embedding spaces partially recover expert-defined symptom structure in mental health language

A new arXiv preprint investigates whether LLM embedding geometry aligns with expert-defined symptom structure in mental health language, using 28 Reddit communities as a testbed. The authors compare pretrained and fine-tuned Qwen3 embeddings (0.6B and 4B) against an expert symptom matrix via representational similarity analysis, with controls for affective, stylistic, and topic confounds. Results show measurable but level-dependent alignment: fine-tuning strengthens it at fine-grained category levels, and larger scale improves both zero-shot alignment and fine-tuning gains. The paper argues that classification accuracy alone is insufficient to validate embedding geometry against domain knowledge.

4arXiv · cs.CL·18d ago·source ↗

Calibrated LLM annotation and encoder transfer for measuring human values in social media text

A new arXiv preprint investigates how different LLMs, prompts, and instruction languages operationalize Schwartz's theory of basic human values when annotating non-English social media posts. The authors evaluate annotation quality beyond standard F1 metrics, examining structural alignment, error structure, and confidence-ambiguity relations, finding that iterative prompt calibration reduces misattributions. They also demonstrate that LLM annotations can be transferred to a smaller encoder model via soft-label training, preserving theory-grounded value interpretations and uncertainty information.

3arXiv · cs.CL·20d ago·source ↗

Comparative study of semantic geometry in transformer embeddings vs. graph-based lexical models

A preprint from arXiv compares the geometric and topological properties of transformer-based vector embeddings (CamemBERT) against lexical co-occurrence graphs for representing semantic structure. Applied to a French civic debate corpus, the study finds similar local topology but divergent global structure between the two approaches. The authors argue graph-based models offer more interpretable semantic organization and suggest graphs could guide neural architectures toward more stable, interpretable convergence.

5Hugging Face Blog·1mo ago·source ↗

Vision Language Models (Better, faster, stronger)

A Hugging Face blog post surveys the state of vision-language models (VLMs) in 2025, covering advances in architecture, training, efficiency, and deployment. The post reviews progress across major open and closed VLMs, highlighting trends in multimodal capability, speed improvements, and practical deployment patterns. As a tier-2 commentary piece, it synthesizes the current landscape rather than announcing new research.

4arXiv · cs.CL·20d ago·source ↗

Acoustic cue alignment tokens improve speech emotion recognition in audio language models

Researchers study whether instruction-following audio language models (ALMs) use explicit acoustic cues in a grounded way when raw audio is already available. They derive six interpretable acoustic concept tokens from the eGeMAPS feature set and append them to text prompts, testing on FAU-Aibo and IEMOCAP benchmarks. Aligned tokens improve unweighted average recall while shuffled or corrupted tokens degrade performance, but models don't fully collapse under perturbation, indicating partial anchoring to the audio signal. The work offers a practical probing method for interpretability and robustness in affective computing with ALMs.