Events
OpenAI introduces Deployment Simulation to predict model behavior pre-release
OpenAI has announced Deployment Simulation, a method for predicting AI model behavior before deployment by using real conversation data. The approach aims to improve safety evaluation accuracy by simulating how models will behave in production conditions prior to release. This represents a methodological contribution to pre-deployment safety evaluation pipelines.
Simon Willison quotes Georgi Gerganov
Simon Willison shares a quote from Georgi Gerganov, the creator of llama.cpp. The body of the item is empty, so the specific content of the quote is unavailable. Georgi Gerganov is a significant figure in the open-weights inference ecosystem, making any substantive statement from him potentially relevant to tracking open-source LLM tooling trends.
Zvi Mowshowitz reviews Fable and Mythos AI model welfare features
Zvi Mowshowitz (Don't Worry About the Vase) publishes a review of Fable and Mythos, two AI products or models, focusing on model welfare considerations. The products are currently unavailable following what the author calls a 'fiasco,' though he continues the review in present tense as if they were accessible. The piece is notable for engaging with model welfare as a substantive evaluation dimension.
Interconnects interviews Finbarr Timbers on frontier post-training recipes
Interconnects (Nathan Lambert) publishes interview #18 with Finbarr Timbers reviewing frontier post-training recipes. The conversation likely covers RLHF, preference optimization, and related techniques used by leading labs. Timbers is a practitioner with direct experience in post-training at frontier scale.
Google DeepMind partners with UK government on AI-accelerated housing planning prototype
Google DeepMind and the UK government have announced a partnership to build an AI-powered prototype aimed at accelerating housing planning decisions. The initiative targets the UK's housing development bottleneck by applying AI to planning workflows. This represents a notable government-lab collaboration deploying AI in a high-stakes public sector context.
Visually grounded method connects spoken words to written text without textual supervision
Researchers present a method for learning mappings between written words and spoken audio using only images and spoken captions, with no explicit text supervision. The approach uses image captioning to build a written vocabulary, then applies unsupervised word discovery to align spoken utterances to those words. The system outperforms a neural baseline on spoken word retrieval and keyword spotting tasks in English, with implications for low-resource language processing.
Semi-supervised framework scales LLM reasoning with minimal labeled data via lightweight verifier
A new arXiv preprint proposes a semi-supervised framework for training LLMs to reason with very few labeled examples, using a lightweight classifier to judge the validity of intermediate reasoning traces. An entropy-based confidence threshold filters unreliable pseudo-labels before fine-tuning. Experiments on math reasoning (Orca-Math subset) and visual QA (GQA) show accuracy comparable to using 10-15x more labeled data. The approach reduces dependence on expensive answer-level supervision by turning verification into a data-creation mechanism.
RL-trained LLMs learn retriever-specific query formulation strategies for RAG
A new arXiv paper presents the first systematic study of using reinforcement learning to teach LLMs to adapt query formulation strategies to different retrieval backends. The authors find that different retrievers have surprisingly distinct optimal query styles (e.g., descriptive vs. question-like), making cross-retriever strategy transfer ineffective. They introduce a branching-based rollout technique to stabilize training over multi-step retrieval trajectories and show gains from retriever-specific human guidance and model scaling.
SearchGEO framework measures LLM search agent vulnerability to web content manipulation
Researchers introduce SearchGEO, a controlled evaluation framework for measuring endorsement corruption in LLM-based web-search agents, combining a manipulation pipeline, five-mode attack taxonomy, and multiple output metrics. Evaluating 13 LLM backends on 308 cases each, they find attack success rates ranging from 0.0% on Claude-Sonnet-4.6 to 31.4% on Gemini-3-Flash, with model-family-specific vulnerability patterns. An auxiliary probe escalating endorsement to install commands reveals a behavioral split: Claude over-rejects while GPT over-trusts. The findings argue for treating adversarial search content robustness as a first-class safety evaluation dimension for deployed agents.
Expert Tying reduces MoE LLM memory footprint by ~2x with minimal quality loss
Researchers introduce Expert Tying, an architectural modification for Mixture-of-Experts LLMs that shares expert parameters across consecutive transformer layers while keeping routing and attention layer-independent. Evaluated on OLMoE, Qwen3, and DeepSeek-style MoE architectures, the method achieves nearly 2x memory reduction with negligible perplexity or downstream quality degradation. The approach exploits parameter redundancy in MoE pathways to improve the compute-to-memory trade-off for training and inference.
Systematic study of tree traversal methods in Transformer Grammars reveals trade-offs between composition and lookahead
A new arXiv preprint evaluates Depth-First, Breadth-First, and a novel hybrid Production-Rule Traversal strategy for linearizing syntactic trees in Transformer Grammars. The authors test these methods across language modeling, syntactic generalization, and summarization tasks with varying tree configurations and masking strategies. The study reveals inherent trade-offs between nested composition and global lookahead, offering design recommendations for task-aware Transformer Grammars.
Transformer embeddings shown to intrinsically encode Russell's circumplex model of emotion geometry
A new arXiv paper investigates whether Transformer-based text and speech encoders (RoBERTa, wav2vec 2.0) recover the geometric structure of Russell's circumplex model of affect — a valence-arousal topology from psychology. Experiments on naturalistic datasets (MSP-Podcast) and LLM-generated stimuli show that multimodal fusion achieves perfect topological alignment with Russell's primary emotion ordering, and zero-shot generic text embeddings place fine-grained emotion terms near their human-mapped coordinates. The authors argue this structure is intrinsically encoded in the representations rather than being an artifact of labeling, bridging psychological theory and representation learning.
RDS Fusion: Hybrid neuro-symbolic gating with compressed CoT for zero-shot irony detection
Researchers introduce the Robust Dual-Signal (RDS) Fusion framework, a hybrid neuro-symbolic architecture that compresses Chain-of-Thought reasoning without supervised fine-tuning for irony and sarcasm detection in social media text. Evaluated on TweetEval (N=734) and iSarcasm, the zero-shot system matches fine-tuned BERTweet performance and outperforms supervised SemEval transformer ensembles on the imbalanced iSarcasm dataset. A statistical ablation shows that only the full concurrent fusion of all three signals yields a validated improvement, with individual components providing no significant standalone gain.
ASRD: Training-free anchor-guided revocable decoding for diffusion LLMs improves accuracy and throughput
A new arXiv preprint introduces ASRD (Anchor Supervised Revocable Decoding), a training-free framework for improving decoding quality in diffusion large language models. The method addresses error propagation and local error reinforcement in revocable decoding by separating trusted 'anchor tokens' (identified via temporal consistency) from uncertain candidates, then applying anchor-guided generation and anchor-perturbed verification. Experiments on math and coding benchmarks show up to 6.4% accuracy improvement and 7.2× inference throughput gains over remasking baselines.
Revisiting LLM systematicity in negation understanding via in-context learning
A new arXiv preprint analyzes how well large language models handle negation from two angles: behavioral systematicity (whether models correctly recognize negation expressions and scope) and representational systematicity (whether function vectors can be reliably constructed from in-context examples). Results show LLMs partially succeed at negation cue recognition via in-context learning but struggle with scope recognition, with performance varying by output format. Function vectors can be composed for cue extraction but are harder to extract for scope recognition tasks.
Dataset and analysis of scam trends and rail paths from Reddit self-disclosure narratives
Researchers build a dataset of 21,304 Reddit posts from scam-related subreddits to analyze yearly trends in scam types and multi-stage rail paths from 2023–2025. An LLM-assisted annotation method labels 1,800 posts for scam chain analysis, and a topic model examines community support behavior. The work is primarily a social science/NLP contribution to fraud detection research rather than an AI capability or infrastructure advance.
Causal DAG model for when AI systems should engage Theory of Mind in conflict scenarios
A new arXiv preprint proposes a structural causal model (formalized as a directed acyclic graph) that treats Theory of Mind as a conditionally activated mechanism rather than an always-on capacity in AI systems. The model specifies exogenous situational and agent-level conditions, five endogenous mediators, and three causal pathways (tractability, reasoning-depth, enabling-cause) leading to an epistemic accuracy outcome. The work targets human-machine teaming in conflict contexts, offering a resource-rational decision procedure for when AI should engage social reasoning. Simulation validation and ethical considerations for conflict-optimized mentalizing are discussed.
Hop-count taxonomy predicts LLM failure on clinical EHR question answering across architectures
Researchers introduce a 'hop-count' taxonomy — the number of distinct inferential steps required to answer a clinical EHR question — as a principled predictor of LLM failure, finding monotone accuracy decline with reasoning depth across Claude Sonnet, GPT-4o, and GPT-5. The pattern holds across two providers and two OpenAI generations, with odds ratios per hop of 0.58–0.80, and is not explained by EHR context truncation. Extended thinking (chain-of-thought) did not significantly flatten the accuracy-depth curve, though token usage scaled with hop count. The findings ground transformer compositionality limits in a clinically consequential domain and suggest hop count as a deployment risk-stratification tool.
Causal auditing framework detects privacy disclosures in synthetic data without model access
A new arXiv preprint introduces a model-agnostic empirical framework for auditing synthetic data generated by LLMs and generative AI systems for privacy leakage. The framework distinguishes 'true disclosures' (direct reproduction of user data) from 'phantom disclosures' (incidental generation), using held-out control sets and statistical hypothesis testing without requiring model access, canary insertion, or shadow model training. It functions as a membership inference attack and provides empirical lower bounds on privacy leakage that are tighter than prior data-based auditing methods. The approach is computationally lightweight and applicable to any synthetic data generation mechanism.
Informath: Symbolic informalization for converting formal proofs to fluent natural language
The paper introduces Informath, a project for symbolic informalization — converting formally verified mathematics into readable natural language without loss of precision. The architecture uses Dedukti as an interlingua hub connecting proof systems (Agda, Lean, Rocq) and Grammatical Framework (GF) for multilingual natural language generation. The work is relevant to AI-assisted formal verification pipelines where autoformalization produces machine-checked proofs that need to be made human-interpretable.
Controlled ablation reveals training artifact behind low frame rate degradation in neural audio codecs
A new arXiv preprint investigates why neural audio codecs degrade sharply at low frame rates (≤6.25 Hz), a property relevant to autoregressive speech synthesis where generation cost scales with sequence length. The authors reproduce a previously reported quality cliff at 6.25 Hz and show it stems from a suboptimal training configuration—fixed clip duration starves the decoder of inter-token context at low frame rates—rather than fundamental phonemic or codebook limits. After correcting the training setup, word error rate degrades smoothly down to 1.6 Hz, suggesting low frame rate codecs are more practically accessible than prior work implied.
Study on text-enriched matrix factorization finds limited marginal contribution of review signals
A preprint investigates three strategies for incorporating textual reviews into matrix factorization-based recommender systems, including learnable gating and cross-attention mechanisms. Experiments across multiple datasets find that collaborative signals continue to dominate performance over textual enrichment. The work is a narrow recommender-systems contribution with no direct relevance to frontier AI/ML.
Decade-long analysis of 56,800 AI conference papers finds sixfold increase in code/data sharing
A new arXiv preprint analyzes documentation and reproducibility practices across 56,800 papers from five leading AI conferences between 2014 and 2024. Code and data sharing rose nearly sixfold from 11% to 64%, with estimated reproducibility increasing from 28% to 64% over the same period. Notably, improvements in documentation practices predate the introduction of formal reproducibility checklists, suggesting the shift reflects a broader open-science movement rather than compliance with venue requirements.
Contrastive-Difference CKA reveals concept-specific structural alignment across LLM architectures
Researchers introduce CKA_Delta (contrastive-difference CKA), a training-free diagnostic that isolates concept-specific representational convergence from generic similarity across LLM architectures. The method reveals a geometric-functional universality dissociation: moderate geometric alignment coexists with near-perfect functional transfer across six concept domains and multiple architectural families. CKA_Delta also functions as an architectural outlier detector, flagging Gemma as a notable outlier (d=1.08, AUC=0.79). The work provides a practical tool for cross-architecture concept monitoring without requiring model training.
LOGOS: A unified autoregressive foundation model for natural science tasks across domains
Researchers introduce LOGOS (Language Of Generative Objects in Science), a generative language model that encodes heterogeneous scientific objects and spatial interactions as discrete token sequences within a single autoregressive framework, avoiding explicit coordinates or geometric neural networks. Models are trained at 1B, 3B, and 8B parameter scales and consistently match or outperform domain-specific baselines across diverse scientific tasks. The work argues that AI for Science should converge on shared architectures and training paradigms with LLMs rather than maintaining a separate technical stack. Model weights are released publicly.