5arXiv cs.LG (Machine Learning)·5d ago

Paper argues Compressed Computation toy model is not computation in superposition

A new arXiv preprint challenges the Compressed Computation (CC) toy model introduced by Braun et al. (2025), which appeared to compute 100 ReLU functions using only 50 neurons. The authors show that apparent performance gains arise from unintended input mixing via a noisy residual stream rather than genuine superposition, with learned neuron directions concentrating in the subspace of the top 50 eigenvalues of the mixing matrix. A semi-non-negative matrix factorization baseline derived purely from the mixing matrix reproduces the qualitative loss profile, supporting the conclusion that CC is not a valid toy model of computation in superposition.

Evaluation and Benchmarking AI Safety Research superposition Compressed Computation is (probably) not Computation in Superposition Braun et al. 2025 Compressed Computation

Related guides (2)

AI Safety ResearchTopic guide

AI Safety Research: From Lab Policies to Real-World Flashpoints

Read asBeginner In-depth

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

3Openai Blog·1mo ago·source ↗

Nonlinear Computation in Deep Linear Networks

OpenAI published a research finding examining how deep linear networks can perform nonlinear computation. The work investigates the theoretical properties of linear neural network architectures and their computational capabilities. This is an older research paper from 2017 that touches on foundational questions about neural network expressivity.

OpenAI

6arXiv · cs.CL·17d ago·source ↗

Dynamic short convolutions yield 1.33–1.60× compute advantage over standard Transformers

A new arXiv preprint introduces dynamic short convolutions as an architectural primitive for Transformers, using input-dependent filters to combine locality bias with increased expressivity. Experiments across 150M–2B parameter language models show consistent perplexity improvements over standard Transformers and static convolution variants, with scaling-law fits indicating a 1.33× compute advantage when applied to key/query/value vectors and 1.60× when added after every linear layer. The technique also improves linear RNNs (Mamba-2, Gated DeltaNet) and mixture-of-experts architectures, with custom Triton kernels making training practical.

Training Infrastructure Frontier Model Releases Triton Mamba Gated DeltaNet-2 +1 more

5arXiv · cs.CL·4d ago·source ↗

Post-hoc falsification operators for frozen small code models fail to beat Best-of-N in leakage-free evaluation

A measurement study evaluates 26 post-hoc operators (selection, verification, repair, elimination, portfolios) applied to frozen small code models (≤1.5B parameters) against a Best-of-N baseline under a strict leakage-free, matched-compute protocol. None of the semantic operators improves held-out accuracy over BoN, with the failure traced to three structural mechanisms: a coverage wall, a capability scissors, and a near-empty consensus trap. Two non-semantic operators do provide value: an expression-layer recovery method (M1) lifts DeepSeek-Coder-1.3B by +12 tasks on HumanEval+ (p=2.4e-4), and an adaptive consensus early-stop saves ~19% compute with no accuracy harm. The paper's core lesson is that harness quality and coverage measurement should precede investment in semantic post-hoc reasoning.

Evaluation and Benchmarking Inference Economics Selection Without Signal, Recovery Through Expression: A Measurement Study of Post-Hoc Falsification Operators for Frozen Small Code Models deepseek-coder Best-of-N +2 more

4Hugging Face Blog·1mo ago·source ↗

A Failed Experiment: Infini-Attention, and Why We Should Keep Trying?

A Hugging Face blog post documents an attempt to implement and validate Infini-Attention, a technique proposed to extend transformer context length by combining local and compressed global memory. The experiment reportedly failed to reproduce the claimed benefits, raising questions about the reproducibility and practical viability of the approach. The post frames the failure as instructive and argues for continued experimentation with long-context architectures.

Long Context Evolution Evaluation and Benchmarking Hugging Face Infini-Attention

7arXiv · cs.AI·22d ago·source ↗

Bounding Compositional Incoherence in Multi-Component LLM Agents

This paper formalizes a failure mode in multi-component LLM agent systems where individual components are locally probabilistically coherent but their composition violates basic probability axioms. The authors introduce the 'compositional residual' (eps*) as a runtime-computable measure of this incoherence, finding it positive in 33–94% of ensemble cliques across 1,876 tested configurations on a four-LLM panel. A hierarchical Boyle-Dykstra projection is proposed as a deterministic repair, and an anytime-valid e-process enables sequential monitoring. Notably, three intuitive LLM-side mitigations—retrieval, partition-aware prompting, and aggregator-LLM—each fail or regress.

Evaluation and Benchmarking AI Safety Research Compositional Residual (eps*)Proportional Allocation Rule Multi-Component LLM Agent +4 more

6arXiv · cs.CL·15d ago·source ↗

Phantom specialization in circuit discovery: structural differences don't imply distinct mechanisms

A new arXiv preprint challenges a core assumption in mechanistic interpretability: that structurally different circuits discovered for the same task imply distinct computational mechanisms. Using Literal Sequence Copying across token-frequency bands in five Pythia models (70M–1.4B), the authors extract 75 circuits and show that structurally distinct circuits implement the same computation, with band-specific edges transferring broadly and a shared core recovering ≥99% of circuit performance. The paper introduces the term 'phantom specialization' for this pattern and argues that standard source-level evaluation inflates apparent faithfulness, while edge-level evaluation and cross-condition transfer tests are needed to detect the many-to-one mapping from structure to function.

Evaluation and Benchmarking AI Safety Research Pythia Many Circuits, One Mechanism: Input Variation and Evaluation Granularity in Circuit Discovery

5arXiv · cs.CL·11d ago·source ↗

Predictor-gated bank-wise sparsity recipe for dense-to-sparse LLM upcycling from Qwen2.5-8B

A new arXiv preprint introduces a continual training recipe to convert dense LLMs into channel-sparse models without post-hoc pruning. Starting from a Qwen2.5-8B checkpoint, the method uses a low-rank predictor to gate FFN channel routing, achieving 4x sparsity in FFN intermediate activations via a bank-wise top-k rule at 32K context. The routing module is trained on the main language modeling path, making the resulting sparsity hardware-oriented rather than approximate. The authors also identify and patch a layer-local long-context failure mode on the RULER-CWE benchmark.

Training Infrastructure Inference Economics Continual LLM Upcycling: A Predictor-Gated Bank-Wise Sparsity Training Recipe for Dense-to-Sparse LLMs SwiGLU RULER-CWE +1 more

6arXiv · cs.LG·17d ago·source ↗

Rosetta Neurons follow sublinear power-law scaling with model size, becoming more monosemantic at scale

A new arXiv paper investigates how neuron populations evolve with scale in both language models (up to 30B parameters) and vision models (up to 5B parameters), focusing on 'Rosetta Neurons' — neurons with similar activation patterns across independently trained models. The authors find Rosetta Neurons grow in absolute count but shrink as a fraction of total neurons, and exhibit a 'Neuron Polarization Effect' where they become increasingly monosemantic while non-Rosetta neurons remain less selective. An analytical model explains the sublinear power-law scaling, and the paper demonstrates practical utility via a targeted data-filtering case study for continued pretraining. The results extend scaling laws to neuron-level interpretability structure, linking model size to systematic changes in universality and specialization.

Evaluation and Benchmarking AI Safety Research Rosetta Neurons Neuron Populations Exhibit Divergent Selectivity with Scale Dravid et al., 2023