Entity · technique

Subspace Projection

techniqueactivesubspace-projection-473dcb77·1 events·first seen May 28, 2026

Aliases: Subspace Projection

Co-occurring entities

Gemma-3-4B-IT Sparse Autoencoders (SAEs)Minerva Math Task Vectors SAE Specificity Score

More like this (12)

low-rank subspace projection Recovery Subspace Dimensionality Orthogonal Residual Projection Boyle-Dykstra Projection Routing-Conditioned Projection Rank-Constrained Subspace Learning (RCSL)Multimodal Voice Activity Projection Unstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse Autoencoders Sparse Subspace-to-Expert Sharing for Task-Agnostic Continual Learning SpatialWorld superposition Planar Unit Distance Problem

Recent events (1)

6arXiv · cs.CL·May 28, 2026·source ↗

SAEs as Stethoscopes: Interpretability-Guided Layer Selection for Task Vector Model Editing

This paper evaluates a Sparse Autoencoder (SAE)-guided model editing pipeline for mathematical reasoning on Gemma-3-4B-IT, finding that projecting task vectors onto SAE feature subspaces discards ~97% of modification energy due to geometric misalignment between activation-space SAE directions and weight-space task vectors. The authors reframe SAEs as diagnostic tools ('stethoscopes') rather than intervention filters ('scalpels'), using SAE-derived specificity scores to identify which layers to inject unfiltered task vectors into. This approach improves Number Theory accuracy from 29.6% to 39.4% on Minerva Math (p=0.0007), with 5 of 7 math subjects significantly improved and none degraded. The method is fully deterministic and adds no inference cost.

Evaluation and Benchmarking AI Safety Research Subspace Projection Gemma-3-4B-IT Sparse Autoencoders (SAEs)+4 more