ATWU: Token-level importance learning improves LLM unlearning via retain-conflict criterion
This paper introduces Alternating Token-Weighted Unlearning (ATWU), a framework that learns which tokens in a forget sample are most relevant to unlearning by characterizing their conflict with the retain objective. Rather than relying on auxiliary models or heuristics, ATWU jointly learns token forget-specificity and model parameters using a lightweight linear scorer over hidden states. Evaluated on TOFU and RWKU benchmarks, ATWU achieves state-of-the-art forget-retain trade-offs and produces token-level scores that align with ground-truth forget-specific spans.
Related guides (2)
Related events (8)
MAST: Mechanism-guided selective unlearning for RLVR-trained reasoning models
Researchers introduce MAST (Mechanism-Aligned Selective Targeting), a method for selectively unlearning capabilities induced by reinforcement learning from verifiable rewards (RLVR) in language models while minimizing collateral damage to retained knowledge. The approach ranks attention-projection tensors by off-principal energy and gradient coupling to identify a targeted subset for update, rather than applying full-parameter gradient ascent. Evaluated on Qwen2.5-Math-1.5B and Qwen3-1.7B-Base, MAST achieves statistically significant forgetting on target MATH problems while preserving GSM8K performance, whereas full-parameter unlearning collapses retained capabilities. The method generalizes across seeds and unlearning objectives (NPO/SimNPO).
DelTA: Discriminative Token Credit Assignment for RLVR Training
DelTA introduces a discriminative token credit assignment method for reinforcement learning from verifiable rewards (RLVR) that addresses the problem of high-frequency formatting tokens dominating policy gradient updates. The method estimates per-token coefficients to amplify side-specific gradient directions and downweight shared or weakly discriminative ones, making the effective update direction more contrastive. On seven mathematical benchmarks, DelTA outperforms same-scale baselines by 3.26 and 2.62 average points on Qwen3-8B-Base and Qwen3-14B-Base respectively, with additional gains on code generation tasks.
STARE: Token-level advantage reweighting to prevent entropy collapse in GRPO-style RL training
Researchers introduce STARE, a method addressing policy entropy collapse in GRPO-style reinforcement learning from verifiable rewards (RLVR) for LLM post-training. Through first-order gradient analysis, they identify a token-level credit assignment mismatch and propose selectively reweighting advantages for entropy-critical tokens using batch-internal surprisal quantiles plus a closed-loop entropy gate. Evaluated across 1.5B–32B models on short/long chain-of-thought and multi-turn tool use tasks, STARE outperforms DAPO and other baselines by 4–8% on AIME24/25 while sustaining stable training over thousands of steps.
SETA: Sparse Subspace-to-Expert Sharing for Continual Learning in LLMs
Researchers introduce SETA (Mixture of Sparse Experts for Task Agnostic Continual Learning), a framework addressing catastrophic forgetting in LLMs via adaptive sparse subspace decomposition into task-specific and shared expert modules. The approach uses adaptive elastic anchoring and routing-aware regularization to protect shared knowledge at both weight and routing levels. Experiments on LLaMA-2 7B and Qwen3-4B show competitive or superior performance versus continual learning baselines, with strong retention of early-task knowledge.
Backdoor unlearning in LLMs generalizes across unknown triggers via cross-backdoor transfer
Researchers demonstrate that training an LLM to unlearn a single backdoor trigger can suppress other backdoors that were never explicitly targeted, a phenomenon they call cross-backdoor transfer. The study spans three model families with backdoors injected via pretraining or continual pretraining, and introduces a new metric called Cross Activation Shift Distance to quantify the relationship between different unlearning interventions. The finding opens a potential defensive strategy where defenders deliberately inject and then remove controlled backdoors to suppress unknown attacker-planted backdoors.
CLP: Lightweight collocation-length predictor achieves zero-loss multi-token inference speedup
Researchers propose CLP (Collocation-Length Predictor), a span-level decision layer for accelerating LLM inference via multi-token prediction without quality degradation. The key insight is 'Backbone-as-Architect': the backbone LM head always generates the first token while MTP heads handle only subsequent tokens, eliminating head-backbone competition that causes repetitive outputs in prior methods. CLP uses a single linear layer (~4.6K–7.7K parameters) versus 1M-parameter gate networks in prior work, achieving 1.14x–1.29x speedup on Qwen2.5 models with near-zero repetition ratio. The paper also establishes that shorter prediction horizons improve MTP head accuracy on larger models, offering a scaling-aware design principle.
ToaST: Tokenization with Split Trees Reduces Token Count by 11%+ Over BPE/WordPiece/UnigramLM
ToaST (Tokenization with Split Trees) is a new subword tokenization method that uses a recursive binary split-tree inference procedure and Integer Programming-based vocabulary selection to directly optimize compression. On English text, ToaST reduces token counts by more than 11% compared to BPE, WordPiece, and UnigramLM at vocabulary sizes of 40,960 and above, effectively extending context length for models using it. In 1.5B parameter LM training experiments, ToaST achieves the highest CORE benchmark score, outperforming baselines by 2.6%–7.6% across 22 tasks. The LP relaxation of the vocabulary selection IP is near-integral in practice, yielding provably near-optimal vocabularies.
Continual learning approach for disfluency-aware ASR with explicit disfluency tokens
A new arXiv preprint addresses the challenge of transcribing disfluent speech (hesitations, repetitions, fillers) in ASR systems, which typically omit such markers causing information loss. The authors introduce explicit disfluency tokens into a pretrained ASR model and apply continual learning to adapt across datasets with varying disfluency distributions while mitigating catastrophic forgetting. The work identifies a trade-off between disfluency marker learning and general ASR performance, and finds a consistent cross-attention head mechanism shared across continual learning methods.

