Almanac
technique

SETA

techniqueactiveprovisionalseta-dae69119·1 events·first seen 9d ago

Aliases: SETA

Co-occurring entities

More like this (12)

Recent events (1)

4arXiv · cs.LG·9d ago·source ↗

SETA: Sparse Subspace-to-Expert Sharing for Continual Learning in LLMs

Researchers introduce SETA (Mixture of Sparse Experts for Task Agnostic Continual Learning), a framework addressing catastrophic forgetting in LLMs via adaptive sparse subspace decomposition into task-specific and shared expert modules. The approach uses adaptive elastic anchoring and routing-aware regularization to protect shared knowledge at both weight and routing levels. Experiments on LLaMA-2 7B and Qwen3-4B show competitive or superior performance versus continual learning baselines, with strong retention of early-task knowledge.