paper
Tying the Loop -- Tied Expert Layers in Mixture-of-Experts Language Models
paperactiveprovisional
tying-the-loop-tied-expert-layers-in-mixture-of-experts-language-models-bf517d85·1 events·first seen 25h agoAliases: Tying the Loop -- Tied Expert Layers in Mixture-of-Experts Language Models
Co-occurring entities
More like this (12)
Sparse Mixture-of-ExpertsMixture of ExpertsFrom Observation to Intervention: A Causal Audit of Expert Importance in Mixture-of-Experts ModelsRedesign Mixture-of-Experts Routers with Manifold Power IterationExpert TyingExpert-Aware Causal Tracing of Factual Recall in Sparse MoE Language ModelsKnowledge Editing in Masked Diffusion Language ModelsFrom Self-Supervised Speech Models to Mixture-of-Experts for Robust Anti-SpoofingLayer-Adaptive Expert PruningSparse Subspace-to-Expert Sharing for Task-Agnostic Continual LearningLatent Context Language ModelsGenerative Explainability for Next-Generation Networks: LLM-Augmented XAI with Mutual Feature Interactions
Recent events (1)
Expert Tying reduces MoE LLM memory footprint by ~2x with minimal quality loss
Researchers introduce Expert Tying, an architectural modification for Mixture-of-Experts LLMs that shares expert parameters across consecutive transformer layers while keeping routing and attention layer-independent. Evaluated on OLMoE, Qwen3, and DeepSeek-style MoE architectures, the method achieves nearly 2x memory reduction with negligible perplexity or downstream quality degradation. The approach exploits parameter redundancy in MoE pathways to improve the compute-to-memory trade-off for training and inference.