Entity · paper

Tying the Loop -- Tied Expert Layers in Mixture-of-Experts Language Models

paperactivetying-the-loop-tied-expert-layers-in-mixture-of-experts-language-models-bf517d85·1 events·first seen Jun 16, 2026

Aliases: Tying the Loop -- Tied Expert Layers in Mixture-of-Experts Language Models

Co-occurring entities

DeepSeek V4 Expert Tying OLMoE Qwen3

More like this (12)

Sparse Mixture-of-Experts Mixture of Experts Mixture of Efficient Experts From Observation to Intervention: A Causal Audit of Expert Importance in Mixture-of-Experts Models Toward Calibrated Mixture-of-Experts Under Distribution Shift Redesign Mixture-of-Experts Routers with Manifold Power Iteration Expert Tying Expert-Aware Causal Tracing of Factual Recall in Sparse MoE Language Models Exposure is Optional: Learning Unlike Coordination in Language Models Knowledge Editing in Masked Diffusion Language Models From Self-Supervised Speech Models to Mixture-of-Experts for Robust Anti-Spoofing Do LLM Embedding Spaces Recover Expert Structure?

Recent events (1)

6arXiv · cs.CL·Jun 16, 2026·source ↗

Expert Tying reduces MoE LLM memory footprint by ~2x with minimal quality loss

Researchers introduce Expert Tying, an architectural modification for Mixture-of-Experts LLMs that shares expert parameters across consecutive transformer layers while keeping routing and attention layer-independent. Evaluated on OLMoE, Qwen3, and DeepSeek-style MoE architectures, the method achieves nearly 2x memory reduction with negligible perplexity or downstream quality degradation. The approach exploits parameter redundancy in MoE pathways to improve the compute-to-memory trade-off for training and inference.

Training Infrastructure Frontier Model Releases DeepSeek V4 Tying the Loop -- Tied Expert Layers in Mixture-of-Experts Language Models Expert Tying +3 more