Entity · technique

Expert Tying

techniqueactiveexpert-tying-7cee5802·1 events·first seen Jun 16, 2026

Aliases: Expert Tying

Co-occurring entities

DeepSeek V4 Tying the Loop -- Tied Expert Layers in Mixture-of-Experts Language Models OLMoE Qwen3

More like this (12)

Tying the Loop -- Tied Expert Layers in Mixture-of-Experts Language Models TIES-Merging TIES Mixture of Efficient Experts It Takes a MAESTRO To Prune Bad Experts Expert Token Rank Layer-Adaptive Expert Pruning Mixture of Experts CARE (Confidence-Adaptive Routing of Experts)Flow Matching TTT-E2E Test-Time Finetuning (TTFT)

Recent events (1)

6arXiv · cs.CL·Jun 16, 2026·source ↗

Expert Tying reduces MoE LLM memory footprint by ~2x with minimal quality loss

Researchers introduce Expert Tying, an architectural modification for Mixture-of-Experts LLMs that shares expert parameters across consecutive transformer layers while keeping routing and attention layer-independent. Evaluated on OLMoE, Qwen3, and DeepSeek-style MoE architectures, the method achieves nearly 2x memory reduction with negligible perplexity or downstream quality degradation. The approach exploits parameter redundancy in MoE pathways to improve the compute-to-memory trade-off for training and inference.

Training Infrastructure Frontier Model Releases DeepSeek V4 Tying the Loop -- Tied Expert Layers in Mixture-of-Experts Language Models Expert Tying +3 more