Entity · model

OLMoE

modelactiveolmoe-f2d1e1ab·1 events·first seen Jun 16, 2026

Aliases: OLMoE

Co-occurring entities

DeepSeek V4 Tying the Loop -- Tied Expert Layers in Mixture-of-Experts Language Models Expert Tying Qwen3

More like this (12)

OLMo2 OLMo OLMo-3 MobileMoE MoE²-LoRA OLMo-1B OLMoE-1B-7B OLMoE-1B-7B-0924 Localized LoRA-MoE LatentMoE Stable LatentMoE SegMoE

Recent events (1)

6arXiv · cs.CL·Jun 16, 2026·source ↗

Expert Tying reduces MoE LLM memory footprint by ~2x with minimal quality loss

Researchers introduce Expert Tying, an architectural modification for Mixture-of-Experts LLMs that shares expert parameters across consecutive transformer layers while keeping routing and attention layer-independent. Evaluated on OLMoE, Qwen3, and DeepSeek-style MoE architectures, the method achieves nearly 2x memory reduction with negligible perplexity or downstream quality degradation. The approach exploits parameter redundancy in MoE pathways to improve the compute-to-memory trade-off for training and inference.

Training Infrastructure Frontier Model Releases DeepSeek V4 Tying the Loop -- Tied Expert Layers in Mixture-of-Experts Language Models Expert Tying +3 more