Almanac
model

OLMoE

modelactiveprovisionalolmoe-f2d1e1ab·1 events·first seen 26h ago

Aliases: OLMoE

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.CL·26h ago·source ↗

Expert Tying reduces MoE LLM memory footprint by ~2x with minimal quality loss

Researchers introduce Expert Tying, an architectural modification for Mixture-of-Experts LLMs that shares expert parameters across consecutive transformer layers while keeping routing and attention layer-independent. Evaluated on OLMoE, Qwen3, and DeepSeek-style MoE architectures, the method achieves nearly 2x memory reduction with negligible perplexity or downstream quality degradation. The approach exploits parameter redundancy in MoE pathways to improve the compute-to-memory trade-off for training and inference.