technique

Mixture of Efficient Experts

techniqueactiveprovisionalmixture-of-efficient-experts-896d8074·1 events·first seen 2d ago

Aliases: Mixture of Efficient Experts

Co-occurring entities

CHERRY-1.8B CHERRY: Compressed Hierarchical Experts with Recurrent Representational Yield Selective Ground Truth Token Training

More like this (12)

Sparse Mixture-of-Experts Mixture of Experts Redesign Mixture-of-Experts Routers with Manifold Power Iteration Toward Calibrated Mixture-of-Experts Under Distribution Shift Tying the Loop -- Tied Expert Layers in Mixture-of-Experts Language Models From Observation to Intervention: A Causal Audit of Expert Importance in Mixture-of-Experts Models Mixture-of-Agents Greedy Ensemble Selection From Self-Supervised Speech Models to Mixture-of-Experts for Robust Anti-Spoofing Sparse Subspace-to-Expert Sharing for Task-Agnostic Continual Learning Expert Tying Layer-Adaptive Expert Pruning

Recent events (1)

5arXiv · cs.CL·2d ago·source ↗

CHERRY: Compressed Hierarchical Experts with Recurrent Representational Yield — three compute-efficient LM training techniques

A preprint from arXiv introduces CHERRY, a suite of three complementary techniques for compute-efficient language model training: Selective Ground Truth Token Training (SGT) that concentrates supervision on ~15% of semantically loaded tokens while recovering ~67% of full-sequence loss reduction; depth compression that shrinks a 48-layer 1B-parameter model to 6 layers (227M) via layer averaging and recurrent unrolling, matching a 566M dense model's loss; and a Mixture of Efficient Experts (MoEE) assembly that outperforms individual compressed models at comparable active parameters. The techniques are validated on CHERRY-1.8B, a Korean-language foundation model trained entirely from scratch using these methods. Authors are transparent about scope limitations: one model family, Korean data, and loss-based metrics only.

Training Infrastructure Open Weights Progress CHERRY-1.8B CHERRY: Compressed Hierarchical Experts with Recurrent Representational Yield Selective Ground Truth Token Training +2 more