CHERRY-1.8B
cherry-1-8b-aa028ce5·1 events·first seen 2d agoAliases: CHERRY-1.8B
Co-occurring entities
More like this (12)
Recent events (1)
CHERRY: Compressed Hierarchical Experts with Recurrent Representational Yield — three compute-efficient LM training techniques
A preprint from arXiv introduces CHERRY, a suite of three complementary techniques for compute-efficient language model training: Selective Ground Truth Token Training (SGT) that concentrates supervision on ~15% of semantically loaded tokens while recovering ~67% of full-sequence loss reduction; depth compression that shrinks a 48-layer 1B-parameter model to 6 layers (227M) via layer averaging and recurrent unrolling, matching a 566M dense model's loss; and a Mixture of Efficient Experts (MoEE) assembly that outperforms individual compressed models at comparable active parameters. The techniques are validated on CHERRY-1.8B, a Korean-language foundation model trained entirely from scratch using these methods. Authors are transparent about scope limitations: one model family, Korean data, and loss-based metrics only.