Almanac
technique

Generalized Distillation

techniqueactiveprovisionalgeneralized-distillation-77cf246f·1 events·first seen 14d ago

Aliases: Generalized Distillation

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.LG·14d ago·source ↗

Sleep paradigm for LLMs enables continual learning and memory consolidation via distillation and RL

A new arXiv preprint proposes a 'Sleep' paradigm for language models that enables continual learning by consolidating short-term in-context memories into long-term parameters. The framework has two stages: Knowledge Seeding (distilling a smaller model's memories into a larger network via on-policy distillation combined with RL-based imitation learning) and Dreaming (self-improvement via RL-generated synthetic curricula without human supervision). Experiments cover long-horizon tasks, continual learning, knowledge incorporation, and few-shot generalization, addressing a known weakness of current LLMs in retaining temporal knowledge across contexts.