technique
Skill-Conditioned Gated Self-Distillation (SGSD)
techniqueactiveprovisional
skill-conditioned-gated-self-distillation-sgsd--e5b709a5·1 events·first seen 20d agoAliases: Skill-Conditioned Gated Self-Distillation (SGSD)
Co-occurring entities
More like this (12)
SGSDSelf-Distillationon-policy self-distillationZEDA (Zero-Expert Self-Distillation Adaptation)Goal-Conditioned Reinforcement LearningSkill-RMState-Conditioned Dynamic SteeringLearning from the Self-future: On-policy Self-distillation for dLLMsSemantic Generative Tuning (SGT)task-conditioned generationSelf-FlowSkillGenBench
Recent events (1)
Skill-Conditioned Gated Self-Distillation (SGSD) for LLM Reasoning
SGSD is a new on-policy self-distillation method for LLM reasoning that replaces trusted privileged information (e.g., reference answers) with an experience-derived skill bank of skill-mistake pairs. It constructs a multi-teacher pool, validates each teacher's contribution via a verifier, and applies a gated objective to distill informative disagreements while suppressing noisy signals. On Qwen3-1.7B, SGSD outperforms GRPO by 6.2% and answer-conditioned OPSD by 1.7% on average across AIME24, AIME25, and HMMT25. The method relaxes the assumption of trusted privileged information, making self-distillation more practical under weaker supervision.