technique
Rubric-Conditioned Self-Distillation
techniqueactiveprovisional
rubric-conditioned-self-distillation-1f31ef49·1 events·first seen 2d agoAliases: Rubric-Conditioned Self-Distillation
Co-occurring entities
More like this (12)
Skill-Conditioned Gated Self-Distillation (SGSD)Rubric Rewardrubric-based reward shapingon-policy self-distillationThe Role of Feedback Alignment in Self-DistillationSelf-Distillationrubric-based rewardsself-refinementPreference-Aware Rubric LearningChain-of-Thought Self-ConsistencyDeepRubricLearning from the Self-future: On-policy Self-distillation for dLLMs
Recent events (1)
Rubric-Conditioned Self-Distillation: structured feedback for reasoning model post-training
A new arXiv preprint proposes Rubric-Conditioned Self-Distillation (RCSD), a post-training framework that replaces scalar reward signals and noisy chain-of-thought annotations with structured rubrics for fine-grained credit assignment. The method conditions a teacher model on criterion-level rubrics to provide token-level guidance on the student's own sampled trajectories, avoiding reliance on a single reference rationale. Evaluated on science reasoning benchmarks, RCSD outperforms GRPO by 1.0 points and OPSD by 0.9 points on average.