Almanac
technique

Rubric-Conditioned Self-Distillation

techniqueactiveprovisionalrubric-conditioned-self-distillation-1f31ef49·1 events·first seen 2d ago

Aliases: Rubric-Conditioned Self-Distillation

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.AI·2d ago·source ↗

Rubric-Conditioned Self-Distillation: structured feedback for reasoning model post-training

A new arXiv preprint proposes Rubric-Conditioned Self-Distillation (RCSD), a post-training framework that replaces scalar reward signals and noisy chain-of-thought annotations with structured rubrics for fine-grained credit assignment. The method conditions a teacher model on criterion-level rubrics to provide token-level guidance on the student's own sampled trajectories, avoiding reliance on a single reference rationale. Evaluated on science reasoning benchmarks, RCSD outperforms GRPO by 1.0 points and OPSD by 0.9 points on average.