technique

Pointwise Mutual Information

techniqueactiveprovisionalpointwise-mutual-information-02dcbd30·1 events·first seen 12h ago

Aliases: Pointwise Mutual Information

Co-occurring entities

on-policy self-distillation Purified OPSD: On-Policy Self-Distillation Without Losing How to Think

More like this (12)

mutual information Kullback-Leibler divergence Variational Information Bottleneck Pair Opt-dist Jensen-Shannon divergence Generalized Method of Moments Pair M-dist Marker Internal Confidence (MIC)AIR: Adaptive Interleaved Reasoning with Code in MLLMs difference-in-means Probabilistic Smoothing with Ratio-Monotone Transforms outcome indistinguishability

Recent events (1)

6arXiv · cs.AI·12h ago·source ↗

Purified OPSD fixes on-policy self-distillation failures in long chain-of-thought reasoning models

A new arXiv preprint identifies why on-policy self-distillation (OPSD) consistently degrades long chain-of-thought reasoning models: the teacher's supervision signal is dominated by reference-induced shortcuts rather than question-conditioned, transferable corrections. The authors propose a two-step fix using a reference-only teacher to isolate the non-transferable component and pointwise mutual information (PMI) to construct a cleaner distillation target. Experiments across four long-CoT models on two datasets show consistent improvements over both the base model and standard OPSD while preserving reflective reasoning behavior.

Alignment and RLHF on-policy self-distillation Pointwise Mutual Information Purified OPSD: On-Policy Self-Distillation Without Losing How to Think