technique
DelTA
techniqueactive
delta-5c091fe8·1 events·first seen 26d agoAliases: DelTA
Co-occurring entities
More like this (12)
Recent events (1)
DelTA: Discriminative Token Credit Assignment for RLVR Training
DelTA introduces a discriminative token credit assignment method for reinforcement learning from verifiable rewards (RLVR) that addresses the problem of high-frequency formatting tokens dominating policy gradient updates. The method estimates per-token coefficients to amplify side-specific gradient directions and downweight shared or weakly discriminative ones, making the effective update direction more contrastive. On seven mathematical benchmarks, DelTA outperforms same-scale baselines by 3.26 and 2.62 average points on Qwen3-8B-Base and Qwen3-14B-Base respectively, with additional gains on code generation tasks.