Entity · model

Qwen3-14B-Base

modelactiveqwen3-14b-base-95402e8a·1 events·first seen May 21, 2026

Aliases: Qwen3-14B-Base

Co-occurring entities

DelTA Qwen3-8B-Base policy gradient token credit assignment Qwen Reinforcement Learning with Verifiable Rewards

More like this (12)

Qwen3-4B-Base Qwen3-8B-Base Qwen3.5-2B-Base Qwen3-30B-A3B-Base Qwen3.5-35B-A3B-Base Qwen3-4B Qwen3-14B Qwen3-1.7B-Base Qwen3-235B Qwen3-30B-A3B Qwen1.5-72B Qwen1.5-32B

Recent events (1)

6arXiv · cs.CL·May 21, 2026·source ↗

DelTA: Discriminative Token Credit Assignment for RLVR Training

DelTA introduces a discriminative token credit assignment method for reinforcement learning from verifiable rewards (RLVR) that addresses the problem of high-frequency formatting tokens dominating policy gradient updates. The method estimates per-token coefficients to amplify side-specific gradient directions and downweight shared or weakly discriminative ones, making the effective update direction more contrastive. On seven mathematical benchmarks, DelTA outperforms same-scale baselines by 3.26 and 2.62 average points on Qwen3-8B-Base and Qwen3-14B-Base respectively, with additional gains on code generation tasks.

Frontier Model Releases Evaluation and Benchmarking DelTA Qwen3-8B-Base policy gradient +5 more