Qwen3-14B
qwen3-14b-7d8fb642·3 events·first seen 8d agoAliases: Qwen3-14B
Co-occurring entities
More like this (12)
Recent events (3)
RACES framework enables recursive composition of verifiable RL environments for LLM reasoning generalization
RACES (Recursive Automated Composition for Environment Scaling) is a new framework that treats verifiable RL training environments as composable building blocks, automatically fusing them when input/output types match. The system implements 300 base environments and four composition operators (SEQUENTIAL, PARALLEL, SORT, SELECT) to generate diverse reasoning patterns at scale. Experiments show consistent gains on unseen benchmarks: DeepSeek-R1-Distill-Qwen-14B improves from 48.2 to 51.3 and Qwen3-14B from 58.8 to 61.1 averaged across six benchmarks. Notably, RACES achieves parity with 300 individual environments using only 50 base environments, suggesting strong efficiency gains over linear environment scaling.
TRACE: Tree-structured rollout budget allocation for efficient agentic RL training
TRACE (Tree Rollout Allocation for Contrastive Exploration) is a new framework for improving reinforcement learning with verifiable rewards (RLVR) in multi-turn agentic LLM settings. The method models each ReAct-style thought-action-observation turn as a distinct node, enabling budget allocation across both prompt-level and turn-level prefixes in a tree structure, rather than only at the prompt level. A shared predictor estimates conditional success probability at each anchor to guide allocation, enriching reward contrast within a fixed sampling budget. Empirically, TRACE improves Qwen3-14B multi-hop QA accuracy by 2.8 points over baselines at equal sampling cost.
Three-axis uncertainty estimation framework for code generation outperforms NL-derived baselines
A new arXiv preprint argues that uncertainty estimation (UE) for code generation requires code-specific design rather than methods ported from natural language. The authors propose three orthogonal uncertainty axes—lexical (token entropy), algorithmic (pseudo-code consistency), and functional (behavioral consistency)—grounded in properties unique to code: token fragility, intent-code gap, and executability. Evaluated across five code LLMs, their ensemble improves average AUROC from 0.696 to 0.776 (+8.1 points) over the strongest NL-derived baseline, with a single-pass token entropy method on Qwen3-14B matching multi-pass baselines at 3x lower cost. The work is directly relevant to safe deployment of LLMs in agentic coding pipelines.