model

GLM-Z1-9B-0414

modelactiveprovisionalglm-z1-9b-0414-479e4967·1 events·first seen 2d ago

Aliases: GLM-Z1-9B-0414

Co-occurring entities

USACO Qwen3-4B LiveCodeBench RiVER ALE-Bench

More like this (12)

GLM-4.7 GLM-5.1 OLMoE-1B-7B-0924 GLM-4.7-Flash GLM GLM-OCR GLM-4-Voice LM1B LLaDA-1.5-8B Qwen3.5-122B-A10B Apertus-8B-Instruct-2509 LLaVA-1.5-13B

Recent events (1)

6arXiv · cs.LG·2d ago·source ↗

RiVER framework enables RL training of LLMs on tasks without ground-truth solutions

Researchers introduce RiVER (Ranking-induced VERifiable framework), a reinforcement learning approach that trains LLMs on score-based optimization tasks using deterministic execution feedback as continuous rewards, without requiring ground-truth answers. The method addresses two failure modes in group-relative RL with continuous rewards—scale dominance and frequency dominance—via calibrated, instance-wise reward shaping. Applied to Qwen3-8B and GLM-Z1-9B-0414 on competitive programming tasks, RiVER improves ALE rating rank by ~9% and also transfers to exact-solution benchmarks (LiveCodeBench, USACO) with 2-4% absolute gains, unlike raw-score baselines. The result suggests score-based heuristic tasks can serve as general-purpose RL training environments for coding ability.

Evaluation and Benchmarking Alignment and RLHF USACO Qwen3-4B LiveCodeBench +3 more