Qwen3-8B-Base
qwen3-8b-base-82d7f8b0·3 events·first seen 26d agoAliases: Qwen3-8B-Base, Qwen3.5-0.8B-Base
Co-occurring entities
More like this (12)
Recent events (3)
Qwen releases Qwen3.5-0.8B-Base multimodal model on Hugging Face
Qwen has released Qwen3.5-0.8B-Base, a small 0.8B parameter image-text-to-text base model on Hugging Face. The model supports conversational use and is compatible with Hugging Face endpoints. With nearly 200K downloads, it signals meaningful community uptake for a compact multimodal base model.
RELEX: Extrapolating LLM RLVR Training via Rank-1 Parameter Trajectories
This paper demonstrates that RLVR weight update trajectories are extremely low-rank and near-linearly predictable, with a rank-1 approximation capturing most downstream performance gains. The authors propose RELEX, a compute-efficient method that observes a short training window, estimates the rank-1 subspace, and extrapolates future checkpoints via linear regression—requiring no additional training. Evaluated on Qwen2.5-Math-1.5B, Qwen3-4B-Base, and Qwen3-8B-Base, RELEX matches or exceeds full RLVR performance using as few as 15% of training steps, and can extrapolate up to 10–20× beyond the observed prefix. The authors attribute the method's effectiveness to a denoising effect from rank-1 projection that discards stochastic optimization noise.
DelTA: Discriminative Token Credit Assignment for RLVR Training
DelTA introduces a discriminative token credit assignment method for reinforcement learning from verifiable rewards (RLVR) that addresses the problem of high-frequency formatting tokens dominating policy gradient updates. The method estimates per-token coefficients to amplify side-specific gradient directions and downweight shared or weakly discriminative ones, making the effective update direction more contrastive. On seven mathematical benchmarks, DelTA outperforms same-scale baselines by 3.26 and 2.62 average points on Qwen3-8B-Base and Qwen3-14B-Base respectively, with additional gains on code generation tasks.