Entity · model

Qwen3-4B-Base

modelactiveqwen3-4b-base-1f13b116·4 events·first seen May 21, 2026

Aliases: Qwen3-4B-Base, Qwen3.5-4B-Base

Co-occurring entities

LKvaluesIT LKValues Aya-Expanse-8B-Base Qwen3.5-2B-Base LKvaluesBench RegMix CausalMix Qwen2.5-0.5B Qwen Hugging Face RLVR Qwen3-8B-Base Qwen2.5-Math-PRM Reinforcement Learning with Verifiable Rewards rank-1 approximation Wei Zhepei Alibaba Qwen Team RELEX

More like this (12)

Qwen3-8B-Base Qwen3-14B-Base Qwen3.5-2B-Base Qwen3-4B Qwen3-30B-A3B-Base Qwen3.5-35B-A3B-Base Qwen3-1.7B-Base Qwen3-235B Qwen2.5-1.5B-Base Qwen3-30B-A3B Qwen1.5-72B Qwen 2.5-7B

Recent events (4)

4arXiv · cs.CL·Jul 23, 2026·source ↗

LKValues: First benchmark and instruction corpus for Sri Lankan societal value alignment in LLMs

Researchers introduce LKValues, a resource suite for aligning LLMs with Sri Lankan cultural values, derived from a trilingual survey of 205 respondents. The suite includes LKvaluesIT, a 150k-instance Sinhala-English instruction corpus, and LKvaluesBench, a 1,000-instance evaluation benchmark. Fine-tuning experiments on Qwen and Aya-Expanse models show that current LLMs exhibit cultural and low-resource alignment gaps, and that LKValues fine-tuning reduces invalid outputs and cross-lingual disparities. The work offers a replicable pipeline for country-specific pluralist value alignment in underrepresented languages.

Evaluation and Benchmarking Alignment and RLHF LKvaluesIT LKValues Aya-Expanse-8B-Base +3 more

5arXiv · cs.CL·Jul 2, 2026·source ↗

CausalMix frames LLM data mixture optimization as causal inference to generalize across data pool shifts

CausalMix proposes treating data mixture optimization for LLM training as a causal inference problem, using Conditional Average Treatment Effect (CATE) estimation to infer optimal domain mixtures without costly retraining when the data pool changes. The method fits a causal model on 512 runs of Qwen2.5-0.5B and extrapolates the resulting mixture to train a 7B model, also generalizing to long chain-of-thought data on Qwen3-4B-Base. It outperforms RegMix and other baselines across multiple downstream tasks while providing interpretable visual analysis of mixing strategies via a CATE Interpreter. The approach addresses a practical scalability limitation in existing proxy-model-based mixture methods.

Training Infrastructure Evaluation and Benchmarking RegMix CausalMix Qwen3-4B-Base +1 more

6Qwen·Jun 5, 2026·source ↗

Qwen releases Qwen3.5-4B-Base multimodal model on Hugging Face

Qwen has released Qwen3.5-4B-Base, a 4-billion parameter base model supporting image-text-to-text tasks, published on Hugging Face. The model is tagged as conversational and endpoints-compatible, using the safetensors format. With over 207,000 downloads, it represents a new entry in the Qwen3.5 model family with multimodal capabilities at a small parameter count.

Frontier Model Releases Open Weights Progress Qwen Qwen3-4B-Base Hugging Face +1 more

7arXiv · cs.CL·May 21, 2026·source ↗

RELEX: Extrapolating LLM RLVR Training via Rank-1 Parameter Trajectories

This paper demonstrates that RLVR weight update trajectories are extremely low-rank and near-linearly predictable, with a rank-1 approximation capturing most downstream performance gains. The authors propose RELEX, a compute-efficient method that observes a short training window, estimates the rank-1 subspace, and extrapolates future checkpoints via linear regression—requiring no additional training. Evaluated on Qwen2.5-Math-1.5B, Qwen3-4B-Base, and Qwen3-8B-Base, RELEX matches or exceeds full RLVR performance using as few as 15% of training steps, and can extrapolate up to 10–20× beyond the observed prefix. The authors attribute the method's effectiveness to a denoising effect from rank-1 projection that discards stochastic optimization noise.

Training Infrastructure Frontier Model Releases RLVR Qwen3-8B-Base Qwen3-4B-Base +8 more