Entity · model

Qwen3-30B

modelactiveqwen3-30b-57a1c433·2 events·first seen Jun 15, 2026

Aliases: Qwen3-30B

Co-occurring entities

AGC-Judge AGC-Bench HELM Judge Response Theory AudioDER Qwen2-Audio-7B-Instruct MMSU MMAU-mini MMAR

More like this (12)

Qwen3-30B-A3B Qwen3.6-27B Qwen3.5-35B-A3B Qwen3-30B-A3B-Base Qwen3-30B-A3B-Instruct Qwen3.5-122B Qwen3.6-35B-A3B Qwen3-14B Qwen3.5-122B-A10B Qwen 3.5 27B Qwen3.5-35B-A3B-Base Qwen3VL-8B

Recent events (2)

6arXiv · cs.CL·Jul 2, 2026·source ↗

AGC-Bench: A unified benchmark for measuring artificial general creativity in LLMs

Researchers introduce AGC-Bench, a comprehensive AI creativity benchmark built from a systematic review of 3,101 papers and 497 existing benchmarks, covering 78 datasets across brainstorming, STEM, narrative, figurative language, and humor. The work introduces Judge Response Theory to correct for LLM-as-judge bias and fine-tunes Qwen3-30B to produce AGC-Judge, an open-weight scoring model. Key findings include the recovery of a single creativity factor 'c' (analogous to the general intelligence 'g' factor) explaining 81.5% of variance across 83 LLMs, and evidence that top humans still outperform top LLMs on creativity tasks. The benchmark, leaderboard, and human data are released as open infrastructure.

Frontier Model Releases Evaluation and Benchmarking AGC-Judge AGC-Bench HELM +2 more

4arXiv · cs.AI·Jun 15, 2026·source ↗

AudioDER: Deduplication-enhanced reasoning dataset for post-training large audio-language models

Researchers introduce AudioDER, a ~191k-sample post-training dataset for Large Audio-Language Models (LALMs) built via an acoustic similarity-based deduplication pipeline to reduce redundancy and improve corpus diversity. Each sample pairs an audio clip with a multiple-choice question, answer candidates, a caption, and a chain-of-thought rationale generated by Qwen3-30B. Post-training Qwen2-Audio-7B-Instruct on AudioDER yields consistent gains on audio reasoning benchmarks including MMAU-mini, MMSU, and MMAR. The work addresses a data quality gap in audio-language training rather than proposing a new model architecture.

Evaluation and Benchmarking Multimodal Progress AudioDER Qwen2-Audio-7B-Instruct Qwen3-30B +3 more