Entity · model

Qwen3-235B

modelactiveqwen3-235b-705f820d·6 events·first seen May 18, 2026

Aliases: Qwen3-235B, Qwen3-235B-A22B, Qwen3 232B-A22B

Merged from

Qwen3-235B-A22B

Co-occurring entities

More like this (12)

Qwen3-30B-A3B Qwen3-4B Qwen1.5-110B Qwen2-72B Qwen1.5-32B Qwen2-57B-A14B Qwen3.5 397B A17B Qwen3.6-35B-A3B Qwen1.5-72B Qwen3.5-35B-A3B Qwen3.5-122B Qwen3-8B-Base

Recent events (6)

5arXiv · cs.CL·38h ago·source ↗

MemSFT: Plug-and-play parametric memory module mitigates alignment tax in domain-specialized LLMs

Researchers propose MemSFT, a method that decouples domain specialization from backbone parameter updates by training a plug-and-play parametric memory module to imitate a non-parametric retriever over domain data. A learned router dynamically fuses the memory and backbone output distributions at each decoding step, allowing selective invocation of domain expertise. Evaluated across biology, geoscience, and law on models from Qwen3-8B to Qwen3-235B-A22B, MemSFT consistently improves domain performance with negligible general-task degradation, whereas full SFT causes severe catastrophic forgetting. The memory module is reusable across LLM sizes, offering a practical path to modular domain specialization.

Enterprise Deployment Patterns Alignment and RLHF MemSFT Qwen3-4B Qwen3-235B

5arXiv · cs.CL·Jul 15, 2026·source ↗

EcoSpec: Cost-aware speculative decoding for MoE models reduces expert memory traffic

Researchers introduce EcoSpec, a speculative decoding framework that incorporates predicted marginal expert activation cost into draft-token selection for sparse Mixture-of-Experts LLMs. The key insight is that standard confidence-driven draft selection causes 'expert scattering'—routing draft tokens to disjoint experts increases memory traffic and undermines speculative decoding speedups. EcoSpec uses a lightweight expert predictor and dynamic expert buffer to favor draft paths that reuse already-loaded experts, achieving up to 1.62× end-to-end decoding speedup. Evaluations cover DeepSeek-V3.1 (671B), Qwen3-235B-A22B, and GPT-OSS-120B across reasoning, coding, QA, and dialogue tasks.

Training Infrastructure Inference Economics speculative decoding DeepSeek V4 GPT-OSS 120B +2 more

7Mistral Ai News·Jun 1, 2026·source ↗

Mistral AI Releases Devstral: Apache 2.0 Agentic Coding Model with SWE-Bench SOTA

Mistral AI, in collaboration with All Hands AI, releases Devstral, an agentic LLM specialized for software engineering tasks under the Apache 2.0 license. The model achieves 46.8% on SWE-Bench Verified, surpassing prior open-source state-of-the-art by over 6 percentage points and outperforming larger models like DeepSeek-V3-0324 (671B) and Qwen3 232B-A22B under the same OpenHands scaffold. Devstral is small enough to run on a single RTX 4090 or a Mac with 32GB RAM, and is available via Mistral's API at $0.1/M input tokens, as well as on HuggingFace, Ollama, and other platforms. Mistral indicates a larger agentic coding model is in development.

Frontier Model Releases Evaluation and Benchmarking DeepSeek-V3-0324 Mistral AI GPT-4.1 mini +10 more

6arXiv · cs.AI·May 20, 2026·source ↗

Graft: Hybrid Tree Construction for Speculative Decoding via Prune-Then-Retrieve

Graft is a training-free framework that improves speculative decoding by coupling dynamic-depth pruning with retrieval-based token compensation. Pruning reduces VRAM and compute overhead while freeing budget for retrieval, which fills topological gaps in the draft tree with near-zero additional cost. On short-context benchmarks, Graft achieves up to 5.41× speedup and improves average speedup over EAGLE-3 by up to 21.8% on Qwen3-235B. The method is evaluated across short- and long-context settings and extended to block-drafting paradigms.

Frontier Model Releases Inference Economics speculative decoding DFlash EAGLE-3 +2 more

8Qwen Research·May 18, 2026·source ↗

Qwen3 Release: Flagship 235B MoE and Full Model Family Announced

Alibaba's Qwen team has released Qwen3, a new family of large language models including the flagship Qwen3-235B-A22B mixture-of-experts model. The flagship model claims competitive benchmark performance against DeepSeek-R1, OpenAI o1/o3-mini, Grok-3, and Gemini-2.5-Pro on coding, math, and general capabilities. A smaller MoE variant, Qwen3-30B-A3B, reportedly outperforms QwQ-32B despite using only one-tenth the activated parameters, and the 4B model is said to match Qwen2.5's larger models. Models are available across Hugging Face, ModelScope, and Kaggle.

Frontier Model Releases Evaluation and Benchmarking Alibaba Qwen DeepSeek V4 Qwen3-30B-A3B +10 more

6arXiv · cs.LG·May 18, 2026·source ↗

FORGE: Self-Evolving Agent Memory via Population Broadcast Without Weight Updates

FORGE (Failure-Optimized Reflective Graduation and Evolution) is a staged, population-based protocol that evolves prompt-injected natural-language memory for hierarchical ReAct agents without any gradient updates. It wraps a Reflexion-style inner loop where a reflection agent converts failed trajectories into textual heuristics or few-shot demonstrations, then propagates the best-performing instance's memory across a population between stages. Evaluated on CybORG CAGE-2 (a stochastic network-defense POMDP), FORGE improves average return by 1.7–7.7× over zero-shot and 29–72% over Reflexion across all 12 model-representation conditions tested with four LLM families. Notably, weaker models benefit disproportionately, suggesting the method may help close capability gaps rather than amplify already-strong models.

Evaluation and Benchmarking Agent and Tool Ecosystem Reflexion Grok-4-Fast ReAct +6 more