Entity · model

Qwen3.6-27B

modelactiveqwen3-6-27b-1edab66d·7 events·first seen May 26, 2026

Aliases: Qwen3.6-27B, Qwen3.5-27B

Co-occurring entities

More like this (12)

Qwen3.6-35B-A3B Qwen3.5-35B-A3B Qwen3.5-122B Qwen3-30B Qwen 3.5 27B Qwen3.5-35B-A3B-Base Qwen3-14B Qwen2.5-14B Qwen3.5-122B-A10B Qwen2.5-7B Qwen2.5-8B Qwen2.5-3B

Recent events (7)

6arXiv · cs.LG·2d ago·source ↗

MindForge pipeline fine-tunes small models for whole-life-cycle software engineering via source-free program synthesis

MindForge is an automated pipeline that converts open-source command-line programs into source-free training environments exposing only compiled executables and documentation, enabling training data generation for from-scratch program synthesis. Using GLM-5.2 as a teacher agent, the authors fine-tune Qwen3.6-27B on synthesized trajectories, raising its ProgramBench pass rate from 37.98% to 49.51% and achieving gains across seven held-out benchmarks including SWE-bench Verified (+5.04) and RepoZero-C2Rust (+31.00). The work addresses a gap in coding agent training infrastructure by spanning the full software engineering life cycle rather than single-phase tasks. The result is notable for achieving frontier-comparable performance on a 27B model through targeted data curation.

Evaluation and Benchmarking Open Weights Progress FeatBench MindForge NL2Repo-Bench +9 more

6arXiv · cs.CL·Jul 10, 2026·source ↗

Proactive Memory Agent reduces behavioral state decay in long-horizon tasks

Researchers introduce a plug-and-play memory agent module that runs alongside an unmodified action agent, maintaining a structured memory bank and selectively injecting reminders when relevant state would otherwise be lost in long trajectories. The approach addresses 'behavioral state decay' — the failure mode where task-critical context gets buried or pushed out of the context window. Evaluated on Terminal-Bench 2.0 and τ²-Bench, the module yields +8.3 pp and +6.8 pp pass@1 gains respectively, with ablations confirming selective injection outperforms always-on or passive retrieval approaches. The authors also train an open-weight memory policy (Qwen3.5-27B) using SFT and GRPO, showing partial transfer to Terminal-Bench.

Long Context Evolution Open Weights Progress GRPO Qwen3.6-27B Remember When It Matters: Proactive Memory Agent for Long-Horizon Agents +4 more

5arXiv · cs.CL·Jun 17, 2026·source ↗

Fine-tuning LLMs to passively estimate depression severity from AI mental health conversations

Researchers fine-tune a Qwen3.5-27B model with a regression head to predict PHQ-9 depression severity scores directly from AI mental health app conversation transcripts, eliminating the need for explicit self-report completion. The training set of 6,283 users combines 3,111 ground-truth labels with pseudolabels generated by Claude Opus and iterative intermediate models. On a held-out test of 842 users, the best model achieves MAE=2.6, Pearson r=0.80, and AUC=0.91 at the clinical PHQ-9≥10 threshold, with AUC>0.87 across all severity thresholds. The work demonstrates a passive, continuous symptom-monitoring approach that could reduce response bias in mental health platforms.

Enterprise Deployment Patterns Claude Opus 4.6 Patient Health Questionnaire-9 Qwen3.6-27B +1 more

7Qwen·Jun 5, 2026·source ↗

Qwen releases Qwen3.5-27B multimodal model on Hugging Face

Qwen has released Qwen3.5-27B, a 27-billion parameter image-text-to-text model, on Hugging Face. The model supports conversational use and is compatible with Azure deployment endpoints. With nearly 3 million downloads and 981 likes, it has seen substantial community uptake.

Frontier Model Releases Open Weights Progress Qwen3.6-27B Qwen Hugging Face +1 more

6Qwen·Jun 5, 2026·source ↗

Qwen releases Qwen3.6-27B multimodal model on Hugging Face

Qwen published Qwen3.6-27B, a 27-billion-parameter image-text-to-text model, on Hugging Face. The model supports conversational use and is compatible with Azure deployment endpoints. With over 5.4 million downloads and 1,619 likes, it has seen substantial community uptake.

Frontier Model Releases Open Weights Progress Qwen3.6-27B Qwen Hugging Face +1 more

6arXiv · cs.AI·May 26, 2026·source ↗

VeriTrace: Cognitive-Graph Framework with Explicit Regulatory Loops for Deep Research Agents

VeriTrace introduces a cognitive-graph framework for deep research agents that replaces implicit LLM reasoning over intermediate representations with three explicit regulatory loops: interpretive update, deviation feedback, and schema revision. The system addresses contamination and error propagation in evolving mental models during complex multi-step research tasks. Using Qwen3.5-27B backbones, VeriTrace improves over the strongest matched baseline by 4.22 pp on DeepResearch Bench Insight and 5.9 pp Overall win rate on DeepConsult. With Config-DeepSeek, it achieves the strongest reproducible open-source result on DeepResearch Bench.

Frontier Model Releases Evaluation and Benchmarking DeepSeek V4 cognitive-graph DeepResearch Bench +4 more

5arXiv · cs.CL·May 26, 2026·source ↗

Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals

This paper investigates uncertainty quantification (UQ) for activation oracles—systems that make LLM internal activations human-legible—by evaluating 6 confidence estimation methods across 6,000 samples per oracle. The authors find that bootstrap mode frequency achieves the best calibration (ECE 5.7% vs. 25.5% for log-probability baseline on Qwen3-8B), while the log-prob baseline remains useful as a cheap triage signal. Experiments vary verbalizer and context prompts across two Qwen3 model sizes. Code and a patched trainer are released publicly.

Evaluation and Benchmarking AI Safety Research Expected Calibration Error Activation Oracles Qwen3-4B +4 more