Qwen3 32B
qwen3-32b-d905d69c·3 events·first seen 15d agoAliases: Qwen3 32B, Qwen3-VL-32B, Qwen3-32B
Co-occurring entities
More like this (12)
Recent events (3)
Activation Capping Technique Stabilizes LLM Assistant Personas Against Drift and Jailbreaks
Researchers from MATS, Oxford, and Anthropic introduced the 'assistant axis,' a vector derived from LLM layer outputs that quantifies how closely a model adheres to its trained assistant persona. They developed 'activation capping,' an inference-time method that corrects deviations from this axis when similarity falls below a threshold. Testing on Gemma 2 27B, Qwen3 32B, and Llama 3.3 70B showed harmful response rates to jailbreak prompts dropped by roughly half (e.g., 83% to 41% for Qwen3 32B) without degrading benchmark performance. The technique targets character-based jailbreaks that bypass system prompts by manipulating a model's internal representational state.
HyperTool: Unified executable MCP-style interface reduces step-wise tool call overhead for LLM agents
HyperTool introduces a unified executable interface that allows LLM agents to invoke multiple tool calls within a single code block, hiding intermediate dataflow from the main reasoning trace. This addresses an 'execution-granularity mismatch' where step-wise atomic tool calls waste context and force models to manage low-level operations. On the MCP-Universe benchmark, HyperTool more than doubles accuracy for Qwen3-32B (15.69% → 35.29%) and Qwen3-8B (9.93% → 33.33%), outperforming GPT-OSS and Kimi-k2.5.
HiViG: History-aware visually grounded critic improves computer use agents across GUI benchmarks
Researchers introduce HiViG, a test-time framework for Computer Use Agents that addresses two weaknesses in existing critic models: short-sighted decision loops and lack of visual grounding. The system trains a multimodal critic on real GUI trajectories to maintain a compact macro-action history and verify execution coordinates against live screenshots before action execution. Evaluated on web, mobile, and desktop benchmarks, HiViG improves average success rates by 5.8% over the strongest baseline with Qwen3-VL-32B and 9.0% with Gemini-3-Flash, with both history and grounding components shown to be independently necessary.