Almanac
model

Qwen 3.5

modelactiveprovisionalqwen-3-5-439e3b1b·4 events·first seen 1mo ago

Aliases: Qwen 3.5, Qwen-3.5

Co-occurring entities

More like this (12)

Recent events (4)

5Interconnects·1mo ago·source ↗

Latest open artifacts (#19): Qwen 3.5, GLM 5, MiniMax 2.5 — Chinese labs' latest push of the frontier

A Interconnects newsletter roundup covering recent open-weight model releases from Chinese AI labs, specifically Qwen 3.5, GLM 5, and MiniMax 2.5. The piece frames these as a continued frontier push from Chinese research organizations. The body content is minimal beyond the title and greeting, suggesting this is either a stub or the full content was not captured.

6arXiv · cs.CL·8d ago·source ↗

SpatialWorld benchmark evaluates interactive spatial reasoning of multimodal agents in real-world tasks

Researchers introduce SpatialWorld, a benchmark for evaluating interactive spatial understanding of multimodal agents across 760 human-annotated tasks spanning household, travel, and social domains. The benchmark integrates eight simulation backends under a shared protocol, requiring agents to operate under vision-only partial observability with egocentric inputs. Evaluation of 15 agents reveals that even the strongest model, GPT-5, achieves only 17.4% task success rate, exposing significant gaps in active exploration and long-horizon planning. The work highlights a mismatch between task success and execution efficiency as a key bottleneck for spatial agents.

7The Batch·16d ago·source ↗

Data Points: Qwen3.7-Max, OpenAI Math Proof, Gated DeltaNet-2, Trump AI Order, Microsoft Fara1.5

This edition of The Batch covers five significant AI developments: Alibaba's Qwen3.7-Max reasoning model with 1M token context and agentic capabilities ranking fifth on the Artificial Analysis Intelligence Index; an OpenAI reasoning model resolving the 80-year-old Erdős planar unit distance problem; Nvidia's Gated DeltaNet-2 outperforming Mamba-3 and other linear attention architectures; Trump pulling back a proposed AI regulation executive order; and Microsoft Research's Fara1.5 computer-use agent family beating OpenAI Operator and Google Gemini on the Online-Mind2Web benchmark.

6arXiv · cs.CL·9d ago·source ↗

LLM-guided MAP-Elites evolution improves medical decision pipelines at inference time

Researchers propose using LLM-guided MAP-Elites evolutionary search as an inference-time alternative to fine-tuning for adapting LLMs to clinical workflows, formulating triage, consultation, and image classification as evolutionary searches over executable artifacts. Across three medical settings, evolved programs substantially outperform manually designed baselines: triage accuracy improves from 77.3% to 87.1% and emergency recall from 0.60 to 0.97, with gains also shown on MIMIC-ESI, iCRAFTMD, and PneumoniaMNIST. The approach works across Llama-3, Qwen-3.5, and Gemma-4 backbones and produces interpretable program-level mechanisms rather than superficial prompt changes.