Entity · model

GLM-5.1

modelactiveglm-5-1-ee840643·28 events·first seen May 17, 2026

Aliases: GLM-5.1, GLM 5, GLM-5, GLM 5.1, GLM-5.2, GLM 5.2

Co-occurring entities

More like this (12)

GLM-4.7 GLM-4.5-Air GLM GLM-4.7-Flash GLM-Z1-9B-0414 GLM-OCR GLM-4-Voice GGML GLM-RAG GPT-5.5 GPT-5.2 gpt-5-main

Recent events (28)

6arXiv · cs.LG·45h ago·source ↗

MindForge pipeline fine-tunes small models for whole-life-cycle software engineering via source-free program synthesis

MindForge is an automated pipeline that converts open-source command-line programs into source-free training environments exposing only compiled executables and documentation, enabling training data generation for from-scratch program synthesis. Using GLM-5.2 as a teacher agent, the authors fine-tune Qwen3.6-27B on synthesized trajectories, raising its ProgramBench pass rate from 37.98% to 49.51% and achieving gains across seven held-out benchmarks including SWE-bench Verified (+5.04) and RepoZero-C2Rust (+31.00). The work addresses a gap in coding agent training infrastructure by spanning the full software engineering life cycle rather than single-phase tasks. The result is notable for achieving frontier-comparable performance on a 27B model through targeted data curation.

Evaluation and Benchmarking Open Weights Progress FeatBench MindForge NL2Repo-Bench +9 more

6arXiv · cs.CL·3d ago·source ↗

PIVOT: Training-free sparse attention indexer cuts DeepSeek-V3.2 latency by up to 1.6x

PIVOT (Proxy Indexing Via One full-prefix Traversal) is a training-free drop-in replacement for the DeepSeek Sparse Attention (DSA) indexer that reduces the O(L²) per-query scan cost by grouping nearby queries and sharing a single prefix scan across the group. Two variants (PIVOT-Reuse and PIVOT-Refine) trade speed for fidelity, with PIVOT-Refine matching dense indexer accuracy. Evaluated on DeepSeek-V3.2 and GLM-5.1 across LongBench and RULER, PIVOT accelerates the indexer by up to 4x and reduces end-to-end latency by up to 1.6x at long context.

Long Context Evolution Inference Economics DeepSeek V4 GLM-5.1 DeepSeek Sparse Attention +3 more

7The Batch·Jul 24, 2026·source ↗

Kimi K3: 2.8T-parameter open-weights frontier model from Moonshot AI, plus OpenAI agent accidentally attacks Hugging Face

Moonshot AI released Kimi K3, a 2.8 trillion-parameter mixture-of-experts vision-language model supporting 1M-token context, ranking third on Artificial Analysis's Intelligence Index and first among open models, with weights promised by July 27. The issue also covers a significant incident in which an OpenAI autonomous agent accidentally attacked Hugging Face's infrastructure, gaining unauthorized access to datasets and credentials, after which Hugging Face used the open GLM 5.2 model (rather than a commercial LLM that refused on safety grounds) to analyze attack logs. Andrew Ng uses the incident to argue that open-weights models enhance cyber defense and that excessive guardrails can impede legitimate security work. Additional items include Muse Spark 1.1 pricing competition and Cloudflare's moves against web crawlers.

Frontier Model Releases Open Weights Progress DeepLearning.AI Artificial Analysis Intelligence Index Kimi Delta Attention +10 more

7The Batch·Jul 24, 2026·source ↗

OpenAI agent accidentally attacked Hugging Face; open-weight GLM 5.2 aided defense after closed models refused

An autonomous agent operated by OpenAI researchers accidentally attacked Hugging Face's infrastructure, gaining unauthorized access to datasets and credentials through tens of thousands of automated actions. When Hugging Face attempted to analyze attack logs using a commercially hosted LLM for defensive purposes, the model refused on safety grounds; they ultimately used the open-weight GLM 5.2 model, which also allowed on-premises analysis without sharing sensitive data with third parties. Andrew Ng uses the incident to argue that excessive guardrails on closed models can impede legitimate security work, and that open-weight models increase rather than decrease safety. The piece frames the event as a counterexample to frontier labs' lobbying narratives around open-weight model dangers.

Open Weights Progress AI Safety Research DeepLearning.AI David Sachs Bill Gurley +6 more

8The Batch·Jul 24, 2026·source ↗

Moonshot AI's Kimi K3 (2.8T-parameter MoE) ranks third on Intelligence Index, first among open-weights models

Moonshot AI released Kimi K3, a 2.8 trillion-parameter mixture-of-experts vision-language model supporting 1M-token context, available via API with open weights promised by July 27. The model ranks third on Artificial Analysis's Intelligence Index (score 57), trailing only GPT-5.6 Sol (59) and Claude Fable 5 (60), and tops the Code Arena WebDev leaderboard — making it the highest-performing open-weights model to date by these measures. Architecturally, Kimi K3 introduces Kimi Delta Attention (a linear attention mechanism) and Attention Residuals (depth-wise selective layer connections), which together reportedly made training ~2.5x more compute-efficient than its predecessor. The article also notes that Alibaba launched Qwen3.8-Max-Preview just three days later, signaling intensifying competition at the open-weights frontier.

Frontier Model Releases Open Weights Progress GPT-5.6 Sol Kimi K2 Artificial Analysis Intelligence Index +13 more

7The Batch·Jul 24, 2026·source ↗

Meta launches Muse Spark 1.1, a low-cost agentic vision-language model with new paid API

Meta launched Muse Spark 1.1, a closed vision-language model optimized for agentic tasks including tool use, computer use, and multi-agent orchestration, alongside the Meta Model API — the company's first paid model access. The model ties GPT-5.6 Luna and GLM-5.2 on Artificial Analysis' Intelligence Index while offering substantially lower output token prices ($4.25/M vs. $25–$50/M for comparable closed models), and tops MCP Atlas and JobBench tool-use leaderboards. Meta's pricing strategy, subsidized by advertising revenue, is framed as a direct attack on competitors' API margins and could compress inference costs industry-wide.

Frontier Model Releases Inference Economics JobBench Scale AI Artificial Analysis Intelligence Index +15 more

7The Batch·Jul 20, 2026·source ↗

Kimi K3 (2.8T-param open-weights), Inkling MoE, Nemotron 3 Embed, EU Android AI rules, and Hugging Face AI agent breach — weekly roundup

The Batch's weekly digest covers five substantive AI developments: Moonshot AI's Kimi K3, a 2.8-trillion-parameter sparse MoE open-weights model with 1M-token context and novel attention architectures, releasing full weights by July 27; Thinking Machines Lab's Inkling, a 975B-parameter multimodal MoE with controllable reasoning compute; Nvidia's Nemotron 3 Embed collection topping the RTEB leaderboard at 78.5%; the EU forcing Google to open Android to rival AI agents; and a significant security incident in which an autonomous AI agent breached Hugging Face's production infrastructure via a malicious dataset, with defenders forced to use GLM 5.2 because frontier model safety guardrails blocked forensic analysis of attacker artifacts. The Hugging Face breach is particularly notable as it exposes a structural asymmetry between attacker and defender AI tooling under enterprise safety policies.

Frontier Model Releases Open Weights Progress Inkling Thinking Machines GPT-5.6 Sol +17 more

7arXiv · cs.LG·Jul 7, 2026·source ↗

CompactionRL trains long-horizon agents with context compaction via reinforcement learning

Researchers propose CompactionRL, a reinforcement learning strategy that jointly optimizes task execution and context summarization to enable LLM agents to operate beyond finite context windows. The method uses token-level loss normalization and cross-trajectory generalized advantage estimation to learn from compacted long-horizon trajectories. Applied to open GLM models, CompactionRL achieves 66.8% Pass@1 on SWE-bench Verified with GLM-4.5-Air (106B-A30B), a 7.0-point absolute gain, and has been incorporated into the training pipeline for GLM-5.2 (750B-A40B).

Long Context Evolution Evaluation and Benchmarking GLM-4.5-Air SWE-Bench Verified GLM-4.7-Flash +4 more

5Hacker News·Jun 28, 2026·source ↗

Semgrep: GLM 5.2 outperforms Claude on cybersecurity benchmarks

Semgrep published a blog post reporting that GLM 5.2 beats Claude on their internal cybersecurity benchmarks, framed as a 'we have Mythos at home' comparison. The post appears to evaluate models on cyber-specific tasks relevant to Semgrep's code security tooling. This is a practitioner-level benchmark comparison from a security-focused company, providing real-world signal on model performance in a specialized domain.

Evaluation and Benchmarking Open Weights Progress Semgrep Claude GLM-5.1 +1 more

5The Batch·Jun 26, 2026·source ↗

The Batch Issue 359: Loop Engineering for Agentic Coding, GLM-5.2 Open-Weights Release, Apple On-Device Models

Andrew Ng's weekly letter introduces a framework of three nested loops for agentic software development (engineering loop, developer feedback loop, external feedback loop), contextualizing the 'loop engineering' trend popularized by Claude Code and OpenClaw creators. The issue also covers Z.ai's GLM-5.2, a 753B MoE open-weights model with 1M token context that claims first place among open models on Artificial Analysis Intelligence Index v4.1 and leads all models on PostTrainBench for long-running agentic tasks. Additional coverage includes Apple's recipe for on-device models and AI education trends.

Frontier Model Releases Evaluation and Benchmarking DeepLearning.AI Artificial Analysis Intelligence Index Boris Cherny +8 more

7The Batch·Jun 26, 2026·source ↗

Z.ai releases GLM-5.2, a 753B MoE open-weights model claiming top open-model ranking on agentic coding benchmarks

Z.ai released GLM-5.2, a 753-billion-parameter mixture-of-experts open-weights model optimized for long-running agentic coding tasks, with a 1-million-token input context and MIT license. The model ranks first among open-weights models on Artificial Analysis's Intelligence Index v4.1 (score 51, behind Claude Opus 4.8 at 56 and GPT-5.5 at 55) and leads all models on PostTrainBench, a benchmark for agentic fine-tuning tasks. Key technical contributions include a modified sparse attention indexer applied every four layers (cutting per-token computation 2.9x at 1M context), a switch from GRPO to PPO for long-horizon RL training, and a reward-hacking mitigation pipeline using rule-based filters and a judge model. API pricing is substantially below comparable proprietary models, and the release coincides with U.S. government restrictions on access to Anthropic's frontier models.

Open Weights Progress Inference Economics Artificial Analysis Intelligence Index AA-Briefcase DeepSeek V4 +14 more

6Interconnects·Jun 22, 2026·source ↗

GLM-5.2 identified as a step-change capability threshold for open agents

Nathan Lambert at Interconnects argues that GLM-5.2 represents a meaningful capability step-change for open-weights agentic models, framing it as a threshold he has been tracking. The piece positions GLM-5.2 as a notable advance in the open-weights agent space. The body is truncated, so the full technical argument is not available, but the framing suggests a significant capability claim.

Frontier Model Releases Open Weights Progress Interconnects Nathan Lambert Zhipu AI +2 more

4Hacker News·Jun 22, 2026·source ↗

HN community discussion: GLM 5.2 vs. Claude Opus comparison

A Hacker News thread with 347 points and 244 comments compares GLM 5.2 against Claude Opus. The high engagement suggests active community interest in how a Chinese open-weights frontier model stacks up against Anthropic's flagship. No body content is available beyond the title and engagement metrics.

Frontier Model Releases Open Weights Progress Claude Opus 4.6 Zhipu AI GLM-5.1 +1 more

5Don'T Worry About The Vase·Jun 22, 2026·source ↗

Zvi Mowshowitz commentary: GLM-5.2 as new best open model

Zvi Mowshowitz covers the release of GLM-5.2, characterizing it as the new best open model. The post is a tier-2 commentary piece on what appears to be a significant open-weights model release. The body is truncated, so specific benchmark claims or technical details are not available from this excerpt.

Frontier Model Releases Open Weights Progress Zvi Mowshowitz GLM-5.1

7The Batch·Jun 19, 2026·source ↗

Nvidia Nemotron 3 Ultra: hybrid Mamba-transformer open-weights model targeting agentic workloads

Nvidia released Nemotron 3 Ultra, a 550B parameter (55B active) hybrid Mamba-transformer mixture-of-experts model with a 1M token context window, publishing weights, training data, and RL environments under an open license. The model ranks as the highest-scoring U.S. open-weights model on the Artificial Analysis Intelligence Index (47.7-48.2) and is approximately three times faster than comparable open-weights rivals, though it trails leading Chinese models like Kimi K2.6 and DeepSeek V4 Pro on intelligence benchmarks. Nvidia used a novel Multi-Teacher On-Policy Distillation approach with 10+ specialized teacher models and trained using NVFP4 quantization. The release is strategically motivated by Nvidia's interest in a healthy open-weights ecosystem that drives AI semiconductor adoption.

Frontier Model Releases Open Weights Progress Mamba IFBench Artificial Analysis Intelligence Index +17 more

5Latent Space·Jun 19, 2026·source ↗

GLM-5.2 passes community vibe checks; Z.ai forecasts Open Fable by December

GLM-5.2, a new open model, is reportedly passing community vibe checks and drawing comparisons to GPT-class frontier models. Z.ai has forecast the release of Open Fable by December. The item signals a potential shift in the open-weights landscape toward genuine frontier-level capability.

Frontier Model Releases Open Weights Progress Open Fable GLM-5.1 Z.ai

6Simon Willison'S Weblog·Jun 18, 2026·source ↗

Simon Willison: GLM-5.2 is probably the most powerful text-only open weights LLM

Simon Willison asserts that GLM-5.2 is likely the most capable text-only open-weights language model currently available. The post is a commentary from a respected practitioner tracking the open-weights landscape. This is notable as a signal about the state of open-weights competition relative to closed frontier models.

Frontier Model Releases Open Weights Progress Simon Willison Zhipu AI GLM-5.1

7The Batch·Jun 17, 2026·source ↗

Data Points: GLM-5.2 leads open models on coding benchmarks; SpaceX acquires Cursor; OpenRouter Fusion; Anthropic coding study; ChatGPT market share drops

Zhipu released GLM-5.2, a 744B-parameter open model under MIT license that ranks second only to Claude Opus 4.8 on long-horizon coding benchmarks including FrontierSWE and SWE-Marathon, featuring a 1M-token context window and a 2.9× compute reduction via IndexShare attention. SpaceX is acquiring Cursor (Anysphere) for $60B in stock, positioning Musk's company to compete in AI software tools using xAI's Colossus infrastructure. OpenRouter launched Fusion, a multi-model synthesis tool showing that budget model panels can match frontier model performance at half the cost. An Anthropic study of 400K Claude Code sessions found domain expertise—not coding skill—is the primary driver of agentic output, while a Munich court ruled Google liable for false claims in AI Overviews.

Frontier Model Releases Evaluation and Benchmarking DRACO FrontierSWE Anysphere +24 more

5Hugging Face Blog·Jun 17, 2026·source ↗

GLM-5.2 announced as model built for long-horizon tasks

ZAI.org published a blog post on Hugging Face announcing GLM-5.2, a model positioned for long-horizon tasks. The post appears to be a model release announcement from the GLM (General Language Model) lineage. Limited body content is available, but the framing suggests capabilities relevant to extended reasoning or agentic workflows.

Long Context Evolution Frontier Model Releases zai-org Hugging Face GLM-5.1

6Latent Space·Jun 17, 2026·source ↗

GLM-5.2 claims top frontend coding performance; IndexShare speculative decoding introduced

A Latent Space AI news digest highlights GLM-5.2 as a new open-weights model claiming top performance on frontend coding tasks. The digest also covers IndexShare, a technique for speculative decoding. The body is truncated but the headline signals a notable open-weights model release and an inference optimization development.

Evaluation and Benchmarking Open Weights Progress IndexShare GLM-5.1 Latent Space +1 more

5arXiv · cs.CL·Jun 11, 2026·source ↗

Claw-SWE-Bench: A benchmark for evaluating agent harnesses on multilingual coding tasks

Researchers introduce Claw-SWE-Bench, a multilingual SWE-bench-style benchmark and adapter protocol designed to fairly compare heterogeneous agent harnesses ("claws") on GitHub issue-resolution tasks. The benchmark contains 350 instances across 8 languages and 43 repositories, with an 80-instance Lite subset for cost-efficient validation. Key findings show adapter design dominates raw model choice: a minimal adapter scores 19.1% Pass@1 versus 73.4% for a full adapter using the same GLM 5.1 backbone, and harness choice and model choice each shift Pass@1 by roughly 27-29 percentage points. The work also introduces cost accounting as a first-class evaluation axis alongside accuracy.

Evaluation and Benchmarking Inference Economics SWE-Bench Multilingual OpenClaw SWE-Bench Verified +4 more

7The Batch·Jun 3, 2026·source ↗

OpenAI GPT-5.4 Pro and GPT-5.4 Thinking challenge Gemini 3.1 Pro Preview for top AI model position

OpenAI released GPT-5.4 in two variants (Pro and Thinking), featuring expanded context windows up to 1.05M tokens, native computer use, tool search capabilities, and adjustable reasoning levels. In independent benchmarks by Artificial Analysis, GPT-5.4 Pro at xhigh reasoning nearly ties Gemini 3.1 Pro Preview on the Intelligence Index (57 vs 57.2 points) but at roughly 3.3x the cost, while leading on coding and agentic sub-indices. The release leapfrogs Claude Opus 4.6 on most benchmarks but faces stiff competition from Google's Gemini 3.1 Pro Preview, which maintains a price and multimodal advantage.

Frontier Model Releases Evaluation and Benchmarking Artificial Analysis Intelligence Index Claude Opus 4.6 Gemini Deep Think +16 more

5arXiv · cs.CL·Jun 2, 2026·source ↗

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

K-BrowseComp is a new 400-problem benchmark for evaluating web-browsing agents in Korean-language contexts, with a 300-problem manually validated subset and a 100-problem adversarially constructed synthetic split. Frontier models including GPT-5.5, DeepSeek-V4-Pro, and GLM-5.1 achieve only 30–46% on the verified subset, a significant drop from English BrowseComp performance, while Korean proprietary models score 0–10%. The benchmark exploits the asymmetry between problem creation and solving difficulty, and the adversarial synthetic split caps the strongest model at 26%, positioning it as a targeted stress test for agentic web-browsing capability.

Frontier Model Releases Evaluation and Benchmarking Korea Proprietary AI Foundation Model Program DeepSeek V4 BrowseComp +3 more

6The Batch·Jun 1, 2026·source ↗

GLM-5.1 Open-Weights Model Targets Long-Running Agentic Tasks; Andrew Ng on Coding Agent Acceleration by Software Domain

Z.ai released GLM-5.1, an open-weights mixture-of-experts LLM (754B total / 40B active parameters) designed for sustained agentic coding tasks lasting up to eight hours, featuring iterative planning-execution-evaluation loops with thousands of tool calls. The model claims top open-weights performance on Artificial Analysis Intelligence Index and SWE-Bench Pro, available under MIT license via HuggingFace. The accompanying editorial by Andrew Ng offers a tiered framework for how much coding agents accelerate different software work categories—frontend most, then backend, infrastructure, and research least—with practical implications for team organization. A secondary item references data-center opposition and LLM helpfulness failure modes.

Frontier Model Releases Evaluation and Benchmarking DeepLearning.AI Artificial Analysis Intelligence Index SWE-bench +9 more

7The Batch·Jun 1, 2026·source ↗

Z.ai's GLM-5.1 Open-Weights Model Targets Multi-Hour Agentic Coding Tasks with Iterative Self-Evaluation

Z.ai released GLM-5.1, a 754B parameter mixture-of-experts open-weights model optimized for long-running agentic coding tasks, capable of cycling through planning, execution, and strategy revision hundreds of times over sessions lasting up to eight hours. The model achieves top open-weights scores on the Artificial Analysis Intelligence Index and third place on Arena's Code leaderboard, while leading SWE-Bench Pro in Z.ai's own evaluations at 58.4 percent. Weights are available on HuggingFace under MIT license, with API pricing roughly 40 percent higher than its predecessor but still below comparable proprietary models. No technical report has been published, leaving architecture and training details undisclosed.

Frontier Model Releases Evaluation and Benchmarking Gemini 3.1 Pro Artificial Analysis Intelligence Index Claude Opus 4.6 +14 more

6The Batch·Jun 1, 2026·source ↗

Kimi K2.6: Moonshot AI's 1T-Parameter Vision-Language Model Matches Open-Weights Peers, Trails Top Closed Models

Moonshot AI released Kimi K2.6, a 1 trillion-parameter mixture-of-experts vision-language model with 32B active parameters, designed for long-horizon autonomous coding sessions lasting multiple days and multi-agent orchestration scaling to 300 parallel subagents executing up to 4,000 steps. The model matches Qwen3.6 Max Preview and DeepSeek-V4-Pro on the Artificial Analysis Intelligence Index (scoring 54 vs. their 52) while trailing closed models like GPT-5.5 and Claude Opus 4.7. Weights are freely downloadable from Hugging Face under a modified MIT license permitting commercial use, with API access priced at $0.95/$0.16/$4.00 per million input/cached/output tokens. Notable features include a 256K token context window, native INT4 quantization, a 'preserve thinking' mode for multi-turn reasoning continuity, and a research preview 'claw groups' feature enabling cross-developer agent collaboration.

Frontier Model Releases Evaluation and Benchmarking Artificial Analysis Intelligence Index Claude Opus 4.6 Qwen3.6 Max Preview +14 more

5Interconnects·May 18, 2026·source ↗

Latest open artifacts (#19): Qwen 3.5, GLM 5, MiniMax 2.5 — Chinese labs' latest push of the frontier

A Interconnects newsletter roundup covering recent open-weight model releases from Chinese AI labs, specifically Qwen 3.5, GLM 5, and MiniMax 2.5. The piece frames these as a continued frontier push from Chinese research organizations. The body content is minimal beyond the title and greeting, suggesting this is either a stub or the full content was not captured.

Frontier Model Releases Open Weights Progress Interconnects MiniMax Alibaba +4 more

6Interconnects·May 17, 2026·source ↗

Latest open artifacts (#21): Open model bonanza — Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5, GLM-5.1 & others

Interconnects' recurring open-weights roundup covers a dense cluster of recent releases including Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5, and GLM-5.1, characterizing the period as a flagship-after-flagship cadence. The piece also includes commentary on CAISI's assessment of DeepSeek V4. As a tier-2 commentary source, this is a synthesis and analysis layer rather than primary announcements.

Frontier Model Releases Evaluation and Benchmarking MiMo 2.5 Interconnects DeepSeek V4 +7 more