Entity · benchmark

KernelBench

benchmarkactivekernelbench-098c0180·1 events·first seen Jun 1, 2026

Aliases: KernelBench

Co-occurring entities

Gemini 3.1 Pro Artificial Analysis Intelligence Index Claude Opus 4.6 AIME 2026 Arena Leaderboard SWE-bench METR GPQA Diamond CyberGym GLM-5.1 HuggingFace Z.ai GPT-5.5

More like this (12)

ProgramBench RepoBench MemBench SorryBench TriggerBench CVE-Bench MemTraceBench MalwareBench BixBench CursorBench NeuralBench TokenBench

Recent events (1)

7The Batch·Jun 1, 2026·source ↗

Z.ai's GLM-5.1 Open-Weights Model Targets Multi-Hour Agentic Coding Tasks with Iterative Self-Evaluation

Z.ai released GLM-5.1, a 754B parameter mixture-of-experts open-weights model optimized for long-running agentic coding tasks, capable of cycling through planning, execution, and strategy revision hundreds of times over sessions lasting up to eight hours. The model achieves top open-weights scores on the Artificial Analysis Intelligence Index and third place on Arena's Code leaderboard, while leading SWE-Bench Pro in Z.ai's own evaluations at 58.4 percent. Weights are available on HuggingFace under MIT license, with API pricing roughly 40 percent higher than its predecessor but still below comparable proprietary models. No technical report has been published, leaving architecture and training details undisclosed.

Frontier Model Releases Evaluation and Benchmarking Gemini 3.1 Pro Artificial Analysis Intelligence Index Claude Opus 4.6 +14 more