Entity · model

GLM-4.7-Flash

modelactiveglm-4-7-flash-15c8dc0b·2 events·first seen May 19, 2026

Aliases: GLM-4.7-Flash

Co-occurring entities

GLM-4.5-Air SWE-Bench Verified CompactionRL GLM-5.1 Terminal-Bench Self-Distillation ZEDA (Zero-Expert Self-Distillation Adaptation)Qwen3-30B-A3B Mixture of Experts

More like this (12)

GLM-4.7 GLM-4.5-Air GLM-5.1 Gemini 3.5 Flash GLM GLM-Z1-9B-0414 GLM-4-Voice Gemini-2.5-Flash-Lite Gemini 3.1 Flash Live Gemini 3 Flash Gemini 3.5 Flash-Lite DeepSeek-V4-Flash

Recent events (2)

7arXiv · cs.LG·Jul 7, 2026·source ↗

CompactionRL trains long-horizon agents with context compaction via reinforcement learning

Researchers propose CompactionRL, a reinforcement learning strategy that jointly optimizes task execution and context summarization to enable LLM agents to operate beyond finite context windows. The method uses token-level loss normalization and cross-trajectory generalized advantage estimation to learn from compacted long-horizon trajectories. Applied to open GLM models, CompactionRL achieves 66.8% Pass@1 on SWE-bench Verified with GLM-4.5-Air (106B-A30B), a 7.0-point absolute gain, and has been incorporated into the training pipeline for GLM-5.2 (750B-A40B).

Long Context Evolution Evaluation and Benchmarking GLM-4.5-Air SWE-Bench Verified GLM-4.7-Flash +4 more

6arXiv · cs.CL·May 19, 2026·source ↗

ZEDA: Post-Trained MoE Models Can Skip Half Their Experts via Self-Distillation

This paper introduces Zero-Expert Self-Distillation Adaptation (ZEDA), a framework that converts static post-trained Mixture-of-Experts (MoE) language models into dynamic ones without pre-training from scratch. ZEDA injects parameter-free zero-output experts into each MoE layer and uses two-stage self-distillation with the original model as a frozen teacher. Applied to Qwen3-30B-A3B and GLM-4.7-Flash across 11 benchmarks, ZEDA eliminates over 50% of expert FLOPs with marginal accuracy loss and achieves approximately 1.20× end-to-end inference speedup, outperforming the strongest dynamic MoE baseline by 4–6 points.

Training Infrastructure Frontier Model Releases Self-Distillation ZEDA (Zero-Expert Self-Distillation Adaptation)Qwen3-30B-A3B +3 more