Entity · company

MiniMax

companyactiveminimax-157b6322·11 events·first seen May 18, 2026

Aliases: MiniMax, MiniMax M2, MiniMax M3, MiniMax M2.7, MiniMax-M3, MiniMax-M2.5, MiniMax-M2.7

Co-occurring entities

More like this (12)

MiniMax 2.5 Mini-R1 SynMax Qwen2.5-Max Qwen-VL-Max Duolingo Max Reachy Mini Nano Banana Pro M2M100 α-entmax Nano Banana 2 o1-mini

Recent events (11)

6arXiv · cs.CL·5d ago·source ↗

Byte-Prefix Marginalization enables cross-tokenizer on-policy distillation between heterogeneous LLM families

A new arXiv preprint introduces Byte-Prefix Marginalization (BPM), a technique for distilling knowledge from teacher LLMs into a student model when the two use different tokenizers. BPM re-expresses the teacher's next-token distribution in a shared byte space, preserving probability mass and producing a vocabulary-complete alignment target. Evaluated with Qwen3-32B, GLM-Z1-9B-0414, and MiniMax-M2.7 as teachers, BPM improves six-benchmark average scores by 3.7–6.6 points over the strongest cross-tokenizer baselines on math and programming tasks. The method addresses a practical bottleneck in consolidating complementary open-weight models into compact students.

Evaluation and Benchmarking Open Weights Progress Byte-Prefix Marginalization MiniMax Qwen3 32B +1 more

6arXiv · cs.CL·Jul 10, 2026·source ↗

Auditing LLM-as-Judge reliability: judge upgrades are not interchangeable across model families

A new arXiv paper investigates measurement validity problems in LLM-as-judge evaluation, finding that swapping evaluator models changes scores even when candidate responses are fixed. Across four judgment datasets, the authors compare Qwen3 dense judges (1.7B–32B) and MiniMax M2/M2.7 API releases, finding that only the Qwen3 1.7B→4B upgrade yields robust adjacent gains while MiniMax adjacent releases do not. Stronger judges reduce but do not eliminate position and verbosity bias, and repeated-sample juries add little when errors are correlated. The paper argues for standardized reporting requirements including dataset slices, bias probes, error-dependence estimates, and protocol audit trails.

Evaluation and Benchmarking AI Safety Research When the Judge Changes, So Does the Measurement: Auditing LLM-as-Judge Reliability MiniMax Alibaba +1 more

6arXiv · cs.CL·Jun 25, 2026·source ↗

OPERA: Perplexity-based RL alignment for open-ended reasoning tasks

OPERA (Objective Perplexity-based Reflective Alignment) proposes replacing LLM-as-a-judge reward models with intrinsic rewards derived from perplexity dynamics to stabilize RL training on open-ended tasks like creative writing. The method includes a cold-start data synthesis pipeline generating 20,000 reasoning trajectories using perplexity-prioritized rollouts. Applied to Qwen3-8B, OPERA claims state-of-the-art among open-source models on open-ended tasks, reportedly matching or exceeding Gemini 2.5 and MiniMax-M2.5 on some benchmarks.

Open Weights Progress Alignment and RLHF OPERA Gemini 2.5 MiniMax +1 more

6The Batch·Jun 22, 2026·source ↗

The Batch digest: U.S. chatbot adoption tops 50%, AA-Briefcase benchmark, ARD spec, North Mini Code, Fable/Mythos export controls

A weekly digest from DeepLearning.AI covers five AI developments: a Pew Research Center survey showing nearly half of U.S. adults now use AI chatbots (ChatGPT at 44% adoption); Artificial Analysis releasing AA-Briefcase, a new benchmark for complex knowledge-work tasks where Claude Opus 4.8 is a top performer; Hugging Face publishing a reference implementation of the Agentic Resource Discovery (ARD) open spec co-developed with Microsoft, Google, and others for runtime tool discovery by agents; Cohere releasing North Mini Code, a 30B-parameter open-weight MoE coding model under Apache 2.0; and over 100 cybersecurity professionals signing an open letter urging the U.S. government to reverse export controls on Anthropic's Claude Fable 5 and Claude Mythos 5. The ARD and export-control items are the highest-signal stories, touching agent infrastructure standards and AI regulatory policy respectively.

Evaluation and Benchmarking Open Weights Progress Artificial Analysis DeepLearning.AI Claude Mythos +22 more

9arXiv · cs.CL·Jun 12, 2026·source ↗

MaxProof achieves gold-medal-level performance on IMO 2025 and USAMO 2026 via population-level test-time scaling

MiniMax introduces MaxProof, a test-time scaling framework for competition-level mathematical proof built on their MiniMax-M3 model. The system trains three capabilities — proof generation, verification, and critique-conditioned repair — then at inference time runs tournament selection over a population of candidate proofs. MaxProof scores 35/42 on IMO 2025 and 36/42 on USAMO 2026, exceeding the human gold-medal threshold on both competitions.

Frontier Model Releases Evaluation and Benchmarking MiniMax USAMO 2026 MaxProof +2 more

7The Batch·Jun 5, 2026·source ↗

Gray market API proxy network enables discounted access to U.S. AI models in China via fraud and distillation

A ChinaTalk report details an informal ecosystem of API proxy servers, account farms, identity brokers, and token resellers that gives Chinese developers access to U.S. AI models like Claude, ChatGPT, and Gemini at steep discounts — sometimes 10% of market price — through methods ranging from terms-of-service violations to credit card fraud. CISPA Helmholtz Center research found proxy 'Gemini-2.5' access achieved only 37% on MedQA versus 83.82% via Google's official API, suggesting model substitution is common. The network also harvests API call logs as training data, feeding the industrial-scale distillation practices Anthropic accused DeepSeek, Moonshot, and MiniMax of in February. The White House acknowledged the distillation threat in an April memo, framing it as an adversarial national security concern.

Frontier Model Releases AI Safety Research White House Gemini 2.5 DeepSeek V4 +10 more

6The Batch·Jun 2, 2026·source ↗

MiniMax M2.7 proprietary reasoning model competes with Gemini and Claude Opus; roundup covers Cursor Composer 2, MAI-Image-2, Claude Code Channels, and Anthropic defense dispute

MiniMax released M2.7, a proprietary reasoning model that achieved 66.6% on MLE Bench Lite (tying Gemini 3.1) and 56.22% on SWE-Pro, priced at $0.30/$1.20 per million tokens, with the shift to proprietary marking a potential strategic pivot among Chinese AI labs away from open weights. Cursor released Composer 2, an agentic coding model built on a fine-tuned Kimi 2.5 (via Moonshot partnership), priced 86% cheaper than its predecessor and scoring 73.7 on SWE-bench Multilingual. Anthropic released Claude Code Channels, routing Telegram and Discord messages into local Claude Code sessions via MCP plugins, and separately filed a court response denying it has any backdoor or kill switch into military deployments of Claude. Microsoft announced MAI-Image-2, a text-to-image model ranking third on Arena.ai among research labs.

Frontier Model Releases Open Weights Progress Stitch Claude Sonnet 4 SWE-Pro +17 more

9Anthropic News·Jun 1, 2026·source ↗

Anthropic Identifies Industrial-Scale Distillation Attacks by DeepSeek, Moonshot, and MiniMax

Anthropic has publicly identified three Chinese AI laboratories—DeepSeek, Moonshot AI, and MiniMax—as conducting coordinated, large-scale distillation attacks against Claude, generating over 16 million exchanges through approximately 24,000 fraudulent accounts in violation of terms of service. The campaigns targeted Claude's most differentiated capabilities including agentic reasoning, tool use, coding, and chain-of-thought generation, with MiniMax alone responsible for over 13 million exchanges. Anthropic frames these attacks as a national security concern, arguing that illicitly distilled models strip out safety safeguards and undermine US export controls. The company claims high-confidence attribution via IP correlation, request metadata, and infrastructure indicators, in some cases corroborated by industry partners.

Frontier Model Releases Open Weights Progress knowledge distillation Kimi DeepSeek V4 +9 more

7The Batch·Jun 1, 2026·source ↗

Claude Opus 4.8 Launches with Improved Honesty; Anthropic Previews Mythos-Class Models and Dynamic Workflows

Anthropic released Claude Opus 4.8 with improvements in coding, reasoning, agentic tasks, and notably better uncertainty flagging—approximately four times less likely than Opus 4.7 to let code flaws pass uncommented. Alongside the model, Anthropic introduced dynamic workflows in Claude Code enabling tens to hundreds of parallel subagents for large-scale engineering tasks, an effort-control slider, and a 3x price cut on fast mode. Anthropic also previewed Mythos-class models, positioned above Opus in capability, currently available to a limited set of organizations for cybersecurity work pending broader safety clearance. The same digest covers MiniMax M3 (open-weights, ~60% SWE-Bench Pro), Nvidia's RTX Spark superchip, Cosmos 3 world model, and a GR00T/Unitree robotics partnership.

Frontier Model Releases Evaluation and Benchmarking Unitree Harvey Claude Mythos +16 more

5Hugging Face Blog·May 19, 2026·source ↗

Aligning to What? Rethinking Agent Generalization in MiniMax M2

MiniMax published a blog post discussing alignment and generalization challenges in their M2 agent model. The piece appears to examine how RLHF or similar alignment techniques interact with agent generalization across tasks. Published on Hugging Face's blog, it reflects MiniMax's thinking on training methodology for their M2 model.

Frontier Model Releases Agent and Tool Ecosystem Reinforcement Learning from Human Feedback MiniMax +1 more

5Interconnects·May 18, 2026·source ↗

Latest open artifacts (#19): Qwen 3.5, GLM 5, MiniMax 2.5 — Chinese labs' latest push of the frontier

A Interconnects newsletter roundup covering recent open-weight model releases from Chinese AI labs, specifically Qwen 3.5, GLM 5, and MiniMax 2.5. The piece frames these as a continued frontier push from Chinese research organizations. The body content is minimal beyond the title and greeting, suggesting this is either a stub or the full content was not captured.

Frontier Model Releases Open Weights Progress Interconnects MiniMax Alibaba +4 more