Entity · benchmark

MCP-Atlas

benchmarkactivemcp-atlas-abd5b051·4 events·first seen May 19, 2026

Aliases: MCP-Atlas, MCP Atlas

Co-occurring entities

More like this (12)

MCP MCP Apps MCP Registry Atlas MCP Inspector FastMCP Euclid-MCP MCP-Universe MITRE ATLAS mcp-use MeCP Activation Atlases

Recent events (4)

7The Batch·Jul 24, 2026·source ↗

Meta launches Muse Spark 1.1, a low-cost agentic vision-language model with new paid API

Meta launched Muse Spark 1.1, a closed vision-language model optimized for agentic tasks including tool use, computer use, and multi-agent orchestration, alongside the Meta Model API — the company's first paid model access. The model ties GPT-5.6 Luna and GLM-5.2 on Artificial Analysis' Intelligence Index while offering substantially lower output token prices ($4.25/M vs. $25–$50/M for comparable closed models), and tops MCP Atlas and JobBench tool-use leaderboards. Meta's pricing strategy, subsidized by advertising revenue, is framed as a direct attack on competitors' API margins and could compress inference costs industry-wide.

Frontier Model Releases Inference Economics JobBench Scale AI Artificial Analysis Intelligence Index +15 more

8The Batch·Jun 3, 2026·source ↗

GPT-5.4 released with tool search, computer use, and frontier benchmark performance

OpenAI released GPT-5.4 in Thinking and Pro variants, featuring an expanded context window (up to 1.05M input tokens), native computer use, tool search capabilities, and adjustable reasoning levels. In independent testing by Artificial Analysis, GPT-5.4 Pro at xhigh reasoning achieved state-of-the-art on GDP-Val-AA, BrowseComp, Terminal-Bench-Hard, SWE-Bench-Pro, and MCP Atlas, while trailing Gemini 3.1 Pro Preview on MMMU-Pro and Humanity's Last Exam. Pricing is set at the top of the market ($30/$180 per million input/output tokens for Pro), and the release also powers Codex, OpenAI's competitor to Claude Code. The item is reported via The Batch (tier 2 commentary) and includes additional context on Andrew Ng's chub CLI tool for agent documentation sharing.

Frontier Model Releases Inference Economics DeepLearning.AI Artificial Analysis Intelligence Index Claude Opus 4.6 +14 more

7The Batch·Jun 3, 2026·source ↗

OpenAI GPT-5.4 Pro and GPT-5.4 Thinking challenge Gemini 3.1 Pro Preview for top AI model position

OpenAI released GPT-5.4 in two variants (Pro and Thinking), featuring expanded context windows up to 1.05M tokens, native computer use, tool search capabilities, and adjustable reasoning levels. In independent benchmarks by Artificial Analysis, GPT-5.4 Pro at xhigh reasoning nearly ties Gemini 3.1 Pro Preview on the Intelligence Index (57 vs 57.2 points) but at roughly 3.3x the cost, while leading on coding and agentic sub-indices. The release leapfrogs Claude Opus 4.6 on most benchmarks but faces stiff competition from Google's Gemini 3.1 Pro Preview, which maintains a price and multimodal advantage.

Frontier Model Releases Evaluation and Benchmarking Artificial Analysis Intelligence Index Claude Opus 4.6 Gemini Deep Think +16 more

7arXiv · cs.CL·May 19, 2026·source ↗

EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL

EnvFactory is a fully automated framework for training tool-use LLM agents via Agentic Reinforcement Learning, addressing two key bottlenecks: scalable execution environments and realistic multi-turn training data. It autonomously constructs stateful, executable tool environments from authentic resources and synthesizes natural trajectories with implicit human intents via topology-aware sampling. Using only 85 verified environments across 7 domains, it generates 2,575 SFT and RL trajectories and improves Qwen3-series models by up to +15% on BFCLv3, +8.6% on MCP-Atlas, and +6% on conversational benchmarks, outperforming prior approaches that use 5x more environments.

Training Infrastructure Evaluation and Benchmarking VitaBench MCP-Atlas BFCLv3 +6 more