Entity · benchmark

Online-Mind2Web

benchmarkactiveonline-mind2web-86924349·3 events·first seen May 28, 2026

Aliases: Online-Mind2Web

Co-occurring entities

More like this (12)

Mind2Web Open WebUI InMind WebSight WebArena Word2World MiniMind OpenWebText FineWeb-Edu WebSwarm WebVid WebSockets

Recent events (3)

7arXiv · cs.CL·Jul 24, 2026·source ↗

OpenForgeRL: Open-source framework for end-to-end RL training of harness-native AI agents

OpenForgeRL is an open-source framework that enables end-to-end reinforcement learning training of AI agents operating within real inference harnesses (e.g., Claude Code, Codex, OpenClaw) and diverse environments. It uses a lightweight proxy to record harness model calls as training data for standard RL codebases like veRL, plus a Kubernetes orchestrator for scalable rollouts in isolated containers. Trained agents (OpenForgeClaw and OpenForgeGUI) achieve competitive results on benchmarks including ClawEval, OSWorld-Verified, Online-Mind2Web, and WebVoyager, matching or surpassing models several times larger in GUI tasks. The work also analyzes how harness choice and RL shape agent behavior, finding meaningful variation in learnability across harnesses.

Training Infrastructure Evaluation and Benchmarking QwenClawBench Online-Mind2Web veRL +11 more

7The Batch·Jun 1, 2026·source ↗

Data Points: Qwen3.7-Max, OpenAI Math Proof, Gated DeltaNet-2, Trump AI Order, Microsoft Fara1.5

This edition of The Batch covers five significant AI developments: Alibaba's Qwen3.7-Max reasoning model with 1M token context and agentic capabilities ranking fifth on the Artificial Analysis Intelligence Index; an OpenAI reasoning model resolving the 80-year-old Erdős planar unit distance problem; Nvidia's Gated DeltaNet-2 outperforming Mamba-3 and other linear attention architectures; Trump pulling back a proposed AI regulation executive order; and Microsoft Research's Fara1.5 computer-use agent family beating OpenAI Operator and Google Gemini on the Online-Mind2Web benchmark.

Long Context Evolution Frontier Model Releases Paul Erdős Fara1.5 Mamba +25 more

8Hacker News·May 28, 2026·source ↗

Claude Opus 4.8 Released by Anthropic

Anthropic has released Claude Opus 4.8, a new frontier model in their Claude lineup. The announcement appeared on Anthropic's official news page and generated significant community engagement on Hacker News with over 1,000 points and 800+ comments. Specific capability details and benchmarks are not available from the source snippet alone.

Frontier Model Releases Evaluation and Benchmarking claude.ai Claude Opus 4.6 Databricks +16 more