Entity · benchmark

Legal Agent Benchmark

benchmarkactivelegal-agent-benchmark-fd9cdfeb·2 events·first seen May 28, 2026

Aliases: Legal Agent Benchmark

Co-occurring entities

More like this (12)

Benchmark Agent SafeAgentBench Progress Advantage for LLM Agents Super-Agent benchmark agent-to-agent evaluation protocol LLM Bargaining Agents ACEBench-Agent MemoryAgentBench AgentWorldBench LLM Agent Classroom Baseline Agent MedAgentBench

Recent events (2)

7The Batch·Jun 1, 2026·source ↗

Claude Opus 4.8 Launches with Improved Honesty; Anthropic Previews Mythos-Class Models and Dynamic Workflows

Anthropic released Claude Opus 4.8 with improvements in coding, reasoning, agentic tasks, and notably better uncertainty flagging—approximately four times less likely than Opus 4.7 to let code flaws pass uncommented. Alongside the model, Anthropic introduced dynamic workflows in Claude Code enabling tens to hundreds of parallel subagents for large-scale engineering tasks, an effort-control slider, and a 3x price cut on fast mode. Anthropic also previewed Mythos-class models, positioned above Opus in capability, currently available to a limited set of organizations for cybersecurity work pending broader safety clearance. The same digest covers MiniMax M3 (open-weights, ~60% SWE-Bench Pro), Nvidia's RTX Spark superchip, Cosmos 3 world model, and a GR00T/Unitree robotics partnership.

Frontier Model Releases Evaluation and Benchmarking Unitree Harvey Claude Mythos +16 more

8Hacker News·May 28, 2026·source ↗

Claude Opus 4.8 Released by Anthropic

Anthropic has released Claude Opus 4.8, a new frontier model in their Claude lineup. The announcement appeared on Anthropic's official news page and generated significant community engagement on Hacker News with over 1,000 points and 800+ comments. Specific capability details and benchmarks are not available from the source snippet alone.

Frontier Model Releases Evaluation and Benchmarking claude.ai Claude Opus 4.6 Databricks +16 more