Entity · benchmark

Vals AI Finance Agent Benchmark

benchmarkactivevals-ai-finance-agent-benchmark-6fda166e·4 events·first seen May 18, 2026

Aliases: Vals AI Finance Agent Benchmark, Finance Agent Benchmark

Co-occurring entities

More like this (12)

Vals AI AI Reproducibility Benchmark OpAI-Bench Super-Agent benchmark Valeo AI multi-turn agent benchmarks AllenAI OpenAI Evals Arena AI Legal Agent Benchmark AgentWorldBench OpenAI Baselines

Recent events (4)

6Anthropic News·Jun 1, 2026·source ↗

Anthropic Expands Claude for Financial Services with Excel Add-in, New Connectors, and Agent Skills

Anthropic is expanding its Claude for Financial Services offering with a beta Excel add-in (Claude for Excel), seven new real-time data connectors (including LSEG, Moody's, Aiera, and Chronograph), and six new pre-built Agent Skills covering tasks like DCF modeling, comparable company analysis, and initiating coverage reports. The updates build on Claude Sonnet 4.5's performance on the Finance Agent benchmark from Vals AI, where it scored 55.3% accuracy. Claude for Excel allows users to read, analyze, modify, and create Excel workbooks directly from a sidebar, with transparency into cell-level changes. These features are rolling out in preview to Max, Enterprise, and Teams users, with Citi cited as a notable enterprise adopter.

Frontier Model Releases Enterprise Deployment Patterns Vals AI Finance Agent Benchmark Microsoft Copilot Aiera +16 more

7The Batch·Jun 1, 2026·source ↗

US Government Prepares AI Model Vetting System; GPT-5.5 Instant, Claude Finance Agents, Pentagon AI Partnerships

The White House is preparing an executive order to create an FDA-style vetting system for new AI models, prompted partly by Anthropic's Mythos model disclosing cybersecurity risks; the Commerce Department separately expanded a voluntary testing program with Google, Microsoft, and xAI. OpenAI rolled out GPT-5.5 Instant as the default ChatGPT model, claiming 52.5% fewer hallucinations on high-stakes prompts. Anthropic released ten financial agent templates running on Claude Opus 4.7, while the Pentagon expanded AI vendor agreements to include Microsoft, Amazon, Nvidia, and Reflection AI after canceling its Anthropic contract over autonomous weapons restrictions. Major pharma companies report AI gains primarily in manufacturing optimization rather than drug discovery breakthroughs.

Frontier Model Releases Evaluation and Benchmarking Vals AI Finance Agent Benchmark White House Darius Amodei +23 more

7Anthropic News·May 18, 2026·source ↗

Anthropic Launches Claude for Financial Services with Claude 4 Models and Ecosystem Integrations

Anthropic has introduced a Financial Analysis Solution targeting finance professionals, built around Claude 4 models and pre-built MCP connectors to data providers including FactSet, S&P Global, PitchBook, Databricks, and Snowflake. Claude Opus 4 reportedly passed 5 of 7 levels of the Financial Modeling World Cup and scored 83% accuracy on complex Excel tasks when deployed by FundamentalLabs. The solution includes Claude Code with expanded usage limits, expert implementation support, and partnerships with major consultancies including Accenture, Deloitte, KPMG, and PwC. Early adopters include Bridgewater's AIA Labs, which has used Claude since 2023 for investment analyst workflows.

Frontier Model Releases Evaluation and Benchmarking PwC Vals AI Finance Agent Benchmark Palantir +20 more

7Anthropic News·May 18, 2026·source ↗

Anthropic Launches Ten Finance Agent Templates with Microsoft 365 Integration and Expanded Data Connectors

Anthropic is releasing ten ready-to-run agent templates targeting high-value financial services workflows including pitchbook creation, KYC screening, and month-end close, deployable as plugins in Claude Cowork/Claude Code or as autonomous Claude Managed Agents. The release includes native add-ins for Microsoft Excel, PowerPoint, Word, and Outlook with cross-application context persistence. Claude Opus 4.7 underpins the offering and leads the Vals AI Finance Agent benchmark at 64.37%, with new data connectors from partners including Dun & Bradstreet, Fiscal AI, FactSet, S&P Capital IQ, and others providing governed real-time data access.

Frontier Model Releases Evaluation and Benchmarking Vals AI Finance Agent Benchmark Claude Opus 4.6 Microsoft 365 +14 more