Vals AI Finance Agent Benchmark
vals-ai-finance-agent-benchmark-6fda166e·4 events·first seen 1mo agoAliases: Vals AI Finance Agent Benchmark, Finance Agent Benchmark
Co-occurring entities
More like this (12)
Recent events (4)
Anthropic Launches Ten Finance Agent Templates with Microsoft 365 Integration and Expanded Data Connectors
Anthropic is releasing ten ready-to-run agent templates targeting high-value financial services workflows including pitchbook creation, KYC screening, and month-end close, deployable as plugins in Claude Cowork/Claude Code or as autonomous Claude Managed Agents. The release includes native add-ins for Microsoft Excel, PowerPoint, Word, and Outlook with cross-application context persistence. Claude Opus 4.7 underpins the offering and leads the Vals AI Finance Agent benchmark at 64.37%, with new data connectors from partners including Dun & Bradstreet, Fiscal AI, FactSet, S&P Capital IQ, and others providing governed real-time data access.
Anthropic Launches Claude for Financial Services with Claude 4 Models and Ecosystem Integrations
Anthropic has introduced a Financial Analysis Solution targeting finance professionals, built around Claude 4 models and pre-built MCP connectors to data providers including FactSet, S&P Global, PitchBook, Databricks, and Snowflake. Claude Opus 4 reportedly passed 5 of 7 levels of the Financial Modeling World Cup and scored 83% accuracy on complex Excel tasks when deployed by FundamentalLabs. The solution includes Claude Code with expanded usage limits, expert implementation support, and partnerships with major consultancies including Accenture, Deloitte, KPMG, and PwC. Early adopters include Bridgewater's AIA Labs, which has used Claude since 2023 for investment analyst workflows.
Anthropic Expands Claude for Financial Services with Excel Add-in, New Connectors, and Agent Skills
Anthropic is expanding its Claude for Financial Services offering with a beta Excel add-in (Claude for Excel), seven new real-time data connectors (including LSEG, Moody's, Aiera, and Chronograph), and six new pre-built Agent Skills covering tasks like DCF modeling, comparable company analysis, and initiating coverage reports. The updates build on Claude Sonnet 4.5's performance on the Finance Agent benchmark from Vals AI, where it scored 55.3% accuracy. Claude for Excel allows users to read, analyze, modify, and create Excel workbooks directly from a sidebar, with transparency into cell-level changes. These features are rolling out in preview to Max, Enterprise, and Teams users, with Citi cited as a notable enterprise adopter.
US Government Prepares AI Model Vetting System; GPT-5.5 Instant, Claude Finance Agents, Pentagon AI Partnerships
The White House is preparing an executive order to create an FDA-style vetting system for new AI models, prompted partly by Anthropic's Mythos model disclosing cybersecurity risks; the Commerce Department separately expanded a voluntary testing program with Google, Microsoft, and xAI. OpenAI rolled out GPT-5.5 Instant as the default ChatGPT model, claiming 52.5% fewer hallucinations on high-stakes prompts. Anthropic released ten financial agent templates running on Claude Opus 4.7, while the Pentagon expanded AI vendor agreements to include Microsoft, Amazon, Nvidia, and Reflection AI after canceling its Anthropic contract over autonomous weapons restrictions. Major pharma companies report AI gains primarily in manufacturing optimization rather than drug discovery breakthroughs.