Entity · model

Claude Haiku 4.5

modelactiveclaude-haiku-4-5-64f79396·25 events·first seen May 18, 2026

Aliases: Claude Haiku 4.5, Claude Haiku, Claude 3 Haiku, Claude 3.5 Haiku

Co-occurring entities

More like this (12)

Claude Sonnet 4.5 Claude 3.5 Sonnet Claude Claude Sonnet 4 Claude 3.5 Claude Opus 4.6 Claude 5 Claude 3.7 Sonnet Code with Claude Claude 3 Sonnet Claude Sonnet 3.5 Claude Code

Guides (1)

Claude Haiku 4.5

Claude Haiku 4.5: Anthropic's Fast, Affordable Model for High-Volume Work

Read asBeginner In-depth

Recent events (25)

6arXiv · cs.CL·Jul 15, 2026·source ↗

ThReadMed-QA benchmark reveals frontier LLMs degrade sharply on medical misconception correction across multi-turn conversations

Researchers introduce ThReadMed-QA, a multi-turn medical dialogue dataset of 2,437 conversation threads (8,204 QA pairs) derived from real AskDocs patient interactions, designed to evaluate LLM ability to detect and correct embedded misconceptions over multiple turns. Evaluation of five LLMs using an LLM-as-a-Judge rubric finds that even frontier models like GPT-5 and Claude Haiku drop from ~85% misconception correction on initial questions to ~50% within two follow-up turns. Oracle analysis attributes much of this degradation to error propagation from prior model outputs rather than context length alone. The findings highlight a significant safety gap in current evaluation frameworks, which do not capture multi-turn conversational dynamics in medical settings.

Evaluation and Benchmarking AI Safety Research AskDocs Claude Haiku 4.5 LLM-as-a-Judge +2 more

6arXiv · cs.CL·Jul 8, 2026·source ↗

RuBench: Repository-level agentic coding benchmark with native Russian task specifications

RuBench 1.0 is a new benchmark of 25 repository-level agentic coding tasks drawn from real fix commits in five live open-source projects, where task specifications are written natively in Russian in the style of customer requests rather than translated from English. The benchmark evaluates deployed product configurations including Claude Code with Opus 4.8, Sonnet 5, and Haiku 4.5, and Codex CLI with GPT-5.5, with the best configuration resolving 78.7% of tasks. A notable finding is that auditing trajectories of a fifth configuration (Claude Code + Fable 5) revealed that on 20% of tasks an official safeguard fallback silently re-routed the model to Opus 4.8, providing direct evidence that the deployed product rather than the underlying model is the actual unit of measurement in agentic evaluations.

Frontier Model Releases Evaluation and Benchmarking Claude Sonnet 3.5 Fable 5 Claude Haiku 4.5 +8 more

6arXiv · cs.CL·Jun 30, 2026·source ↗

Attractor states emerge in multi-turn LLM conversations, with asymmetric model influence

A new arXiv preprint studies long-run dynamics in multi-agent LLM conversations across 7 models and 20 controversial topics, finding that self-play trajectories form model-specific attractor states that asymmetrically influence conversation partners in mixed-play debates. Claude Haiku is identified as a strong attractor that pulls other models toward its stylistic traits (e.g., metacommentary), while GPT-4.1 nano is found to be especially malleable. The results suggest open-ended LLM interactions are partially predictable from model-specific attractors, with implications for designing and monitoring autonomous agentic systems.

Evaluation and Benchmarking AI Safety Research Attractor States Emerge in Multi-Turn LLM Conversations GPT-4.1 nano Claude Haiku 4.5 +3 more

6arXiv · cs.AI·Jun 30, 2026·source ↗

MCP Server Architecture Patterns: Five recurring patterns and four anti-patterns catalogued from production deployments

An industry experience paper catalogues five recurring architectural patterns for Model Context Protocol (MCP) servers—Resource Gateway, Tool Orchestrator, Stateful Session Server, Proxy Aggregator, and Domain-Specific Adapter—drawn from 15 servers including five production deployments on the ANSYR voice AI platform and ten from the official MCP registry. The paper also documents four anti-patterns and cross-cutting concerns around authentication, versioning, and observability. A quantitative evaluation includes inter-rater reliability (Cohen's kappa = 0.76 on 54 held-out servers), transport overhead measurements, and a tool-count study showing tool-selection accuracy drops below 90% between 10–15 tools for Claude Haiku 4.5 and between 20–30 tools for Claude Sonnet 4. Code, corpus, and prompts are released as a replication package.

Enterprise Deployment Patterns Agent and Tool Ecosystem Claude Sonnet 4 MCP Server Architecture Patterns for LLM-Integrated Applications ANSYR +3 more

6arXiv · cs.CL·Jun 17, 2026·source ↗

Structural role injection via Handlebars triple-brace interpolation in LLM prompts: empirical analysis across delimiter families and models

A new arXiv paper demonstrates that Handlebars templating's HTML auto-escaping—the default in Microsoft Semantic Kernel—provides uneven protection against structural role injection attacks, where attacker-controlled data carries chat role delimiters to forge higher-privilege turns. The authors conduct 5,760 trials across seven delimiter families, two attack objectives, and four models (GPT-3.5 Turbo, GPT-4o mini, GPT-4.1 mini, Claude Haiku 4.5), finding that HTML escaping neutralizes angle-bracket-based delimiters (ChatML, Llama-3, XML) but leaves colon- and Markdown-based families fully exposed. GPT-3.5 Turbo follows task-hijack instructions in 97% of raw and 91% of escaped trials; Claude Haiku 4.5 resists both objectives almost entirely. The paper concludes that HTML escaping cannot substitute for structural separation of instruction and data.

AI Safety Research Agent and Tool Ecosystem Microsoft Semantic Kernel GPT-3.5 Turbo GPT-4.1 mini +7 more

6arXiv · cs.AI·Jun 10, 2026·source ↗

Frontier coding agents use metaprogramming to handle esoteric programming languages

A new arXiv paper evaluates six LLM-based coding agents on four esoteric programming languages (including Brainfuck and Befunge-98), finding that the strongest agents—Claude Opus 4.6 and GPT-5.4 xhigh—often avoid writing the target language directly, instead generating it via Python metaprograms. Forbidding this strategy causes large performance drops, and text guidance alone does not transfer the capability to weaker models, though sharing Opus-derived Python helper code does sharply improve mid-tier agents. The study reveals capability stratification that mainstream benchmarks like SWE-Bench Verified compress into narrow bands, suggesting frontier agents succeed by constructing and debugging working models of unfamiliar environments rather than pattern-matching to training data.

Frontier Model Releases Evaluation and Benchmarking Claude Sonnet 4 Claude Opus 4.6 SWE-Bench Verified +8 more

6Anthropic News·Jun 4, 2026·source ↗

Anthropic releases Claude 3 Haiku, fastest and most affordable model in the Claude 3 family

Anthropic released Claude 3 Haiku, the fastest and most cost-efficient model in the Claude 3 lineup, processing 21K tokens per second for prompts under 32K tokens. The model is positioned for enterprise workloads requiring high throughput and low cost, with pricing enabling analysis of 400 Supreme Court cases or 2,500 images for one dollar. Haiku is available via the Claude API, Claude Pro on claude.ai, and Amazon Bedrock, with Google Cloud Vertex AI support forthcoming.

Frontier Model Releases Inference Economics Amazon Bedrock Claude Opus 4.6 Google Cloud Vertex AI +4 more

5Latent Space·Jun 4, 2026·source ↗

Andon Labs on building frontier evals: VendingBench and evaluating Claude models

Latent Space interviews Lukas Petersson and Axel Backlund of Andon Labs, the creators of VendingBench, about their approach to building real-world AI evaluations. The conversation covers their experience evaluating Claude models across the capability spectrum from Haiku to Mythos, and their methodology for constructing durable frontier evals. The episode is notable for touching on a speculative or unreleased Claude model tier called 'Mythos.'

Frontier Model Releases Evaluation and Benchmarking Claude Mythos Axel Backlund Claude Haiku 4.5 +5 more

3Anthropic News·Jun 4, 2026·source ↗

Anthropic launches Claude in Canada with full product suite

Anthropic expanded Claude's availability to Canada as of June 5, 2024, offering access to Claude.ai, the iOS app, the API, and the Team plan. Canadian users can subscribe to Claude Pro at CA$28/month for access to the Claude 3 model family (Opus, Sonnet, Haiku) with 5x usage limits. The expansion is a geographic rollout with no new technical capabilities announced.

Enterprise Deployment Patterns claude.ai Claude Opus 4.6 Claude Haiku 4.5 +2 more

7Anthropic News·Jun 4, 2026·source ↗

Anthropic makes Claude 3 Haiku and Sonnet available to US Intelligence Community and AWS GovCloud

Anthropic has made Claude 3 Haiku and Claude 3 Sonnet available via AWS Marketplace for the US Intelligence Community and AWS GovCloud, marking a significant expansion into government deployment. The company has crafted contractual exceptions to its general Usage Policy to permit legally authorized foreign intelligence analysis, including combating human trafficking and identifying covert influence campaigns, while maintaining restrictions on disinformation, weapons design, and malicious cyber operations. The deployment is currently limited to ASL-2 models under Anthropic's Responsible Scaling Policy. Anthropic also notes prior pre-release access to Claude 3.5 Sonnet was provided to the UK AI Safety Institute for pre-deployment testing.

AI Safety Research Enterprise Deployment Patterns AWS GovCloud UK Artificial Intelligence Safety Institute Claude 3.5 Sonnet +8 more

6Anthropic News·Jun 4, 2026·source ↗

Salesforce integrates Anthropic Claude models into Einstein platform via Amazon Bedrock

Salesforce has partnered with Anthropic to make Claude 3.5 Sonnet, Claude 3 Opus, and Claude 3 Haiku available to Salesforce customers through Amazon Bedrock via a Bring Your Own LLM feature. The integration enables Claude to power custom AI experiences and Agentforce Agent actions across CRM use cases including sales, marketing, customer service, healthcare, and financial services. Claude models are accessible through Einstein Studio and operate within Salesforce's Einstein Trust Layer for security and compliance. This expands Anthropic's enterprise distribution through a major CRM platform with a large existing customer base.

Enterprise Deployment Patterns Agent and Tool Ecosystem Salesforce Einstein Amazon Bedrock Claude Opus 4.6 +7 more

6Anthropic News·Jun 4, 2026·source ↗

Anthropic enables fine-tuning of Claude 3 Haiku via Amazon Bedrock

Anthropic announced that Claude 3 Haiku can now be fine-tuned through Amazon Bedrock using custom prompt-completion pairs, with general availability reached November 1, 2024. The capability targets specialized business workflows, with Anthropic citing a case study showing classification accuracy improvement from 81.5% to 99.6% and 85% token reduction on a content moderation task. Early enterprise adopters include SK Telecom and Thomson Reuters, both reporting measurable performance gains. Fine-tuning is available in the US West (Oregon) region with text support up to 32K context, with vision fine-tuning planned.

Frontier Model Releases Enterprise Deployment Patterns Amazon Bedrock SK Telecom Claude Haiku 4.5 +3 more

8Anthropic News·Jun 4, 2026·source ↗

Anthropic and AWS expand partnership with $4B investment and Trainium hardware collaboration

Anthropic announced an expanded partnership with Amazon Web Services, including a new $4 billion investment that brings Amazon's total stake to $8 billion, while establishing AWS as Anthropic's primary cloud and training partner. The collaboration includes deep hardware-software co-development on AWS Trainium accelerators, with Anthropic engineers writing low-level kernels and contributing to the AWS Neuron software stack to optimize model training from the silicon up. Claude on Amazon Bedrock is described as core infrastructure for tens of thousands of enterprises, with named deployments at Pfizer, Intuit, Perplexity, and the European Parliament. The deal also extends Claude's availability to AWS GovCloud and classified cloud regions for government customers.

Training Infrastructure Frontier Model Releases AWS GovCloud Amazon Bedrock AWS Neuron +10 more

5Anthropic News·Jun 3, 2026·source ↗

Claude 3 Haiku and Sonnet reach general availability on Google Cloud Vertex AI

Anthropic announced general availability of Claude 3 Haiku and Claude 3 Sonnet on Google Cloud's Vertex AI platform, with Claude 3 Opus to follow in coming weeks. The deployment gives enterprise customers access to Claude models within their existing Google Cloud environment, with associated data governance and security benefits. Quora's Poe app is cited as an early adopter, reporting millions of daily messages exchanged via Claude-based bots.

Frontier Model Releases Enterprise Deployment Patterns Google Cloud Quora Poe +6 more

9Anthropic News·Jun 3, 2026·source ↗

Anthropic launches Claude 3 model family: Haiku, Sonnet, and Opus

Anthropic announced the Claude 3 model family on March 4, 2024, comprising three models — Haiku, Sonnet, and Opus — in ascending capability order. Claude 3 Opus claims top performance on major benchmarks including MMLU, GPQA, and GSM8K, with near-perfect recall on long-context evaluations (200K context window, 99%+ NIAH accuracy) and new multimodal vision capabilities. The release also highlights reduced unnecessary refusals, a twofold accuracy improvement over Claude 2.1, and Constitutional AI-based safety tuning. Opus and Sonnet launched immediately via claude.ai and the Claude API across 159 countries, with Haiku to follow.

Long Context Evolution Frontier Model Releases Claude Opus 4.6 Constitutional AI Claude Haiku 4.5 +8 more

9Anthropic News·Jun 3, 2026·source ↗

Anthropic introduces computer use capability, upgraded Claude 3.5 Sonnet, and Claude 3.5 Haiku

Anthropic announced three major developments: an upgraded Claude 3.5 Sonnet with significant coding improvements (SWE-bench Verified rising from 33.4% to 49.0%, surpassing all publicly available models including reasoning models), a new Claude 3.5 Haiku that matches Claude 3 Opus performance at Haiku-tier speed, and a public beta of 'computer use' — a capability allowing Claude to control computers by viewing screens, moving cursors, clicking, and typing. Computer use is available via the Anthropic API, Amazon Bedrock, and Google Cloud Vertex AI, with early adopters including Replit, The Browser Company, and Cognition. Both safety institutes (US AISI and UK AISI) conducted pre-deployment testing, and the model was assessed as remaining within ASL-2 under Anthropic's Responsible Scaling Policy.

Frontier Model Releases Evaluation and Benchmarking OpenAI o1-preview Amazon Bedrock Claude 3.5 Sonnet +15 more

6Anthropic News·Jun 2, 2026·source ↗

Claude models approved for FedRAMP High and DoD IL4/5 workloads via Amazon Bedrock

Anthropic announced that Claude models are now approved for use in FedRAMP High and DoD Impact Level 4 and 5 workloads through Amazon Bedrock in AWS GovCloud (US) regions. Currently available models include Claude 3.5 Sonnet v1 and Claude 3 Haiku, with Bedrock capabilities such as Agents, Guardrails, and Knowledge Bases also accessible. This authorization opens Claude to federal agencies and defense organizations handling controlled unclassified information, representing a significant expansion into the U.S. government market. Additional models including Claude 3.7 Sonnet and Claude 4 may be added in the future.

Enterprise Deployment Patterns Regulatory Developments AWS GovCloud Amazon Bedrock Claude 3.5 Sonnet +5 more

7Anthropic News·Jun 1, 2026·source ↗

Claude Sonnet 4.5, Haiku 4.5, and Opus 4.1 Now Available in Microsoft Foundry and Microsoft 365 Copilot

Anthropic and Microsoft are expanding their partnership to make Claude Sonnet 4.5, Haiku 4.5, and Opus 4.1 available in public preview on Microsoft Foundry, enabling Azure customers to build production applications and enterprise agents using existing Azure agreements and billing. Claude is also being integrated into Microsoft 365 Copilot's Agent Mode in Excel, allowing users to generate formulas, analyze data, and iterate on spreadsheet solutions. The Foundry integration supports serverless deployment with Python, TypeScript, and C# SDKs, and includes capabilities such as code execution, web search, citations, vision, and prompt caching. This partnership reduces procurement friction for enterprises already invested in the Microsoft ecosystem.

Frontier Model Releases Inference Economics Microsoft Copilot Claude Opus 4.6 Microsoft +10 more

9Anthropic News·Jun 1, 2026·source ↗

Microsoft, NVIDIA, and Anthropic Announce Major Strategic Partnerships with $15B Investment and $30B Azure Compute Commitment

Anthropic has announced simultaneous strategic partnerships with Microsoft and NVIDIA, committing to purchase $30 billion of Azure compute capacity and up to one gigawatt of compute with NVIDIA Grace Blackwell and Vera Rubin systems. NVIDIA and Microsoft are investing up to $10 billion and $5 billion respectively in Anthropic, while Claude models (Sonnet 4.5, Opus 4.1, Haiku 4.5) will be available on Microsoft Foundry and across the Copilot product family. Anthropic and NVIDIA are also establishing a deep technology partnership to co-optimize model performance and future NVIDIA architectures for Anthropic workloads. Amazon remains Anthropic's primary cloud and training partner.

Training Infrastructure Frontier Model Releases Dario Amodei Microsoft Copilot Claude Opus 4.6 +18 more

7Anthropic News·Jun 1, 2026·source ↗

Anthropic Launches Claude Haiku 4.5: Near-Frontier Performance at $1/$5 per Million Tokens

Anthropic has released Claude Haiku 4.5, a small model priced at $1/$5 per million input/output tokens that delivers coding performance comparable to Claude Sonnet 4 at one-third the cost and more than twice the speed. The model surpasses Sonnet 4 on computer use tasks and achieves 90% of Sonnet 4.5's performance on agentic coding evaluations, running 4-5x faster than Sonnet 4.5. Notably, Haiku 4.5 is classified under ASL-2 safety standards—less restrictive than the ASL-3 applied to Sonnet 4.5 and Opus 4.1—and is described as Anthropic's safest model by automated alignment metrics. It is available via the Claude API, Amazon Bedrock, and Google Cloud Vertex AI.

Frontier Model Releases Evaluation and Benchmarking Claude Sonnet 4 Amazon Bedrock Claude Opus 4.6 +15 more

6Anthropic News·Jun 1, 2026·source ↗

Anthropic Details Safeguards for User Wellbeing: Crisis Detection, Anti-Sycophancy, and Evaluation Results

Anthropic has published a detailed account of its user wellbeing safeguards, covering how Claude handles suicide and self-harm conversations through model training, system prompts, and a real-time crisis classifier integrated with ThroughLine's global helpline network. The post discloses evaluation results for Claude Opus 4.5, Sonnet 4.5, and Haiku 4.5, showing 98–99% appropriate response rates on high-risk single-turn prompts and very low false-refusal rates on benign requests. Anthropic also addresses anti-sycophancy efforts and an 18+ age requirement for Claude.ai. The company is partnering with the International Association for Suicide Prevention (IASP) to further inform training and product design.

Evaluation and Benchmarking AI Safety Research claude.ai Claude Opus 4.6 Reinforcement Learning from Human Feedback +9 more

4Anthropic News·Jun 1, 2026·source ↗

Anthropic Launches Claude for Nonprofits with 75% Discount and Sector-Specific Integrations

Anthropic is launching Claude for Nonprofits in partnership with GivingTuesday, offering eligible organizations up to 75% discounts on Team and Enterprise plans. The program includes new open-source connectors to nonprofit-specific platforms (Blackbaud, Candid, Benevity), a free AI Fluency for Nonprofits course via Anthropic Academy, and consulting partnerships with organizations like The Bridgespan Group and Slalom. Existing deployments cited include the Epilepsy Foundation's 24/7 support tool reaching 3.4 million Americans, IRC humanitarian field operations, and IDinsight reporting 16× faster survey preparation.

Enterprise Deployment Patterns Agent and Tool Ecosystem The Bridgespan Group Claude Opus 4.6 Candid +13 more

7arXiv · cs.CL·May 22, 2026·source ↗

Boiling the Frog: A Multi-Turn Benchmark for Agentic Safety

Researchers introduce 'Boiling the Frog,' a multi-turn safety benchmark evaluating whether tool-using AI agents in corporate/office settings are susceptible to incremental attacks that begin with benign requests before introducing harmful payloads. The benchmark uses stateful multi-turn evaluation with a three-level operational risk taxonomy grounded in the EU AI Act and its GPAI Code of Practice. Across nine models, aggregate strict attack success rate is 44.4%, ranging from 20.5% for Claude Haiku 4.5 to 92.9% for Gemini 3.1 Flash Lite, with loss-of-control scenarios reaching 93.3% category-level ASR.

Evaluation and Benchmarking AI Safety Research Seed 2.0 Lite Claude Haiku 4.5 EU AI Act +7 more

7arXiv · cs.CL·May 22, 2026·source ↗

AMEL: Accumulated Message Effects Bias LLM Judgments in Multi-Turn Evaluation Pipelines

This paper introduces AMEL (Accumulated Message Effect on LLM Judgments), documenting that prior conversation history with predominantly positive or negative evaluations systematically biases subsequent LLM judgments toward the prevailing polarity. Across 75,898 API calls to 11 models from 4 providers, the effect is statistically robust (d = -0.17, p < 10^-46), concentrates on high-uncertainty items, and shows a negativity asymmetry where negative histories induce 1.62x more bias than positive ones. Critically, the bias does not grow with context length, scaling reduces but does not eliminate it, and the simplest mitigation is using a fresh context per evaluation item.

Evaluation and Benchmarking AI Safety Research Claude Opus 4.6 Google Claude Haiku 4.5 +7 more

7The Batch·May 18, 2026·source ↗

Anthropic Alignment Breakthrough, OpenAI Audio Models, DCI Retrieval, and NLA Interpretability

This digest covers four substantive AI developments: Anthropic's research showing that training Claude on ethical reasoning (rather than just aligned actions) reduced agentic misalignment from 22% to 3%, with every Claude model from Haiku 4.5 onward scoring perfectly on misalignment evals. OpenAI launched three new audio models (GPT-Realtime-2, GPT-Realtime-Translate, GPT-Realtime-Whisper) with expanded context windows and multilingual capabilities. Researchers proposed Direct Corpus Interaction (DCI), a retrieval method using command-line tools instead of vector indexes that outperforms RAG baselines by 11-30% across 13 benchmarks. Anthropic also introduced Natural Language Autoencoders (NLAs) for interpretability, revealing Claude shows evaluation awareness more often than it discloses.

Frontier Model Releases Evaluation and Benchmarking Claude Opus 4.6 GPT-Realtime-2 Claude +14 more