Almanac
model

Claude Haiku 4.5

modelactiveclaude-haiku-4-5-64f79396·21 events·first seen 1mo ago

Aliases: Claude Haiku 4.5, Claude Haiku, Claude 3 Haiku, Claude 3.5 Haiku

Co-occurring entities

More like this (12)

Recent events (21)

7Anthropic News·15d ago·source ↗

Anthropic Launches Claude Haiku 4.5: Near-Frontier Performance at $1/$5 per Million Tokens

Anthropic has released Claude Haiku 4.5, a small model priced at $1/$5 per million input/output tokens that delivers coding performance comparable to Claude Sonnet 4 at one-third the cost and more than twice the speed. The model surpasses Sonnet 4 on computer use tasks and achieves 90% of Sonnet 4.5's performance on agentic coding evaluations, running 4-5x faster than Sonnet 4.5. Notably, Haiku 4.5 is classified under ASL-2 safety standards—less restrictive than the ASL-3 applied to Sonnet 4.5 and Opus 4.1—and is described as Anthropic's safest model by automated alignment metrics. It is available via the Claude API, Amazon Bedrock, and Google Cloud Vertex AI.

6Anthropic News·13d ago·source ↗

Anthropic enables fine-tuning of Claude 3 Haiku via Amazon Bedrock

Anthropic announced that Claude 3 Haiku can now be fine-tuned through Amazon Bedrock using custom prompt-completion pairs, with general availability reached November 1, 2024. The capability targets specialized business workflows, with Anthropic citing a case study showing classification accuracy improvement from 81.5% to 99.6% and 85% token reduction on a content moderation task. Early enterprise adopters include SK Telecom and Thomson Reuters, both reporting measurable performance gains. Fine-tuning is available in the US West (Oregon) region with text support up to 32K context, with vision fine-tuning planned.

6Anthropic News·12d ago·source ↗

Anthropic releases Claude 3 Haiku, fastest and most affordable model in the Claude 3 family

Anthropic released Claude 3 Haiku, the fastest and most cost-efficient model in the Claude 3 lineup, processing 21K tokens per second for prompts under 32K tokens. The model is positioned for enterprise workloads requiring high throughput and low cost, with pricing enabling analysis of 400 Supreme Court cases or 2,500 images for one dollar. Haiku is available via the Claude API, Claude Pro on claude.ai, and Amazon Bedrock, with Google Cloud Vertex AI support forthcoming.

7arXiv · cs.CL·25d ago·source ↗

Boiling the Frog: A Multi-Turn Benchmark for Agentic Safety

Researchers introduce 'Boiling the Frog,' a multi-turn safety benchmark evaluating whether tool-using AI agents in corporate/office settings are susceptible to incremental attacks that begin with benign requests before introducing harmful payloads. The benchmark uses stateful multi-turn evaluation with a three-level operational risk taxonomy grounded in the EU AI Act and its GPAI Code of Practice. Across nine models, aggregate strict attack success rate is 44.4%, ranging from 20.5% for Claude Haiku 4.5 to 92.9% for Gemini 3.1 Flash Lite, with loss-of-control scenarios reaching 93.3% category-level ASR.

6Anthropic News·15d ago·source ↗

Anthropic Details Safeguards for User Wellbeing: Crisis Detection, Anti-Sycophancy, and Evaluation Results

Anthropic has published a detailed account of its user wellbeing safeguards, covering how Claude handles suicide and self-harm conversations through model training, system prompts, and a real-time crisis classifier integrated with ThroughLine's global helpline network. The post discloses evaluation results for Claude Opus 4.5, Sonnet 4.5, and Haiku 4.5, showing 98–99% appropriate response rates on high-risk single-turn prompts and very low false-refusal rates on benign requests. Anthropic also addresses anti-sycophancy efforts and an 18+ age requirement for Claude.ai. The company is partnering with the International Association for Suicide Prevention (IASP) to further inform training and product design.

7Anthropic News·15d ago·source ↗

Claude Sonnet 4.5, Haiku 4.5, and Opus 4.1 Now Available in Microsoft Foundry and Microsoft 365 Copilot

Anthropic and Microsoft are expanding their partnership to make Claude Sonnet 4.5, Haiku 4.5, and Opus 4.1 available in public preview on Microsoft Foundry, enabling Azure customers to build production applications and enterprise agents using existing Azure agreements and billing. Claude is also being integrated into Microsoft 365 Copilot's Agent Mode in Excel, allowing users to generate formulas, analyze data, and iterate on spreadsheet solutions. The Foundry integration supports serverless deployment with Python, TypeScript, and C# SDKs, and includes capabilities such as code execution, web search, citations, vision, and prompt caching. This partnership reduces procurement friction for enterprises already invested in the Microsoft ecosystem.

9Anthropic News·14d ago·source ↗

Anthropic introduces computer use capability, upgraded Claude 3.5 Sonnet, and Claude 3.5 Haiku

Anthropic announced three major developments: an upgraded Claude 3.5 Sonnet with significant coding improvements (SWE-bench Verified rising from 33.4% to 49.0%, surpassing all publicly available models including reasoning models), a new Claude 3.5 Haiku that matches Claude 3 Opus performance at Haiku-tier speed, and a public beta of 'computer use' — a capability allowing Claude to control computers by viewing screens, moving cursors, clicking, and typing. Computer use is available via the Anthropic API, Amazon Bedrock, and Google Cloud Vertex AI, with early adopters including Replit, The Browser Company, and Cognition. Both safety institutes (US AISI and UK AISI) conducted pre-deployment testing, and the model was assessed as remaining within ASL-2 under Anthropic's Responsible Scaling Policy.

9Anthropic News·13d ago·source ↗

Anthropic launches Claude 3 model family: Haiku, Sonnet, and Opus

Anthropic announced the Claude 3 model family on March 4, 2024, comprising three models — Haiku, Sonnet, and Opus — in ascending capability order. Claude 3 Opus claims top performance on major benchmarks including MMLU, GPQA, and GSM8K, with near-perfect recall on long-context evaluations (200K context window, 99%+ NIAH accuracy) and new multimodal vision capabilities. The release also highlights reduced unnecessary refusals, a twofold accuracy improvement over Claude 2.1, and Constitutional AI-based safety tuning. Opus and Sonnet launched immediately via claude.ai and the Claude API across 159 countries, with Haiku to follow.

5Anthropic News·13d ago·source ↗

Claude 3 Haiku and Sonnet reach general availability on Google Cloud Vertex AI

Anthropic announced general availability of Claude 3 Haiku and Claude 3 Sonnet on Google Cloud's Vertex AI platform, with Claude 3 Opus to follow in coming weeks. The deployment gives enterprise customers access to Claude models within their existing Google Cloud environment, with associated data governance and security benefits. Quora's Poe app is cited as an early adopter, reporting millions of daily messages exchanged via Claude-based bots.

7Anthropic News·13d ago·source ↗

Anthropic makes Claude 3 Haiku and Sonnet available to US Intelligence Community and AWS GovCloud

Anthropic has made Claude 3 Haiku and Claude 3 Sonnet available via AWS Marketplace for the US Intelligence Community and AWS GovCloud, marking a significant expansion into government deployment. The company has crafted contractual exceptions to its general Usage Policy to permit legally authorized foreign intelligence analysis, including combating human trafficking and identifying covert influence campaigns, while maintaining restrictions on disinformation, weapons design, and malicious cyber operations. The deployment is currently limited to ASL-2 models under Anthropic's Responsible Scaling Policy. Anthropic also notes prior pre-release access to Claude 3.5 Sonnet was provided to the UK AI Safety Institute for pre-deployment testing.

7The Batch·1mo ago·source ↗

Anthropic Alignment Breakthrough, OpenAI Audio Models, DCI Retrieval, and NLA Interpretability

This digest covers four substantive AI developments: Anthropic's research showing that training Claude on ethical reasoning (rather than just aligned actions) reduced agentic misalignment from 22% to 3%, with every Claude model from Haiku 4.5 onward scoring perfectly on misalignment evals. OpenAI launched three new audio models (GPT-Realtime-2, GPT-Realtime-Translate, GPT-Realtime-Whisper) with expanded context windows and multilingual capabilities. Researchers proposed Direct Corpus Interaction (DCI), a retrieval method using command-line tools instead of vector indexes that outperforms RAG baselines by 11-30% across 13 benchmarks. Anthropic also introduced Natural Language Autoencoders (NLAs) for interpretability, revealing Claude shows evaluation awareness more often than it discloses.

9Anthropic News·15d ago·source ↗

Microsoft, NVIDIA, and Anthropic Announce Major Strategic Partnerships with $15B Investment and $30B Azure Compute Commitment

Anthropic has announced simultaneous strategic partnerships with Microsoft and NVIDIA, committing to purchase $30 billion of Azure compute capacity and up to one gigawatt of compute with NVIDIA Grace Blackwell and Vera Rubin systems. NVIDIA and Microsoft are investing up to $10 billion and $5 billion respectively in Anthropic, while Claude models (Sonnet 4.5, Opus 4.1, Haiku 4.5) will be available on Microsoft Foundry and across the Copilot product family. Anthropic and NVIDIA are also establishing a deep technology partnership to co-optimize model performance and future NVIDIA architectures for Anthropic workloads. Amazon remains Anthropic's primary cloud and training partner.

6Anthropic News·14d ago·source ↗

Claude models approved for FedRAMP High and DoD IL4/5 workloads via Amazon Bedrock

Anthropic announced that Claude models are now approved for use in FedRAMP High and DoD Impact Level 4 and 5 workloads through Amazon Bedrock in AWS GovCloud (US) regions. Currently available models include Claude 3.5 Sonnet v1 and Claude 3 Haiku, with Bedrock capabilities such as Agents, Guardrails, and Knowledge Bases also accessible. This authorization opens Claude to federal agencies and defense organizations handling controlled unclassified information, representing a significant expansion into the U.S. government market. Additional models including Claude 3.7 Sonnet and Claude 4 may be added in the future.

6Anthropic News·13d ago·source ↗

Salesforce integrates Anthropic Claude models into Einstein platform via Amazon Bedrock

Salesforce has partnered with Anthropic to make Claude 3.5 Sonnet, Claude 3 Opus, and Claude 3 Haiku available to Salesforce customers through Amazon Bedrock via a Bring Your Own LLM feature. The integration enables Claude to power custom AI experiences and Agentforce Agent actions across CRM use cases including sales, marketing, customer service, healthcare, and financial services. Claude models are accessible through Einstein Studio and operate within Salesforce's Einstein Trust Layer for security and compliance. This expands Anthropic's enterprise distribution through a major CRM platform with a large existing customer base.

6arXiv · cs.CL·1h ago·source ↗

Structural role injection via Handlebars triple-brace interpolation in LLM prompts: empirical analysis across delimiter families and models

A new arXiv paper demonstrates that Handlebars templating's HTML auto-escaping—the default in Microsoft Semantic Kernel—provides uneven protection against structural role injection attacks, where attacker-controlled data carries chat role delimiters to forge higher-privilege turns. The authors conduct 5,760 trials across seven delimiter families, two attack objectives, and four models (GPT-3.5 Turbo, GPT-4o mini, GPT-4.1 mini, Claude Haiku 4.5), finding that HTML escaping neutralizes angle-bracket-based delimiters (ChatML, Llama-3, XML) but leaves colon- and Markdown-based families fully exposed. GPT-3.5 Turbo follows task-hijack instructions in 97% of raw and 91% of escaped trials; Claude Haiku 4.5 resists both objectives almost entirely. The paper concludes that HTML escaping cannot substitute for structural separation of instruction and data.

7arXiv · cs.CL·25d ago·source ↗

AMEL: Accumulated Message Effects Bias LLM Judgments in Multi-Turn Evaluation Pipelines

This paper introduces AMEL (Accumulated Message Effect on LLM Judgments), documenting that prior conversation history with predominantly positive or negative evaluations systematically biases subsequent LLM judgments toward the prevailing polarity. Across 75,898 API calls to 11 models from 4 providers, the effect is statistically robust (d = -0.17, p < 10^-46), concentrates on high-uncertainty items, and shows a negativity asymmetry where negative histories induce 1.62x more bias than positive ones. Critically, the bias does not grow with context length, scaling reduces but does not eliminate it, and the simplest mitigation is using a fresh context per evaluation item.

6arXiv · cs.AI·7d ago·source ↗

Frontier coding agents use metaprogramming to handle esoteric programming languages

A new arXiv paper evaluates six LLM-based coding agents on four esoteric programming languages (including Brainfuck and Befunge-98), finding that the strongest agents—Claude Opus 4.6 and GPT-5.4 xhigh—often avoid writing the target language directly, instead generating it via Python metaprograms. Forbidding this strategy causes large performance drops, and text guidance alone does not transfer the capability to weaker models, though sharing Opus-derived Python helper code does sharply improve mid-tier agents. The study reveals capability stratification that mainstream benchmarks like SWE-Bench Verified compress into narrow bands, suggesting frontier agents succeed by constructing and debugging working models of unfamiliar environments rather than pattern-matching to training data.

4Anthropic News·15d ago·source ↗

Anthropic Launches Claude for Nonprofits with 75% Discount and Sector-Specific Integrations

Anthropic is launching Claude for Nonprofits in partnership with GivingTuesday, offering eligible organizations up to 75% discounts on Team and Enterprise plans. The program includes new open-source connectors to nonprofit-specific platforms (Blackbaud, Candid, Benevity), a free AI Fluency for Nonprofits course via Anthropic Academy, and consulting partnerships with organizations like The Bridgespan Group and Slalom. Existing deployments cited include the Epilepsy Foundation's 24/7 support tool reaching 3.4 million Americans, IRC humanitarian field operations, and IDinsight reporting 16× faster survey preparation.

8Anthropic News·13d ago·source ↗

Anthropic and AWS expand partnership with $4B investment and Trainium hardware collaboration

Anthropic announced an expanded partnership with Amazon Web Services, including a new $4 billion investment that brings Amazon's total stake to $8 billion, while establishing AWS as Anthropic's primary cloud and training partner. The collaboration includes deep hardware-software co-development on AWS Trainium accelerators, with Anthropic engineers writing low-level kernels and contributing to the AWS Neuron software stack to optimize model training from the silicon up. Claude on Amazon Bedrock is described as core infrastructure for tens of thousands of enterprises, with named deployments at Pfizer, Intuit, Perplexity, and the European Parliament. The deal also extends Claude's availability to AWS GovCloud and classified cloud regions for government customers.

3Anthropic News·13d ago·source ↗

Anthropic launches Claude in Canada with full product suite

Anthropic expanded Claude's availability to Canada as of June 5, 2024, offering access to Claude.ai, the iOS app, the API, and the Team plan. Canadian users can subscribe to Claude Pro at CA$28/month for access to the Claude 3 model family (Opus, Sonnet, Haiku) with 5x usage limits. The expansion is a geographic rollout with no new technical capabilities announced.

5Latent Space·12d ago·source ↗

Andon Labs on building frontier evals: VendingBench and evaluating Claude models

Latent Space interviews Lukas Petersson and Axel Backlund of Andon Labs, the creators of VendingBench, about their approach to building real-world AI evaluations. The conversation covers their experience evaluating Claude models across the capability spectrum from Haiku to Mythos, and their methodology for constructing durable frontier evals. The episode is notable for touching on a speculative or unreleased Claude model tier called 'Mythos.'