Almanac
model

Claude 3.5 Sonnet

modelactiveprovisionalclaude-3-5-sonnet-eda1cf6e·15 events·first seen 1mo ago

Aliases: Claude 3.5 Sonnet, Claude-3.5 Sonnet

Co-occurring entities

More like this (12)

Recent events (15)

8Anthropic News·15d ago·source ↗

Anthropic Releases Computer Use Capability for Claude 3.5 Sonnet

Anthropic has launched a public beta of computer use for Claude 3.5 Sonnet, enabling the model to control a computer by interpreting screenshots and issuing pixel-level cursor and keyboard commands. The model achieves 14.9% on the OSWorld benchmark, roughly double the next-best AI model's 7.7%, though well below human-level performance of 70-75%. Anthropic trained the model on a small set of simple software tools and found it generalized rapidly to broader computer interaction. Safety analysis confirmed the capability remains at AI Safety Level 2, with prompt injection identified as a primary near-term risk.

8Anthropic News·15d ago·source ↗

Introducing Claude 3.5 Sonnet

Anthropic launches Claude 3.5 Sonnet, the first model in its Claude 3.5 family, claiming it outperforms Claude 3 Opus and competitor models on GPQA, MMLU, and HumanEval benchmarks while operating at twice the speed and mid-tier pricing ($3/$15 per million tokens). The model features a 200K context window, improved vision capabilities, and an internal agentic coding evaluation score of 64% versus 38% for Opus. Alongside the model, Anthropic introduces Artifacts on Claude.ai, a dedicated workspace for real-time editing of AI-generated content. The model was pre-deployment evaluated by the UK AI Safety Institute and assessed at ASL-2.

9Anthropic News·14d ago·source ↗

Anthropic introduces computer use capability, upgraded Claude 3.5 Sonnet, and Claude 3.5 Haiku

Anthropic announced three major developments: an upgraded Claude 3.5 Sonnet with significant coding improvements (SWE-bench Verified rising from 33.4% to 49.0%, surpassing all publicly available models including reasoning models), a new Claude 3.5 Haiku that matches Claude 3 Opus performance at Haiku-tier speed, and a public beta of 'computer use' — a capability allowing Claude to control computers by viewing screens, moving cursors, clicking, and typing. Computer use is available via the Anthropic API, Amazon Bedrock, and Google Cloud Vertex AI, with early adopters including Replit, The Browser Company, and Cognition. Both safety institutes (US AISI and UK AISI) conducted pre-deployment testing, and the model was assessed as remaining within ASL-2 under Anthropic's Responsible Scaling Policy.

7Anthropic News·14d ago·source ↗

Claude 3.5 Sonnet begins rollout on GitHub Copilot via Amazon Bedrock

Anthropic's Claude 3.5 Sonnet is now rolling out on GitHub Copilot, available in public preview for all Copilot Chat users in Visual Studio Code and GitHub.com. The model claims top performance on SWE-bench Verified among publicly available models and 93.7% on HumanEval. The integration runs via Amazon Bedrock's cross-region inference and reaches GitHub's community of over 100 million developers, representing a significant distribution milestone for Claude.

6Anthropic News·14d ago·source ↗

Claude models approved for FedRAMP High and DoD IL4/5 workloads via Amazon Bedrock

Anthropic announced that Claude models are now approved for use in FedRAMP High and DoD Impact Level 4 and 5 workloads through Amazon Bedrock in AWS GovCloud (US) regions. Currently available models include Claude 3.5 Sonnet v1 and Claude 3 Haiku, with Bedrock capabilities such as Agents, Guardrails, and Knowledge Bases also accessible. This authorization opens Claude to federal agencies and defense organizations handling controlled unclassified information, representing a significant expansion into the U.S. government market. Additional models including Claude 3.7 Sonnet and Claude 4 may be added in the future.

6Anthropic News·13d ago·source ↗

Salesforce integrates Anthropic Claude models into Einstein platform via Amazon Bedrock

Salesforce has partnered with Anthropic to make Claude 3.5 Sonnet, Claude 3 Opus, and Claude 3 Haiku available to Salesforce customers through Amazon Bedrock via a Bring Your Own LLM feature. The integration enables Claude to power custom AI experiences and Agentforce Agent actions across CRM use cases including sales, marketing, customer service, healthcare, and financial services. Claude models are accessible through Einstein Studio and operate within Salesforce's Einstein Trust Layer for security and compliance. This expands Anthropic's enterprise distribution through a major CRM platform with a large existing customer base.

8Anthropic News·1mo ago·source ↗

Anthropic Open-Sources the Model Context Protocol (MCP)

Anthropic has released the Model Context Protocol (MCP), an open standard enabling secure, two-way connections between AI assistants and external data sources such as business tools, content repositories, and development environments. The protocol introduces a client-server architecture with SDKs, local MCP server support in Claude Desktop, and a repository of pre-built connectors for systems like GitHub, Slack, Google Drive, and Postgres. Early adopters include Block and Apollo, with development tool companies Zed, Replit, Codeium, and Sourcegraph integrating MCP into their platforms. The goal is to replace fragmented, per-source integrations with a single universal protocol, improving context availability for AI agents.

6Anthropic News·12d ago·source ↗

Anthropic launches Projects feature for Claude.ai Pro and Team users

Anthropic introduced Projects for Claude.ai Pro and Team subscribers, allowing users to organize chats with curated knowledge bases, custom instructions, and a 200K context window per project. The feature also includes Artifacts (a side-by-side content generation and preview pane) and team-level conversation sharing via activity feeds. Projects are powered by Claude 3.5 Sonnet and include a privacy commitment that shared data will not be used for model training without explicit consent.

6Anthropic News·15d ago·source ↗

Anthropic Opens Tokyo Office, Signs AI Safety MoC with Japan AI Safety Institute

Anthropic has officially opened its first Asia-Pacific office in Tokyo, with CEO Dario Amodei meeting Japanese Prime Minister Takaichi and signing a Memorandum of Cooperation with the Japan AI Safety Institute to collaborate on AI evaluation methodologies. The company also joined the Hiroshima AI Process Friends Group and hosted a Builder Summit for 150+ startups. Japanese enterprise deployments of Claude are highlighted across Rakuten, Nomura Research Institute, Panasonic, and Classmethod, with Anthropic reporting 10x run-rate revenue growth in Asia-Pacific over the past year. Expansion to Seoul and Bengaluru is planned for coming months.

4Anthropic News·13d ago·source ↗

Claude launches in Brazil with consumer and API access

Anthropic has made Claude available in Brazil, offering access via Claude.ai, Android and iOS mobile apps, and the Anthropic API. Pricing is localized in Brazilian reais, with Pro and Team plans at R$110 and R$165 per user per month respectively. The launch extends Claude's geographic footprint into a major Latin American market.

5Mistral Ai News·1mo ago·source ↗

Pixtral 12B: Mistral AI's First Multimodal Model (Now Deprecated)

Mistral AI released Pixtral 12B in September 2024 as their first natively multimodal model, combining a new 400M parameter vision encoder trained from scratch with a 12B multimodal decoder based on Mistral Nemo. The model supports variable image sizes and aspect ratios, a 128K token context window for multiple images, and achieved 52.5% on MMMU while maintaining strong text-only benchmark performance. The model is now deprecated and has been replaced by newer vision and multimodal models from Mistral. It was released under Apache 2.0 license.

7Mistral Ai News·1mo ago·source ↗

Pixtral Large: Mistral AI's 124B Open-Weights Multimodal Model

Mistral AI released Pixtral Large, a 124B open-weights multimodal model built on Mistral Large 2, featuring a 1B parameter vision encoder and 128K context window supporting at least 30 high-resolution images. The model claims state-of-the-art results on MathVista, DocVQA, and ChartQA, outperforming GPT-4o and Gemini-1.5 Pro on several benchmarks, and leads the LMSys Vision Leaderboard among open-weights models by ~50 ELO points. Simultaneously, Mistral updated its text model to Mistral Large 24.11 with improvements in long-context understanding, function calling, and RAG/agentic workflows. Note: the model has since been deprecated and replaced by newer Mistral vision models.

6The Batch·24d ago·source ↗

Agent Benchmarks Skew Toward Software Engineering, Missing Most Economically Valuable Labor

Researchers from Carnegie Mellon University and Stanford University mapped over 10,000 examples from 43 agent benchmarks to U.S. labor statistics using O*NET occupational taxonomies, finding that current benchmarks heavily over-represent software engineering relative to its share of employment and wages. Office and administrative support (18.2M workers, $869.8B wages) and management (11M workers, $1326.3B wages) are vastly under-represented compared to computer and mathematical occupations (5.2M workers, $563.6B wages). No single benchmark covered more than 50% of work activities, and all 43 benchmarks combined covered only 56.5% of work activities. The study identifies a systematic gap between where agentic AI is being evaluated and where the largest economic opportunity lies.

6Anthropic News·13d ago·source ↗

Anthropic publishes policy brief calling for targeted AI regulation within 18 months

Anthropic published a policy position paper arguing that governments have an 18-month window to enact narrowly-targeted AI regulation before risks in cyber and CBRN domains become acute. The post cites rapid capability gains—SWE-bench scores rising from 1.96% to 49% in a year, GPQA scores approaching human expert level—as evidence that frontier models are approaching meaningful misuse thresholds. Anthropic also reviews its Responsible Scaling Policy as a model for adaptive, proportionate risk governance and calls for similar frameworks to be adopted industry-wide and codified in law.

7Anthropic News·13d ago·source ↗

Anthropic makes Claude 3 Haiku and Sonnet available to US Intelligence Community and AWS GovCloud

Anthropic has made Claude 3 Haiku and Claude 3 Sonnet available via AWS Marketplace for the US Intelligence Community and AWS GovCloud, marking a significant expansion into government deployment. The company has crafted contractual exceptions to its general Usage Policy to permit legally authorized foreign intelligence analysis, including combating human trafficking and identifying covert influence campaigns, while maintaining restrictions on disinformation, weapons design, and malicious cyber operations. The deployment is currently limited to ASL-2 models under Anthropic's Responsible Scaling Policy. Anthropic also notes prior pre-release access to Claude 3.5 Sonnet was provided to the UK AI Safety Institute for pre-deployment testing.