Claude 3.5 Sonnet
claude-3-5-sonnet-eda1cf6e·15 events·first seen 1mo agoAliases: Claude 3.5 Sonnet, Claude-3.5 Sonnet
Co-occurring entities
More like this (12)
Recent events (15)
Anthropic Releases Computer Use Capability for Claude 3.5 Sonnet
Anthropic has launched a public beta of computer use for Claude 3.5 Sonnet, enabling the model to control a computer by interpreting screenshots and issuing pixel-level cursor and keyboard commands. The model achieves 14.9% on the OSWorld benchmark, roughly double the next-best AI model's 7.7%, though well below human-level performance of 70-75%. Anthropic trained the model on a small set of simple software tools and found it generalized rapidly to broader computer interaction. Safety analysis confirmed the capability remains at AI Safety Level 2, with prompt injection identified as a primary near-term risk.
Introducing Claude 3.5 Sonnet
Anthropic launches Claude 3.5 Sonnet, the first model in its Claude 3.5 family, claiming it outperforms Claude 3 Opus and competitor models on GPQA, MMLU, and HumanEval benchmarks while operating at twice the speed and mid-tier pricing ($3/$15 per million tokens). The model features a 200K context window, improved vision capabilities, and an internal agentic coding evaluation score of 64% versus 38% for Opus. Alongside the model, Anthropic introduces Artifacts on Claude.ai, a dedicated workspace for real-time editing of AI-generated content. The model was pre-deployment evaluated by the UK AI Safety Institute and assessed at ASL-2.
Anthropic introduces computer use capability, upgraded Claude 3.5 Sonnet, and Claude 3.5 Haiku
Anthropic announced three major developments: an upgraded Claude 3.5 Sonnet with significant coding improvements (SWE-bench Verified rising from 33.4% to 49.0%, surpassing all publicly available models including reasoning models), a new Claude 3.5 Haiku that matches Claude 3 Opus performance at Haiku-tier speed, and a public beta of 'computer use' — a capability allowing Claude to control computers by viewing screens, moving cursors, clicking, and typing. Computer use is available via the Anthropic API, Amazon Bedrock, and Google Cloud Vertex AI, with early adopters including Replit, The Browser Company, and Cognition. Both safety institutes (US AISI and UK AISI) conducted pre-deployment testing, and the model was assessed as remaining within ASL-2 under Anthropic's Responsible Scaling Policy.
Claude 3.5 Sonnet begins rollout on GitHub Copilot via Amazon Bedrock
Anthropic's Claude 3.5 Sonnet is now rolling out on GitHub Copilot, available in public preview for all Copilot Chat users in Visual Studio Code and GitHub.com. The model claims top performance on SWE-bench Verified among publicly available models and 93.7% on HumanEval. The integration runs via Amazon Bedrock's cross-region inference and reaches GitHub's community of over 100 million developers, representing a significant distribution milestone for Claude.
Claude models approved for FedRAMP High and DoD IL4/5 workloads via Amazon Bedrock
Anthropic announced that Claude models are now approved for use in FedRAMP High and DoD Impact Level 4 and 5 workloads through Amazon Bedrock in AWS GovCloud (US) regions. Currently available models include Claude 3.5 Sonnet v1 and Claude 3 Haiku, with Bedrock capabilities such as Agents, Guardrails, and Knowledge Bases also accessible. This authorization opens Claude to federal agencies and defense organizations handling controlled unclassified information, representing a significant expansion into the U.S. government market. Additional models including Claude 3.7 Sonnet and Claude 4 may be added in the future.
Salesforce integrates Anthropic Claude models into Einstein platform via Amazon Bedrock
Salesforce has partnered with Anthropic to make Claude 3.5 Sonnet, Claude 3 Opus, and Claude 3 Haiku available to Salesforce customers through Amazon Bedrock via a Bring Your Own LLM feature. The integration enables Claude to power custom AI experiences and Agentforce Agent actions across CRM use cases including sales, marketing, customer service, healthcare, and financial services. Claude models are accessible through Einstein Studio and operate within Salesforce's Einstein Trust Layer for security and compliance. This expands Anthropic's enterprise distribution through a major CRM platform with a large existing customer base.
Anthropic Open-Sources the Model Context Protocol (MCP)
Anthropic has released the Model Context Protocol (MCP), an open standard enabling secure, two-way connections between AI assistants and external data sources such as business tools, content repositories, and development environments. The protocol introduces a client-server architecture with SDKs, local MCP server support in Claude Desktop, and a repository of pre-built connectors for systems like GitHub, Slack, Google Drive, and Postgres. Early adopters include Block and Apollo, with development tool companies Zed, Replit, Codeium, and Sourcegraph integrating MCP into their platforms. The goal is to replace fragmented, per-source integrations with a single universal protocol, improving context availability for AI agents.
Anthropic launches Projects feature for Claude.ai Pro and Team users
Anthropic introduced Projects for Claude.ai Pro and Team subscribers, allowing users to organize chats with curated knowledge bases, custom instructions, and a 200K context window per project. The feature also includes Artifacts (a side-by-side content generation and preview pane) and team-level conversation sharing via activity feeds. Projects are powered by Claude 3.5 Sonnet and include a privacy commitment that shared data will not be used for model training without explicit consent.
Anthropic Opens Tokyo Office, Signs AI Safety MoC with Japan AI Safety Institute
Anthropic has officially opened its first Asia-Pacific office in Tokyo, with CEO Dario Amodei meeting Japanese Prime Minister Takaichi and signing a Memorandum of Cooperation with the Japan AI Safety Institute to collaborate on AI evaluation methodologies. The company also joined the Hiroshima AI Process Friends Group and hosted a Builder Summit for 150+ startups. Japanese enterprise deployments of Claude are highlighted across Rakuten, Nomura Research Institute, Panasonic, and Classmethod, with Anthropic reporting 10x run-rate revenue growth in Asia-Pacific over the past year. Expansion to Seoul and Bengaluru is planned for coming months.
Claude launches in Brazil with consumer and API access
Anthropic has made Claude available in Brazil, offering access via Claude.ai, Android and iOS mobile apps, and the Anthropic API. Pricing is localized in Brazilian reais, with Pro and Team plans at R$110 and R$165 per user per month respectively. The launch extends Claude's geographic footprint into a major Latin American market.
Pixtral 12B: Mistral AI's First Multimodal Model (Now Deprecated)
Mistral AI released Pixtral 12B in September 2024 as their first natively multimodal model, combining a new 400M parameter vision encoder trained from scratch with a 12B multimodal decoder based on Mistral Nemo. The model supports variable image sizes and aspect ratios, a 128K token context window for multiple images, and achieved 52.5% on MMMU while maintaining strong text-only benchmark performance. The model is now deprecated and has been replaced by newer vision and multimodal models from Mistral. It was released under Apache 2.0 license.
Pixtral Large: Mistral AI's 124B Open-Weights Multimodal Model
Mistral AI released Pixtral Large, a 124B open-weights multimodal model built on Mistral Large 2, featuring a 1B parameter vision encoder and 128K context window supporting at least 30 high-resolution images. The model claims state-of-the-art results on MathVista, DocVQA, and ChartQA, outperforming GPT-4o and Gemini-1.5 Pro on several benchmarks, and leads the LMSys Vision Leaderboard among open-weights models by ~50 ELO points. Simultaneously, Mistral updated its text model to Mistral Large 24.11 with improvements in long-context understanding, function calling, and RAG/agentic workflows. Note: the model has since been deprecated and replaced by newer Mistral vision models.
Agent Benchmarks Skew Toward Software Engineering, Missing Most Economically Valuable Labor
Researchers from Carnegie Mellon University and Stanford University mapped over 10,000 examples from 43 agent benchmarks to U.S. labor statistics using O*NET occupational taxonomies, finding that current benchmarks heavily over-represent software engineering relative to its share of employment and wages. Office and administrative support (18.2M workers, $869.8B wages) and management (11M workers, $1326.3B wages) are vastly under-represented compared to computer and mathematical occupations (5.2M workers, $563.6B wages). No single benchmark covered more than 50% of work activities, and all 43 benchmarks combined covered only 56.5% of work activities. The study identifies a systematic gap between where agentic AI is being evaluated and where the largest economic opportunity lies.
Anthropic publishes policy brief calling for targeted AI regulation within 18 months
Anthropic published a policy position paper arguing that governments have an 18-month window to enact narrowly-targeted AI regulation before risks in cyber and CBRN domains become acute. The post cites rapid capability gains—SWE-bench scores rising from 1.96% to 49% in a year, GPQA scores approaching human expert level—as evidence that frontier models are approaching meaningful misuse thresholds. Anthropic also reviews its Responsible Scaling Policy as a model for adaptive, proportionate risk governance and calls for similar frameworks to be adopted industry-wide and codified in law.
Anthropic makes Claude 3 Haiku and Sonnet available to US Intelligence Community and AWS GovCloud
Anthropic has made Claude 3 Haiku and Claude 3 Sonnet available via AWS Marketplace for the US Intelligence Community and AWS GovCloud, marking a significant expansion into government deployment. The company has crafted contractual exceptions to its general Usage Policy to permit legally authorized foreign intelligence analysis, including combating human trafficking and identifying covert influence campaigns, while maintaining restrictions on disinformation, weapons design, and malicious cyber operations. The deployment is currently limited to ASL-2 models under Anthropic's Responsible Scaling Policy. Anthropic also notes prior pre-release access to Claude 3.5 Sonnet was provided to the UK AI Safety Institute for pre-deployment testing.