Claude Sonnet 5 is out: what it means for teams building agents

Anthropic released Claude Sonnet 5 to run AI agents more cheaply. Real deployments at Rakuten, Zapier, and Zed show how agents finish multi-step tasks in production.

Anthropic released Claude Sonnet 5 on 1 July 2026 as its new flagship model for running AI agents in production [1][2]. The model executes multi-step plans, operates a terminal, and browses the web without human intervention, priced at 3 dollars per million input tokens and 15 dollars per million output tokens [1].

An AI agent is a system that plans, calls tools, and completes multi-step tasks on its own. An agent opens a terminal, runs code, checks the result, then moves to the next step [1].

Benchmarks and pricing: Sonnet 5 vs its predecessors

On SWE-bench Pro, an automated engineering benchmark, Sonnet 5 scored 63.2 percent, up from 58.1 percent on Sonnet 4.6 [1]. On Terminal-Bench 2.1, which measures terminal operation ability, Sonnet 5 scored 80.4 percent, up from 67.0 percent [1]. Sonnet 5 base price matches Sonnet 4.6 at 3 dollars per million input tokens and 15 dollars per million output tokens [1].

Sonnet 5: SWE-bench Pro 63.2%, Terminal-Bench 2.1 80.4%, input 3.00 dollars per million tokens, output 15.00 dollars per million tokens [1].
Sonnet 4.6: SWE-bench Pro 58.1%, Terminal-Bench 2.1 67.0%, input 3.00 dollars, output 15.00 dollars per million tokens [1].
Opus 4.8: SWE-bench Pro 69.2%, Terminal-Bench 2.1 82.7%, input 5.00 dollars, output 25.00 dollars per million tokens [1].

An introductory rate through 31 August 2026 cuts Sonnet 5 to 2 dollars input and 10 dollars output per million tokens [1]. For teams running agents all day, this one-third output cost drop hits the monthly bill directly.

Four real agent deployments in production

Vendors reported Sonnet 5 deployments on real workflows across four engineering teams [1].

Rakuten: production pull requests

Rakuten's engineering team tasked Sonnet 5 with processing a dozen of its hardest production pull requests. For each PR, the model ran tests and verified results before handing code to an engineer for final approval [1].

Zapier: multi-step administrative tasks

Zapier integrated Sonnet 5 into its core product for multi-step tasks. In one documented deployment, the model updated a Salesforce account tier, then drafted and sent a launch announcement to enterprise contacts. Prior-generation architectures often stalled mid-flow, while Sonnet 5 completed the full chain without human remediation [1].

Zed: autonomous debugging

The Zed team pointed the model at an active bug. Without step-by-step instruction, Sonnet 5 wrote a test script that reproduced the bug, applied a fix, then reverted the fix to verify the bug returned without the patch. The whole diagnosis and fix happened in one processing session [1].

Factory: long-horizon coding tasks

Factory ran this architecture on long coding tasks across a complex codebase. The team reported the model kept logic consistent across repositories and finished tasks that previously timed out or failed [1].

Safety and limits

The Sonnet 5 launch ended an 18-day operational pause triggered by a US government export directive on 12 June 2026 [1]. That directive followed a report from Amazon researchers who found a way past Fable 5 safety controls, to the point the model identified software vulnerabilities and produced exploit code [1].

Anthropic trained automated classifiers targeting those bypass mechanisms. Internal validation data shows the classifiers blocked the reported exploit techniques on more than 99 percent of attempts [1]. When developer demand trips this limit, the platform routes load to the older Opus 4.8 architecture [1].

Mozilla's test on Firefox 147 showed zero working exploits across the evaluation window [1]. The expanded classifiers also flag benign requests more often during routine development, a trade-off teams should anticipate when building debug pipelines [1].

An industry framework for rating breaches

Anthropic, Amazon, Microsoft, and Google formed a partnership to rate model safety breaches on shared metrics. The framework records four criteria [1]:

Capability gain: how far an exploit raises ability beyond standard utility.
Breadth of capability gain: the number of distinct offensive operations the same exploit opens up.
Ease of weaponisation: the specialized engineering and prompt effort required.
Discoverability: how accessible the exploit technique is in public research.

What it means for Indonesian teams

For engineering teams in Indonesia building agents, three things stand out. The lower output price makes all-day agent experimentation more affordable [1]. Higher terminal and web ability opens automation of operational tasks that previously needed custom plumbing [1]. Tighter classifiers mean teams should design debug pipelines so benign requests do not trip the safe route to the older model [1].

Frequently asked questions

What is Claude Sonnet 5?

Anthropic's new language model released 1 July 2026 for running AI agents. It executes multi-step plans, operates a terminal, and browses the web without human intervention [1].

How much does Claude Sonnet 5 cost?

The base price is 3 dollars per million input tokens and 15 dollars per million output tokens. An introductory rate through 31 August 2026 cuts it to 2 dollars input and 10 dollars output per million tokens [1].

How does Sonnet 5 compare with Opus 4.8 for agents?

Opus 4.8 scores 69.2 percent on SWE-bench Pro and 82.7 percent on Terminal-Bench 2.1, above Sonnet 5 at 63.2 percent and 80.4 percent, priced at 5 dollars input and 25 dollars output per million tokens. Many teams combine the two, Sonnet 5 for routine work and Opus 4.8 for the hardest tasks [1].

Is Claude Sonnet 5 safe for production?

Anthropic ships a built-in real-time classifier that blocks the reported exploit techniques on more than 99 percent of attempts. Mozilla's test on Firefox 147 showed zero working exploits. Teams still need to design debug pipelines so benign requests do not trip the safe route to Opus 4.8 [1].

Sources

1. AI News, Anthropic deploys Claude Sonnet 5, Fable and Mythos restored. https://www.artificialintelligence-news.com/news/anthropic-deploys-claude-sonnet-5-fable-and-mythos-restored/
2. TechCrunch, Anthropic launches Claude Sonnet 5 as a cheaper way to run agents. https://techcrunch.com/2026/06/30/anthropic-launches-claude-sonnet-5-as-a-cheaper-way-to-run-agents/