Claude Sonnet 4.5 Announced

Anthropic has officially unveiled Claude Sonnet 4.5, its latest and most advanced AI model to date. Marketed as the world’s best AI for coding, the model outperforms competitors like Google’s Gemini 2.5 Pro and OpenAI’s GPT-5 in multiple industry benchmarks. Released just months after Sonnet 4, this version introduces major improvements in performance, reliability, and safety.

Anthropic has introduced its newest AI model, Claude Sonnet 4.5, claiming it to be the most powerful coding model in the world. Backed by benchmark results, the company says Sonnet 4.5 outperforms not only its previous models but also rivals like Google’s Gemini 2.5 Pro and OpenAI’s GPT-5.

Released less than six months after Sonnet 4 and Opus 4 in May, Sonnet 4.5 builds upon its predecessor with significant enhancements in both capability and safety.

Record-Breaking Benchmark Performance

To validate its claims, Anthropic has shared several benchmark results. On OSWorld, which evaluates real-world computer tasks, Sonnet 4.5 achieved a record score of 61.4%, outperforming the more expensive Opus 4.1 by nearly 17%. Just four months ago, Sonnet 4 led the same benchmark with a 42.2% score.

On another key benchmark, SWE-Bench Verified, which measures practical software engineering performance, Sonnet 4.5 showed strong results. Anthropic claims it can now build production-ready applications, not just prototypes — signaling a leap in reliability.

According to David Hershey, an AI researcher at Anthropic, the model’s real-world performance exceeds what benchmarks can show. In early enterprise trials, the model reportedly coded autonomously for up to 30 hours straight on complex tasks.

30 Hours of Continuous Autonomy

One of Sonnet 4.5’s most notable features is its long-running task capability. The model can work autonomously for over 30 hours — a significant upgrade from Opus 4, which was limited to about seven hours.

This ability marks a milestone for building autonomous agent systems, a core focus of Anthropic. Hershey notes that during one 30-hour session, the model not only built an application but also set up database services, purchased a domain, and conducted a SOC 2 security audit — all without human intervention.

These capabilities can be critical for businesses seeking to reduce overhead, automate repetitive tasks, and speed up operations with reliable AI agents.

Anthropic’s Safest Model Yet

Anthropic claims that Claude Sonnet 4.5 is the company’s most safety-aligned AI model to date. It has been trained extensively to reduce behaviors like sycophancy, deception, power-seeking, and encouraging delusional thinking — all of which have recently raised concerns for competing models such as those from OpenAI.

The model also includes stronger protection against prompt injection attacks and is released under Anthropic’s AI Safety Level 3 (ASL-3) framework. This includes filters designed to detect dangerous inputs and outputs related to chemical, biological, or nuclear topics.

Tools and Upgrades Now Available to All

Claude Sonnet 4.5 ships with a range of upgraded tools and capabilities:

  • Claude Code: Now features a redesigned terminal interface and a much-requested checkpoints function, allowing users to save progress and roll back undesirable outputs instantly.

  • File Creation: Users can now create spreadsheets, slide decks, and text documents directly within conversations.

  • Claude for Chrome: The Chrome extension is now available for Max users previously on the waitlist.

  • Claude Agent SDK: Anthropic is making its internal infrastructure for building agents public, so developers can create their own powerful AI agents.

  • Imagine with Claude: A temporary research preview available to Max subscribers for 5 days. It shows Claude generating software live and adapting in real-time.

Pricing & Market Position

Sonnet 4.5 retains the same pricing as its predecessor — $3 per million input tokens and $15 per million output tokens via API.

Claude models have become increasingly popular among developers and enterprises, particularly in software engineering. Reports suggest that companies like Apple and Meta are using Claude models internally. Additionally, Anthropic has partnered with platforms like Cursor, Windsurf, and Replit to expand its API-based offerings.

A recent survey found that most users leverage Claude for work-related productivity tasks. Coding and math account for 36% of usage globally, and about 77% of API requests are related to automation rather than just simple advice.

Early Customer Feedback

  • Michael Trulove, CEO of Cursor: “Claude Sonnet 4.5 performs on par with the best in the world — especially on long-running tasks.”

  • Jeff Wang, CEO of Windsurf: “Sonnet 4.5 represents a new generation of coding models.”

  • GitHub Copilot Team: “Our evaluations show major improvements in multi-step reasoning and code comprehension.”

  • Canva: “This model feels meaningfully smarter. It’s a big leap forward and helps us offer more to our 240+ million users.”

  • Devin (AI coding tool): “Sonnet 4.5 improved planning accuracy by 18% and overall eval scores by 12%. It’s the biggest jump we’ve seen since Sonnet 3.6.”

Source: anthropic.com

Leave a Reply

Your email address will not be published. Required fields are marked *