Google Gemini’s dominance is over — Anthropic’s new Claude is now the best AI for real work

Claude on phone with Anthropic logo in the background
(Image credit: Shutterstock)

For most of 2025, the “top chatbot” conversation felt like a tug-of-war between ChatGPT and Gemini. Earlier this year, Gemini inched ahead with Gemini 3 Flash, proving an end to ChatGPT's dominance.

But Claude has been quietly crushing it for a while now. And while Google has been bundling and repositioning Gemini as the center of its AI strategy, Anthropic just… got better. Again. With the release of Claude Opus 4.6, Anthropic didn’t just improve — it actually pulled ahead of Google’s Gemini 3 Flash in the areas that actually matter for real work: reliability, reasoning depth, agentic performance and professional usefulness.

Right now, if I had to pick one best chatbot for serious thinking and real-world tasks, it wouldn’t be Gemini. It would be Claude. Here's why.

The moment the balance tipped

Claude on a computer screen

(Image credit: Shutterstock)

Let me be clear: Gemini 3 Flash is genuinely impressive. On paper, it looks like a powerhouse — a 1 million token context window, native support for text, images, audio, video, and PDFs, near real-time speed at up to 3x faster than Gemini 2.5 Pro, and “pro-class” reasoning for a flash model. For many everyday uses, it’s excellent.

But here’s the thing: being fast and multimodal is no longer enough to be the best. That’s where Claude Opus 4.6 changed the game. Anthropic didn’t just make Opus 4.6 faster — it made it fundamentally more capable at doing real work. Here’s how.

A 1M token context window used more intelligently

Claude Opus 4.6

(Image credit: Anthropic)

Both Gemini 3 Flash and Claude Opus 4.6 support a 1 million token context window. But context size alone isn’t the win, it's how the models use it that really matters.

Anthropic paired its 1M token with a feature called compaction, meaning Claude can summarize its own context for long-running tasks, making it more reliable over time instead of slowly “losing the plot” in massive conversations. That matters for long research projects, multi-step analysis, extended coding sessions, and complex document editing. Gemini can handle big inputs. Claude is better at thinking coherently across them.

Anthropic specifically trained Opus 4.6 to plan more carefully, sustain agentic tasks for longer, work more reliably inside large real-world codebases and catch its own mistakes during debugging and code review. These improvements are what turned the AI into a chatbot that actually behaves like a junior engineer or analyst.

The numbers back this up: 65.4% on Terminal-Bench 2.0 (industry-leading for agentic coding) and 72.7% on OSWorld, making it the best computer-using model Anthropic has shipped. In plain English, Claude is now better at doing stuff, not just talking about doing stuff.

Claude wins at real-world knowledge work

Claude Opus 4.6

(Image credit: Anthropic)

This is where the dethroning really becomes clear. On GDPval-AA — a benchmark designed to measure economically valuable work in finance, legal and professional domains — Claude Opus 4.6 beat OpenAI’s GPT-5.2 by roughly 144 Elo points and beat its own predecessor (Opus 4.5) by 190 points.

It also outperformed every other model on BrowseComp (finding hard-to-locate information online) and led all frontier models on Humanity’s Last Exam, a brutal multidisciplinary reasoning test.

To be fair, Gemini 3 Flash still has real advantages. If your priority is raw speed, seamless multimodal handling (especially video and audio), real-time agentic loops and deep integration with Google’s ecosystem, it’s fantastic. And, we can't forget that Gemini 3 powers Nano Banana Pro for image generation (something Claude still doesn't do).

But Claude pulls ahead on depth of reasoning, careful planning, professional-grade analysis, coding reliability, and sustained multi-step work. If the task demands getting it right rather than getting it fast, Opus 4.6 has the edge.

The features that make the difference

Claude Code image on laptop

(Image credit: Shutterstock)

Alongside Opus 4.6, Anthropic rolled out some genuinely powerful upgrades that flew under the radar: agent teams in Claude Code that let multiple AI agents collaborate on tasks together, adaptive thinking and new effort controls for developers, smoother response style, major upgrades to Claude in Excel and a research preview of Claude in PowerPoint.

These helpful tools are what make a real difference to productivity to users who rely on AI for work. people who actually use AI for work. And that’s the strategic difference. Google is still building for everyone. Anthropic is building for professionals.

Pricing, however, is the kicker. Google offers Gemini 3 Flash to all users for free as long as they have an account. However, if users want access to Opus 4.6, they need to subscribe to at least Claude Pro, which is $20 per month.

Bottom line

Google's Gemini 3 is incredible. It's partly why Sam Altman called for a 'Code Red' at OpenAI, it's just, Opus 4.6 is that much better.

If your goal is serious reasoning, complex work, reliable coding, deep analysis and long-running projects, Claude Opus 4.6 is now the best chatbot you can use — and the tests prove that this is not just my opinion.

In the race for “best AI” in 2026, right now, that crown belongs to Claude.


Google News

Follow Tom's Guide on Google News and add us as a preferred source to get our up-to-date news, analysis, and reviews in your feeds.


More from Tom's Guide

Amanda Caswell
AI Editor

Amanda Caswell is an award-winning journalist, bestselling YA author, and one of today’s leading voices in AI and technology. A celebrated contributor to various news outlets, her sharp insights and relatable storytelling have earned her a loyal readership. Amanda’s work has been recognized with prestigious honors, including outstanding contribution to media.

Known for her ability to bring clarity to even the most complex topics, Amanda seamlessly blends innovation and creativity, inspiring readers to embrace the power of AI and emerging technologies. As a certified prompt engineer, she continues to push the boundaries of how humans and AI can work together.

Beyond her journalism career, Amanda is a long-distance runner and mom of three. She lives in New Jersey.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.