Google drops new Gemini model and it goes straight to the top of the LLM leaderboard
It's only an experiment at this point
Here at Tom’s Guide our expert editors are committed to bringing you the best news, reviews and guides to help you stay informed and ahead of the curve!
You are now subscribed
Your newsletter sign-up was successful
Want to add more newsletters?
Daily (Mon-Sun)
Tom's Guide Daily
Sign up to get the latest updates on all of your favorite content! From cutting-edge tech news and the hottest streaming buzz to unbeatable deals on the best products and in-depth reviews, we’ve got you covered.
Weekly on Thursday
Tom's AI Guide
Be AI savvy with your weekly newsletter summing up all the biggest AI news you need to know. Plus, analysis from our AI editor and tips on how to use the latest AI tools!
Weekly on Friday
Tom's iGuide
Unlock the vast world of Apple news straight to your inbox. With coverage on everything from exciting product launches to essential software updates, this is your go-to source for the latest updates on all the best Apple content.
Weekly on Monday
Tom's Streaming Guide
Our weekly newsletter is expertly crafted to immerse you in the world of streaming. Stay updated on the latest releases and our top recommendations across your favorite streaming platforms.
Join the club
Get full access to premium articles, exclusive features and a growing list of member rewards.
Google is constantly updating Gemini, releasing new versions of its AI model family every few weeks. The latest is so good it went straight to the top of the Imarena Chatbot Arena leaderboard — toppling the latest version of OpenAI's GPT-4o.
Update: Gemini under fire after telling user to die.
Previously known as the LMSys arena, it is a platform that lets AI labs pit their best models against one another in a blind head-to-head. The users vote but don't know which model is which until after they've voted.
The new model from Google DeepMind has the catchy name Gemini-Exp-1114 and has matched the latest version of GPT-4o and exceeded the capabilities of the o1-preview reasoning model from OpenAI.
The top 5 models in the arena are all versions of OpenAI or Google models. The first model on the leaderboard not made by either of those companies is xAI's Grok 2.
The success of this new model comes as Google finally releases a Gemini app for iPhone, which beat the ChatGPT app in our Gemini vs. ChatGPT 7-round face-off.
How well does the new model work?
Massive News from Chatbot Arena🔥@GoogleDeepMind's latest Gemini (Exp 1114), tested with 6K+ community votes over the past week, now ranks joint #1 overall with an impressive 40+ score leap — matching 4o-latest in and surpassing o1-preview! It also claims #1 on Vision… https://t.co/AgfOk9WHNZ pic.twitter.com/HPmcWE6zzINovember 14, 2024
The latest Gemini model seems to perform particularly well at math and vision tasks, which makes sense as they are areas in which all Gemini models excel.
Get instant access to breaking news, the hottest reviews, great deals and helpful tips.
Gemini-Exp-1114 isn't currently available in the Gemini app or website. You can only access it by signing up for a free Google AI Studio account (the platform aimed at developers wanting to try new ideas).
I'm also not sure whether this is a version of Gemini 1.5 or whether its an early insight into Gemini 2, expected next month. If it is the latter then the improvement over the previous generation might not be as extreme as some expected.
However, it is doing well in technical and creative areas according to benchmarks. This would tie in to the idea its going to be useful for reasoning and managing agents. It first in math, solving hard problems, creative writing and vision.
Unlike other benchmarks the Chatbot Arena is based on human perceptions of performance and output quality, rather than rigid testing against data.
Whether this is just a new version of Gemini 1.5 Pro or an early insight into the capabilities of Gemini 2, its going to be an interesting few months in AI land.
More from Tom's Guide
- ChatGPT will soon be able to see your Mac apps and provide real-time advice
- I asked ChatGPT to solve today's difficult Wordle, and it's worse than I am
- I created Genmoji with Apple Intelligence — here's my favorite prompt

Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on AI and technology speak for him than engage in this self-aggrandising exercise. As the former AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover.
When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing.










