ChatGPT finally has competition — Google Bard with Gemini just matched it with a huge upgrade
The AI war of 2024 is on
Google Bard with Gemini just equaled ChatGPT’s performance on a popular chatbot arena, coming second on the leaderboard just behind GPT-4-Turbo, OpenAI’s most advanced model.
Powered by a newly updated version of the new Gemini Pro artificial intelligence model, Bard has seen a marked increase in performance since the model’s release in December. My own test of Bard vs the free version of ChatGPT saw Bard come out on top.
The Large Model Systems Organization (LMSYS) Chatbot Arena pits the leading AI models against one another with humans judging and rating the output. It includes both closed-source models like GPT-4 or Gemini Pro, as well as open-source AI like Meta’s Llama 2.
This is the first time Bard has beaten the base version of GPT-4, which powers the premium version of both ChatGPT and Microsoft Copilot. Jeff Dean, Chief Scientist at Google DeepMind wrote on X that the recent success was down to a new version of Gemini Pro called “scale”.
How does the chatbot arena work?
Like any battleground, the Chatbot Arena puts two models in the ring head-to-head, the difference is you have no idea of the identity of the competitors.
You write a prompt, it gets sent to two anonymous models, and after the response is shown you have to determine which is the better response of the two.
Only the underlying system knows which model won each round, and the collective human decisions are used to create the leaderboard. So far more than 200,000 votes have been cast.
Sign up now to get the best Black Friday deals!
Discover the hottest deals, best product picks and the latest tech news from our experts at Tom’s Guide.
Rank | Model | Organization | License |
---|---|---|---|
1 | GPT-4 Turbo | OpenAI | Proprietary |
2 | Bard (Gemini Pro) | Proprietary | |
3 | GPT-4-0314 | OpenAI | Proprietary |
4 | GPT-4-0613 | OpenAI | Proprietary |
5 | Mistral Medium | Mistral | Proprietary |
6 | Claude-1 | Anthropic | Proprietary |
7 | Claude-2.0 | Anthropic | Proprietary |
8 | Mixtral-8x7b | Mistral | Apache 2.0 |
9 | Gemini Pro (Dev API) | Proprietary | |
10 | Claude 2.1 | Anthropic | Proprietary |
How does Bard compare with the competition?
Gemini Pro on its own, as an API and not part of Bard with live internet access comes in at number 9 and an earlier version of Gemini Pro is at number 12 in the current leaderboard.
While Bard with Gemini Pro has come in strong at number two, three of the top five are all versions of OpenAI's GPT-4 model with GPT-4-Turbo taking the top spot.
Nine of the top ten models are proprietary, with French startup Mistral’s Mixtra-8x7b the only open-source large language model in the upper region of the table.
Anthropic's Claude and Mistral’s Mistral Medium, a proprietary version of the Mixtral model, are the only non-Google or OpenAI models in the top ten.
What is new with Google Bard?
Bard, powered by the Gemini Pro-scale model, debuts at the #2 position on the independent lmsys leaderboard. 🔥Give it a try at https://t.co/m9D7JYUfls. Bard is much better & has many more capabilities since its debut in March, thanks to everyone on the Bard/Gemini teams! https://t.co/rOPTbOE4v8January 26, 2024
According to Jeff Dean Bard is now “powered by the Gemini Pro-scale model, an update on the base version of Gemini Pro and this change allowed it to enter at number two. This is still based on the mid-tier version of the Gemini family of LLMs.
Google is said to be releasing Gemini Ultra, its most advanced and natively multimodal model at some point this year.
This will power a new premium version of its chatbot called Bard Advanced and, likely, this will significantly outperform event GPT-4-Turbo.
Dean wrote on X: “The race is heating up like never before! Super excited to see what's next for Bard + Gemini Ultra release.”
More from Tom's Guide
- Apple Vision Pro review: This is a computing revolution
- Hackers now spreading Mac malware via fake browser updates — don’t fall for this
- Best Black Friday streaming deals now— Hulu, Paramount Plus, Max and more
Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?
-
stevenstarr I tested Google's Bard with Gemini Pro and requested it to compose a Python tutorial for the Django framework. Unfortunately, it failed to generate anything beyond a few brief paragraphs. Moreover, the information it presented was either incomplete or outright incorrect. On the other hand, ChatGPT 3.5 faces no such challenges; it effortlessly tackles this task, producing over 20 pages of accurate content with minimal errors, typically confined to sporadic and often unnecessary steps.Reply