Google Bard with Gemini just equaled ChatGPT’s performance on a popular chatbot arena, coming second on the leaderboard just behind GPT-4-Turbo, OpenAI’s most advanced model.

Powered by a newly updated version of the new Gemini Pro artificial intelligence model, Bard has seen a marked increase in performance since the model’s release in December. My own test of Bard vs the free version of ChatGPT saw Bard come out on top.

The Large Model Systems Organization (LMSYS) Chatbot Arena pits the leading AI models against one another with humans judging and rating the output. It includes both closed-source models like GPT-4 or Gemini Pro, as well as open-source AI like Meta’s Llama 2.

This is the first time Bard has beaten the base version of GPT-4, which powers the premium version of both ChatGPT and Microsoft Copilot. Jeff Dean, Chief Scientist at Google DeepMind wrote on X that the recent success was down to a new version of Gemini Pro called “scale”.

How does the chatbot arena work?

(Image credit: LMSys)

Like any battleground, the Chatbot Arena puts two models in the ring head-to-head, the difference is you have no idea of the identity of the competitors.

You write a prompt, it gets sent to two anonymous models, and after the response is shown you have to determine which is the better response of the two.

Only the underlying system knows which model won each round, and the collective human decisions are used to create the leaderboard. So far more than 200,000 votes have been cast.

Swipe to scroll horizontally LMSys Chatboat Arena leaderboard Rank Model Organization License 1 GPT-4 Turbo OpenAI Proprietary 2 Bard (Gemini Pro) Google Proprietary 3 GPT-4-0314 OpenAI Proprietary 4 GPT-4-0613 OpenAI Proprietary 5 Mistral Medium Mistral Proprietary 6 Claude-1 Anthropic Proprietary 7 Claude-2.0 Anthropic Proprietary 8 Mixtral-8x7b Mistral Apache 2.0 9 Gemini Pro (Dev API) Google Proprietary 10 Claude 2.1 Anthropic Proprietary

How does Bard compare with the competition?

Gemini Pro on its own, as an API and not part of Bard with live internet access comes in at number 9 and an earlier version of Gemini Pro is at number 12 in the current leaderboard.

While Bard with Gemini Pro has come in strong at number two, three of the top five are all versions of OpenAI's GPT-4 model with GPT-4-Turbo taking the top spot.

Nine of the top ten models are proprietary, with French startup Mistral’s Mixtra-8x7b the only open-source large language model in the upper region of the table.

Anthropic's Claude and Mistral’s Mistral Medium, a proprietary version of the Mixtral model, are the only non-Google or OpenAI models in the top ten.

What is new with Google Bard?

Bard, powered by the Gemini Pro-scale model, debuts at the #2 position on the independent lmsys leaderboard. 🔥Give it a try at https://t.co/m9D7JYUfls. Bard is much better & has many more capabilities since its debut in March, thanks to everyone on the Bard/Gemini teams! https://t.co/rOPTbOE4v8January 26, 2024 See more

According to Jeff Dean Bard is now “powered by the Gemini Pro-scale model, an update on the base version of Gemini Pro and this change allowed it to enter at number two. This is still based on the mid-tier version of the Gemini family of LLMs.

Google is said to be releasing Gemini Ultra, its most advanced and natively multimodal model at some point this year.

This will power a new premium version of its chatbot called Bard Advanced and, likely, this will significantly outperform event GPT-4-Turbo.

Dean wrote on X: “The race is heating up like never before! Super excited to see what's next for Bard + Gemini Ultra release.”