I test AI for a living and Google’s Gemini Pro 1.5 is a true turning point

Google Gemini 1.5 logo
(Image credit: Google)

Google dropped a surprise on the AI world on Thursday with the release of Gemini Pro 1.5, a new version of its recently released next-generation AI model Gemini Pro.

A company the size of Google announcing a new update isn’t exactly ground breaking, but what is significant about Gemini Pro 1.5 is just how much better and different it is to version one.

Gemini Pro 1.5 has a significantly larger context window than any other model on the market, up to 10 million tokens compared to, for example, 128,000 tokens for GPT-4.  It is also now technically more powerful than Gemini Pro 1.0 and Gemini Advanced 1.0 which powers the paid-for version of the Gemini chatbot.

Article continues below

What makes Gemini Pro 1.5 so different?

Google Gemini 1.5

(Image credit: Google)

Gemini Pro 1.5 has a staggering 10 million token context length. That is the amount of content it can store in its memory for a single chat or response. 

This is enough for hours of video or multiple books within a single conversation, and Google says it can find any piece of information within that window with a high level of accuracy.

Jeff Dean, Google DeepMind Chief Scientist wrote on X that the model also comes with advanced multimodal capabilities across code, text, image, audio and video.

He wrote that this means you can “interact in sophisticated ways with entire books, very long document collections, codebases of hundreds of thousands of lines across hundreds of files, full movies, entire podcast series, and more."

In “needle-in-a-haystack” testing where they look for the needle in the vast amount of data stored in the context window, they were able to find specific pieces of information with 99.7% accuracy even with 10 million tokens of data.

What are the use cases for such a large context window?

YouTube YouTube
Watch On

While promoting the new model Google showed off its video analysis ability on Buster Keaton’s silent 45 minute movie “Sherlock Jr.” 

Scanning a single frame per second, the input required a total of 648,000 tokens. Gemini could then answer questions about the movie such as "tell me about the piece of paper removed from the person's pocket" and give the exact time code.

Gemini was able to say exactly what was written on the piece of paper and the exact moment the paper is shown in full on screen. 

In another example developers were able to make a quick sketch of a scene, not particularly well drawn and using stick figure art. It gave it to Gemini and asked it to give the timestamp for that particular scene — Gemini returned the exact timestamp accurately in seconds.

Other surprising benefits of a large context

Google demonstrated Gemini 1.5 finding moments in a video

(Image credit: Google)

One other aspect that hasn’t been widely reported is the potential for saving, learning and even creating new languages. Jim Fan, a senior researcher and AI agent expert at Nvidia pointed out the zero shot ability of Gemini Pro 1.5 to understand linguistics in a suprising way.

He wrote on X that:  “v1.5 learns to translate from English to Kalamang purely in context, following a full linguistic manual at inference time.” 

Kalamang is a language spoken by about 200 people in New Guinea and Gemini had no information on the language during its original training — so it had nothing to drawn from.

For the test it was given 500 pages of linguistic documentation, a dictionary, and about 400 parallel sentences and the context in which they apply. It used this to learn the language and be able to offer translations of any words or phrases from English to Kalamang and back.

When will I be able to use Gemini Pro 1.5?

Gemini Pro 1.5 is already available to some enterprise customers using Vertex AI or Google Cloud’s Generative AI studio. At some point it will come to the Gemini chatbot but when it does the maximum context will likely be closer to 128,000, similar to ChatGPT Plus.

This is a game changing moment for the AI sector in a day that also saw OpenAI launch a video model and Meta find a way to use video to teach AI about the real world.

What we are seeing with these advanced multimodal models is the interact of the digital and the real, where AI is gaining a deeper understanding of humanity and how WE see the world.

More from Tom's Guide

Category
Brand
Storage Size
Colour
Condition
Minimum Price
Any Minimum Price
Maximum Price
Any Maximum Price
Showing 10 of 57 deals
Filters
Arrow
(128GB)
Our Review
1
Google Pixel 8 Pro - Unlocked...
Amazon
(256GB Black)
Our Review
2
Samsung Galaxy S24 Ultra Cell...
Amazon
Our Review
5
Google Pixel 8 Pro - 128 GB
Verizon
Low Stock
(256GB)
Our Review
6
Restored Samsung Galaxy S24...
Walmart
bundle
(128GB)
Our Review
8
Pixel 8 Pro Obsidian 128GB...
Google Store
(512GB)
Our Review
9
Samsung Galaxy S24 Ultra...
Samsung
Show more
Ryan Morrison

As the former AI Editor for Tom's Guide, Ryan wielded his vast industry experience with a mix of skepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover.
When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing.