Google flashes everyone — new Gemini Flash 1.5 takes on GPT-4o

(Image credit: Google)

Google has launched a new member of the Gemini family of artificial intelligence models. Sitting between the on-device Nano and cloud-based Pro, Gemini Flash is designed for chat, complex tasks that require a fast response and handling images, video and speech.

Unveiled at the annual Google I/O developer event, Gemini Flash 1.5 is a native multimodal model similar to OpenAI’s recently unveiled GPT-4o and was built for speed, making it useful for real-time conversations.

The new model is currently available globally for developers to use in their own applications, so we could see a number of third-party live chat apps built using Gemini Flash 1.5 soon.

We also saw an upgrade to Gemini Pro 1.5, the model first released earlier this year and the news it will now power the Gemini Advanced premium chatbot.

What makes Gemini Flash 1.5 different?

Gemini Flash 1.5 sits just above Nano and just below Pro in the size hierarchy and what makes it different, not just to its siblings but other AI models, is the combination of speed and agility.

In addition to being fast and impressive in its ability to understand text, images, video and speech, Flash 1.5 is cheap — at least compared to Pro which is 20 times more expensive.

“We know from user feedback that some applications need lower latency and a lower cost to serve,” said Google DeepMind CEO Demis Hassabis. “This inspired us to keep innovating,” he added, unveiling Flash as a “model that’s lighter-weight than 1.5 Pro, and designed to be fast and efficient to serve at scale.”

A good comparison, at least in terms of speed, is with OpenAI’s recently announced GPT-4o model. It is very fast, natively multimodal and designed for real-time interaction. That said, Gemini Flash 1.5 seems to be a less capable model in terms of reasoning.

What about the massive context window?

Gemini 1.5 Flash tokens — (Image credit: Google)

Like other Gemini family models, Flash 1.5 comes with a massive one million token context window and the promise of actually being able to utilize it in full. In comparison, GPT-4o has a 128,000 token content window and Claude 3 is at 200,000 tokens.

What makes a large context window so important is the ability to hold a massive amount of information in its memory within a single conversation. This is vital when it comes to analyzing non-text content as an image is worth 1,000 words and a video even more.

It was also trained by its big brother, Gemini Pro 1.5. Hassabis said this was done “through a process called ‘distillation,’ where the most essential knowledge and skills from a larger model are transferred to a smaller, more efficient model.”

“1.5 Flash excels at summarization, chat applications, image and video captioning, data extraction from long documents and tables, and more,” as a result of this process, he said.

As these models, including the faster but smaller ones like Flash, gain the ability to understand more than just text that increased context window becomes even more important.

More from Tom's Guide

Back to MacBook Air

Apple

Asus

Lenovo

Intel Core M3

Intel Pentium

8GB RAM

16GB RAM

24GB RAM

128GB

256GB

512GB

1TB

Black

Brown

Grey

Silver

New

Refurbished

EMMC

SSD

Showing 10 of 31 deals

Filters☰

Apple MacBook Air M3

(15-inch 256GB)

$1,274.95

View

Asus Zenbook S 13 OLED

(OLED)

$1,399.99

View

Lenovo IdeaPad Duet 3

$369.99

View

Apple MacBook Pro 14-inch M3 (2023)

(1TB Silver)

Our Review

☆☆☆☆☆

Asus Zenbook S 13 OLED

(OLED)

$1,599

View

Lenovo IdeaPad Duet 3

(128GB 8GB RAM)

$389.99

View

Apple MacBook Pro 14-inch M3 (2023)

(1TB Intel Core M3)

Our Review

☆☆☆☆☆

(256GB 16GB RAM)

Asus Zenbook S 13 OLED

(OLED)

$1,599

View

See more AI News

Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?