Gemini Omni Flash can create and edit videos with your voice and it feels like the future of multimodal AI

Google has already thrown its hat into the AI image generation and editing space with its Nano Banana AI image generator powered by Gemini.

Nano Banana, which is now in its second iteration, has been used to help millions of users create professional-grade images through their text descriptions and image prompting, plus make adjustments to those visual creations via reference images that guide the process. Now Google has released another impressive piece of AI-assisted visual generation tech that promises to create videos from all manner of inputs, which include text, images, audio or even videos themselves.

Generating videos through a variety of inputs and editing them through chats

Gemini Omni Flash is a multimodal AI video generation tool that promises to create videos from any voice references you mention about images, text or audio or just rely on input references. Your voice references can be used to describe the visual language needed to birth your visual creations or you can implement input references such as images of characters, scenes or drawings to apply the styles, motion or effects you want in your video clip.

Google points to Omni Flash as being a highly intelligent AI video creation tool that can build incredibly lifelike scenes and work on its own to produce what happens next in each clip it generates. Its cohesive knowledge of gravity, kinetic energy and fluid dynamics helps it produce more realistic-looking scenes. Omni Flash is also intelligent enough to incorporate Gemini’s understanding of language, imagery and meaning to create short or lengthy explainers from your vocal prompts.

On the video editing front, Omni Flash can listen to your vocal instructions and build upon each one to craft your newly created video any way you see fit. You can speak to the AI video generator to change certain aspects of your video or change everything in it altogether. You can also tell Omni Flash to edit a video you recently shot by altering what’s actually happening in it, adding in new objects or characters or completely transforming a specific moment into something else entirely.

Google is already rolling out the first member of the Omni family to the Gemini app, Google Flow and YouTube Shorts. And as a starter project, you can generate videos with your voice by using Google’s AI “Avatars,” which is a digital version of yourself that uses your voice to sound just like you.

Bottom line

Gemini Omni Flash’s multimodal AI capabilities sound impressive, especially since you’re able to take full advantage of its video generation capabilities by simply voicing your commands. Building worlds, swapping in/out all forms of human and animal characters, switching up visual styles, altering certain details and more come together to offer one of the most powerful AI video generation tools we’ve ever seen.

Click to follow Tom's Guide on Google News

Follow Tom's Guide on Google News and add us as a preferred source to get our up-to-date news, analysis, and reviews in your feeds. Subscribe to Tom's Guide on YouTube and follow us on TikTok.

More from Tom’s Guide

256GB

512GB

Black

Blue

Gold

Purple

Silver

White

New

Refurbished

Showing 10 of 131 deals

Filters☰

(256GB)

(256GB Black)

(256GB White)

(256GB Black)

(256GB Blue)

(256GB Blue)

(256GB White)

(256GB)

TOPICS

Elton Jones covers AI for Tom’s Guide, and tests all the latest models, from ChatGPT to Gemini to Claude to see which tools perform best — and how they can improve everyday productivity.

He is also an experienced tech writer who has covered video games, mobile devices, headsets, and now artificial intelligence for over a decade. Since 2011, his work has appeared in publications including The Christian Post, Complex, TechRadar, Heavy, and ONE37pm, with a focus on clear, practical analysis.

Today, Elton focuses on making AI more accessible by breaking down complex topics into useful, easy-to-understand insights for a wide range of readers.