Sora 2 was missing from GPT-5 — but it could be the biggest jump in AI video yet

Shutterstock Sora image
(Image credit: Shutterstock)

OpenAI is set to release a new version of its flagship AI video model Sora sometime this quarter. While revolutionary at launch, Sora has since lost ground to competitors, with Google's Veo 3 now setting the gold standard for AI video generation.

I expect Sora 2 will arrive in the coming weeks, maybe months given the rapid release of GPT-5. Like GPT-4o, GPT-5 is natively multimodal, handling any input or output type (including video) while performing complex reasoning tasks similar to the "o" series models.

Sora remains a solid platform. Its Storyboard feature breaks new ground, and ChatGPT Pro subscribers can generate clips up to 20 seconds long. But the underlying model is showing its age. The output still suffers from motion control issues, lacks sound generation, and struggles with complex physics rendering — unlike Veo 3, Kling 2.1, or MiniMax 2.

Even in the social video space, OpenAI now faces competition from virtually every AI platform, including Meta, Grok, and Midjourney. Yet OpenAI remains the world's largest AI lab with significant resources and — despite Meta's recent talent raids — a formidable engineering team. Don't count them out just yet.

What OpenAI Needs to Make Sora Competitive

To compete with Google's video model or the emerging Chinese competitors, OpenAI must leverage its multimodal capabilities while expanding Sora's feature set. Tighter ChatGPT integration wouldn't hurt either. Here are five essential improvements for Sora 2:

1. Native sound generation is non-negotiable

Google Veo 3 lasagne video - YouTube Google Veo 3 lasagne video - YouTube
Watch On

If OpenAI wants to compete with Veo 3, Sora 2 must handle both video and audio natively. Any model without sound generation starts at a disadvantage.

Currently, Sora produces only silent clips — a significant weakness when Veo 3 generates sound effects, ambient noise, and even dialogue as core features. This isn't just about tacking on audio as an afterthought; it's about true integration.

Veo 3 can produce lip-synced character speech in multiple languages. Sora 2 needs that same built-in audio capability — from atmospheric soundscapes to spoken dialogue.

If OpenAI delivers fully multimodal generation (video + audio) while maintaining 20-second clips or longer, it won't just catch Veo 3 — it could leap ahead entirely.

2. Physics simulation must improve dramatically

Google Veo 2

(Image credit: Google Veo 2/AI)

Visual realism extends beyond resolution — it's fundamentally about physics. Current Sora outputs often display unnatural motion or distorted physics: water that defies gravity, objects that morph unpredictably, or movement that feels fundamentally wrong.

Google clearly prioritized real-world physics with Veo 3, and the results speak for themselves. Their videos excel at simulating realistic physics and dynamic motion with minimal glitches. Meanwhile, Sora's older model produces jittery movement and inconsistent object interactions that shatter immersion.

For Sora 2 to compete, its model must better understand real-world behavior — from natural human gaits to bouncing balls, from smoke dynamics to fluid mechanics. OpenAI essentially needs to integrate a physics engine into Sora. Believable motion and interactions (no more warping limbs or melting backgrounds) would close a critical gap with competitors.

3. Conversational prompting should be the standard

Veo 3 video

(Image credit: Veo 3 / Ryan Morrison)

OpenAI's ace in the hole? ChatGPT has already trained millions to communicate conversationally with AI. Sora 2 should capitalize on this by making video creation feel like dialogue, not programming.

Rather than demanding perfect prompts or complex interface navigation, the system should support natural back-and-forth refinement. Google's already moving this direction — their Flow tool uses Gemini AI to enable intuitive, everyday language prompting.

Runway does this brilliantly with its chat mode, and now the new Aleph tool that allows Gen-4 to expertly refine any single element. Luma’s Dream Machine was built with this concept from the ground up.

Picture this workflow: Type "medieval knight on a mountain," receive a draft video, then simply say "make it sunrise and add a dragon" — and Sora instantly updates the scene. This conversational approach would lower barriers for newcomers while accelerating workflows for professionals.

The technology exists. ChatGPT already interprets follow-up requests and adjusts outputs dynamically (as demonstrated with GPT-4os native image integration). Sora 2, fully integrated with ChatGPT, should let us talk our way to great videos. That user experience would outshine the technical prompting most competitors still require.

It would also allow for native image generation first, then animation using Sora, similar to the way Google is working with Veo 3 in Gemini or the new Grok Imagine feature.

4. Character consistency and customization are essential

Leonardo and Sora Logo T-Shirt AI images

(Image credit: Tom's Guide)

Character and scene consistency represents another crucial improvement area. Currently, generating two clips of "a girl in a red dress" might produce two completely different people. Sora's outputs drift in style and details between generations, making coherent multi-scene stories or recurring characters nearly impossible.

Sora 2 must enable consistent characters, objects, and art styles across longer videos or clip series. Competitors already deliver this — Kling 2.1 boasts "consistent characters and cinematic lighting straight from text prompts." Google's Flow goes further, allowing custom assets (character images, specific art styles) as "ingredients" across multiple scenes.

OpenAI should provide similar capabilities: reference image uploads, style fine-tuning, or character persistence across scenes. If Sora 2 can maintain consistent character appearance throughout a video, creators can actually tell stories instead of producing disconnected clips. Especially if it has native audio integration over 20 second clips.

Consistency and customization work together — whether you're an artist maintaining a signature style or a filmmaker needing character continuity, Sora 2 should deliver that control.

5. Deep ChatGPT integration and universal access

ChatGPT-5 on phone

(Image credit: Shutterstock)

Finally, OpenAI should maximize its ecosystem advantage by deeply integrating Sora 2 into ChatGPT while ensuring broad accessibility. Google's Veo connects to a broader toolkit (Gemini integration, API access, Flow app), and Meta will inevitably embed AI video throughout their products.

OpenAI can differentiate by making Sora 2 a seamless ChatGPT feature. Applying this approach to Sora 2 would instantly give millions of ChatGPT users an AI video studio without switching apps. They could follow Google and have a low limit on videos per day with a premium plan for unlimited access — as they do now with ChatGPT Pro and Sora.

Mobile optimization is crucial. Today's creators shoot, edit, and post entirely from phones. If Sora 2 works within ChatGPT's mobile app (or a dedicated Sora app) with quick generation capabilities, it could capture the TikTok and Reels creator market. Imagine telling your phone, "Hey ChatGPT, make a 15-second video of me as a cartoon astronaut landing on Mars," and receiving share-ready content instantly.

By making Sora 2 ubiquitous — through ChatGPT, developer APIs, and mobile platforms — OpenAI can rapidly build its user base while gathering essential improvement feedback.

Platforms like Leonardo, Freepik and Higgsfield are already heavily using Google’s Veo 3 and Hailuo’s MiniMax 2 because they’re impressive, fast and available via API — OpenAI is falling behind in the creative AI space by not updating Sora.

The bottom line

OpenAI has a genuine opportunity to reclaim leadership by learning from competitors' successes. Google's Veo 3 currently sets the benchmark with native audio, realistic physics, and strong prompt adherence, while emerging models like Kling 2.1 and MiniMax 2 continue pushing boundaries.

Runway races ahead with new refinements to its Gen-4 model — which is of a similar quality in terms of physics to Sora but with more features, and others like Pika are focusing on the creator market, further pushing OpenAI out of the valuable space.

Sora 2 can't be just an incremental upgrade — it needs to amaze.

The encouraging news? OpenAI has the foundational elements: a powerful language model, a first-generation video model to build upon, and ChatGPT's massive user base. If OpenAI delivers native sound generation, realistic physics, conversational ease-of-use, character consistency, and seamless product integration, Sora 2 could very well beat Veo 3, Kling, and the entire field at their own game.

When it all comes together, don't be surprised if the next viral AI video in your feed was created with Sora 2.

More from Tom's Guide

Ryan Morrison
AI Editor

Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on AI and technology speak for him than engage in this self-aggrandising exercise. As the former AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover.
When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.