I finally saw a live demo of ChatGPT-4o Voice — if anything it is underhyped

(Image credit: Future)

OpenAI unveiled its GPT-4o model during its Spring Update event earlier this month and with the addition of the live voice functionality it garnered a lot of hype — including from me. I’ve finally seen a live, in-person demo, and if anything I think it was underhyped.

An hour before I was due to go on stage to moderate a panel on AI co-workers at VivaTech, a European technology conference in Paris, OpenAI’s head of developer experience Romain Huet demonstrated all the new functionality.

During the demo, Huet used the ChatGPT Desktop app to have the AI address the 400+ capacity audience. He even had it do so more enthusiastically and in French. The accent was like an American speaking French but he said “We’re working on making it more French.”

It seems like we’ll have to wait a few months before we all get access to these new capabilities as OpenAI put them through further security testing, but when they arrive this is going to change the way we interact with technology forever. Especially as it will also be in Windows Copilot.

ChatGPT Voice can also watch you

One of the most impressive moments came when Huet opened the camera module in the (coming in the next few months) ChatGPT Voice section of the desktop app.

He gave it a sketch he’d drawn showing the Eiffel Tower and Arc de Triumph, just a rough, drawn-on-a-piece-of-paper outline. ChatGPT identified both from the sketch.

Huet then showed ChatGPT a map and asked how to get to the places in his sketch from our location at Port de Versailles. It was able to give a detailed train route with stops and changes.

He had planned to show the features on an iPhone using the ChatGPT app but had to show it on the laptop due to technical difficulties at the venue. This meant though that he could do an ad-hoc demo of coding using ChatGPT — he is the developer experience guy after all.

Sharing his screen with the AI, he was able to have ChatGPT view the code he was writing, identify its function and suggest improvements. He could then show it the output and ask it for ways to change the code to make it look or work differently — all in real-time.

A demonstration of Sora and Voice Engine

We're future-proof nothing to lose...📚 at #VivaTech with Lisa Heneghan @LHeneghanCIOA, Global Chief Digital Officer @KPMG, & @JulieRanty, Co-Founder @hey_pollen, as they share strategies for continuous learning & career adaptation, leveraging AI for new opportunities. pic.twitter.com/j7BCl7LDlMMay 22, 2024

OpenAI seems to be entering “product mode” at the moment. While it still describes itself as a research lab with a focus on building artificial general intelligence, it is also stepping up its product game. The ChatGPT Desktop app is close to becoming a vital productivity tool.

The potential for the creation of deep fakes and misleading content using these tools is very real so I understand the reticence, but similar tech already exists so hopefully it will be released soon.

During the demo in Paris Huet also showed a new Sora video, made for the OpenAI developer event in Paris the previous day and showed a multishot tour of the city. As a Sora video takes about 15 minutes to generate this was the only part pre-made from the whole demo.

I was only able to watch this from backstage on a small screen, so didn’t get video but all eyes in the green room turned to that screen as the demonstration happened.

He gave the Sora video to ChatGPT and had it summarize the contents and write a voice-over script for the video. This is where we got to see another hinted-at OpenAI product in action — Voice Engine. This has been kept for internal use only due to safety concerns.

Romain Huet of OpenAI at VivaTech — (Image credit: Future)

Huet was able to record (in real-time) a 20-second sample of his voice, have Voice Engine clone it and create a perfect copy. This was then applied to the Sora video to create a promo video. It went further though, as he was able to quickly change the language from English to French to Japanese at the click of a button.

Sora and Voice Engine aren’t publicly available as they are “working on ways to release it safely”.

The potential for the creation of deep fakes and misleading content using these tools is very real so I understand the reticence, but similar tech already exists so hopefully it will be released soon.

More from Tom's Guide

Back to MacBook Air

Apple

Asus

Lenovo

8GB RAM

16GB RAM

24GB RAM

128GB

256GB

512GB

1TB

Black

Brown

Grey

Red

Silver

New

Refurbished

EMMC

SSD

Showing 10 of 28 deals

Filters☰

Apple MacBook Air M3

$899

View

Lenovo IdeaPad Duet 3

(128GB 8GB RAM)

$388

View

Asus Zenbook S 13 OLED

(OLED)

$1,399.99

View

Apple MacBook Pro 14-inch M3 (2023)

(1TB Silver)

Our Review

☆☆☆☆☆

(256GB 16GB RAM)

Lenovo IdeaPad Duet 3

$369.99

View

Asus Zenbook S 13 OLED

(OLED)

$1,599

View

Apple MacBook Pro 14-inch M3 (2023)

(512GB SSD)

Our Review

☆☆☆☆☆

(512GB 8GB RAM)

Asus Zenbook S 13 OLED

(OLED)

$1,599

View

TOPICS

Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on AI and technology speak for him than engage in this self-aggrandising exercise. As the former AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover.
When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing.