The next ChatGPT could have a face and a voice

d-di ai chatbot virtual avatars
(Image credit: D-ID)

ChatGPT might have taken the internet by storm, but it’s still very limited in how you interact with it. But the chatbots of the future may not be so limited. In fact they may even have human-like avatars you can hold a spoken conversation with, rather than being forced to type and read messages.

I spoke to one company, Israeli firm D-ID, about this possibility over at Mobile World Congress in Barcelona. D-ID (opens in new tab) is all about creating digital people, of a sort, and right now that means adding a more human connection to AI chatbots—something it hopes to accomplish with the launch of its new API.

That API offers real-time streaming capabilities, letting you speak to the chatbot as you would a regular human. That chatbot itself can utilize text-to-video technology and give the impression that its digital avatar is actually speaking in a human voice. Of course you can choose to type instead, if you find it more preferable than talking to a machine.

Being an API this means D-ID’s system can be integrated into other apps and services. The general idea is for businesses to build and offer virtual assistants that people can connect with on a more personal level — which is where the facial aspect comes into play.

According to D-ID CEO and co-founder Gil Perry, humans aren’t wired for conversations that only rely on text or audio. Instead, having a human face (or at least a representation of it) makes everything more engaging and natural — and in multiple languages to boot.

Plus, from a business perspective, having an AI person doing a bunch of stuff for you saves you the cost of hiring actual people. Whether that’s conversing with customers, training employees, or anything else that AIs can handle in their current state.

Right now people tend to be pretty skeptical and wary about AI, and Perry suggested this faceless nature might be part of the reason. Adding a human face could help people feel more comfortable conversing with a chatbot, even if they know there isn’t a human being at the other end of the call.

I saw a demo of various D-ID chatbots at MWC, including the new integrated streaming capabilities. It was certainly interesting to see the various different kinds of avatars in action, but even the most advanced had a little bit of an uncanny valley effect to them. 

That was particularly true of the speaking animations. Everything wasn’t quite in sync, and at times it looked more like random mouth flapping than an avatar trying to vocalize sound—which it obviously isn’t actually doing. 

I also noticed a few obvious delays in the speech recognition and writing as the bot was interacting with people. But that doesn’t change the fact you could speak to the bot, have it understand what you’re saying and offer up a response without you having to type or read a single thing.

The only question is if and when this kind of experience will be available to the public. D-ID’s goal is to offer its API to businesses and enterprise users. But, as we’ve seen with ChatGPT’s recent explosion in popularity, there’s a good chance we may see this kind of tech show up in a more public arena. Particularly given the number of AI-centric companies that seemed to be on the ground at MWC, and the number of ChatGPT rivals that have been popping up in recent weeks.

Tom Pritchard
Automotive Editor

Tom is the Tom's Guide's Automotive Editor, which means he can usually be found knee deep in stats the latest and best electric cars, or checking out some sort of driving gadget. It's long way from his days as editor of Gizmodo UK, when pretty much everything was on the table. He’s usually found trying to squeeze another giant Lego set onto the shelf, draining very large cups of coffee, or complaining that Ikea won’t let him buy the stuff he really needs online.