Video games have become more immersive these past few years as developers try to make them seem more like the real world. Graphics have made a huge step forward, and now NVIDIA is working on NPCs as well to let players converse with them like real people.
Virtual Conversations In-Game
NVIDIA just launched t a custom AI model foundry service called Avatar Cloud Engine (ACE) which lets NPCs be more interactive. Non-playable characters are no longer limited to programmed dialogues as AI could generate real responses for players.
Gamers can just speak through their headset and the NPC will answer accordingly based on the question or dialogue. It would make it seem like you'll actually be speaking to random strangers in real life with already-established backstories.
The demonstration, as shown by NVIDIA, is called "Kairos" which shows how the main game character would talk to an NPC in Cyberpunk 2077-like ramen shop called Jin. Instead of a single reply after an interaction, the non-playable can elaborate further.
When asked how Jin was, he replied saying that it was "not so good." When the main player followed up and asked why that is, Jin would respond by saying that he was worried about the crime around the area and that his ramen shop was caught in a crossfire.
Although it's not exactly as realistic as you would expect. Granted, the innovative generative AI interaction with the NPC is impressive, but the NPC's dialogue sounds more robotic than human with a flat tone and no emotion at all.
Perhaps that is just how the character really is, but with an NPC that had just seen his shop "caught in a crossfire," it would seem more realistic if he displayed some sort of distress or anger both in facial expressions and in the tone of his voice.
The Technology Behind It
The demonstration was in partnership with Convai to promote the AI model, which is able to run in both the cloud and locally, as reported by Engadget. NVIDIA NeMo is used to build and customize large language models that give the NPC a backstory.
Other technologies are also used to make the interaction as human-like as possible. In order for the NPC to "understand" you, NVIDIA NeMo also uses Riva, a text-to-speech tool, as well as NVIDIA's Omniverse Audio2Face to match the NPC's facial expressions with the speech.
The Omniverse connectors are used by Audio2Face for Unreal Engine 5, which allows developers to add facial animations to the MetaHuman characters. Since Unreal Engine 5 could be heavy for the device, the models are optimized for latency as well.
Audio2Face is already being used by developers called GSC Game World, which will be adopted in its upcoming game, S.T.A.L.K.E.R. 2: Heart of Chernobyl. Charisma.ai, which creates virtual characters through AI, will use Audio2Face to ad animations to its conversation engine.
With these technologies, players can engage in more interactions within the game and discover the backstories of certain NPCs. It can even be incorporated in quests or missions for open-world RPGs.