Meta Showcases 'Seamless' AI Translator for Expressive Communication

Meta has unveiled version 2 of its multimodal AI translation model to offer seamless and expressive cross-language communication, making the Universal Human Translator more realistic.

The Seamless Communication suite was released publicly in time of Meta's 10-year anniversary of Fundamental AI Research (FAIR) team, along with research papers and test data.

Meta's addition to the SeamlessM4T involves translation models capable of preserving expression across languages and holding "real-time conversations."

The latest improvements make it possible for the AI to match its pitch, volume, emotional tone, speech rate, and pauses with the speaker no matter the language.

Meta said there would only be a two-second latency needed for the AI to respond, replicating how actual conversations work in real life.

Meta first launched SeamlessM4T in late August as a "foundational multilingual and multitask model that seamlessly translates and transcribes across speech and text."

Earlier versions were reported to lack essential signals to convey emotion and intent to the users.

Meta AI Communication Mission

Seamless Communication is just the latest of Meta's advancement in the AI communication model, a product of its past attempts to replicate human speech in a bid to attract more audience.

The month of September marked the launch of several AI flagship projects to the public, including the AI stickers and a conversational assistant using the likeness of popular celebrities.

The Meta AI is an interactable assistant people can access in WhatsApp, Messenger, Instagram, and soon on Ray-Ban Meta smart glasses and Quest 3.

The AI uses the faces of "cultural icons and influencers" like Snoop Dogg, Tom Brady, and Kendall Jenner to communicate with the users.

Also Read : Meta to Use Distinct AI Personalities to Attract Younger Users

Meta AI Models Improvement

Meta's AI communicators are not the only AI product that received an overhaul.

The latest Ego-Exo4D is now able to simultaneously first-person and external view for users via the surrounding camera accessible to the AI.

Audiobox, the successor of the Voicebox released earlier this year, will also advance AI generation to help with audio editing, sampling, and styling.

Custom audio can now be processed via prompts, including samples like "a running river and birds chirping" and "a young woman speaks with a high pitch and fast pace."