OpenAI announced its new AI model, GPT-4o, which stands for omni, that boasted a natural interaction between a human and a computer for free.
The new model can handle various inputs such as text, audio, and image, and create outputs from different media.
OpenAI Develop ChatGPT With More Natural Interaction
The AI company introduced its upgraded voice mode that was trained from a single new model end-to-end across different media inputs and outputs. Previous ChatGPT models are not capable of analyzing tone and handling multiple speakers at the same time.
"Because GPT-4o is our first model combining all of these modalities, we are still just scratching the surface of exploring what the model can do and its limitations," the company shared.
In a demo video, OpenAI showed a human-like interaction between a human and the chatbot. GPT-4o was able to analyze a real environment and describe the possible scenarios that could happen. Other sample materials also showed how the new model generates outputs based on a combination of text, images, and audio prompts.
OpenAI Gradually Rolls Out GPT-4o on Different Tiers
OpenAI has started publicly releasing the text and image inputs and text outputs. The audio outputs will be available soon with a limited selection of preset voices to ensure that they fit with safety policies.
The company also shared that they are still working on the technical infrastructure, training, and other safety measures before releasing the other modalities. The GPT-4o is available in the free tier and Plus users.
Plus users will get up to 5x higher message limits. Meanwhile, the improved voice mode will debut in alpha in ChatGPT Plus in the next weeks.
Developers can now access the new model in the API for its text and vision model. ChatGPT-4o is offering 2x faster and half the price compared to GPT-4 Turbo.