untitled design

OpenAI releases GPT-4o multimodal model

The OpenAI laboratory presented a new multimodal artificial intelligence model GPT-4o. According to the company, this technology is another step towards “a much more natural human-computer interaction.”

The “o” in the name stands for omni – GPT-4o is capable of accepting any combination of text, audio and images as input and outputting data in all three formats. The model is also able to recognize emotions, allows itself to be interrupted during speech, and can react as quickly as a person during a conversation.

According to the startup's CTO Meera Murati, the new algorithm provides “GPT-4 level” intelligence but has better capabilities across different modalities and environments.

“[…] For the last couple of years, we've been focused on increasing the intelligence of models. This is the first time we have taken a huge step forward when it comes to ease of use,” she noted.

During the presentation, OpenAI demonstrated GPT-4o in action. The algorithm translated live between English and Italian, helped a researcher solve a linear equation in real time on paper, and provided deep breathing recommendations to a lab manager.

Difference from predecessors

Previous “leading and most advanced” algorithm GPT-4 Turbo could analyze images and text to perform tasks such as extracting what was written from pictures or describing the content in them. But GPT-4o adds speech processing.

Because the new model is trained on three data formats, input and output information is processed by the same neural network. Its predecessors, GPT-3.5 and GPT-4, allowed users to ask questions by voice and then transcribe the audio into text. This robbed speech of intonation and emotion and made interaction slower.

Thanks to GPT-4o, using ChatGPT feels like talking to an assistant.

For example, when talking to a chatbot based on a new model, it can be interrupted while responding. According to OpenAI, the algorithm provides “real-time” responses and can even capture the nuances of a user's sound, generating voices in response “in a variety of emotional styles,” including singing.

Improved vision, language and speech

GPT-4o extends ChatGPT's vision capabilities. Given a photo or desktop screen, the chatbot is now able to quickly answer related questions ranging from “what's going on in this code?” and ending with “what brand of shirt is this person wearing?”

According to Murati, these features will be further developed in the future. While GPT-4o is capable of viewing a menu image in a foreign language and translating it, the model will later allow ChatGPT to, for example, “watch” a live sports game and explain its rules.

The laboratory said that the new algorithm is more multilingual – it can understand about 50 languages.

According to the company, through Microsoft's OpenAI API and Azure OpenAI Service, the new model is twice as fast, cheaper to distribute, and less speed-limited than GPT-4 Turbo.

Voice support in the GPT-4o API does not yet apply to all clients. Citing the risk of misuse, the company noted that it will first launch the feature to a “small group of trusted partners” in the coming weeks.

OpenAI will make the new model available to everyone, including free ChatGPT users, over the coming weeks. Owners of premium Plus and Team subscriptions will have access to it with a “five times lower” limit on the number of accesses.

New web interface and application for ChatGPT

The lab announced the launch of an updated ChatGPT web user interface with a “more conversational” home screen and message layout.

OpenAI also introduced a desktop version of the chatbot for macOS, which paid users will have access to starting today. A Windows version will arrive later this year.

ChatGPT desktop application used in a coding task. Data: OpenAI.

In addition, free ChatGPT users will have access to the GPT Store, a library and tools for creating third-party AI chatbots. They will also get access to some previously paid ChatGPT options, such as the “memory” function.

Previously, the media claimed that on May 13 OpenAI will present search engine based on artificial intelligence.

Source: Cryptocurrency

You may also like

Get the latest

Stay Informed: Get the Latest Updates and Insights


Most popular