3-5 Evaggelikis Scholis, 14231 Nea Ionia, Greece

ChatGPT with GPT-4o

Originally posted on tomsguide.

OpenAI outshone Apple during last night’s spring update livestream. This happened in terms of hype before the event and the overwhelmingly positive reaction to the products being announced by the team. As CEO Sam Altman said: “It feels like magic”.

The biggest announcement was the model GTP-4o which will power ChatGPT for both paid and free users. Unlike large language models, this is an omnimodal model, capable of taking in anything from text to video and outputting speech, text and even 3D files.

I’ve covered a lot of product announcements over a 20+ year career and this is the most exciting I’ve been to try a new product ever. If Altman is to believed, this is only just the beginning.

Why is GPT-4o such a big deal?

GPT-4o (or, the Omni model) brings a new way to interact with information. Instead of typing, you can just have a conversation or show it a video and get a voice response without any delay.

This response won’t be the slightly monotone of other assistants or the faux inflections of the previous generation of ChatGPT Voice — it is a natural-sounding voice with laughter, emotion and inflections that react in real-time to your conversation.

The full multimodal features with the ability to talk naturally using speech-to-speech are still being rolled out slowly, but even the chat version — conversing in text and pictures — is faster and more responsive than its predecessors.

Altman wrote in his blog: “Talking to a computer has never felt really natural for me; now it does. As we add (optional) personalization, access to your information, the ability to take actions on your behalf, and more, I can really see an exciting future where we are able to use computers to do much more than ever before.”

What might this future look like?

One day, and probably not as far away as many people think, this technology will power robots that work with us or serve us in our homes.

These will be robots we can converse with like a friend and ask to do complex tasks and have it both understand and respond.

Somebody will fall in love with GPT-4o.

Even in the short term, as OpenAI rolls out iPad, iPhone and laptop apps for ChatGPT with voice and vision capabilities we’ll see it take on the role of tutor, coding assistant, financial advisor and fitness coach — and do so without judgment.

What we’re witnessing — and other companies will catch up — is the dawn of a new era in human-computer interface technology.

Omni models don’t require the AI to first convert what you say to text, analyze the text and then convert that back to speech — they understand what we say natively by analyzing the audio, the inflections in our voice and even live video feeds.

The small black dot you talk to and that talks back is as big of a paradigm shift in accessing information as the first printing press, the typewriter, the personal computer, the internet or even the smartphone.

Source: tomsguide

Related Posts