Google Introduces Gemini Omni, a Multimodal AI That Knows the World

3 hours ago 3

Google announced its latest AI product, Gemini Omni, during its I/O conference on Tuesday. Unlike existing text-to-video products such as Veo, Omni can take in virtually any input to create realistic, lifelike videos.

Built on Gemini modeling architecture, Omni is a true multi-modal input and output system, allowing you to create videos from text, images and existing videos. At launch, you'll be able to create videos with the aforementioned inputs, but image; text generations will be available in a future update.

With Gemini at its core, Omni can process and interpret multiple types of inputs to produce a consistent, sophisticated final product. Omni builds on Google's existing products by integrating Gemini Intelligence.

The rise of AI-created videos comes at a paradoxical time as companies such as Google make incredible advances with the technology, while social media feeds become more filled with AI slop. Google considers Omni the "next big step" toward building AI that can model and simulate the real world. It's a world model with advanced reasoning, capable of generating videos grounded in the world we know today. Omni demonstrates advanced physics capabilities, enabling it to create realistic video outputs. Here's what's coming in Gemini Omni from Google I/O.

Powerful (and scary) editing capabilities

As with its powerful video generation, Omni also has advanced video editing capabilities. If you create a video with Omni, you can feed it back into the tool, make impressive changes with just a prompt or incorporate additional media. You can even upload your own videos and change or swap out individual elements, allowing for a new way to edit videos that has essentially never been available before.

That ability to fully replace elements in a person's video could lead to some dark outcomes, making Omni's advanced editing abilities as alarming as they are impressive. But Google has built-in guardrails. First, any output from Omni will automatically include Google's SynthID watermark, so you know that what you're viewing has been altered in some way by AI. This is a big deal, as Omni essentially lets you change how reality is perceived.

Multiple access points

People will be able to play with Gemini Omni in a variety of ways. It's a prominent feature within the newly redesigned Gemini app, where you can add built-in templates to your camera roll with a single click. Additionally, you'll be able to create a custom avatar that looks and sounds like you and add it to videos.

For some paid subscribers, Omni will be available on Google Flow and YouTube Shorts, starting on Tuesday. Omni will roll out to developers and enterprise customers via APIs in the coming weeks, allowing for custom integrations.

Omni Flash and Omni Pro

Like most Gemini models, Omni will be split into Flash and Pro versions, though the former will be available initially. Google is working on an even more powerful model, Omni Pro, which will become available in the future.

Read Entire Article