Meta Says Its New AI Model Understands Physical Rules Like Gravity

1 day ago 3

A new generative AI model Meta released this week could change how machines understand the physical world, opening up opportunities for smarter robots and more, the company said.

The new open-source model, called Video Joint Embedding Predictive Architecture 2, or V-JEPA 2, is designed to help artificial intelligence understand things like gravity and object permanence, Meta said.

"By sharing this work, we aim to give researchers and developers access to the best models and benchmarks to help accelerate research and progress," the company said in a blog post, "ultimately leading to better and more capable AI systems that will help enhance people's lives."

Current models that allow AI to interact with the physical world rely on labeled data or video to mimic reality, but this approach emphasizes the logic of the physical world, including how objects move and interact. The model could allow AI to understand concepts like the fact that a ball rolling off of a table will fall.

Meta said the model could be useful for devices like autonomous vehicles and robots by ensuring they don't need to be trained on every possible situation. The company called it a step toward AI that can adapt like humans can.

One struggle in the space of physical AI has been the need for significant amounts of training data, which takes time, money and resources. At SXSW earlier this year, experts said synthetic data -- training data created by AI -- could help prepare a more traditional learning model for unexpected situations. (In Austin, the example used was the emergence of bats from the city's famed Congress Avenue Bridge.)

Meta said its new model simplifies the process and makes it more efficient for real-world applications because it doesn't rely on all of that training data.

The next steps for world models include training models that are capable of learning, reasoning and planning across different time and space scales, making them better at breaking down complicated tasks. Multimodal models, that can use other senses like audio and touch in addition to vision, will also help future AI models understand the real world.

Watch this: These AI Robots Want to Do Your Chores for You

06:39

Read Entire Article