Home Robotics Meet the Robot That Becomes a Domestic Pro by Watching YouTube

Meet the Robot That Becomes a Domestic Pro by Watching YouTube

Home Chores Get Easier: CMU's New Robots "Can Learn Where and How Humans Interact With Different Objects through Watching Videos."

Home Chores Get Easier: CMU’s New Robots “Can Learn Where and How Humans Interact With Different Objects through Watching Videos.”

Researchers at Carnegie Mellon University have made significant advancements in enhancing robots’ ability to learn from videos. By analyzing footage of people performing everyday household tasks, these robots can now acquire the skills needed to accomplish similar tasks in any environment.

This breakthrough holds promising implications for the integration of robots into domestic settings, as they can now assist individuals with activities such as cooking and cleaning. Through video observation, two robots successfully mastered 12 tasks, including opening drawers, oven doors, and lids, as well as retrieving pots from stovetops, answering telephones, and handling vegetables and cans of soup.

Deepak Pathak, an assistant professor in CMU’s School of Computer Science’s Robotics Institute, explains, “The robot can learn where and how humans interact with different objects through watching videos. From this knowledge, we can train a model that enables two robots to complete similar tasks in varied environments.”

Robots Learn from Videos
Robots Learn from Videos

The current approaches for training robots often involve time-consuming and failure-prone methods such as manual task demonstrations by humans or extensive training in simulated environments. Previous research led by Pathak and his team introduced a unique approach called WHIRL (In-the-Wild Human Imitating Robot Learning), which relied on humans completing tasks in the same environment as the robot.

Building upon the success of WHIRL, Pathak’s latest work introduces the Vision-Robotics Bridge (VRB), an improved model that eliminates the need for human demonstrations and the requirement for the robot to operate in an identical environment. Like WHIRL, the robot still requires practice to master a task, but the team’s research demonstrated that it can learn a new task in as little as 25 minutes.

Shikhar Bahl, a Ph.D. student in robotics, explains, they “were able to take robots around campus and do all sorts of tasks. Robots can use this model to curiously explore the world around them. Instead of just flailing its arms, a robot can be more direct with how it interacts.”

To teach the robot how to interact with objects, the team incorporated the concept of affordances. Affordances, rooted in psychology, refer to the opportunities an environment provides to an individual. In the context of VRB, affordances define where and how a robot can interact with an object based on human behavior. For instance, by observing a human opening a drawer in multiple videos, the robot identifies the contact points (the handle) and the direction of the drawer’s movement (straight out from the starting position). Consequently, the robot can learn to open any drawer.

The team leveraged large datasets such as Ego4D and Epic Kitchens for training. Ego4D contains nearly 4,000 hours of egocentric videos showcasing daily activities worldwide, with some videos collected by CMU researchers. Epic Kitchens offers similar videos depicting cooking, cleaning, and other kitchen tasks.

These datasets, originally intended for training computer vision models, are now being utilized in a novel way to enable robots to learn from the vast amount of Internet and YouTube videos available.

Image Credit: Shutterstock

Exit mobile version