According to a new study, American engineers have created a nozzle for VR helmets, in which a smartphone is inserted, which allows you to determine the pose of the body and lips of the user, as well as his appearance. The nozzle consists of two mirror hemispheres mounted in front of the smartphone’s camera and allowing it to act as a simple depth camera. The development was presented at the UIST 2019 conference.
Many modern virtual reality systems are equipped with sensors for motion capture, which allows a more realistic picture of the user in the virtual world and use hands to interact with objects. Typically, such a system consists of several cameras on the front surface of the helmet, directed in different directions. However, in addition to full-fledged VR helmets, there are entry-level helmets that work thanks to the smartphone inserted into them, which displays the image of the virtual world on the screen.
Since most modern smartphones are equipped with relatively high-quality cameras, engineers from Carnegie Mellon University, led by Robert Xiao, suggested using an integrated camera to track the user’s body movements. In helmets that require a smartphone to be installed, the smartphone is positioned in such a way that its cameras are directed away from the user. This does not allow them to be used to capture body movements. Engineers solved this problem by using two mirrored hemispheres in front of the camera.
The prototype created by the engineers consists of a simple helmet (during the development they used different models, including Google Cardboard), wire and two mirror-coated hemispheres fixed at its end. Thanks to this form of mirrors, the smartphone’s camera is capable of capturing a significant part of the surrounding space, including the entire front of the human body. A pair of hemispheres instead of one is needed so that the algorithms can consist of two frames taken from different angles, data on the depth on the frame.
After shooting the frame, it is sent from the smartphone to the server. At first, there is a “sweep” of spherical images into rectangular ones, which is possible due to data on the distance from the camera to the sphere, shooting angle, and calibration. After that, the rectangular image is analyzed by the OpenPose algorithm, which marks key points on the body that correspond to the positions of the joints and other parts. To obtain data on the shape of the lips, a separate neural network is used, capable of distinguishing five of their positions.
Engineers suggest using such a system to recognize the user’s gestures and allow him to control applications with his hands. In addition, they showed another application: they taught the algorithm to create a virtual avatar, painted in the same way as the user’s clothes, using the data from the camera.