6.5 C
New York
Wednesday, December 2, 2020

The neural network recreated the movement of human hands in his speech

Must Read

Scented candles on Amazon become victims of coronavirus

Complaints about the lack of scent in scented candles on Amazon increased dramatically in recent months, however, a researcher...

Russian Scientists propose to send swarm probes to decipher solar wind riddles

Russian researchers propose to send several swarm probes to the Sun to study the speed of winds, reveals the...

UK Authorizes Use of Pfizer-BioNTech’s COVID Vaccine

The UK government has approved an anti-coronavirus vaccine from Pfizer and BioNTech. It is the first vaccine to be approved for widespread...
Jiya Saini
Jiya Saini is a Journalist and Writer at Revyuh.com. She has been working with us since January 2018. After studying at Jamia Millia University, she is fascinated by smart lifestyle and smart living. She covers technology, games, sports and smart living, as well as good experience in press relations. She is also a freelance trainer for macOS and iOS, and In the past, she has worked with various online news magazines in India and Singapore. Email: jiya (at) revyuh (dot) com

American developers have created an algorithm capable of predicting the movement of human hands in his speech. Receiving only audio recording of speech, he creates an animated model of the human body, and then generates a realistic video based on it. A development article will be presented at the CPVR 2019 conference.

The main way to convey information to others around people is speech. However, besides her, we also actively use gestures in our conversation, reinforcing the spoken words and giving them an emotional touch. By the way, according to the most likely hypothesis of the development of the human language, initially the ancestors of man conversely communicated mainly with gestures, but the active use of hands in everyday life led to the development of sound communication and made it basic. One way or another, the process of a person uttering words in a conversation is closely related to the movements of his hands.

Researchers led by Jitendra Malik from the University of California at Berkeley used this connection to predict a person’s gestures in a conversation based on the voice component of his speech. The work of the algorithm can be divided into two stages: first, it predicts the movement of hands through the audio recording of speech, and then visualizes the predicted gestures using the algorithm presented in 2018 by a related group of researchers. Then the developers taught the neural network to transfer the movement of people between videos, using an intermediate stage with the recognition of a person’s posture.

At the first stage, the UNet algorithm on the basis of the convolutional neural network accepts a two-dimensional spectrogram of audio recording and turns it into a one-dimensional intermediate signal. This signal is then transformed into a sequence of poses represented as a skeletal model with 49 key points reflecting parts of the arms, shoulders and neck. After that, the sequence of poses is transmitted to the visualization algorithm, which turns it into a video.

In order to teach the algorithm to convert speech to movement, the researchers collected datasets consisting of records with a total length of 144 hours. There were TV presenters, lecturers and religious preachers on the recordings — such a choice was due to the fact that it was easy for them to find long recordings of speech with gesticulations. Using the OpenPose algorithm, the researchers compared each frame from the dataset with a skeleton model. Obtaining during training the recording of speech and frames with the finished model, the algorithm has learned to create realistic videos. It should be noted that the approach chosen by the authors implies that for correct work it is necessary to train a separate neural network model for a specific person.

On the video demonstrated by the researchers it can be seen that some movements do not fully correspond to the real movements of a person on the original recording. For example, often the algorithm selects the correct movement, but uses the wrong hand. However, this is a consequence of a fundamental lack of approach rather than its incorrect implementation. The fact is that gestures during speech are not invariant – different gestures can correspond to the same phrase, spoken by the same person.

- Advertisement -

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisement -

Latest News

Scented candles on Amazon become victims of coronavirus

Complaints about the lack of scent in scented candles on Amazon increased dramatically in recent months, however, a researcher...

Russian Scientists propose to send swarm probes to decipher solar wind riddles

Russian researchers propose to send several swarm probes to the Sun to study the speed of winds, reveals the director of the Space Research...

UK Authorizes Use of Pfizer-BioNTech’s COVID Vaccine

The UK government has approved an anti-coronavirus vaccine from Pfizer and BioNTech. It is the first vaccine to be approved for widespread unrestricted use in Europe. The results...

Developers find a way to simulate PS2 games on the new Xbox console

Developers have made it possible to launch the games from the old PlayStation 2 in Microsoft's newly released 'boxes', the Xbox series X and...

Researchers reveal how insects got their wings

The evolution of insect wings is a question that has been debated by scientists for more than a century. Some said that wings evolved from...
- Advertisement -

More Articles Like This

- Advertisement -