Google and Jigsaw have compiled a dataset consisting of more than three thousand Deepfake videos created by face-swapping algorithms. Researchers added the created videos (modified and original) to another large project dedicated to changing faces in a video – FaceForensics ++. The dataset description is published on GitHub, and the data itself is available to outside researchers only after approval.
In the past few years, great progress has been made in the use of machine learning for imaging. However, along with algorithms that can, for example, recognize skin cancer, the developers have created powerful algorithms to create fake videos. They gained widespread fame in 2017 when a Reddit user with the nickname deepfakes (because of this they began to call such dipfakes) published pornographic videos in which the original faces were replaced with the faces of popular actresses, including Gal Gadot and Scarlett Johansson. Soon after, large Internet platforms banned the publication of such content, however, developers are improving algorithms for creating deepfakes, so their recognition is becoming more difficult.
Google and Jigsaw (both companies are owned by Alphabet holding) decided to help improve algorithms for recognizing deepfakes by complementing the already existing FaceForensics ++ project, in which European developers created a dataset, as well as an automated benchmark consisting of several algorithms for changing faces and determining their effectiveness with using various methods.
The new Deep Fake Detection Dataset is based on 363 videos that the developers shot specifically for the project. Based on these videos, they created 3068 new ones in which the faces of the volunteers were replaced by others: they used publicly available algorithms Deepfakes, Face2Face, FaceSwap and NeuralTextures to create videos. Developers note that in the future they will complement the dataset.
Recently, other large IT companies: Facebook and Microsoft have joined the fight against deepfakes. They announced the creation of a contest for developers of algorithms to determine the substitution of faces in the video, and also promised to create a large open dataset for this task. Like Google, companies will not use data from users of social networks or YouTube, but will hire volunteer actors.