Deepfakes are 21st Century Photoshopping. The name comes from the AI used to create them, deep learning — a process that involves neural networks called autoencoders. Deepfakes can place a person’s head on someone else’s body or create entirely fictional photos from scratch. Voices can also be deepfaked through what is called “voice clones” or “voice skins,” which are able to imitate a public figure’s speech.
Several people (and companies) have become victims of deepfake technology. In March 2019, a UK subsidiary of a German energy firm paid $235,230 (200,000 euros) into a Hungarian bank account after a call from a fraudster who mimicked the German CEO’s voice. In May of the same year, a video of a speech by Nancy Pelosi was slowed and spliced giving the appearance that she was drunk or had dementia. The video was retweeted by President Donald Trump and others.
Although there are techniques to recognize some types of fakes (for example, deepfake faces tend to not blink normally), their most insidious impact is to create a zero-trust society where people cannot or will not distinguish truth from falsehood.
How Are Deep Fakes Made?
Deep fakes use a type of neural network called an autoencoder. In order to create a face-swap video, all you need is to feed the autoencoder thousands of face shots of two people. The algorithm then finds and learns similarities between the two and reduces them to their common features — this would be the encoding part of the process. A decoder recovers the faces from these compressed images and reconstructs the faces for the “wrong” person. This way, person A can have the expressions and orientation of face B. In order to create a deepfake video, this process has to be executed on every frame.
Another way to create deepfakes is to use a generative adversarial network (or Gan). In this case, two algorithms are pitted against each other. One, the generator, is fed random noise and converts it to an image. The other, the discriminator, adds this synthetic image to a stream of real pictures of, for example, a celebrity. By repeating the process countless times, both the generator and discriminator improve and start producing realistic faces of completely nonexistent celebrities. The result is rather uncanny, as these images depict faces that are strangely familiar, yet completely made up.
Most deepfakes are created on standard computers, hence the risk of their proliferation. Although they require high-end desktops with powerful graphics cards, this type of equipment is common in, for example, gaming PCs. The processing of the images can take days or weeks, however, this time can be significantly reduced to just hours by using cloud computing power. About 250 photos of a person can be enough to create a deepfake video.
The part of the process that requires more expertise is the touch-up of these videos to reduce flicker and remove visual defects. However, several companies, like Deepfakes Web, provide this service for as little as $2 an hour.
The technology is actually widespread in other devices too. The app Zao lets users add their faces to a list of TV and movie characters that the system has “learned.” Massively popular platform TikTok and sister app Douyin also provide face-swapping technology, and Snapchat has had a FaceSwap option for years. Creating deepfake videos can be as easy as using an Instagram filter.
The Deepfake Business
For now, the cost of deepfakes is estimated in lost credibility and emotional harm as opposed to dollars. The number one targets are usually politicians and celebrities, although media companies that depend on reliability to maintain an audience can also be damaged by devalued stocks and boycotts. The technology is, however, particularly weaponized against women through severe violations to privacy.
Deeptrace found that, in 2019, there were about 14,698 deepfake videos online. This means an 84% increase compared to the ones available 7 months before. Although companies like Twitter, Facebook, and YouTube have all come out with policies banning synthetic videos with “malicious intent” recently, a unified effort is required to identify the actors, the tech, and the reasons behind deepfakes.
This year, YouTube announced that fake videos related to the 2020 US presidential election are banned from the platform. The service was the first to remove the aforementioned Nancy Pelosi video, when sites like Facebook refused to do so.
There are thousands of incredibly popular deepfakes on YouTube. Some of them have millions of views and include Jim Carrey as Jack Nicholson in the classic thriller The Shining, Steve Buschemi talking about Real Housewives as Jennifer Lawrence, and Back to the Future but made with Spiderman characters.
Poor quality deepfakes are easy to spot. In addition to strange blinking (the majority of images fed to an algorithm like the one mentioned above show people with their eyes open), the lip synching can be bad and the skin tone patchy. There can also be flickering around the edges of the faces and detectable artifacts in strands of hair. Jewelry and teeth tend to also be rendered less effectively, and illumination is usually inconsistent — in particular, the reflections on the iris.
Research to discover deepfakes is being funded by universities, tech firms, and governments. The Deepfake Detection Challenge dataset, for example, was launched in June 2020 and consists of 124k videos and eight facial modification algorithms that enable experts around the world to benchmark their detection models. However, as of today, good deepfakes are still hard to detect. A recent challenge carried by Facebook had over a thousand entries. The winner achieved a 65% detection success rate.
The Risks of Deepfakes
Not all deepfakes are malicious. Some of them are entertaining and others are even helpful. For example, voice cloning can restore people’s voices when they lose them to disease. The same technology can also be used to improve the dubbing of films in foreign languages or, more controversially, resurrect dead actors.
The videos that have the most impact are not deepfakes, but shallow ones. Jim Acosta, a CNN correspondent, was temporarily banned from the White House press briefings because a doctored video had sped up the moment he had reached for a microphone held by an intern. This made the move seem much more aggressive. Recently, a TV interview in the UK was doctored to make it seem that Labour MP Keir Starmer was unable to answer a question about Brexit. It’s worth mentioning that 72% of Brits interviewed in 2019 didn’t know what deepfakes were.
As technology progresses, deepfakes could become more widespread and open the doors to a whole new generation of scams. Fake events could be used in tribunals as evidence, while fake biometric data could trick systems that rely on voice, vein, or face recognition.
Their most immediate danger, however, lies on the potential for synthetic data to erode truth and trust about specific events. In 2019, Cameroon’s minister of communication dismissed a video that showed the country’s soldiers executing civilians as “fake news.” The video had been shared by Amnesty International, who firmly believed it to be true. Prince Andrew doubted the authenticity of a photo showing him with Virginia Giuffre at Ghislaine Maxwell’s home, even though attorneys insist it’s genuine.
The Legality of Deepfakes
Deepfakes are not illegal per se, but depending on the content, they can infringe copyright and breach data protection law, or be defamatory. There’s also the specific criminal offense of sharing sexual and private images without consent. Recently, California passed the AB 730 legislation, which makes it illegal to create and distribute deepfakes that feature politicians. California’s AB 602 also gives any resident the possibility to sue anyone who uses their image to create sexually explicit content.
Ironically, one of the strongest candidates for dealing with deepfakes is AI. Many detection systems today have a weakness, however: they work better for celebrities because they offer hours of footage used for training.
An alternative approach would be to focus on the provenance of media, using for example a blockchain online ledger system that provides safe watermarks for content and their modifications. An indelible certification at the time of creation and all possible associated metadata could generate a ground truth reference file. Some companies like Serelay and Truepic are already adopting this type of verification.
Not all deepfakes are evil, but all synthetic media should be made easily identifiable. It will take time for solutions to emerge and this will have to be done in a very unstable context. As soon as the lack of blinking in fake videos was reported, photos of celebrities with their eyes closed were fed into the algorithms resulting in more believable deepfakes.
Last year, the Malaria Must Die campaign launched a video featuring legendary soccer player David Beckham speaking nine different languages. Additionally, in 2018, CereProc analyzed the recordings of 831 JFK speeches, managing to bring the speeches back to life, despite President John F. Kennedy’s assassination in 1963.
As the technology becomes more accessible, deepfakes could allow for reality to become plausibly deniable. They could also make content more accessible and give people their voices back. The times are challenging, yes, but we can still be victorious.