It’s 2 November 2020, the eve of the US presidential election, and a secretly recorded video is all over the news. Donald Trump has been filmed saying that he’s been “talking to the Russians. Believe me, I’ve been doing some incredible spying. Really incredible”. Despite Trump’s protests that he never said those words, it proves to have a major effect on voters, and the rival Democratic candidate wins the election.
But Trump was telling the truth – he didn’t say those words. The footage of him was generated by ‘deepfake’ technology, which uses sophisticated artificial intelligence (AI) to create video and audio that impersonates real people. The technology is in use already and, if left unchecked, could lead us to start doubting everything we watch and hear online.
But it’s not just on the internet that the effects of deepfake could be felt. Suppose a criminal ‘deepfaked’ your sister’s voice and called you, asking you to transfer money because “I’m in trouble and desperately need your help”. Would you fall for it?
Thankfully, experts are becoming increasingly aware of the dangers that deepfake poses. And they’re beginning to fight back by harnessing the very techniques that make this technology so convincing.
Deepfake can be used for all kinds of trickery, but it’s most commonly used for ‘face swaps’, where one person’s face is superimposed onto another. In one demonstration, the filmmaker Jordan Peele’s mouth is transferred onto Barack Obama’s face, so Peele can get the former US president to say whatever he likes.
Deeptrace, an Amsterdam-based company that’s been set up to tackle the threat of deepfake, estimates that there are around 10,000 deepfake videos circulating online. Over 8,000 of these are pornographic clips, where a celebrity’s face is superimposed onto a porn star’s body. Experts are predicting that it’s only a matter of time before a fake video emerges of a politician purportedly saying or doing something that changes the course of an election.
The word ‘deepfake’ is a portmanteau of ‘deep learning’ and ‘fake’. Deep learning is a form of AI where algorithms inspired by the human brain, known as neural networks, learn new skills by processing vast amounts of data. At the heart of a deepfake is a form of deep learning known as ‘generative adversarial networks’, or GANs. Here, two neural networks work against each other to create realistic video and sound. One network, the generator, is the creative bit. This is fed reams of data, such as images of a celebrity’s face, and tasked with generating the same face artificially. Another network, the discriminator, is tasked with spotting whether the image it receives from the generator is fake, feeding back what’s wrong with it. When the discriminator rejects a video, the generator tries again. This back and forth continues until the generator produces something that’s almost indistinguishable from reality. A deepfake is born.
One of the reasons that this technique is so effective is because the discriminator is capable of something that the human mind is not: mathematically describing what’s wrong with an image or sound. “I could notice that the generator gets shadows wrong in the images it creates, but I can’t describe that mathematically,” says Carter Huffman, chief technical officer at Modulate, a company using this technology to develop artificial voices. “I can’t just write down a formula for that. But the discriminator can suggest corrections that we don’t have the ability to write down a formula for.”
Malaria No More UK created this video of David Beckham speaking nine languages
The technology behind deepfakes isn’t just put to nefarious uses, however. The team at Modulate foresees its artificial voices being used as ‘voice skins’ in gaming, enabling gamers to take on new personas. Meanwhile, teams at universities around the world are developing these techniques for other positive purposes. But algorithms for creating deepfakes have found their way onto online repositories where computer code is shared and these can be exploited by amateur developers. All it takes is a laptop with a graphics processing unit and a little software knowhow, to create a believable fake. While some of the efforts have been innocent fun, the explosion of celebrity face-swap porn demonstrates the potential for misuse – and that misuse is predicted to spread into other realms soon.
“We’ve always been able to manipulate video,” says Prof Hany Farid, an expert in digital forensics at University of California, Berkeley. “Hollywood studios have done it, individuals acting covertly on behalf of governments have done it. Now imagine a landscape where the average Reddit user can make fake videos of Theresa May – that’s a little worrisome.”
Farid is so concerned about a world leader being deepfaked that he and his team are developing a system for recognising deepfakes of specific politicians. They’re currently using automated software to analyse the head and face movements of a handful of leaders around the world, including Donald Trump, Theresa May and Angela Merkel, to identify unique patterns. A suspected fake video of one of these leaders can then be analysed to see whether it matches their real-life movements.
“World leaders tend to have distinct and predictable patterns of facial expressions and head movements,” says Farid. “It’s difficult for current deepfake systems to mimic these because they’re focused on trying to make sure that each frame of the video looks believable.”
When the individual frames come together, there’s no guarantee that the person will move in their own nuanced way, and it’s this peculiarity that Farid will use to spot deepfakes. He’s hoping to have his system operational by the end of 2019 – ahead of the 2020 US presidential elections.
Meanwhile, in Amsterdam, Deeptrace is developing a deepfake detection system that effectively turns the technology on itself. It uses a powerful discriminator algorithm (the aspect of deepfake technology that spots when a video or image has strayed from reality) to look for fakes. Whereas the discriminator algorithms used to generate deepfakes are designed to create videos that fool the average human, Deeptrace is aiming to create a discriminator that’s sophisticated enough to spot incredibly subtle flaws in deepfakes, giving them a one-up on the people making them.
The Deeptrace team is currently feeding its discriminator thousands of fake videos to hone the system. In much the same way as the original technology is not programmed how to generate a realistic face, Deeptrace’s discriminator learns how to spot the fakes from the data it’s fed.
“Any algorithm that’s used to manipulate images leaves behind geometric patterns on them,” says Giorgio Patrini, chief executive officer of Deeptrace. “These artefacts would not appear in genuine images from a camera and are often not visible to the human eye.”
These patterns might not only reveal if a video is a deepfake, but also how it was created. “Families of algorithms will leave different traces, so sometimes we might get some information about the generator algorithm that was used,” says Patrini. “Tracing the algorithm could help us to identify, for example, how a human face was manipulated – was it just the facial expression or the person’s entire appearance and identity?”
Patrini and his business partner Francesco Cavalli see the main market for their technology initially being media companies. “Journalists would like proof of whether a video is real or fake,” says Patrini. But they see lots more opportunities on the horizon. “As soon as people realise that they can deepfake things like phone calls and video conferencing technology, it opens up a world of misuse,” says Patrini.
At Modulate, the team has recognised the risk of misinformation and are embedding sonic ‘watermarks’ in the sound files they create so their work can be traced back to them, hopefully acting as a disincentive to anyone thinking of misusing the technology. “The watermarks are slight tweaks to some of the sound frequencies, in ways that you won’t be able to hear,” says Huffman.
Modulate hopes to form a coalition with other AI speech-generation companies and establish a self-policing system where anyone will be able to upload sound files to a dedicated website to find out whether or not they carry a company watermark.
But even those fighting deepfakes recognise that their detection systems are only part of the solution. Even if a deepfake video can be spotted as a fake, it can still be spread across social networks in a matter of minutes before anyone has had the chance to verify it, potentially changing the course of elections or destroying careers. So perhaps the emergence of deepfake calls for a shift in our mindset, where we recognise that seeing is no longer always believing. “We really need to rewire ourselves to stop believing that a video is the truth,” says Patrini. “But it will take effort in education, and potentially catastrophic events in the news, to finally get there.”