I’m watching a clip from the movie The Shining. Shelly Duvall is hiding from her crazed husband as he chops down the door with an axe. Jim Carrey sticks his head through the opening and cackles the iconic line: “Here’s Johnny!”
…Jim Carrey is not in The Shining.
What you’re seeing is not a Hollywood special effect. It wasn’t done with After Effects, green screen, or with costuming and makeup. The video is a fake created by deep learning artificial intelligence – a deepfake. And anyone with a powerful computer and enough time can make one.
You might have heard of deepfakes before, or glimpsed headlines discussing the technology. You might even have laughed at various YouTube videos on channels such as Ctrl Shift Face that have swapped faces of celebrities in iconic roles to some humorous and sometimes unsettling results (once you’ve seen any of the bizarre deepfakes involving Nicolas Cage you can never un-see them.)
But deepfakes, once confined to darker corners of the internet, are becoming a serious threat. In the US, particularly as the 2020 election season rapidly approaches, AI experts are warning that deepfakes could become a powerful tool for spreading misinformation and manipulating the public. With enough effort a bad actor could create a video of any political candidate saying nearly anything. And in today’s climate of social media outrage and algorithm-driven content distribution, there’s no telling how far it could spread before someone caught it.
It’s time engineers, developers, and technologists all had a serious discussion about deepfakes.
|(Image source: Adobe Stock)
The Origin Of Deepfakes
There’s no one particular person that has taken credit for originally developing deepfakes. Their existence owes to a confluence of technologies ranging from ever-more sophisticated computer vision algorithms and neural networks, to increasingly powerful GPU hardware.
The first deepfakes to emerge on the internet seem to have emerged in 2017, when an anonymous Reddit user called “Deepfakes” began distributing illicit, altered videos of celebrities online. Other Reddit users followed suit and it wasn’t long before a community had sprung up around distributing both deepfakes themselves as well as tutorials and software tools to create them.
In an interview with Vice, [NSFW link] one of the first outlets to take an extensive look at deepfakes, the Reddit user outlined how comparatively easy the process is:
“I just found a clever way to do face-swap. With hundreds of face images, I can easily generate millions of distorted images to train the network. After that if I feed the network someone else’s face, the network will think it’s just another distorted image and try to make it look like the training face.”
But it wasn’t all fun and games. Far from it. When they first appeared, deepfakes had one particularly popular and disturbing use case – pornography. Much of the early deepfake content available was pornographic films created using the faces of celebrities like Gal Gadot, Scarlett Johansson, and Taylor Swift without their consent.
As the videos proliferated, there was an crackdown with Reddit itself shutting down its deepfakes-related communities, pornographic websites removing the content, and sites like GitHub refusing to distribute deepfake software tools.
If private citizens weren’t that concerned yet it was probably because sites got somewhat ahead of the problem. Left unchecked it wouldn’t have been long before deepfake pornography spread from celebrities to every day people. Anyone with enough publically available photos or video of themselves on a platform like Facebook or Instagram could potentially become a victim of deepfake revenge porn.
In 2018, Rana Ayyub, and investigative journalist from India, fell victim to a deepfakes plot intended to discredit her as a journalist. Ayyub detailed her ordeal in an article for The Huffington Post:
“From the day the video was published, I have not been the same person. I used to be very opinionated, now I’m much more cautious about what I post online. I’ve self-censored quite a bit out of necessity.
“Now I don’t post anything on Facebook. I’m constantly thinking what if someone does something to me again. I’m someone who is very outspoken so to go from that to this person has been a big change.
“I always thought no one could harm me or intimidate me, but this incident really affected me in a way that I would never have anticipated…
“…[Deepfakes] is a very, very dangerous tool and I don’t know where we’re headed with it.”
How Deepfakes Work
On the surface the process of creating a deepfake is fairly straightforward. First, you need enough images (hundreds or more ideally) of your target – showing their face in as many orientations as possible (the more images you can get, the better the results – hence why celebrities and public figures are an easy target). If you think it might be difficult to get hundreds or thousands of images of someone remember that a single second of video could contain 60 frames of someone’s face.
Then you need a target video. The AI can’t change skin tone or structure so it helps to pick a target and source with similar features. Once a deep learning algorithm is trained on a person’s facial features, additional software can then superimpose that face onto another person’s in your target video. The results can be spotty at times, as many videos online will attest to, but done right, and with enough attention to detail, the results can be seamless.
In an interview with Digital Trends, the anonymous owner of the Ctrl Shift Face YouTube channel (the channel responsible for the Jim Carrey/The Shining videos, among others) discussed how simple, yet time-consuming the process is:
“I’m not a coder, just a user. I don’t know the details about exactly how the software works. The workflow works like this: You add source and destination videos, then one neural network will detect and extract faces. Some data cleanup and manual extraction is needed. Next, the software analyzes and learns these faces. This step can sometimes take a few days. The more the network learns, the more detailed the result will be. In the final step, you combine these two and the result is your deepfake. There’s sometimes a bit of post-process needed as well.”
On one hand, the relative ease at which this can be done with little to no coding experience is certainly disconcerting. On the other however, deepfakes are an impressive demonstration of the sophistication of AI today.
At the core of deepfakes is a neural network called an autoencoder. Put simply, an autoencoder is designed to learn the important features of a dataset so it can create a representation of it on its own. If you feed a face into an autoencoder its job is then to learn the distinguishing characteristics that make up a face and then construct a lower-dimensional representation of that face – in this case called a latent face.
Deepfakes work by having a single encoder train to create a generalized representation of a face and then have two decoders share that representation. If you have two decoders – one trained on Person A’s face, the other on Person B’s – then feed the encoder either face you can transpose Person A’s face onto Person B’s (or vice versa). If the encoder is trained well enough, and the representation is generalized enough, it can handle facial expressions and orientations in a very convincing way.
Since faces in general are very similar in their overall shape and structure, a latent face created by an encoder using Person A’s face, can be passed to a decoder trained on Person B’s face to good effect. The result at the other end is a video of Person B, but with Person A’s face.
As long as you have two subjects similar enough and a computer with enough processing power, the rest just takes time. Faceswap – one of the more readily available deepfakes apps – can run on a Windows 10, Linux, or MacOS computer and recommends a newer Nvidia GPU for processing. “Running this on your CPU means it can take weeks to train your model, compared to several hours on a GPU,” according to Faceswap’s documentation.