In Event of Moon Disaster

By: Andrew Dickson

Are deepfakes always sinister or can anything good come out of them?

It may only be a few years old, but deepfake technology has a checkered history. Whether it’s news stories about deepfake pornography, or anxiety among US politicians that Russian intelligence could use deepfakes to sow disinformation, nerves jangle at the idea of using AI to manipulate pre-existing video or audio footage. Little wonder Facebook promised to ban deepfakes earlier this year, citing the fear that users might be misled.

Yet are they really as sinister as all that? Can anything good come out of this technology? Is there more to deepfakery than porn and Putin, in other words? Might they even hold the key to our creative future?

Omer Ben-Ami, co-founder of the Israeli technology firm Canny AI, believes the answer is yes on all counts. Collaborators on the In Event of Moon Disaster project, Ben-Ami and his colleagues were responsible for giving Richard Nixon a realistically moving face and mouth as he read out the alternative Apollo 11 speech.

Rather than talking about “deepfakes”, Ben-Ami tells me with a laugh, Canny AI prefer to use the phrase “Video Dialogue Replacement” (VDR) – which has the benefit of being both more precise and less, well, evil-sounding.

“A lot of people use ‘deepfake’ to talk about face-swapping, which has become really popular with apps,” he says. “VDR is more subtle, and it’s way harder to make convincing.”

One example is a video Canny AI made last year, which depicted world leaders including Trump, Obama and Xi Jinping lip-syncing to John Lennon’s “Imagine” (fans of irony might appreciate that Putin also made an appearance). The video was created using stock footage, which developers used as a “training set” to generate new mouth movements for each politician. A production studio then edited together these new-old clips to match the song. All told, it took just a few weeks.

side-by-side photo os Vladimir Putin and actor "puppeting" his voice

An actor provides facial mapping data to make Russian President Vladimir Putin appear to be singing “Imagine” by John Lennon. (Credit: Canny AI)

Beyond one-off projects such as this, there are numerous practical and creative applications, each of which are likely to upend the way in which video is made and shaped behind the scenes. One is taking the grunt work out of dubbing video from one language into another. Instead of hiring voice actors to re-record and sync dialogue – too expensive and time-consuming for all but the biggest producers – VDR does it rapidly and cheaply, and more convincingly, too. A movie made in Mandarin can look and sound like it was filmed using a Hindi- or Spanish-speaking cast; a Harvard physics lecture in English can be translated for students in Dar es Salaam or Tokyo, and appear as authentic and engaging as the original.

“You can communicate natively in any language,” says Ben-Ami. “That’s hugely exciting, both commercially and creatively.”

There are journalistic and informational uses, too. For the recent HBO documentary Welcome to Chechnya, the visual effects whizz Ryan Laney employed face-swapping techniques to protect the identities of the persecuted gay and lesbian people the film-makers had interviewed, using synthetic features generated using machine learning to hide their real faces.

A few months back, the BBC’s Blue Room lab collaborated with the London-based tech firm Synthesia to create a video weather forecast in which a real but AI-enhanced “presenter” gives you a customized weather forecast (both weird and weirdly normal). More advanced techniques – such as the ability to type text into an app and have it produce realistic video of people saying those very words – are less than a year away, experts think.

Grant Reaber of the Ukrainian company Respeecher – who supplied Nixon’s “voice” for the In Event of Moon Disaster project using AI analysis of audio recordings – points to the film industry. Currently, sound directors are limited in how much they can clean up audio that’s been recorded on set, which often necessitates costly and complex post-production, or requires actors to traipse into a studio to redo dialogue. Synthetic sound could do away with all that, and even let directors tweak their stars’ accents or intonation. “We think of it as Photoshop for voice,” Reaber says.

There are also powerful real-world applications. Reaber is fascinated by how AI-edited audio could assist language learners by regularizing pronunciation and intonation, thus making recordings of foreign languages easier to digest. Another firm, the Massachusetts-based VocaliD, uses machine learning to create “custom voices” for medical purposes and education as well as entertainment and customer support. A particularly compelling use case is for people who have lost their voice owing to conditions such as neck cancer or Parkinson’s, and rely on a speech synthesizer to talk. Instead of having to use a generic robotic voice, they can (re)create a voice that is uniquely theirs.

“It’s hard to imagine what life would be like without your voice until you truly face it,” VocaliD’s founder, Rupal Patel, a former speech clinician, explains. “When you need to rely on an artificial voice, nothing compares to one that suits your personality and individual identity.”

Her company has even used the technology to help people remember loved ones who have since died – like looking at family photographs, but even more intimate. “Family members report using the voice as a way to cope with the pain and grief once the recipient is no longer present,” she says.

To be sure, there are ethical dilemmas and challenges in manipulating media at this level, suggests D. Fox Harrell, director of the MIT Center for Advanced Virtuality, which produced In Event of Moon Disaster. “A part of the issue is media literacy,” he says. “People haven’t yet had the opportunity to be critical consumers. What I’d like to see is more widespread critical consumption and production by broad and diverse sets of people.”

But then interesting and creative things have always happened in the space between truth and fiction. Think of Shakespeare’s history plays, which bend and twist the historical record to make satisfying dramatic arcs, or of Baroque trompe l’oeil paintings that fool our eyes and make us believe we’re seeing 3D objects instead of a flat canvas.

Even photography has always been a more duplicitous medium than we might think, Harrell points out: decades before PhotoShop became commonplace, photographers were cropping, dodging and burning prints in the darkroom, using chemical processes to alter the “real” images they had captured in the camera. “Photography is not a direct conduit for some type of objective truth,” Harrell says.

Few of us would categorize this as fakery, exactly, at least these days – or perhaps we’re more interested in how well it’s being done, and for what creative purposes, rather than whether it’s objectively right or wrong.

Perhaps most important of all, even if a technology uses AI, it still relies on humans for the imaginative input, points out Ben-Ami: “You still need the human touch to fine-tune anything.”

And while we might roll our eyes at “virtual” Instagram influencers such as Lil Miquela (cooked up using algorithms) suddenly being able to speak, or the prospect of celebrities licensing their video avatars to make commercials while they laze on the beach, synthetic media provides far more exciting opportunities for those who have the imagination. Student film directors will soon be able to magic up live-action sequences that Hollywood VFX studios used to spend months and millions of dollars on. Visual artists can mash up pre-existing footage and dialogue in ways that make us think again about the documentary record, and sound artists will be able to experiment with creating new and diverse voices, perhaps in languages that don’t yet exist. After all, via MIT’s In Event of Moon Disaster project, deepfakery has already brought to life a real, decades-old presidential speech that was – thankfully – never delivered.

“Just like other media, computational media can be used for many purposes,” says Harrell. “It can be used to produce highly creative and socially beneficial works.”

The only certainty is that, as with all emerging technologies, the most compelling uses for synthetic media will be for things we haven’t dreamed up yet. “The possibilities are endless,” says Patel. “And some of these things are closer than you might think.”

Lede photo credit: Source: Helen Simonsson via Flickr

Back to Resources

Synthetic Media For Good

An actor provides facial mapping data to make Russian President Vladimir Putin appear to be singing “Imagine” by John Lennon. (Credit: Canny AI)