“What kind of actor do we need to book?” we asked Dmytro at Respeecher. “Could I do this?” Fran volunteered. “No, a Brit won’t do. The actor needs to have a similar accent and ideally would be male,” he told us. The actor didn’t need to be the same age or race and crucially the actor was NOT to impersonate Nixon. That would be disastrous as they wouldn’t be able to keep it up for three days consistently.
It was a painful week in the recording studio. Our actor, Lewis D. Wheeler, was a trooper. Phrase after phrase he repeated, trying to get the rhythm and cadence right. He never tried to put on a Nixon accent except on demand when we all needed some comic relief!
As well as the Nixon fragments, we also recorded 20 full takes of the contingency speech in audio and video. These would be used for Canny to map the mouth movements, and to provide the performance for the new synthetic speech to be produced by Respeecher.
It took three weeks to build the audio model, mainly because all the source audio used to train the model was recorded 50 years ago and was therefore far from ideal. At first it sounded somewhat muffled, so Respeecher experimented with various techniques, including transfer learning, and made some crucial improvements. In the end, we were impressed by how lifelike the synthesized speech sounded.
We got the visuals back from Omer at Canny, and put it all together—audio and video, frame by frame—to form our “complete deepfake”.