Most of what our job consists of these days is commercials, promos and station imaging. Very infrequently are we called upon to save the day in other regards… perhaps de-noising an audio file or cutting a special remix of a popular song for air. But nothing requiring the equivalent of micro-surgery on a bit of sound. However, when the moment to pull someone's butt out of the fire comes, the skills had better be there.
Such a moment happened in early October at my day job at the Radio America Network in Arlington VA/Washington DC. Down the hall, the video production arm of Radio America is hopping this time of year, producing several short presentations honoring American veterans and their service and sacrifice. These productions are frequently narrated by celebrities and famous actors, and it is not uncommon for us to receive files regularly from Morgan Freeman, Gary Sinese, Tom Hanks and a roster of others during this time of year.
One such file arrived honoring Chesley B. “Sully” Sullenberger – the pilot who landed his crippled plane in the Hudson River in 2009 and real-life veteran of the Air Force. It was voiced by a celebrity whose name I can’t reveal, but rhymes with “Clorge Goony”. In all respects, his read was impeccable, except rather than “Chesley,” the narration distinctly tagged him as Chelsea B. Sullenberger!
No one in the recording studio caught the error, and so it was sent off to us to lock it to picture. Our senior video editor and WW2 historian Megan Maggi caught it, but by then it was impossible to get our celeb back in the studio for a recut. And since Sully himself was to be present at the conference featuring the video, we couldn't look like dolts and actually call him Chelsea in the room.
“What's the big deal? Just flip the L and the S and you're all done,” we were told. Maybe in print, but not in sound. No, this baby required some serious laparoscopic surgery. And so, it got dropped in my lap. No pressure.
What makes repairs like this feasible is a working knowledge of phonemes: little slices of speech that, when carefully grafted together, make words. It’s how concatenated speech synthesis works, providing the near-realistic sounds of today's 'robo-call' voices and NOAA weather radio broadcasts.
In this instance, nearly nothing from the original “Chelsea” soundbite was usable – the first syllable is pronounced almost as chull and not chell (try it). However, the hard “CH” fricative sound was a possibility, so it got saved.
Elsewhere in the recording, I found the name of Sully's co-pilot: Jeff Skyles. The “e” sound in “Jeff” was just what was needed: crossfading the hard CH into the soft J of “Jeff” got me started.
A random hard S sound was located elsewhere in the file, and a very fast crossfade was dropped in between a shortened F in “Jeff” to the hard S sound. So CH+Jeff+S gave me my first syllable.
As the script described the evacuation of the plane, “quickly” and “calmly” came up in the recording, both ending in a nice clean “-ly”. I was hoping I had my second syllable, but both were entirely too short, and in practice, both had an unnatural stop prior to the middle initial “B”.
If you say “B” to yourself, you'll notice it doesn't always start cold with the plosive lip pop, but a short trapped voiced pitch prior to release – almost like mm-BEE. So what was needed was a “-ly-mm-BEE” sort of sound. None existed in the recording, other than in the original “Chelsea B” bite. So that meant grafting a tiny portion of the defective syllable “-sea-B” at the end of “-ly”. Rather than a crossfade though, I just used a butt splice with a zero-crossing to avoid a click in the edit.
Was I done? Nope. Edits like these invariably don't work because the pitch of the voice and the pacing in each phoneme is inconsistent with the whole. The voice bounced up and down so much inside the word “Chesley” that it sounded more like a Max Headroom revival than a valid edit. That, and the individual syllables rushed and dragged in all the wrong ways.
Here is where pitch and time stretching come to the rescue. We are so used to compressing entire spots to fit a 60-second allocation that it’s easy to forget we can apply both processes to single syllables. Even portions of syllables. So the CH+Jeff phoneme got squished a bit to speed it up, and the two “E” sounds were matched in pitch, then likewise matched to the pitch of “B” that immediately proceeded. With as many as five stacked and crossfaded layers contributing to only two syllables, this was as close as we were going to get.
Megan dropped the edited clip into the video project and added some swooping patriotic music and low-level air control radio chatter – not only for the drama of the production, but to mask any inconsistencies that we couldn't fully reconcile. The room heard nothing but “Chesley B. Sullenberger”.
It is said that the best edits are the ones no one notices. We nailed it.
Alan Peterson is national production director for the Radio America Network and a professor of audio technology at Montgomery College in Maryland. He can be reached at