by Steve Cunningham
Digital recording uses a stream of numbers to represent an audio signal’s frequency and amplitude. In analog recording, “frequency” is the time component of the sound, and “amplitude” is the level component. In digital recording, “sampling” is the time component and “quantization” is the level component.
SAMPLING RATES
The rate of sampling is the number of “snapshots” of an analog signal that are taken each second, with each snapshot representing the voltage level of the analog signal at that moment.
It’s analogous to what happens inside a movie camera — the camera takes 24 pictures every second, which is fast enough to let us perceive smooth motion. If the camera took 48 pictures every second, the motion on the film would appear smooth, but the camera would use twice as much film with no apparent increase in smoothness. But if the camera only took 2 pictures every second, then on film the motion would appear jerky. The film industry determined that 24 pictures per second is the right balance between smoothness and film consumption.
Some years ago a researcher at Bell Labs named Harry Nyquist discovered that if the highest frequency in an audio signal was to be digitally encoded successfully, it would have to be sampled at a rate at least twice that highest frequency. This sampling frequency is known as the Nyquist frequency, and the audio industry determined that Nyquist sampling rates of 44.1 and 48 kHz were best, since they’re over twice the highest frequency of audio (20 kHz). 32 kHz sampling is deemed acceptable for broadcast, since the maximum bandwidth of most broadcast audio is 15 kHz.
Making the sampling frequency at least double the highest frequency of audio also helps control “aliasing.” Aliasing occurs when very high frequency noises, created during the sampling process, modulate one another and produce sidebands that fall within the audible spectrum. In addition, anti-aliasing filters are applied to the digital signal to help control the audible sidebands that naturally occur.
With the introduction of DVDs, 96 kHz sampling is now a standard rate as well. 96 kHz sampling takes over twice as many snapshots as does 44.1 kHz, and further reduces aliasing by shifting potential sidebands well beyond the range of hearing. However, 96 kHz sampling eats up over twice the disk space to store the sound as does 44.1 kHz sampling.
QUANTIZATION
To discuss quantization we’re going to have to get into a little math, so take a deep breath and hang in there.
As the snapshots of the audio are taken, the voltage levels of each are assigned binary values, in a process called quantization. The number of possible values available is a function of the size of the binary word used, or the number of bits available. For example, quantizing using 1-bit words gives us only two possible values to describe the incoming voltage level: 0 and 1. Using 2-bit words, there are only four possible binary values: 00, 01, 10, and 11 (that’s 1, 2, 3, and 4 in decimal). Using 4-bit words gives us only 16 possible values to describe volume levels between 0 dB and silence. So quantizing at 4-bits certainly won’t give us a very accurate representation.
Quantizing at 16-bits starts to sound good, because now we have 65,536 different values to describe the instantaneous level of the sound. This is the resolution of a CD, and sounds fine to most of us. But if we use 20-bit words, we have 1,048,576 values to assign to the range from 0 dB to silence. And using 24-bits we have a whopping 16,777,216 values to work with. It’s far more accurate.
But isn’t that overkill? Using the previous example, aren’t we consuming too much film? Not necessarily, at least in theory.
The quantization process itself generates a relatively constant amount of “quantization noise.” The signal-to-noise ratio of digital-to-analog converters is roughly 6 dB for each bit used. So in a 4-bit converter the quantization noise floor will be at -24 dB (6 dB x 4 bits) — that’s pretty noisy. 16-bit converters are adequate to deal with quantization noise, since the noise floor should be around -96 dB (6 dB x 16 bits). With 20-bit words, the theoretical noise floor drops to -120 dB, and with 24-bit words the noise is down -144 dB. We can say that the greater the digital word size, the lower the noise floor and the greater the available dynamic range.
However, all this is theoretical. In the Real World, we’re lucky if our best 18-bit oversampling converters give us 14 or 15 bits of usable dynamic range (85 to 90 dB). And few of us can actually hear noise that’s at -96 dB anyway, much less at -144 dB.
The Alesis ADAT XT20 and LX20 were the first widely-used recorders with 20-bit quantization, and many people agree that there’s a noticeable difference between 16-bit and 20-bit digital audio. Reverb tails are smoother, the high end is less gritty and more open. But the difference between 20-bits and 24-bits is much more subtle. Many (including yours truly) can’t hear much if any difference at all.
MARKETING BITS
Perhaps that’s why ever since the first 24-bit products were introduced, cynics have referred to those last four bits of resolution as “the marketing bits”, designed solely to get you to buy your digital gear all over again. Combine that with the fact that it takes 50% more disk space to record 24-bit words than 16-bit. Now add a 96 kHz sampling rate, and suddenly that 10 GB drive that held over 30 mono track hours of recording at 16/44.1 now stores around 10 mono track hours at 24/96. Yup, gotta buy more hard disks. What’s worse is that you can’t burn an audio CD at 24-bits, nor at 96 kHz. Is this whole 24/96 thing just a lot of marketing hoo-hah?
After some consideration and a lot of listening, I’ve decided that the answer is no. The fact is that the new 24-bit converters that I’ve heard are astonishingly better-sounding than most of the old 18- or 20-bit converters, even when using them at 16/44.1. I’m not certain why — perhaps the surrounding analog circuitry is better, perhaps the performance of the digital filtering in the converters is better, perhaps there’s less quantizing noise. I’m not sure. But I do know that the new 24/96 converters sound more open, cleaner, and more accurate than do most of the old converters.
This may not be what the manufacturers intended when they designed the new 24/96 chips. But I don’t care. They sound better, and even these old, abused ears can hear the difference.
♦