- Published Sep 30, 2013 in Gear Garage
- Read time: about 6 minutes
James McCanna takes a look at audio bit resolution and demystifies dithering.
These days, many soundcards can record at up to 24-bit resolution and use a 96kHz sample rate. Many multitrack and editing software programs employ internal processing at 32-bits and more. However, the CD standard remains at 16-bit/44.1kHz. Somehow, the DAW user has to get the 24-bit file into a 16-bit file. So what are the options available to the DAW user?
Bit resolution refers to the number of bits a soundcard can use to express the amplitude of an audio sample. Each bit can resolve 6dB of amplitude information - the addition of each bit results in 6dB more amplitude range. The total number of bits available is referred to as bit depth. The total of the amplitude information is known as dynamic range.
Dynamic range is the difference between the quietest and the loudest amplitude a soundcard can record. Dynamic range is determined by the number bits the soundcard can use to resolve the amplitude of the signal. As bit depth increases so does the dynamic range. This means the threshold for the quietest signal that can be recorded goes down and the threshold for the loudest signal that can be recorded goes up. A 16-bit signal has a 96dB dynamic range. A 20-bit signal dynamic range is 120dB. A 24-bit signal dynamic range is 144dB. What does this have to do with noise? All analog systems have inherent system noise. Digital systems have no system noise but do introduce quantization errors which sound like noise. So, in terms of digital noise, each additional bit reduces the audible level of quantization error by 6dB.
Each bit represents a quantization interval - with a discrete threshold for its amplitude range. In an analog waveform, there is an equivalent dynamic range that exists between each digital 0 and 1. When the analog signal amplitude being sampled falls between a quantization interval (each bit), the system cannot resolve the analog amplitude of the input signal and simply truncates it. The result is a square wave for each instance where the digital device cannot reconcile the difference. These square waves leave digital artifacts that do not represent any frequency in the analog waveform. This is known as quantization error.
The amplitude of an analog waveform can be graphically represented by a continuous smooth curve. As one moves through/across the waveform over time, the amplitude seamlessly changes. In digital recording however, the soundcard can only record discrete "levels" of amplitude. There is a distinct plateau to each level. This is because the soundcard can only resolve each bits amplitude at a discrete point - - when it reaches this point (threshold) it "toggles" the bit from off to on. The result is a stairstep representation of what previously was smooth. In between each step the amplitude is flat. If you were to overlay the digital representation of the analog waveform over the analog shape, you would see that the digital shape roughly approximates the analog shape. Each step edge is very near the amplitude waveform line. However, on closer inspection you can see jagged cut out portions of the analog waveform that look like stairsteps. Any frequencies whose amplitude fell in between the steps are either gone or had their amplitude reduced. This introduces harmonic artifacts and errors into the audio signal. They are digital errors but sound like noise. Oftentimes, the resulting sound quality is referred to as having a granular, artificial sound to it. So what is the solution?
One remedy is to increase the number of bits used to resolve the analog signal. The difference between bit steps gets correspondingly smaller as the bit rate increases. In this way, the discrete differences between steps is smaller, each digital amplitude threshold is closer to the next and more accurately follows the analog waveform. Though digital noise still resides in the flat parts of each step, its relative distortion effect on the overall sound quality is significantly reduced. That is why a 24-bit file sounds better than a 16-bit file. However, our problem is improving the sound of a file reduced from 24-bit to 16-bit resolution. When we reduce the digital wave file from 24-bit to 16-bit, we reduce the number of discrete amplitude levels available to represent the digital file. This results in more audible quantization errors as described above. This is where dithering comes into play.
Dithering is a process that adds broadband noise to a digital signal. You may wonder why adding noise would make a signal sound better? It is really a trade off. The introduction of noise lessens the audibility of the digital distortion that comes from the quantization errors discussed above. In essence, low-level hiss-like noise is traded for a reduction of digital distortion.
Dithering adds amplitude to all the signals in a digital sample. It forces the lower level amplitude values up to the next threshold level. These new higher amplitude signals now represent the sum of the dither noise and the previously existing amplitude. The lower level bits are filled in with the dither noise and become the least significant bits (in terms of amplitude) in the 24-bit signal. Then, as the file is cut from 24-bit to 16-bit, only the lower 8 bits is truncated thus leaving behind the previous signal plus some noise.
Perhaps an analogy is in order. Two come to mind actually. First, imagine you are working on your bicycle wheel and have it up on a stand. When the wheel is stationary, the spokes block the view of the wall behind it. Now, imagine the wall is your audio signal. Where the spokes block the view of the wall, is analogous to the truncation errors described above. However, if you spin the wheel, the spokes become blurred. They are still blocking the view of the wall but have become less perceptible and as a result you are able to more clearly perceive all of the wall. This is what dither noise does to the audibility of your audio signal. The brain is able to separate out the signal from the noise and hears only the signal. The perception is more pleasing to the ear.
The second analogy is a little cleaner (pun intended). Imagine that you have a bathtub full of bubble bath. The bubbles represent the upper 16-bits of your 24-bit audio information and tower above the underlying water which represents the lower 8-bits of your signal. However, in the lower water portion there resides some audio information. You agitate that water which causes that audio information to form into bubbles which raise up into the same level as the upper 16-bits. The new bubbles bring with them some of the water that was in the lower portion but join the upper echelon nonetheless. The agitation is analogous to adding dither noise. The bubbles that form bring with it some of the noise caused by the agitation but contain primarily audio information. When you convert to 16-bit, you simply cut (drain) out the lower 8-bit (the water) and the remaining 16-bits represent your final signal. This analogy works well because as you cut out the lower 8-bits of information the overall level of the bubbles lowers significantly which is analogous to the change in dynamic range between 24-bit and 16-bit. However, all the audio information is retained within the new dynamic range.
Glad to have cleaned up that controversy!
The result is smoother sounding... The ear can separate out the noise and hear only the audio signal.
So the question may remain as to why adding noise sounds better? In essence, when we add broadband dither noise, it fills in the space between interval steps mentioned above. It has the effect of spreading the many quantization errors across the audio spectrum. The result is smoother sounding - the errors remain but are obscured by the dither noise. The ear can separate out the noise and hear only the audio signal.