Digital audio is audio that has been converted into a digital representation. This is usually done by an analog-to-digital converter (ADC) which measures the amplitude of the incoming analog signal at discreet points in time. These discreet measurements are taken thousands of times a second as the ADC approximates the waveform of the analog signal.
When a signal is digitized, it is translated into finite values much like the finite pixels of a bitmapped image. In a bitmapped image, one often sees "jaggies" as a result. In digitized audio, these "jaggies" can be heard as high frequency harmonics or overtones. To cut down on these unwanted harmonics, one can increase the bit depth of the sample.
Bit depth refers to the resolution with which an audio signal may be reproduced. Digitization maps the audio signal amplitude to a finite range of values. For example, 8-bit samples provide a range of -128 to 127 or 256 finite steps, while 16-bit provides a much higher resolution of 65,536 finite steps in the range from -32,768 to 32,767.
Increasing the sample rate increases the precision with which a signal may be reproduced. The rate, measured in samples per second should be at least twice the highest audio frequency being digitized in order to accurately reproduce the sound. Otherwise, distortions such as frequency loss and aliasing may occur.
Aliasing is a side effect of sampling that distorts the sound being digitized. Since the digitization process only samples at fixed intervals, changes that may occur in the audio signal between those points may be lost. This effect most often occurs when digitizing audio frequencies that are more than one half of the sampling frequency.
Digitized audio can take as much as 1MB or more of memory per minute of sound. Compression methods such as Adaptive Pulse Code Modulation (AD-PCM) can significantly reduce the space requirements by storing just the changes in amplitude from one sample point to another, which requires less bit depth. However, this will usually decrease the sound quality significantly.
One can often maintain a reasonable sound quality while decreasing bit depth or sample rate, and therefore sample size. In general, low frequency sounds can be reproduced clearly with a sample rate of 22kHz, but high frequency sounds will often require a rate of 44kHz. While one can usually use an 8-bit depth for low quality playback, 16-bit resolution is usually preferred in order to avoid distortion due to aliasing. Many high-end systems these days are beginning to support as much as 24-bit sampling in order to reproduce a much finer gradations in the signal.
One problem with digitizing audio is that one can encounter a DC offset of the sample. This occurs when the signal being digitized has an extra DC current applied to it, which causes the signal to be pushed up or down from the zero line. For the most part, this does not tend to effect the quality of the signal unless the DC current is varying throughout the digitizing process. It can also cause problems if the offset is very large, because then it begins to significantly decrease the available dynamic range on one side of the sample, and can throw the sample out of balance. For the most part, this is a rare problem, but as such, there are few tools available that address this problem. The best solution is to adjust the signal source to cut out the excess DC current.
Another problem one encounters with digital audio is popping. This is usually caused when the end-points of a digitized waveform do not start or end on the zero line (as shown in the figure below). As a result, when the signal is converted back to analog by a digital-to-analog converter (DAC), a spike will appear in the analog signal as the DAC attempts to recreate the waveform represented by the data. (DAC's always start from a zero point and then move to the first data point.) The resulting spike in the analog signal sounds like a pop, and can, if large enough, cause serious damage to one's speaker system.
Popping can also arise when two waveforms are overlapped where their end points do not match (as shown in the figure below). Again, the DAC attempts to approximate the very sudden change from one data point to the next, causing a sudden spike in the signal -- a spike that your speakers may not survive.
The key to avoiding popping is simply to ensure smooth transitions from waveform to waveform. In the case of the beginning or end of a single waveform, a simple fade-in or fade-out would suffice. (The fade can be carried out over just a fraction of a second, on the order of twenty to forty sample points in order to maintain the illusion of an immediate termination of the sound, but without the associated pop.) Likewise, one can simply zoom in and choose specific start- and end-points that match the corresponding end- or start-point of the subsequent waveform. This is just an alternative method for smoothing the transition from one waveform to the next.
In conclusion, you should always try to keep your options open for as long as possible. This means sampling at high rates and bit depths (44.1kHz, 16-bit), and only downsampling after all processing is complete if file size is a problem. It is also a good idea to do a bit of cleaning up of samples after digitizing them -- apply fades to the ends to smooth out pops, etc.
Last modified 4/26/97.