|
Back To Articles Main Page
Data Compression For Film: Digital Audio Basics, The Rationale For Data Compression In Film, And DTS® - Part 1
By Perry Sun
If you know a thing or two about movie sound, then surely you're well aware of the three formats for digital film sound currently in use: Dolby® Digital, Digital Theater Systems® (DTS®), and Sony Dynamic Digital Sound® (SDDS®). Since the early 1990s, you've come to realize the benefits this revolutionary technology has brought in terms of improved sound quality. And since the mid-90s, you've been able to appreciate this technology in the home with Dolby Digital, DTS Digital Surround, and MPEG audio.
Have you ever considered what makes one digital sound format distinct from the other? For film, you might point out the fact that each format stores its audio data differently, whether on the film print or on a CD-ROM. Others will claim that one format is more robust than another, with better durability and resistance to wear and tear. Yet others claim that a format is superior over the others, because it "sounds better."
But what do they attribute this better sounding quality to? Usually, it's the particular digital sound system's technology behind transforming the original six (or eight) channels of sound into a format that can be conveniently stored and subsequently retrieved for playback. Because of physical space limitations, digital audio can't be stored for film presentation with conventional methods that are used for audio CD. Therefore, a means to somehow "pack" digital audio data into a limited space is needed, while at the same time preserving the fidelity of the sound.
All of the digital sound systems employ some form of data compression, or data reduction, in which specific algorithms are applied such that less data and therefore less storage space is required to convey the sound. The approach and degree to which data compression is applied varies for each format.
Many who are in the business of film sound, or whom otherwise are avid enthusiasts, tend to favor one format over another, because of the degree of data compression used, and claim to be able to audibly differentiate between formats on this basis. However, concluding that Dolby Digital, for example, is inferior to DTS and SDDS because it uses only 5-8 percent of the original data, is premature, and should not be a criterion in selecting which theatre to see a movie.
Professional audio experts tend to agree that sonic differences between these formats are very small, to the point of being virtually imperceptible to the ears of the typical moviegoer. They also are likely to agree that the sound quality, though carried in a data stream with much less bandwidth than uncompressed digital audio, is excellent, with exemplary fidelity and dynamic range. If this is so, then one may surmise that a system that requires the least data to produce quality audio is the one that is the most sophisticated.
But before any assessments about a digital sound format's performance ability can be made, an understanding of how it works it essential. This and the subsequent article will give you an introduction to the methods for data compression that are used by each digital film sound format. Part 1 will be devoted to a review of basic digital audio concepts, an explanation of why data compression is necessary for film applications, and the principles behind the data reduction algorithm used for DTS. Part 2 will cover the techniques used for SDDS and Dolby Digital. From these articles, you will discover that the rates of data transfer, and therefore the degree of compression, vary substantially between them. At the same time, you will realize that measures to preserve audio quality at reduced data rates vary in complexity. Perhaps most importantly, these articles aim to dispel the widespread belief that sonic quality is simply commensurate with the amount of digital audio data used by a particular format, and to discourage enthusiasts from judging between the performance of digital sound formats solely on this basis.
Figure 1 Audio signal in the (a) analog and (b) digital domains
Digital Audio Basics
To have some appreciation of the processes in data compression, an understanding of digital audio concepts is essential. An in-depth tutorial is obviously beyond the scope of this publication, but for the purposes of this article a cursory knowledge of the basics is really all you'll need. The following is a brief review that should help bring everyone up to speed.
Sound can be recorded (and played back) by two means. A microphone detects fluctuations in air pressure that make up sound waves, and transduces them into a time-varying voltage waveform (see Figure 1A). The amplitude of this waveform changes in direct analogy to the variations in air pressure sensed by the microphone. This is known as the analog process of recording audio&emdash;subsequently, the voltage-time signals are said to be in the analog domain.
Sound can also be recorded in the digital domain, in which audio is conveyed as a stream of numbers (Figure 1B). At specific intervals of time, the analog voltage waveform is sampled&emdash;that is, the value of the voltage is "captured." This is then mapped to one of a discrete series of voltages, where each is expressed as a binary number. A binary number comprises a series of digits, each with a value of 0 or 1. (Contrast this with the numerical system we use, in which each digit has a value between 0 and 9. Therefore, the numbers 1, 3, and 5 are 1, 11, and 101, respectively, in binary representation.) The process of converting a sampled voltage of an analog waveform to a binary code is known as quantization.
Figure 2 Analog audio signal (curve), and the sampled voltages converted to digital (3 bit resolution; bars)
Figure 2 shows an analog signal (curve), and the quantized voltages (bars). The horizontal spaces on the grid denote sampling intervals, and the vertical spacing the resolution of the discrete voltages. The vertical intervals are also known as the quantizing voltage step-size. The dotted lines are the thresholds for quantizing sampled voltages, so that any voltage sampled within two adjacent thresholds is quantized to the discrete voltage level in between them.
Each digit in a binary number is known as a bit, the elementary unit of storage and transmission of digital data. To adequately convey an audio signal without perceived loss of fidelity, the size of the binary number needs to be at least 16 bits long, or equivalently, have a binary word length of at least 16 bits. This means that 216, or 65536 words (and therefore discrete voltages) are possible. (Professional digital audio recording also dictates the use of 18-bit to 24-bit word lengths). The word length of digital audio samples determines the dynamic range&emdash;for 16 bits it is about 96dB, whereas 20-bit digital audio yields a dynamic range of approximately 120dB. In Figure 2, 3-bit word lengths are used for quantization, resulting in 8 possible discrete voltages. This method of converting analog audio samples to digital words is known as linear pulse code modulation (PCM).
The frequency at which the analog signal is sampled is known as the sampling rate. In Figure 2, the sampling rate is 400Hz, or 400 samples per second (equal to the inverse of the sampling time interval, 2.5 milliseconds or 0.0025 seconds). According to the Nyquist sampling theorem, to fully resolve the spectral characteristics of the analog waveform, the sampling rate must be at least twice the highest frequency component present in the signal. In the professional audio industry, the sampling rate is 48kHz (48,000Hz), more than adequate to account for the upper limit of human hearing sensitivity (about 22kHz). (The professional audio industry is now moving towards sampling rate standards of 96kHz and even 192kHz.)
Figure 2 shows that the sampled voltages of the analog waveform deviate from the discrete binary voltage levels assigned to them. This is known as quantization error, an artifact that can be audible if substantial. To minimize quantization error, the binary codes for the sampled voltages should have the highest bit resolution possible, e.g., have the longest possible word length. The maximum amplitude for quantization error is equal to half the voltage step-size. A signal with an amplitude comparable to the quantization error is particularly susceptible to problems. To deal with this, a small amount of random noise, called dither, is added to the analog signal, which stochastically raises the low amplitude signal so that it can be more accurately quantized. In addition to minimizing quantization error through optimizing the number of bits per audio sample, the voltage range of the quantizer must be wide enough to span the voltages encompassed by the analog signal. (The quantizer is also known as the analog-to-digital converter or ADC&emdash;the inverse quantizer, or the DAC is the opposite.)
Too Many Bits, Too Little Space!
Six channels of film sound, at a sampling rate of 48kHz and a typical 20-bit resolution per audio sample means that the digital data need to be transferred at a rate of 5.76 megabits (million bits) per second. Since the speed of film projection is 24 frames per second, this means that there has to be a way to accommodate 240,000 bits per frame. If the data were to fit into, for example, the space on the outer edge of the print adjacent to the sprocket holes, each bit would have to be a tiny optical square element measuring only 10 thousandths of a millimeter! This would not be practical, given the limitations of the optical technology to reliably detect these small squares, the well-known susceptibility to wear and dirt on the film, and quality control limitations in copying and developing film prints. If the data were to be retrieved from CD-ROMs (as is the case for DTS), a 2-hour film would require 9 CDs, which would be a rather cumbersome medium to carry digital data.
This space limitation is the very reason why data compression measures are employed, so that digital audio for film can be accommodated, while at the same time preserving the nuances and fidelity of the sound. A straightforward approach to data compression might be to simply truncate the bits for each digital sample. However, as explained earlier, this would just increase the quantization error (which is doubled every time a bit is discarded). For playback reliability, the minimum optical element representing a single bit needs to be at least 3 times as large as the small square previously considered (Reference 8). This would mean a 3 to 1 (3:1) compression ratio, and could be achieved by discarding 14 out of the 20 bits per sample. However, this would also result in about a 16,000-fold increase in quantization error!
Another alternative to reducing the data rate could be to reduce the sampling rate by three-fold to 16kHz. However, this is not acceptable, since the upper limit of the audio's spectral range would be reduced from 24kHz to 8kHz! Even a combination of these two measures would result in a substantial degradation in dynamic and frequency range.
Therefore, more sophisticated measures to compress digital audio data are necessary. Having acquired some basic knowledge of digital audio, we will now explore the novel approaches employed with DTS to achieve data compression while preserving sonic fidelity.
DTS: Predicting The Audio Signal
We will start with DTS, since the processes in data compression are relatively simple, compared to SDDS and Dolby Digital. The digital data for DTS is stored on and played back from a CD-ROM (typically 2 CD-ROMs are required for a full-length movie feature). A digital SMPTE 24-bit time-code, located on the film adjacent to the optical analog soundtrack, is used to synchronize the CD player on the DTS cinema processor with the projector. Although digital cinema sound predominantly utilizes 5.1 channels&emdash;5 full frequency range plus the ".1" LFE (Low Frequency Effects)&emdash;DTS encodes only 5 discrete channels, unlike SDDS and Dolby Digital. Prior to data compression, the LFE track is rolled-off above 80Hz, split into two, and combined with each of the two surround channels. After decoding during playback, frequencies below 80Hz are filtered from the surround channels, and then summed before output to the subwoofer(s).
Figure 3 &endash; ADPCM (Adaptive Differential PCM) block diagram
DTS uses the apt-X codec (code-decode algorithm), from Audio Processing Technology (APT) Ltd., originally envisioned for transmission of high fidelity digital audio with ISDN (Integrated Service Digital Networks), broadcast, and DBS (Digital Broadcasting Satellite) distribution. Besides digital film sound, the algorithm is being used in a variety of professional audio applications, including post-production, telecommunications, and multimedia. It should be noted that apt-X is used only for the theatrical version of DTS; the consumer version, DTS Digital Surround, utilizes the distinctly different and more sophisticated Coherent Acoustic Coding (CAC) algortihm (which will be described in a future issue).
apt-X achieves data compression through a combination of two strategies. The first, predictive coding, analyzes the amplitude of the audio signal with time, using present and past audio samples, and predicts the amplitude of the next sample. Instead of storing and transmitting the sampled voltage, as would be the case for linear PCM, the difference between the predicted and the actual signal level is used. The anticipated voltage will not always be exact, but the difference signal will invariably have an amplitude much lower than that of the actual waveform. In essence, predictive coding acts to remove redundant binary information, which is not directly relevant to changes in the waveform with time. Therefore, a significant reduction in data transfer can be realized, by quantizing with smaller word lengths, since the dynamic range of the difference signal is substantially reduced. Furthermore, by adapting the voltage range of the quantizer according to the variance of this signal, quantization error can be minimized to levels potentially lower than the error yielded through linear PCM. These measures to reduce the digital audio data are known as ADPCM, or Adaptive Differential PCM.
A block diagram for ADPCM is shown in Figure 3. The original audio signal, in the analog domain (black arrows), is first converted to a digital signal through the quantizer Q (dotted gray arrow). Then, the digitized audio samples are converted back to analog through the inverse quantizer I (dashed black arrows). The voltage range of both the quantizer and inverse quantizer is adjusted through a voltage step-size adapter D according to the signal range of the previous audio sample. Using the present and previous audio samples (dashed gray arrow), the predictor outputs the anticipated signal (solid gray arrow). The difference between the predicted and the actual signal (dotted black arrow) is then quantized at a lower bit resolution and output from the encoder.
Figure 4 &endash; Predictive coding: sampled voltages quantized and then converted back to analog (bars), predicted voltages (gray line), and the difference between sampled and predicted voltages (black line)
Figure 4 shows the same series of sampled voltages as in Figure 2, first in the digital domain and then converted back to analog (bars), and the predicted sample voltage (gray line). The difference between the these two series of voltages is shown as the black line, and is subsequently quantized with a reduced voltage range/step-size, and bit resolution. For example, in Figure 4, the quantizing voltage range could be reduced to -0.5 to 0.5V, the step-size reduced by two-thirds from 1V to 0.33V, and so the word length for the digital samples would be reduced from 3 to 2 bits, resulting in a 1.33:1 data compression ratio.
In addition to predictive coding, the apt-X algorithm utilizes another method of compressing digital audio signals known as sub-band coding. The rationale is that recorded sounds generally do not have uniform energy over their frequency range. PCM digital audio, however, does not take into account the various dynamic range of spectral components, and therefore in many cases, bits are being utilized that are likely to be redundant. With sub-band coding, frequency components which predominate the audio signal are coded more accurately than those which are less significant. The general approach is to divide the audio signal into a number of frequency regions or bands. Those regions which have small energy contribute little to the sound, and are thus coded with a lower bit resolution than those which have the highest amplitudes. The end result is that the sum of the word lengths from each of the bands is less than that of PCM.
With apt-X, the PCM digital audio signal is divided into 4 frequency bands: (1) 0-5.5kHz, (2) 5.5-11kHz, (3) 11-16.5kHz, and (4) 16.5-22kHz. The band division process is accomplished by what are known as QMFs (Quadrature Mirror Filters) which split the original PCM audio into the 4 spectral regions, each at one-fourth the original sampling rate (so that the output and input data rates from and to the QMFs are the same). Then each of the frequency bands undergo predictive coding at various bit resolutions&emdash;(1) at 8 bits, (2) at 4 bits, and (3) and (4) at 2 bits. It should be clear that region (1) was determined to generally predominate the audio signal and thus is coded with the highest priority.
The apt-X codec for DTS accepts as input PCM audio at 16-bit resolution and 44.1kHz sampling rate for each channel. The encoding is performed on each channel separately, and the encoded signal per channel has an overall bit resolution which is equal to the sum of the word lengths from each of the four frequency bands (8+4+2+2=16 bits). Since the sampling rate is only 11kHz, one-fourth that of PCM (due to the QMFs), apt-X achieves a 4:1 data compression ratio. Recall the earlier discussion that at least a 3:1 compression ratio was required for printing on and reading digital data from film&emdash;apt-X more than adequately meets this criteria. In order to decode the data and recover the original sonic waveform, the processes of QMF filtering, quantization, and predictive coding are simply reversed.
Summary
Digital audio compression (or reduction) is necessary in order to accommodate 6 channels of digital audio within the limited space on film, or on CD-ROMs. It is widely believed that the audio performance of a digital sound format is linked to the degree of data compression, or reduction. However, it is premature to make judgements on this basis, without prior knowledge of the processes involved with data compression for each format. An essential prerequisite for the understanding and appreciation of these processes is some knowledge of digital audio concepts, namely quantization, PCM, binary word length, sampling rate, and quantization error. It has been determined that at least a 3:1 compression of PCM audio is needed to allow for reliable storage and playback of multichannel digital audio on film, but simply reducing the audio sample word length and/or the sampling rate is not acceptable. Therefore, specialized measures to code digital audio at lower data rates have been used. DTS utilizes the apt-X algorithm for theatrical film release distribution, with achieves a 4:1 data compression ratio through two methods: perceptual coding and sub-band coding.
In the second part of this series, we will explore the rationale and processes involved with the data compression techniques used for SDDS and Dolby Digital, both of which have similarities to each other in they rely on human perception of sound. In subsequent installments, we will explore multichannel data compression approaches for consumer applications, such as Meridian Lossless Packing (MLP) and Sony/Phillips Direct Stream Digital (DSD).
References
1. "Terry Beard: A Conversation With The President of DTS," Widescreen Review, (Issue 2, September/October, 1993), 54-69.
2. Brandenburg, Karlheinz, and Marina Bosi. "Overview of MPEG-Audio: Current And Future Standards For Low Bit-Rate Audio Coding," 99th AES Convention, preprint 4130.
3. Holman, Tomlinson. Sound For Film And Television. Boston: Focal Press, 1997.
4. Pohlmann, Ken C. "Digital Audio 101: Back To Basics," Stereo Review, (November, 1996), 95-102.
5. Smyth, Michael, and Stephen Smyth. "APT-X100: A Low-Delay, Low Bit-Rate, Sub-Band ADPCM Coder for Broadcasting," Procedures Of The 10th International AES Conference, 41-56.
6. Smyth, Stephen M. F. "Method And Apparatus For Electrical Signal Coding," United Kingdom Patent WO8907866.
7. Watkinson, John. The Art Of Digital Audio. 2nd edition Oxford: Focal Press, 1994.
8. Weinberg, David J. "The Dolby Stereo Digital Film Sound System," Widescreen Review, (Issue 3, April/May, 1994), 53-63.
Perry Sun is the Movie Sound Editor for Widescreen Review, and also the editor of eFilmNetwork.com. Perry can be contacted via e-mail at perry@widescreenreview.com.
Back To Articles Main Page
|