|
Back To Articles Main Page
Data Compression For Home Theatre: DTS® Digital Surround - Part 3
By Perry Sun
Parts 1 and 2 of this series of articles (published in Issues 37 and 38, respectively) introduced fundamental digital audio principles, along with the inner workings of the data compression systems for the three digital sound formats in cinema. Part 3 is the beginning of a continuing series of features exploring data reduction measures with multichannel sound formats relevant to consumer applications. Linear pulse code modulation (PCM) digital audio has and will continue to be a mainstay with CD. However, other current and evolving media have exploited the use of data compression (i.e. DTV, DVD-Audio/Video, multimedia etc.), both as a means for compact storage of audio information, and as an opportunity to exploit the inherent redundancy in PCM audio to offer improvement in the quality of the audio signal. Examples of multichannel audio formats for the former include Dolby® Digital, MPEG-1/2 Layer-3 (better known as MP3) and Advanced Audio Coding (AAC). Meridian Lossless Packing (MLP) and DTS® Digital Surround exemplify the latter. Whatever the philosophical approach to audio data compression, the general principle is the same - conserved utilization of bandwidth in order to meet a specified requirement for digital audio delivery.
In this feature, we will look at the essential aspects of the codec for DTS Digital Surround. There has been a lot of commentary regarding the comparison between DTS and Dolby Digital, both in these pages and through several other sources in print and over the Internet. The arguments in favor of, or against one of these formats seems to have been largely based on the bit rates of the codecs. However, the numbers only tell part of the story. What is more important are the processes behind these numbers to achieve data reduction, especially when different formats are being compared. The inner workings of Dolby Digital was explored in Part 2. The following will hopefully provide some useful information in formulating comparative judgements between DTS Digital Surround and other formats.
The Essentials
The codec for DTS Digital Surround is known as Coherent Acoustics. Originally developed for providing high quality, multichannel surround sound for CDs and LaserDiscs, it is now of course featured on several DVD-Video titles.
Coherent Acoustics is a highly flexible audio coding scheme, allowing for a wide range of bit resolutions and sampling rates. Its operation ranges from low bit rate, lossy perceptual coding (meaning that it reduces data based on psychoacoustics principles) to high quality, lossless coding with variable bit rates. Encoded data rates of 32-4096kbps, sampling rates of 8-192kHz, resolutions of 16- to 24-bit and up to eight channels are possible (although current decoders utilize 5.1 channels). There are also a series of features inherent to Coherent Acoustics decoders, including selective channel down-mixing and dynamic range control. Additionally, special extension data is available, which allows for future audio enhancements while maintaining backward compatibility with existing decoders. The general philosophy behind Coherent Acoustics is to provide for sophistication and future refinements in encoding that can be realized using simple decoders.
For LaserDisc and CD, the bit rate for DTS Digital Surround is 1235kbps. For DVD-Video, the data is transferred at rates of 1509 or 754kbps. Many sources quote the 1536 bit rate as opposed to 1509 (and 768 instead of 754kbps). Both numbers are technically correct. DTS occupies the same bandwidth allotted for PCM audio&emdash;the latter quoted rate is the actual DTS throughput, and the former denotes the PCM data rate which encompasses DTS. (It should be noted that for Widescreen Review, we will use the 754 and 1509 data rates).
The Nuts And Bolts Of Compression
Currently, the typical input for Coherent Acoustics is 5.1-channels of 20-bit or 24-bit PCM audio, at a sampling rate of 48kHz. The .1 LFE channel is handled separately from the main channels, and is discussed later. Under these conditions, the first task would be to group the PCM audio for each of the five main channels in frames of up to 2048 or 1024 samples (for output data rates 754 or 1509kbps, respectively&emdash;the number of samples is limited in accordance to the decoder's input buffer size). Each frame is then split into 32 frequency bands of equal bandwidth (and equal number of samples). Consequently, at the 754kbps rate, each sub-band consists of 64 samples, while each band at the 1509 rate comprises 32 samples.
This measure allows for precision in coding, by data compressing each sub-band independently, in accordance to its respective signal condition. Also, by dividing the audio into sub-bands, advantage can be taken of the fact that for music, the spectral content usually is non-uniform, and very small at high frequencies. PCM audio weighs all frequencies equally, and therefore data usually exists that is redundant (Figure 1). This redundancy can be eliminated by selectively reducing the bit resolution for bands corresponding to high frequencies. Such an approach to data compression is known as sub-band coding.

Figure 1 - (a) Non-uniform spectral content, characterisic of music.
(b) Equal distribution of spectral content, which is assumed by PCM audio.
The primary data reduction method with Coherent Acoustics is known as predictive coding. The idea behind this approach is that current samples of audio can be used to predict what the next sample will be. This anticipated value will not always be exact, but the difference between this and the actual signal will invariably have an amplitude much lower than that of the actual waveform (Figure 2). By coding and transmitting the difference signal, redundant binary information which is not directly relevant to changes in the waveform with time is removed. Therefore, a significant reduction in data transfer can be realized by quantizing with smaller bit resolutions, since the dynamic range of the difference signal is substantially reduced. Furthermore, by adapting the range of the quantizer according to the variance of this signal, quantization error can be minimized to levels potentially lower than the error yielded through linear PCM. These measures to reduce the digital audio data are known as ADPCM, or Adaptive Differential PCM.

Figure 2 - Predictive coding. Blue, current audio samples; magenta, audio sample to be estimated; gray line, linear prediction; green, difference between predicted and actual amplitudes.
It might be interesting to note that the processes just described are somewhat similar to the codec used for the theatrical DTS sound format, apt-X (discussed in Part 1). The latter only uses four frequency bands for coding, however, and what will be described below are refinements exclusive to Coherent Acoustics.
Following the splitting of the PCM audio into sub-bands, linear predictive coding (LPC) analysis is performed on each frequency band, in frames (or windows) of 32 sub-band PCM samples, to determine the optimal coefficients for the mathematical prediction equation. Then, a first pass with ADPCM is performed, known as the "estimation" loop. The resulting difference signal is then compared with the actual signal to determine whether an acceptable coding gain will be realized with ADPCM, taking into account the prediction coefficients and other side information which has to be transmitted as part of the output bit stream. If so, the second ADPCM process, known as the "real" loop, actually codes and transmits this residual data. However, if insignificant or negative coding margin results during the estimation, no ADPCM is performed, and the original PCM data is used instead.
Transient Signal Analysis
Transient signals rapidly change between small and large amplitudes. They are tricky to code because the quantization error in coding them is generally larger than relatively steady-state signals. The problem is that if a transient occurs within a given frame of samples, this increase in quantization error is spread throughout the time frame, resulting in noise prior to the onset of the transient (also known as pre-echo, see Figure 3). Dolby's AC-3® codec for Dolby Digital and several others attempt to mitigate the effects of pre-echo by using short time windows, with the aim of limiting the content of the frame to the transient.

Coherent Acoustics deals with pre-echo in a different manner. If a transient is detected within a LPC window, two scale factors are used: one for the transient and a smaller factor for the remaining content in the frame. This information, along with the location of the transient within the window, is transmitted to the "real" ADPCM loop, and is part of the output bit stream. In the absence of a detected transient, only one scale factor is used for the analysis window.
Quantization Of Sub-Bands
Along with ADPCM, a global bit management routine is used to allocate bits for re-quantizing sub-bands across frequencies and all channels. At low bit rates, this bit allocation is performed using perceptual coding (to be explained below). At the data rates currently used for consumer applications, there is a significant availability of bits to obviate the need for perceptual coding, which occurs at rates of approximately 500kbps and lower. Thus, the higher the bit rate, the more data available for quantizing the sub-bands and therefore the margin increases between the perceptible noise threshold and the error introduced by the compression. At this point, the desire is often to use the available bits to achieve uniform error along the spectrum. Ultimately, with lossless coding, the bit rate is the highest, and is varied in order to attain error levels equal to those of the original PCM signal. For each sub-band, bit allocation information is transmitted as part of the output data stream.
Low Bit Rate Considerations
For low bit rate applications, a variety of provisions are available to accommodate coding with limited data. In such cases, perceptual coding becomes advantageous because substantial amounts of data can be removed in accordance to human patterns of hearing (also referred to as psychoacoustics). Aspects of psychoacoustics that are relevant to Coherent Acoustics (as well as several other codecs) are frequency masking, in which a signal at one frequency obscures signals at adjacent frequencies, and the threshold of auditory sensitivity, which varies with frequency. Models of masking, along with the hearing threshold are used to calculate the maximum allowable quantization error for each sub-band, and therefore the appropriate bit allocation. Again, it should be emphasized here that at the data rates for DVD, CD and LaserDisc, perceptual coding is an insignificant factor, as is any other measure for low bit rate compression discussed below.
Another coding strategy that can be useful for low bit rate requirements is variable-length coding of differential sub-band codes (the output of ADPCM), as well as side information (prediction coefficients, scale factors, bit allocation information, etc.). This method takes advantage of the fact that smaller (shorter) codes tend to occur more frequently than those that are longer. Variable-length coding is employed to replace small values having a high probability of occurrence, with codes that are more compact. Longer codes are assigned to those which occur more randomly. These code assignments come from a series of statistical tables which map them to the original, fixed-length codes. Under certain circumstances, further coding gain can be achieved through adjustment of the scale factor to yield additional short-length codes.
For very low bit rate requirements, a technique called joint frequency coding can be employed. The rationale is that high frequencies are difficult to localize by humans. Therefore, high frequency sub-bands from two or more channels can be summed into a single group of high frequency sub-bands. Upon decoding, this group is then appended to each individual channel. The effect of such data compression is to retain the amplitude waveform for all of the channels, but also sacrifices phase information.
Coding The .1 LFE
Because of its limited bandwidth, the .1 LFE is handled differently from the main channels. In this case, the LFE is input as a full-bandwidth PCM bit stream, and then decimated using filters with bandwidths of either 80Hz or 150Hz. The decimated data stream is then re-quantized at 8-bit resolution. In the decoding stage, the original PCM signal is reconstructed using the same filters.
Extension Data
A powerful feature of the Coherent Acoustics codec is the ability to incorporate refinements in bit resolution, higher frequencies and additional channels, while maintaining backward compatibility with current-generation decoders. This is accomplished through what is known as extension data, and is already being implemented with DTS-ES Discrete 6.1 (please refer to Shane Buettner's exclusive feature in Issue 41 for more information). Another potential application of extension data is to offer 5.1-channel, 24-bit/96kHz audio&emdash;DTS presented a technical paper on this technique at the 109th AES Convention.)
he extension data basically comprises information that is in excess of that needed for the 5.1-channel and up to 24-bit/48kHz configuration, compatible with current decoders (also known as the core data stream). The extension data is encoded separately and then appended to the core bit stream. The nature of this encoding is dependent on the type of extension data, whether it is higher frequency information (i.e. 48kHz-96kHz) or an extra channel, such as the back surround channel for DTS-ES Discrete 6.1. Decoders that are appropriately updated will fully utilize the enhancements in the extension data, while current-generation decoders will simply derive the "normal" audio and disregard the extension data stream.
Summary
The output of the Coherent Acoustics encoder consists of the coded sub-bands for all of the main channels, along with the coded .1 LFE and side information, all multiplexed into a single data stream. For current applications, the compression ratio ranges between approximately 3:1 to 8:1, depending on bit resolution, sampling rate and encoded data rate. The lossless coding mode offers ratios varying between about 1.5:1 to 2.5:1. A block diagram summarizing the coding processes for one channel is shown in Figure 4. P, predictive analysis, determines whether ADPCM for a given sub-band will yield significant coding gain, or if the original PCM code should be used instead. The pink arrows denote transfer of side information.

Figure 4 - Coherent Acoustics encoding. S, sub-band splitting; L, LPC analysis; E, "estimation" ADPCM; T, transient analysis/scale factors; P, prediction analysis; G, global bit allocation; R, "real" ADPCM.
What should be apparent from this feature is that with the current implementations of DTS Digital Surround, the data compression allows for considerable margin over what is attainable with lower bit rates, at least in theory. At these reduced rates, perceptual coding is utilized, an approach associated with other formats including Dolby Digital, ATRAC and MP3.
References
1. Smyth, Mike. "An Overview Of The Coherent Acoustics Coding System." DTS Digital Surround White Paper.
2. Smyth, S.M.F., Smith, W.P. et al. "DTS Coherent Acoustics: Delivering High Quality Multichannel Sound To The Consumer." 100th AES Convention, Preprint 4293, May 1996.
3. Watkinson, John. The Art Of Digital Audio. Second Edition, Oxford: Focal Press, 1994.
Perry Sun is the Movie Sound Editor for Widescreen Review, and also the editor of eFilmNetwork.com. Perry can be contacted via e-mail at perry@widescreenreview.com.
Back To Articles Main Page
|