|
Back To Articles Main Page
DTS Coherent Acoustics® The Future Of Audio Part Three:
Evaluating The Performance Of Audio Codecs
By Mike Smyth And Stephen Smyth
DTS Technology
Introduction
This third article on digital audio data re-duction deals with some of the common artifacts generated by audio compression algorithms, and discusses ways of measuring and evaluating their performance. Particular em-phasis is placed on objective tests that can indicate the mode of operation and potential shortcomings of any algorithm. The difficulties of setting up and running reproducible subjective listing tests are also discussed. Finally, three of the more important algorithms are described briefly.
The purpose behind the development of DTS Coherent Acoustics® was to enable mastering quality digital audio to be delivered to the home on existing and proposed new me-dia platforms. In addition it will be suitable for discrete multichannel formats, while simultaneously having extensions to the new proposed higher quality digital audio standards.
Within the DTS Coherent Acoustics framework is a digital audio compression methodology which operates directly on the linear PCM data in order to reduce the bit rate, with-out affecting the fidelity of the audio signal itself. The bit rate reduction is what allows, for example, multiple channels of higher quality audio to be delivered on a standard CD.
The Sound Of Compression
The current state of audio compression algorithms is such that reasonably transparent audio requires a bit rate of at least 128 kilobits per second per channel. At higher bit rates psychoacoustic analysis becomes less important and fairly simple modeling at the encoder or decoder can be used successfully. If the bit rate falls below 128 kilobits per second per channel perceptual transparency cannot be guaranteed, and trained listeners will hear coding artifacts with standard audio test signals. Low bit rate artifacts may be caused by an inaccurate psychoacoustic analysis of the audio signal, or by "bit starvation" where the analysis is accurate but the bit demand exceeds the number of bits available.
Coding Artifacts
Most of the severe artifacts are produced by transient or sharp percussive sounds (e.g. castanets or xylophone), which by definition contain little objective redundancy, and which are difficult to analyze using the standard psy-choacoustic noise-masking calculations. Time waveforms of castanets and harpsichord [Figures 1, 2] illustrate the differences between sharply and moderately transient signals, e.g. the noise floor level that the signal rises from, and the rate of decay of the signal, [Figures 3, 4].
The ideal "critical" codec-stressing musical instrument would have a sharp transient onset, pure low frequency fundamental tonal components, and complex lingering high frequency harmonics. The harpsichord has some of these attributes, with a moderately sharp onset [Figure 4] and large amplitude high frequency components extending up to 20 kHz, Figures 5, 6]. This instrument does tend to cause coding artifacts due to the excessive number of bits required to adequately code the full-band signal.
Traditionally much of the critical musical material has been extracted from the EBU SQAM CD (European Broadcasting Union,

Figure 1 - Original, 16 Bit Linear PCM Castanet, Left & Right Channels

Figure 2 - Original, 16 Bit Linear PCM Harpsichord, Left & Right Channels

Figure 3 - Original, 16 Bit Linear PCM Castanet, Left & Right Channels

Figure 4 - Original, 16 Bit Linear PCM Harpsichord, Left & Right Channels
Sound Quality Assessment Material), which consists primarily of short recordings of single musical instruments. Audio engineers (who are not necessarily musically talented) find that complex orchestral recordings tend to hide rather than reveal coding artifacts. On the other hand it has been noted that musicians are more accomplished at picking out coding artifacts that "color" the music and make it sound distorted, and are less likely to focus on particular artifacts such as pre-echo or noise modulation.
Pre-Echo Control
All low bit rate audio codecs operate on blocks of data rather than on individual audio samples. This causes one of the most noticeable coding artifact, commonly called pre-echo.
Pre-echo manifests itself as a dulling of the initial attack of a transient signal, and is caused by a rise in the noise floor just before the transient begins. In extreme cases it is distinguished as a shuffling noise which ac-companies each strike. Obvious musical ex-amples would include the castanets and the xylophone, but it may also occur with the triangle and glockenspiel. Pre-echo can also occur during any filtering operation. Figure 7 shows some pre-echo introduced in 16-bit linear PCM audio, probably due to an anti-aliasing filter used after the analog-to-digital conversion.
Pre-echo is a particular problem for transform based coders, which must trade block size with coding gain. In general the smaller the block of samples used to compress the signal the less noticeable the pre-echo. However, smaller blocks are less efficient for compressing audio than larger blocks. Coders must therefore balance compression efficiency with pre-echo management. Most psychoacoustic research has focused on steady state signals, i.e. how pure tones are heard in the presence of noise or how a tone at one frequency "masks" a tone at another frequency. As previously described the basic mathematical tool in this research is the Fourier time-to-frequency transform, and in particular the fast Fourier transform or FFT. This operates on sequential blocks of time domain audio data. The block length fundamentally defines the time resolution of the transform, i.e. the audio signal is assumed to be in a steady state condition for the entire length of the block. If the signal changes dramatically within a block of data , this information is lost when the signal is transformed into the frequency domain. Since the noise masking calculation operates on the frequency transformed blocks of data, it cannot detect transients within blocks. The end result is that the noise masking curve is used over the en-tire block of data, even though in the time domain the first half of the block may have contained no audio signal. This is illustrated in Figure 8 which should be compared to the original in Figure 7. Figures 9 and 10 show pre-echo occurring with the glockenspiel, using a logarithmic energy scale rather than linear amplitude.
Pre-echo can be controlled by detecting transient signals in the time domain, before the audio is coded, followed by a more judicious selection of transform block length. This does not eliminate the pre-echo but es-sentially constrains the effect of the pre-echo to the shorter block size, which may not be audible. Typically, the transform block is re-duced from 1024 audio samples to 512 or 256 samples. The time resolution of a 256 sample block of audio data sampled at 48 kHz is approximately 5 milliseconds, and is sufficiently small that pre-echo should not be resolved. Figures 8 and 10 show pre-echo occurring on this time-scale, indicative of a short coding block length.
To a large extent the audibility of pre-echo will be determined by the position in time of the transient in the block of audio data to be transformed. If the transient occurs at the be-ginning of the block then no pre-echo will be introduced. If it occurs at the end then a maximum amount of pre-echo will be created. This obviously creates a big problem when listening tests are being conducted on codecs since pre-encoded material may not exhibit this variability.

Figure 5 - Original, 16 Bit Linear PCM Stereo Harpsichord, Left & Right Channels

Figure 6 - Original, 16 Bit Linear PCM Stereo Harpsichord, Expanded Between 5kHz & 10 kHz

Figure 7 - Original, 16 Bit Linear PCM Castanet, Left Channel

Figure 8 - Dolby AC-3®, 96 Kbits/Second/Channel/ Dual Mono Castanet, Left Channel
The down side of the variable block length technique is that coding efficiency drops as the block size decreases. This may result in new coding artifacts becoming audible, i.e., the attempt to contain pre-echo creates new coding difficulties which may be more audibly annoying than the original pre-echo problem. In the example above, the 256 sample block containing the transient may require so many bits for adequate coding that the next block (which may not contain a transient) is "starved" of bits, causing audible low frequency noise modulation.
Figure 9 - Original, 16 Bit Linear PCM Glockenspiel, Left Channel Expanded Around First Transient
Figure 10 - Dolby AC-3 ®, 96 Kbits/Second, Dual Mono Glockenspiel, Left Channel Expanded Around Transient
Figure 11 - DTS Coherent Acoustics ®, 96 Kbits/Second/Channel Stereo Music
Figure 12 - DTS Coherent Acoustics ®, 64 Kbits/Second/Channel Stereo Music
Lossless And Lossy Audio Compression
An audio compression scheme that only removes redundancy from the signal, is re-ferred to as "lossless," since no real information in the signal is actually discarded and the signal can be reconstructed exactly during playback. The philosophical appeal of lossless compression is very strong to audiophiles, and the concept has begun to command more attention as new data delivery media are developed which may be more suitable for this form of audio compression. A brief explanation of lossless compression is given next in this article along with some of the practical delivery problems that its use entails.
Most commercial digital audio compression algorithms today, such as are used in DCC and MiniDisc, try to remove primarily irrelevancy, and are referred to as "lossy," since information that was in the original signal has been discarded.
The coding framework of DTS Coherent Acoustics utilizes both lossless and lossy compression techniques, and can operate in either mode. Due to the greater coding gains that can be achieved by lossy compression compared to lossless, the focus of most of the current article is on the techniques utilized by lossy compression algorithms.
Bit Starvation
In this context bit starvation refers to an inadequate supply of bits with which to code the audio signal without audible distortion. The function of the pyschoacoustic calculation is to determine the minimum bit-rate re-quired for perceptually lossless compression. If the particular audio signal requires more bits than are available then, by definition, audible distortion must occur. It is then up to the bit allocation routine to try to minimize the audible effect of the lack of bits.
Many options are available and more than one may be used simultaneously. In individual channels these include decreasing the coded bandwidth, for example from 20 kHz to 15kHz, or increasing the quantization noise at selected frequencies. If more than one channel is being coded from a common bit pool further options include combining selected spectral regions together across two or more channels, or preferentially allocating bits to particular audio channels depending on their artistic importance. Ultimately bandwidth must be exchanged for bit-rate if distortion is to be kept inaudible, but this reduction can be accomplished dynamically and a momentary loss of bandwidth is less offensive than a momentary gain in distortion. Figures 11 and 12 illustrate the drop in bandwidth as the bit rate is reduced from 96 kilobits per second to 64 kilobits per second per channel.
Testing Audio Codecs
The ultimate worthiness of an audio co-dec can only be determined subjectively over prolonged periods of listening to a wide variety of audio material. Nevertheless consumers and professionals alike are uncomfortable with this approach and still insist on assessing audio equipment by some simple objective measurements such as total harmonic distortion, frequency bandwidth or dy-namic range. These measurements make up the "specs" for the audio product, and may be useful for comparing similar products.
However, it should be evident that psychoacoustic-based audio compression algorithms can easily fool almost all of the standard objective audio tests, even when operating at very low bit-rates. Single tone analysis of distortion products, dynamic range or bandwidth tend to give no useful information about co-decs since they are essentially measure-ments of the steady state of the system, whilst the coder is intrinsically a dynamic system. Multichannel audio codecs are even more difficult to "pin down" given their ability to manipulate bits across time, frequency and channels.
A coder may therefore sound wrong with real dynamic audio signals but measure perfectly with real static audio test signals. The listener (or broadcast audio engineer) is in a quandary if there is no way to measure the problem that he or she can easily hear. This is essentially the situation today with respect to the objective testing of the sonic quality of audio codecs.
Nevertheless some objective tests can be used to show the behavior of codecs, and with care may also be useful in comparing the performance of codecs. Test signals may also be used to verify the performance of a codec for automated quality control. These test signals are designed to either stress the psychoacoustic analysis algorithm of the coder under steady state conditions, or stress the bit-allocation algorithm under transient conditions when the pyschoacoustic analysis breaks down. The most widely used is a multitone test signal.
Figure 13 - Original, 16-Git Linear PCM @ 44.1 kHz Left Tones @ 500 Hz Spacing, Right Tones @ 1000 Hz
Figure 14 - DTS Coherent Acoustics® , 96 Kbits/Second/Channel, Dual Mono Left Tones @ 500 Hz Spacing, Right Tones @ 1000 Hz Spacing
Multitone Test Signal
Evenly (or unevenly) spaced tones of equal amplitude that extend over the full audio bandwidth, [Figure 13], have been used for some time to measure the actual "operational" bandwidth of coders, as opposed to a coder's static bandwidth measured using steady state signals. Since none of the tones are perceptually masked by any other tone, the coder is forced to try to quantize all the frequency components. If there are insufficient bits then normally the audio bandwidth is reduced, and this can be easily seen by comparing the coded signal [Figures 14 and 15] to the original [Figure 13]. The multi-tone signal is a reasonable approximation of a complex audio signal [Figure 6 harpsichord], and gives a fairly accurate measure of the "real" bandwidth of any single channel of a codec. However as shown in Figures 14 and 15 this test must also be used with caution since higher frequency tones are often coded with very limited frequency resolution and/or limited amplitude resolution.
If a multi-tone test signal is applied to an intrinsically multichannel codec, care must taken in evaluating the results. In order for the test to be meaningful the individual channels of the codec must receive different tones in each multi-tone signal. If the same multitone signal is fed into all the channels of a multichannel codec the artificial correlation be-tween the input signals can be used to in-crease the effective quantization resolution and lower the noise floor. Figures 16 and 17 illustrate this effect in a 2-channel codec. In Figure 16 the same multitone signal is fed to both channels, while in Figure 17 two different multitone signals are used. In each case the higher frequency tones of each channel above approximately 10 kHz have simply been added together to maintain the perceived bandwidth, and increase the quantization resolution below 10 kHz. This is not obvious in Figure16 but is very apparent in Figure 17. Both can be compared to Figure 13 where the same coder is in dual mono mode and each channel is allowed to operate independently. Compared to the original signal in Figure 13, what is the actual coding bandwidth in Figure 17?
The multitone signal can also be used to evaluate the intrinsic coding efficiency of different compression algorithms, by comparing their signal-to-noise ratios at particular tones. The noise floor will normally rise at the higher frequencies, and all codecs should show a similar overall shape. However the actual level of the noise floor across all frequencies is an accurate indication of the ability of the codec to extract both redundancy and irrelevancy from a complex, steady state audio signal, similar in many ways to the harpsichord signal shown in Figure 6. The lower the noise floor (integrated over all frequencies) the greater the potential sonic performance of the coder. Figures 14 and 15 illustrate this comparison with both codecs operating at the same bit-rate and in dual mono (independent channel) mode.

Figure 15 - Dolby AC-3® 96 Kbits/Second/Channel, Dual Mono Left Tones @ 500 Hz Spacing, Right Tones @ 1000 Hz Spacing

Figure 16 - Dolby AC-3® 96 Kbits/Second/Channel, Joint Stereo Left & Right Channel Tones @ 500 Hz Spacing

Figure 17 - Dolby AC-3® 96 Kbits/Second/Channel, Joint Stereo Left Tones @ 500 Hz Spacing,Right Tones @ 1000 Hz Spacing

Figure 18 - Dolby AC-3® 96 Kbits/Second/Channel, Dual Mono Left Tones @ 500 Hz Spacing, Right No Signal
The shape of the noise floor is indicative of the bit allocation scheme, which is also very important in determining the subjective quality of the coder. Once again this measurement must be interpreted with caution if the signal is applied to a single channel of a multichannel codec, unless precautions have been taken to limit the ability of the co-dec to redistribute bits across the channels. A better procedure would be to inject "tonally distinct" multi-tone signals in every channel simultaneously, and then analyze each channel individually. This would indicate the presence of the summing of high frequency components of the individual channels [Figure 17], and also inhibit the ability of the codec to re-distribute bits unevenly across the channels. Figure 18 illustrates the ability of a coder to re-allocate bits from the un-used right channel into the left channel, lowering the noise floor and increasing the sonic quality in the left channel compared to the same channel in Figure 15.
The ability to dynamically re-allocate bits across the channels according to need is a very powerful coding tool, and does not imply any coding dependency among the audio channels. Rather it is a method of maximizing the overall audio quality.
However the summing together of the high frequency components of individual channels into a common channel does result in the loss of channel independence. On analysis each channel coded jointly would reveal all the high frequency components of all the other channels that have been summed with it. This is illustrated in Figures 19 and 20 which compare the differences between the left and right channels of a stereo harpsichord signal coded jointly [Figure 20] and originally [Figure 19]. Above approximately 10 kHz the difference signal in the coded version is zero, impaling that the signals are identical in both left and right channels above this frequency. This high frequency joining technique is most easily seen by comparing Figures 21 (original signal) and Figure 22 (coded signal). The joining can be disabled temporarily during transient portions of the signal.
Figure 19 - Original, 16-Bit Linear PCM Stereo harpsichord, Left Channel Minua Right Channel
Figure 20 - Dolby AC-3® 96 Kbits/Second/Channel, Joint Stereo, Stereo Harpsichord. left Channel Minua Right Channel
Figure 21 - Original, 16-Bit Linear PCM @ 44.1 kHz Left Tones @ 500Hz Spacing, No Right Signal
Figure 22 - Dolby AC-3® 96 Kbits/Second/Channel, Joint Stereo Left Tones @ 500 Hz Spacing, Right No Signal
Subjective Listening Tests
Listening tests still provide the only conclusive means of evaluating low bit-rate audio codecs. While objective measurements, for example based on the multi-tone signal de-scribed above, can demonstrate differences between codecs that may be important, ultimately codecs stand or fall on their subjective performance with "real" audio signals.
However the logistics involved in setting up listening tests so as to obtain meaningful, repeatable results are formidable and expensive. For example consider the variables involved in a 5.1 channel test between two codecs. These include the number of audio channels to be listened to, the choice of the listening environment, whether speakers or headphones (or both) are used, the bit-rate(s) of the coders, the error robustness of the codec given a specific application, the choice and number of audio materials, the choice and number of expert or non-expert listeners, and the training of these listeners.
Meaningful listening tests demand that the coded signal be compared with a reference signal. A coded signal may sound poor simply because the original signal is of low quality.
Most modern tests use a "double-blind hidden reference" methodology whereby the listener compares two identical audio se-quences against a third known original. One of the two "identical" pieces is the coded version and the other is the "hidden" original. If the coded version cannot be distinguished from the hidden original then on average, over many listeners, it will obtain the same score as the original. If the coded version sounds worse than the original, it will score lower. If the coded version sounds better than the original, the test or the listeners are fundamentally flawed. The hidden reference concentrates the golden-eared listeners mind wonderfully, since no expert listener wants to be seen to score a coded version of the signal better than the original.
Usually between 10 and 20 audio selections are used, which are repeated for each codec at each bit-rate, using speakers and headphones. If more than one coder is being assessed the listener should not know which codec is being tested. This is normally done by randomizing the coded samples. Since the recommended procedure calls for individuals to be tested in isolation, and each individual may require three or four hours to complete the test, the whole test normally continues over a period of weeks or months. The final result is a set of numbers which measure the differences between the coded and the original signals at a particular bit-rate. If the differences are statistically insignificant the coder is subjectively transparent at that particular bit-rate. Typically a codec will be transparent to some signals and not transparent to others. Assessing the merits of two codecs therefore may involve comparing low but overall consistent results with higher but more erratic results. If a coder is typically transparent but falls down badly on a particular signal it would be rejected in favor of one that is less transparent but more consistent.
One of the difficulties in this test is the wide range of scores that are sometimes given to the same coded piece of music. While all scores are valid, given the subjective nature of the test, the scores should display a skewed normal distribution clustered around some average result. If the scores were clustered in two or more regions it would indicate some disagreement between the listeners on how to score subjectively, and would invalidate the test.
The listening test is also open to abuse particularly if the test subjects are sufficiently knowledgeable, and are able to distinguish which codec is being tested by picking out particular coding artifacts that act as codec "fingerprints." For example, a transform based codec may produce audible harmonic distortion while, for the same signal, a subband based coder produces noise modulation. If the listener is a proponent of a competing codec his subjectivity in this case may be suspect. In many official tests the listening panel has included the engineers who developed the codecs being assessed. These are often the only people willing to volunteer.
However, none of the above should be taken to imply that individuals cannot make meaningful comparisons on coders using home theatre audio equipment set up in their living room. The only problem is lack of access to the uncoded audio signals. Without reference material it is almost pointless to assess the quality of a codec. With reference material, almost anyone is capable of being an "expert" listener, and being able to quickly and consistently pick out coding artifacts.
Commercially Available Codecs
Many audio codecs are in use today and commercially available. The most widespread are those used in telecommunications for voice data. For the home theatre audio/video market the three most interesting codecs are ISO/MPEG, Dolby AC-3® and DTS Coherent Acoustics®.
ISO/MPEG
The ISO/MPEG (Moving Pictures Experts Group) audio compression algorithm consists of a number of related compression algorithms, Layers I, II and III, that differ in their computational complexity. The least complex is Layer I and the most complex is Layer III, each layer operating over a wide range of overlapping bit rates. A higher layer decoder must be able to decode lower layers, but not vice versa. Recently a new multichannel coder has been proposed which comes in two flavors, Backward Compatible (BC) and Non-Backward Compatible (NBC), depending on whether or not it can decode lower layer 2-channel bit-streams. In addition a new Layer IV has been proposed that seeks to have limited salability and operate at even lower bit-rates.
All of the ISO-MPEG algorithms are based on a uniform 32-band polyphase filter bank, although Layer III increases the frequency resolution within each subband. Each subband signal is quantized using adaptive PCM. The main difference between Layers I and II is in the calculation of the noise masking threshold used to determine the quantization step-size (i.e. noise) for each subband. Layer II utilizes a high resolution FFT transform in parallel with the subband de-composition, while Layer I only uses the low resolution subband signals themselves. Since Layer I is intended for high bit-rate consumer applications, such as DCC, the encoder (i.e. recorder) must be fairly cheap and therefore simple. Layer II is intended for low to medium bit-rate broadcast use, such as satellite delivered digital video and digital radio, where the encoding is done at the point of transmission and can be made much more complex in order to retain high quality at the lower bit-rates. Layer III is even more complex at both the encoding and decoding stages, and is used primarily on digital telephone networks such as ISDN. The multichannel version of the codec has been proposed for use on DVD discs.
Dolby AC-3
The Dolby AC-3 multichannel audio compression algorithm is transform based, with the block size of the transform adapting to transients detected in the signal. The frequency coefficients are quantized adaptively according to a fairly simple psychoacoustic calculation. In contrast to more advanced coders the core bit allocation routine operates in both the encoder and decoder. The big danger with this technique is that the core bit allocation routine in the decoder could eventually become obsolete and is not up-gradable. The claimed advantage is that little or no bandwidth is required to transmit explicit bit allocations calculated at the en-coder. This may or may not improve the audio quality depending on the accuracy that can be achieved in the bit allocation routine running on the decoder. In other words, having more bandwidth available for the actual audio codes will only improve the audio quality if the bit allocation routine is accurate.
The algorithm uses a number of other coding techniques to maximize the quality at any given bit rate. The most significant and noticeable is a process referred to as coupling [Figure 14b, 19]. If the bit-rate is too low the AC-3 algorithm will sum audio channels into a common channel. The joining can begin from just above 3 kHz and continues up to above 23 kHz. The time varying energy of the audio signals in the original discrete channels are passed to the decoder, so that the volume of each channel on playback bears some semblance to the original.
AC-3 also includes some useful features such as dynamic range compression, down-mixing from 5.1 to 2 channels or LT/RT and dialogue normalization.
AC-3 was used initially for the reproduction of motion picture 5.1 channel digital soundtracks in film theatres (known as Dolby Digital® or SR-D®), where the digital data was printed optically between the sprocket holes of the 35mm print and read back by a special reader mounted on the projector. Due to the severe data storage restrictions and error correction overheads imposed by recording the data unto the actual film, this version of the algorithm runs at a total bit rate of only 320 kilobits per second, and is somewhat obsolete. This slightly compromised version also appears to be the algorithm used in the 2-channel systems for broadcast use that operate at a total bit rate of 192 kilobits per second. The latest 5.1 channel version of the algorithm used for laserdiscs operates at a bit rate of 384 kilobits per second, which is also the proposed rate for 5.1 channel audio within the HDTV and DVD specification.
DTS Coherent Acoustics
The primary motivation for the development of Coherent Acoustics was to provide a multichannel audio delivery system where any audio channel could surpass the quality of 16-bit Compact Discs in terms of dynamic range and sheer fidelity. To achieve this goal a number of basic operational requirements of the Coherent Acoustics coding system were specified from the out set.
The coder must be capable of compressing 18-bit, 20-bit, 22-bit and 24-bit digital audio and retain the full dynamic range of the input signal, and operate at sampling rates above 48 kHz, e.g. 96 kHz or 192 kHz and still retain the full bandwidth of the input signal.
In order to accomplish this, the coder operates over a wide range of bit rates, but specifically provides unequivocal transparency for 20-bit signals (44.1 kHz and 48 kHz) at bit rates not exceeding 220 kilobits per second per channel, thus allowing six compressed channels to replace the 16-bit PCM data streams on CD, LD and DAT.
The coder relies primarily on mechanisms which exploit objective redundancy in the signal to achieve bit-rate reduction, since these processes are the least destructive. It also has the ability to operate efficiently in a lossless mode for bit-for-bit reconstruction applications.
The decoding algorithm is fully compatible with future encoding improvements re-gardless of age, and a 5.1 channel version has been implemented on a single low cost DSP (IC). This processor is capable of operating at bit rates of up to 1.5 megabits per second. A key feature of the decoding algorithm is its almost constant computational intensity and storage requirements which are independent of bit rate. This has been achieved by making the complexity of the decoding algorithm inversely proportional to the bit rate. At high bit rates, the decoding algorithm uses shorter buffers and operates in a much less complex mode than at low bit rates. This allows a universal decoder processor to operate successfully over a wide range of bit rates.
DTS Coherent Acoustics is based on a uniform 32 band polyphase filter bank, with adaptive differential PCM coding operating on each of the subband signals. Differential coding within each subband extracts objective redundancy from the signal. The algorithm can operate on up to eight channels at a total bit rate from 32 kilobits per second to 4096 kilobits per second, depending on the available bit-rate and number of coded channels. A global bit allocation routine operates across all channels in frequency and time, based on a high frequency resolution psychoacoustic analysis of the signal in each channel. Transients are detected and isolated in the time and frequency domains.
DTS Coherent Acoustics will be launched commercially at the winter Consumer Electronics Show (CES Convention) in January 1996, with encoded software on LD's and CD's and decoding hardware available.
Conclusion
Audio coding techniques are here to stay, irrespective of any perceptible coding artifacts. Discreet multichannel audio for video will become available in the near future, and will increasingly become the preferred format for music. The advantages of multiple channels of albeit compressed audio are readily acknowledged by the home theatre owner, and will become obvious to the music listener.
However, consumers must be aware of the problems that are likely to surface as use of aggressive audio compression techniques proliferate to other storage and transmission media. While coders may sound "OK" in the short term, it takes longer periods of listening, with frequent reference to the original uncoded material, to really assess coding quality. Under these conditions most listeners will readily discern coding artifacts and become sensitized to them.
In order for consumers to gain confidence in compression technology the proponents of any compression schemes must be prepared to allow the compressed audio to be easily compared to the original. DTS Technology has taken this position in order to gain acceptance from the artistic community. Consumers are the real benefactors.
Mike Smyth and Stephen Smyth are principals in AlgoRhythmic Technology. Stephen Smyth is the designer of the DTS algorithm both for the DTS theatrical system and the distinctly different DTS Coherent Acoustics consumer/professional system. DTS Technology is a joint venture between Digital Theater Systems, AlgoRhythmic Technology, Steven Spielberg and Universal/ MCA.
Back To Articles Main Page
|