Disc Search

Artist Search:

Advanced Search

Published Articles - (Issue 14 of Widescreen Review)

Back To Articles Main Page

DTS Coherent Acoustics® The Future Of Audio Part One: Setting The Scene
By Mike Smyth And Stephen Smyth

DTS Technology
This is the first article in an exclusive three-part series on the technology of DTS Coherent Acoustics®. DTS Coherent Acoustics is a variable and high bit rate solution to the delivery of discrete 5.1 multichannel digital audio in consumer and professional applications such as LaserDisc, DVD, CD, DAT, Digital VCR and HD platforms. DTS Coherent Acoustics is a competing codec to that developed by Dolby Laboratories and marketed as AC-3®, which we have covered extensively, and continue to do so, during its pre-introduction period last year. DTS Coherent Acoustics had its first trade and consumer introduction at the Stereophile High End Hi-Fi '95 Consumer Show in Los Angeles. As has been the norm, Widescreen Review presents this informative series of technology articles so that our readers can be well informed on the critical standards issues impacting the future of audio. - Gary Reber, Editor

Introduction
This series of articles is concerned with digital audio data reduction. The objectives are threefold: to explain some of the fundamentals of the generic technology; to describe in more detail particular algorithms (the series of instructions) currently in use, listing some of the advantages of each; to give general guidelines to listeners who wish to evaluate the fidelity of the most common audio algorithms that been introduced.

Since many of the consumer applications of digital audio data reduction involve the reproduction of pre-recorded music and motion picture soundtracks, this article presents some basic background information on the transmission of sound, the measurement, recording and reproduction of analog and numerical (more commonly called digital) representations of sound, and some of the limitations of each recording format. These recording formats are also briefly contrasted with the human perception of sound, or psychoacoustics.

With this background the fundamental reasons for using audio compression algorithms become fairly clear. Uncompressed CD digital audio requires both a lot of memory for storage and a lot of bandwidth for transmission, which will only increase with new ultra HiFi standards, and multichannel "surround" sound. A good choice of an audio compression algorithm should provide early access for consumers into these new audiophile formats using existing digital audio and home theatre equipment. Coherent Acoustics® is an audio processing framework developed by DTS Technology which will allow the economical delivery of all existing and proposed audio formats to the home.

The next article in the series (scheduled for Issue 15) begins with a brief examination of lossless audio compression in order to more clearly distinguish this technique from those of the lossy audio compression systems which predominate in today's market. The remainder of that article will show why digital audio data compression is possible, and give indications as to what its limitations are in terms of fidelity versus compression ratio.

The third and last article will delve into the specific algorithmic tools used to compress audio, and offer some advice on listening tests that should enable readers to meaningfully evaluate the various audio compression techniques competing in the commercial arena. Some of the more widespread consumer algorithms are examined in greater depth, and finally future consumer applications are discussed.

Getting The Jargon Down
It is quite easy to get confused with the technical jargon, especially when discussing audio data, since many of the commonly used terms have different meanings in different industries. However most of the ambiguities are easily resolved by the context in which they are used.

This series of articles is ultimately concerned with techniques which operate on the PCM representations of high quality audio (the industry standard method of digitizing audio), in order to reduce the amount of digital data. The most popular technical term for this process is "digital audio data reduction." For brevity this has often been shortened to "audio compression" or simply "compression." In this series of articles "digital audio data reduction" and "compression" are interchangeable, even though technically they are distinct.

To reiterate, digital audio data reduction is a technique for reducing the amount of data traditionally required to represent a digital audio signal. In most applications the reduction is "lossy" in that certain information associated with the original PCM digital samples is lost during the data compression process and can never be recovered. Well designed algorithms try to ensure that the discarded information is inaudible to humans and hence may be considered as irrelevant.

Audio data compression may also be used in a "lossless" reduction process whereby all of the audio data or information is fully recovered (i.e. not lost) following a compression/expansion cycle. This style of compression is commonly used for storing computer data on PC hard drives, where the absolute integrity of the data is vital. At first glance lossless audio compression appears to be the panacea for audiophiles, promising a reduction in data without any information loss. However, as will become more clear in the second article, the degree of data re-duction in such processes will vary with the information content of the audio signal, and therefore for many applications lossless audio compression is currently not deemed practical.

The term compression/expansion also has a very long history of use in analog audio systems, and refers to a decrease/increase in the dynamic range of an audio signal using variable gain amplifiers or their equivalent. Digital audio signals can be similarly dynamically compressed and expanded in an exactly equivalent manner. This is more accurately referred to as "dynamic range compression" and should not be confused with the process of digital audio data compression.

The Audio Chain
In order to discuss audio compression it is useful to summarize the fundamentals behind the processes of generation, recording, reproduction and human perception of sound. On examination this is found to be primarily an analog chain with the exception of the recording link, which has recently adopted digital PCM as a means of im-proving the quality of the recorded sounds. Digital techniques are also gaining ground as a means of manipulating sounds for mixing and editing purposes, and these must be accommodated to some degree by any audio compression algorithms used. However, since the recording of sound is the primary location in the chain where audio compression is used, this article concentrates almost exclusively on this particular application. This also recognizes the commercial importance of the playback of audio in the home from pre-recorded media, where audio compression will play a major role.

The Generation And Propagation Of Sound
Sound is generated by a vibrating body that is in contact with a transmission medium. The medium is either solid, liquid or gaseous, and will normally be air. In all cases the sound travels through interactions be-tween the individual particles that make up the media. This interaction can be viewed as an elastic collision between neighboring particles, similar to that which occurs be-tween the balls on a pool table. This simple collision model can be used to explain many of the effects that are commonly noticed with the transmission of sound. These observations include the increase in the speed of sound moving between air and water, the increase in the speed of sound with air temperature and pressure, and the reflection and absorption of sound at surfaces.

The model is still somewhat complicated. An even simpler version operates in just one dimension (as opposed to three dimensions in the real world), and assumes that the air particles are held semi-rigidly. A "slinky" is a good example of this model. As the sound wave moves from left to right each particle moves first to the right and then to the left. This results in periodic compression and rarefaction of the particles, which can be considered as periodic variations in the density of the particles that make up the media. For air this density measure is more commonly referred to as pressure. [Figure 1]


In the real world of course the molecules that make up air are in constant random motion in all three dimensions, and are not held semi-rigidly with respect to each other. In other words the air pressure at any point is constantly fluctuating as molecules collide with and rebound from each other. Therefore there is also a background level of "noise" in air caused by this random motion which may be considered to define the lower limit of the sensitivity of microphones, and potentially even the human hearing mechanism itself

Electrical Sound
For sound traveling in air the above discussed model indicates that measuring the air pressure at any point would allow sound waves to be detected as they passed through the point. This is the basis of the operation of the microphone. To record sound, microphones convert changes in air pressure into an electrical signal by deploying a light diaphragm connected to a electromagnetic transducer. The diaphragm moves in sympathy with the oncoming compression and rarefactions, and hence induces an electrical current. This signal can be considered an electrical representation of the changing air pressure over time as measured by the microphone, and may be stored with either an analog or digital recorder. [Figure 1]

Electro-Mechanical Reproduction Of Sound
Electrical "sound" signals can be converted back into air pressure fluctuations by using the microphone principle in reverse. A fluctuating current, usually a heavily amplified version of the signal derived from the microphone's own transducer, is fed to the loudspeaker coil. This current induces a push-pull motion on the cone which in turn causes air compressions and rarefactions to radiate from the cone surface in sympathy with the variations of the electrical current. Loudspeaker designs can vary for ex-ample by replacing the cone with a dia-phragm which moves between two highly charged electrodes - as with the electrostatic loudspeaker. However, the principle of forcing air to move in sympathy with a continuously varying electrical signal is common to all popular loudspeakers.

Recording Sounds
An Analog Recorder - As its name implies an analog recording of a signal is one which tries to be analogous or similar to the signal itself. In the case of sound the easiest measurement is that of air pressure. Microphones convert the continuous air pressure changes, which result from the passage of sound, into time varying electrical signals which are then faithfully recorded. Hence these electrical recordings are an "analog" of the classical sound wave traveling through air. A unique feature of analog recordings is the infinite number of one-to-one correspondences between the recorded version of the signal and the signal itself, reflecting the real world situation in which both air pressure and time are continuous parameters. [Figure 1]

A Numerical or Digital Audio Recorder - A generalized numerical recorder tries to periodically record discrete numerical values of the measured signal as opposed to a continuous recording of the pressure changes. [Figure 1]


To begin with, the audio signal must be input to the recorder as a time varying electrical signal, typically a fluctuating voltage, where the amplitude and polarity relate to the instantaneous air pressure at a point source. The amplitude of this signal is measured or sampled at a rate at least twice that of the maximum frequency contained in the audio signal (the Nyquist rate), typically 44.1 kHz for Compact Discs or 48 kHz for professional audio applications. Each sample is given a binary number to represent the measured voltage at that instant in time. In digital audio recorders it is this binary voltage measurement or number which is stored rather than the actual electrical signal. Obviously this representation of audio is useless for human consumption and must be converted back to a continuous time varying "analog" electrical signal prior to listening.

The most popular method of digitizing the analog signal is called linear pulse code modulation, or linear PCM and forms the basis of most audio analog-to-digital converters (ADCs) and digital-to-analog converters (DACs) in the market today.

The performance of an ideal digital audio recorder is governed by just two parameters, the sampling rate and the sample resolution. The sampling rate defines the maximum audio frequency that can be recorded, and the sample resolution defines the accuracy of the recorded voltage fluctuations. There is clearly a finite number of one-to-one correspondences between the numerical recording of the analog audio signal and the signal itself. Any discrepancy between the original signal and the recorded number either in time or magnitude is an error that cannot be removed.

For example consider the binary number assigned to each voltage measurement in the PCM process. The resolution, or accuracy, of the measurement is limited by the size of the binary word. Typically a 16-bit number is used giving a number range between +/- 32767. For a 1 volt peak input analog signal a properly scaled 16-bit number can resolve to only 1/32767 of a volt. Because the 16-bit number cannot represent voltages which lie between these 1/32767 steps the measurement is effectively truncated or rounded, i.e. the voltage is now discrete. This loss in precision manifests itself as a slight increase in background noise during playback. Generally speaking, smaller binary numbers generate higher levels of noise or distortion, whereas large binary word lengths, i.e. in the order of 21 to 22 bits ex-hibit such low noise floors that the dynamic range of these systems can actually ex-ceed that of the human ear. Clearly then, the finite size of the binary number used for the PCM representation determines the degree of information lost during the conversion process which in turn determines the fidelity of the reproduced audio.

Consider also the rate at which the voltage fluctuations are sampled or measured during the PCM process. The sampling frequency must be twice the maximum frequency which one wishes to capture from the original analog signal. As mentioned earlier, the Compact Disc samples at 44.1 kHz, giving a theoretical bandwidth of 22.05 kHz for the reproduced audio signal. Since the upper frequency limit of human auditory perception approaches 20 kHz, clearly the sampling rate of PCM must as a minimum exceed 40 kHz in order not to limit the clarity of the signal. The finite nature therefore of the PCM sample, or measurement points in time, causes information originally present in the analog signal, whose frequency is greater than half that of the sampling rate, to be lost forever.

The digitization of an analog signal using linear PCM is of fundamental importance to most digital audio data reduction systems, since these numerical values form the raw data on which compression systems operate. As a result the success or failure of compression is measured by comparing their re-production fidelity against that of the original PCM digital audio signal.

Operational Similarities And Differences
Analog and digital audio recorders are essentially similar in that both can record a good representation of electrical audio signals. Digital machines are on the ascendancy due to their superior performance at low costs, and this trend is expected to continue with the introduction of multichannel 20-bit resolution recorders.

The following are some of the salient points regarding the performance of analog and digital PCM audio recorders.

Both analog and digital machines are de-pendent on having an accurate and stable method of reproducing time, both in recording and playing. The analog system has a mechanical motor, defects in which may cause wow and flutter; the digital system uses a crystal generated electronic clock, defects in which may cause wideband noise modulation.

Analog recorders have a wideband noise floor that for low levels is independent of the signal. A digital recording exhibits noise correlation at low levels where the signal may become trapped between two binary numbers. Fortunately this problem can be completely alleviated by adding appropriate dither (or random noise) to the signal.

At high signal levels analog recording media normally saturates in a non-linear but audibly innocuous manner. Engineers can operate analog recorders at high signal levels knowing that occasional excursions into the "red" will not be heard. On the other hand digital recorders have a zero tolerance to any signal that exceeds the "clipping" level, which creates large amounts of very audible distortion. For this reason digital recorders must be set such that the peak input signal never exceeds the clip level, forcing engineers to operate with sub-optimum resolutions far from the peak. For example a 16-bit recorder may normally only be operating using 12 or 13 bits of resolution, at which point its signal to noise ratio is similar to a conventional analog recorder. This is a powerful reason for moving to at least 20-bit resolution recorders.



Hearing Sounds
The ability of humans to detect sound is determined by both the ear and the brain. Hearing can be considered as an interaction between hardware and software, the hardware being the ear, the software running in the brain. A common term used to describe this ear/brain interaction is psycho-acoustics, and is a term used extensively in audio compression literature.

Essentially the ear converts the movement of air molecules, which make up the sound waves, into electrical impulses. However, unlike normal mechanical microphones, the electrical impulses are in the frequency domain. The ear thus acts as an electro-mechanical filter bank in that the time domain sound waves are transformed into electrical signals of many different frequencies. Humans "hear" frequencies, microphones "record" pressure changes. The ear also appears to be very imperfect at filtering, (in comparison to modern digital filters), introducing level dependent harmonic distortion products even when stimulated with pure tones.

The actual "software" processing by the brain of these frequency related electrical signals from the ear "hardware" is not well understood. Nevertheless an important aspect of audio compression algorithms is that they try to mimic some of the basic "psycho-acoustic" hearing processes. These include the variable threshold of hearing with frequency [Figure 2], differential perception of signal levels, the masking of quiet low level signals by louder higher level signals, and critical band noise masking.

It is important to realize that the representation of sound as a sum of frequencies is just as valid as the more usual representation of sound as a linear series of pressure measurements at a single point, and in many cases may be more useful. The new audio compression algorithms typically operate in the frequency domain, and are thus more easily identified with the process of hearing sound as opposed to the process of either transmitting or recording sound. [Figure 3]

Psychoacoustics And PCM Digital Audio
One fundamental drawback in current PCM digital audio recorders is that all frequencies are recorded with equal importance. This is a direct consequence of the fact that PCM operates in the time domain, i.e. the binary numbers represent a time-series of sampled voltages. The effect of this is that in the frequency domain all components of the time-series must have the same resolution. For example in theory the signal-to-noise ratio (SNR or S/N) of PCM digital audio at 20 kHz is identical to the SNR at 100 Hz. This may be relevant if the purpose of the recording is to archive a sound for future processing, or if the audio needs to be easily manipulated.

However, for consumer playback media it ignores the frequency dependent sensitivity of human hearing, and typically gives far too much weight to high frequencies (above 12 kHz) in comparison to low frequencies (below 4 kHz). For example, during reproduction it is not necessary for a 20 kHz harmonic to have a resolution of 16 bits, but it may be important that a 3 kHz fundamental tone have a resolution greater than 16 bits.

This problem can be viewed alternatively in terms of a concept known as rate-distortion. This theory attempts to calculate the amount of data needed to represent a signal for a pre-determined level of distortion. This theory, when coupled to psychoacoustics, shows that PCM operates at data rates far in excess of that required to maintain in-audible levels of distortion. As mentioned above the main reason for this mismatch is that PCM digital audio operates in the time domain whereas humans perceive distortion in the frequency domain. More importantly this mismatch opens the door to the development of more efficient representations of digital audio which model more closely the way we hear, in order to optimize the distribution of data across the audio frequency components. The net effect of this approach is both to reduce the overall data rate compared to 16-bit PCM, and to improve the quality of the audio at those frequencies to which the ear is most sensitive.

For example, using the Coherent Acoustics algorithm developed by DTS Technology, it is now possible to replace the two channel 16-bit PCM tracks on a Compact Disc with six discrete channels of 20-bit source encoded compressed audio data, any single channel of which is measurably and audibly superior to either of the 16-bit PCM tracks. This is a clear example of the power of the rate-distortion theory.

Digital Audio And Its Real World Limitations
Nevertheless, while rate-distortion may point to inefficiencies in PCM digital audio for storage and consumer delivery, the redundancies within PCM do allow for the convenient and easy manipulation of the audio signals. This would include editing, equalization, flanging and mixing. None of these processes are straightforward operations if the digital audio is in a compressed format.

Today's digital audio recorders exhibit obvious advantages over their analog counterparts. However at 16-bit resolution the performance of a digital recorder is only marginally better than that of an analog machine, when account is taken of all aspects of the media. For example operational difficulties often re-quire a digital system to work well below its full dynamic range. Furthermore the current standard sampling frequencies are perhaps the bare minimum that should be used to re-produce the full frequency resolution of the human auditory system. For reasons relating to the band limiting required prior to conversion to digital in the PCM process, it is generally felt that the overall sound quality of digital audio could be improved if the sampling rates were increased.

For these reasons recent professional digital audio recorders now operate at up to 20-bit resolution, and proposals are under review to increase the sampling rate from 48 kHz to 96 kHz, (and possibly 192 kHz), with up to a 24-bit sample resolution, a specification which might end the debate of analog versus digital for the recording of high quality audio. [Figure 4] All of this places even greater demands on equipment that handles the storage and reproduction of PCM digital audio signals, while simultaneously enhancing the position of high quality compression systems such as DTS Coherent Acoustics which can deliver these signals to the consumer.

Conclusion
From this first article it should be clear as to the reasons for the coexistence of digital and analog recording and playback systems. Analog, while under threat as the traditional recording method, still cannot be circumnavigated. Analog is the means by which sound is generated and heard in the world today and is unlikely to change in the near future. As a recording media, analog has some limitations, primarily with susceptibility, poor repeatability, and more acutely, poor signal-to-noise performance. A dramatic improvement in any of these areas could easily revive analog as the preferred recording format.

Digital on the other hand is by definition repeatable since it is simply a string of numerical voltage measurements. Such numbers cannot age, and since no timing information is conveyed in the numbers, they can be repeatedly stored or transmitted at will, without fear of degradation. This is in acute contrast to analog systems where every generational copy introduces more and more distortion and noise.

Digital audio, however, does have one large disadvantage compared to analog. As a result of its incompatibility with the processes by which the human ear perceives distortion, PCM produces vast amounts of irrelevant data during the conversion to digital. This leads to certain difficulties when it is necessary to store or transmit the digital PCM data. High quality analog music signals have been broadcast for 20 years over FM radio, but there is still no equivalent digital music broadcasting system. Furthermore no one is considering using linear PCM for broadcasting music since it is considered very wasteful of limited broadcasting bandwidth resources.

A similar conclusion can be reached in the reproduction of music. When compared to the original analog signal, conventional consumer PCM systems require a bandwidth over 20 times greater than that required by analog systems for reproduction, rising to anywhere from 30 to 60 times for the proposed higher performance formats.

The limitations on digital audio imposed by the PCM standard have restricted the proliferation of this format much beyond CD's. The new ultra hi-fi standards and discrete multichannel formats will be very expensive for the consumer if the PCM standard is adopted. It is our contention that by a suitable choice of audio compression algorithm, such as DTS Coherent Acoustics, digital audio could become more widely available, and that these new and potentially exciting audio formats could be economically delivered to the consumer using existing technology. On the other hand many audiophiles would contend that audio compression by definition imposes limitations on the fidelity of the audio that should not be entertained. The next article will try to allay these fears by explaining the steps involved in compressing audio, with an assessment of the likely loss of information in the process.

Mike Smyth and Stephen Smyth are principals in AlgoRhythmic Technology. Stephen Smyth is the designer of the DTS algorithm both for the DTS theatrical system and the distinctly different DTS Coherent Acoustics consumer/professional system. DTS Technology is a joint venture between Digital Theater Systems, AlgoRhythmic Technology, Steven Spielberg and Universal/ MCA.

Back To Articles Main Page

Home
Disc Reviews
Recent News
About SurroundMusic.net
What Is Surround Music
Contact Us

Top Of Page

Widescreen Review® Magazine
27645 Commerce Center Drive
Temecula, CA 92590
Phone: 951 676 4914 • Fax: 951 693 2960

Copyright © 1995 - 2005 www.WidescreenReview.com
All Rights Reserved