Audible signals such as speech and music are acoustic analog waveforms, pressure changes that propagate through a medium such as air or water. The waveforms are created by a vibrating source such as a loudspeaker or musical instrument and detected by a receptor such as a microphone diaphragm or eardrum. An example of a simple waveform is a pure tone, a periodic signal that repeats many times per second. The number of repetitions per second is its frequency and is measured in Hertz (Hz). Audible tones are typically in a range from 20 to 20,000 Hz, which is referred to as the bandwidth of audible signals. The tone will create a sound pressure displacement that is related to its amplitude.
Speech and Audio Quality Assessment
The quality assessment of loss speech and audio compression algorithms is a complicated issue. Quite often we want to assess not only the quality of the compression algorithms, but also the quality delivered by these algorithms in a typical operating scenario, which means including other aspects of the delivery chain as well, such as the quality of the network, or the quality and type of playback equipment. Because the coders use perceptual techniques, it is important to use human listeners. Even in this case care has to be taken to get reliable and reproducible results.
Speech Coding Standards
For communication purposes, it is important to establish standards to guarantee interoperability between equipment from different vendors, or between telecommunication services in different geographic areas. Telecommunication standards are set by different standard bodies, which typically govern different fields of use.
Audio coding techniques
The most common high-quality audio distribution format is based on the compact disc format introduced in the early 1980s. The signal is encoded using PCM with 16 bits/sample using a 44.1-kHz sampling rate. For stereo signals this means a data rate of 44,100 × 16 × 2 = 1.41 Mb/s. More recent formats such as DVD-audio support up to 24 bits/sample, multichannel formats (e.g., 5.1), and sampling rates up to 192 kHz, resulting in even higher data rates. For most practical purposes these signals will be used as digitized source signals.
Two popular Internet applications of speech and audio compression are telephony and streaming. In both cases, the IP data network is used to transport digitized (and compressed) audio signals. The basic protocol is the IP (Internet protocol), which is used to set up connections between machines and sessions between applications. For most data applications the next protocol layer would be TCP (transmission control protocol), which guarantees a reliable connection.
It should be clear that especially for the lower bit rates, there is no single coder that is good for all applications, and that careful tailoring toward the application is important. Speech and audio coding is a mature field, with many available solutions. Based on our current knowledge and the various constraints that exist, it is expected that future developments in this field will focus less on compression efficiency and more on application-specific issues, such as scalability, error robustness, delay, and complexity