Title: Media Representations Audio Fall 2005
1CMPT 365 Multimedia Systems
Media Representations- Audio Fall 2005
2Outline
- Audio Signals
- Sampling
- Quantization
- Audio file format
- WAV/MIDI
- Human auditory system
3What is Sound ?
- Sound is a wave phenomenon, involving molecules
of air being compressed and expanded under the
action of some physical device. - A speaker in an audio system vibrates back and
forth and produces a longitudinal pressure wave
that we perceive as sound. - Since sound is a pressure wave, it takes on
continuous values, as opposed to digitized ones. - If we wish to use a digital version of sound
waves, we must form digitized representations of
audio information.
4Digitization
- Digitization means conversion to a stream of
numbers, and preferably these numbers should be
integers for efficiency. - 1-dimensional nature of sound amplitude values
depend on a 1D variable, time.
5Digitization contd
- Digitization must be in both time and amplitude
- Sampling measuring the quantity we are
interested in, usually at evenly-spaced intervals - The first kind of sampling, using measurements
only at evenly spaced time intervals, is simply
called sampling. The rate at which it is
performed is called the sampling frequency - For audio, typical sampling rates are from 8 kHz
(8,000 samples per second) to 48 kHz. This range
is determined by Nyquist theorem discussed later. - Sampling in the amplitude or voltage dimension is
called quantization
6Sampling and Quantization
7Audio Digitization (PCM)
PCM Pulse coded modulation
8Parameters in Digitizing
- To decide how to digitize audio data we need to
answer the following questions - 1. What is the sampling rate?
- 2. How finely is the data to be quantized, and
is quantization uniform? - 3. How is audio data formatted? (file format)
9Sampling Rate
- Signals can be decomposed into a sum of
sinusoids. - -- weighted sinusoids can build up quite
a complex signals
10Sampling Rate contd
- If sampling rate just equals the actual frequency
- a false signal (constant ) is detected
- If sample at 1.5 times the actual frequency
- an incorrect (alias) frequency that is lower than
the correct one - it is half the correct one -- the wavelength,
from peak to peak, is double that of the actual
signal
11Nyquist Theorem
- For correct sampling we must use a sampling rate
equal to at least twice the maximum frequency
content in the signal. This rate is called the
Nyquist rate. - Sampling theory Nyquist theorem
- If a signal is band-limited, i.e., there
is a lower limit f1 and an upper limit f2 of
frequency components in the signal, then the
sampling rate should be at least 2(f2 - f1).
12Quantization (Pulse Code Modulation)
- At every time interval the sound is converted to
a digital equivalent - Using 2 bits the following sound can be digitized
- Tel 8 bits
- CD 16 bits
13Digitize audio
- Each sample quantized, i.e., rounded
- e.g., 28256 possible quantized values
- Each quantized value represented by bits
- 8 bits for 256 values
- Example 8,000 samples/sec, 256 quantized values
-- 64,000 bps - Receiver converts it back to analog signal
- some quality reduction
- Example rates
- CD 1.411 Mbps
- MP3 96, 128, 160 kbps
- Internet telephony 5.3 - 13 kbps
14Audio Quality vs. Data Rate
15More on Quantization
- Quantization is lossy
- Roundoff errors quantization noise/error
16Quantization Noise
- Quantization noise the difference between the
actual value of the analog signal, for the
particular sampling time, and the nearest
quantization interval value. - At most, this error can be as much as half of the
interval. - The quality of the quantization is characterized
by the Signal to Quantization Noise Ratio (SQNR). - A special case of SNR (Signal to Noise Ratio)
17Signal to Noise Ratio (SNR)
- Signal to Noise Ratio (SNR) the ratio of the
power of the correct signal and the noise - A common measure of the quality of the signal.
- SNR is usually measured in decibels (dB), where 1
dB is a tenth of a bel. The SNR value, in units
of dB, is definened in terms of base-10
logarithms of squared voltages, as follows
18Signal to Noise Ratio (SNR) contd
- The actual power in a signal is proportional to
the square of the voltage. For example, if the
signal voltage Vsignal is 10 times the noise,
then the SNR is 20 log10(10)20dB. - In terms of power, if the power from ten violins
is ten times that from one violin playing, then
the ratio of power is 10dB, or 1B.
19Common sound levels
20Quantization Noise Ratio (SQNR) Revisit
- For a quantization accuracy of N bits per sample,
the peak SQNR can be simply expressed - 6.02N is the worst case.
- If the input signal is sinusoidal, the
quantization error is statistically independent,
and its magnitude is uniformly distributed
between 0 and half of the interval, then it can
be shown that the expression for the SQNR
becomes
Derive it by yourself !
21Outline
- Audio Signals
- Sampling
- Quantization
- Audio file format
- WAV/MIDI
- Human auditory system
22Audio File Format .WAV
- Microsoft format Interleaved multi-channel
samples
http//ccrma.stanford.edu/courses/422/projects/Wav
eFormat/
23Example
Create this figure in Matlab x
wavread(horn.wav) plot(x(, 1)) plot(x(400010
000, 1))
Note Wavread() normalizes the Samples to the
range of -1, 1.
24Audio File Format MIDI
- MIDI Musical Instrument Digital Interface
- A simple scripting language and hardware setup
- MIDI Overview
- MIDI codes events" that stand for the production
of sounds. E.g., a MIDI event might include
values for the pitch of a single note, its
duration, and its volume. - MIDI is a standard adopted by the electronic
music industry for controlling devices, such as
synthesizers and sound cards, that produce music. - Supported by most sound cards
25Outline
- Audio Signals
- Sampling
- Quantization
- Audio file format
- WAV/MIDI
- Human auditory system
26Computer vs. Ear
- Multimedia signals are interpreted by humans!
- Need to understand human perception
- Almost all original multimedia signals are analog
signals - A/D conversion is needed for computer processing
27Properties of Human Auditory System
- Range of human hearing 20Hz - 20kHz
- ? Minimal sampling rate for music 40 kHz
(Nyquist frequency) - CD Audio
- 44.1 kHz sampling rate
- each sample is represented by a 16-bit signed
integer - 2 channels are used to create stereo system
- 44100 16 2 1,411,200 bits / second (bps)
- Speech signal 300 Hz 4 KHz
- ? Minimum sampling rate is 8 KHz (as in telephone
system)
28Properties of Human Auditory System
- Hearing threshold varies dramatically at
different frequencies - Most sensitive around 2KHz
29Properties of Human Auditory System
- Critical Bands
- Our brains perceive the sounds through 25
distinct critical bands, the bandwidth grows
logarithmically with frequency. - At 100Hz, the bandwidth is about 160Hz
- At 10kHz it is about 2.5kHz in width.
1 2 3 4 5 6
24
25
frequency
30Properties of Human Auditory System
- Masking effect
- what we hear depends on what audio environment we
are in - One strong signal can overwhelm/ hide another
The masking effects in the frequency domain A
masker inhibits perception of coexisting signals
below the masking threshold.
http//beradio.com/mag/radio_perceptual_audio_enco
ding/
31Properties of Human Auditory System
- Masking thresholds in the time domain
Simultaneous masking Two sounds occur
simultaneously and one is masked by
the other.
Forward masking (Post) softer sounds that occur
as much as 200 milliseconds after the loud sound
will also be masked.
Backward masking (Pre) A softer sound that
occurs prior to a loud one will be masked by
the louder sound.
32HAS Audio Filtering
- Prior to sampling and AD conversion, the audio
signal is also usually filtered to remove
unwanted frequencies. - For speech, typically from 50Hz to 10kHz is
retained, and other frequencies are blocked by
the use of a band-pass filter that screens out
lower and higher frequencies - An audio music signal will typically contain from
about 20Hz up to 20kHz - At the DA converter end, high frequencies may
reappear in the output (Why ?) - because of sampling and then quantization, smooth
input signal is replaced by a series of step
functions containing all possible frequencies - So at the decoder side, a lowpass filter is used
after the DA circuit
33HAS Perceptual audio coding
- The HAS properties can be exploited in audio
coding - Different quantizations for different critical
bands - Subband coding
- If you cant hear the sound, dont encode it
- Discard weaker signal if a stronger one exists in
the same band (frequency-domain masking) - Discard soft sound after a loud sound
(time-domain masking) - Stereo redundancy At low frequencies, we cant
detect where the sound is coming from. Encode it
mono. - More on later (MP3, APE)
34Further Exploration
- Links for Chapter 6 in Further Exploration of
the textbook page - An extensive list of audio file formats.
- CD audio file formats are somewhat different. The
main music format is called red book audio. A
good description of various CD formats is on the
website. - A General MIDI Instrument Patch Map, along with a
General MIDI Percussion Key Map. - A link to good tutorial on MIDI and wave table
music synthesis. - A link to a java program for decoding MIDI
streams. - A good multimedia/sound page, including a source
for locating Internet sound/music materials.