Digital Audio presentation

About This Presentation

Transcript and Presenter's Notes

Title: Digital Audio

1
Digital Audio

Introducing perceptual encoders
Using psychoacoustics

2
Principal features of the ear / brain response

Ear is approximately logarithmic in subjective
response to increasing volume (the 3dB fader in
audio control)
Response of the ear to a fixed audio spectral
distribution (e.g an audio recording) is
subjectively different as the volume changes (the
phon curves) bass becomes more pronounced as
volume increases (loudness controls) and
richness of tone increases over about 60db
loudness level (i.e. harmonics introduced by
hearing system)

3
Principal features of the ear / brain response
part 2

Masking occurs in both time and frequency
in frequency, this modifies the effective
threshold of hearing
in time this can modify the effective threshold
of hearing before a loud signal arrives (cuts
off the build up) and after the signal stopped.
The before deaf period can be a few msec, the
after deaf period can be upto 200msec
Equivalently, the ear behaves differently for
long and short duration bursts i.e when
compared to a few hundred msec.

4
Masking holds to key to psychoacoustic
codes..raise significantly, the effective
threshold of hearing
MAF- minimum audible field threshold of hearing
5
Even with simple MAF curves

Audible dynamic range is less at 50Hz than
5kHz..
Maybe we could start then by splitting the audio
spectrum into bands and quantising each one
differently forget masking at this stage..

6
Sub band coding digital audio sampled at
48ksamples/sec
16x48000 768 kbps
4x3x16000 192 kbps (reduce bits Per sample)
16x3x48000 2304 kbps
Bit rates
16x3x16000 768 kbps (1/3 in each channel)
7
Reducing the bits per sample an example using
only the standard MAF
8
Bits needed in different parts of the spectrum
still no masking
80
Peak Signal Level
70
9 bits
9 bits
10 bits
10 bits
10 bits
9 bits
10 bits
11 bits
12 bits
11 bits
12 bits
12 bits
60
50
40
Sound Pressure Level dB-SPL
30
Threshold of Hearing
20
10
Frequency Hz
0
5000
10000
15000
-10
-20
-30
9
Using masking a psychoacoustic model..
Signal
Signal Noise (SNR 24 dB)
Noise

Signal suppresses the noise
Raises effective threshold of hearing to the
masking threshold
Establish a model of the new threshold of hearing
and use this to determine the resolution needed
in a particular frequency band essence of MPEG
audio codes (but at present ignore temporal
masking)
Only use the dynamic range needed and make this
adaptive

10
Single tone masking
11
Masking on a more complex signal
12
Other information needed in addition to the coded
audioframes

The code is prepared and processed in frames of a
predetermined length
Each frame contains mostly coded audio, but in
addition
The peak level in each frequency sub band
The masking level in each sub band
The number of bits in each sample in each sub
band

13
What does the system look like??- encoding
compressed audio
Digital Audio In
Sub-band filter bank
Scale and Quantise
Multiplex and Data Format
Coded Audio Out
Masking thresholds
Code additional Info
Psycho-acoustic model
FFT
ENCODER
Everything is done in the digital domain the
analogue original has been digitised to high
quality before entering the coder
14
What does the system look like??- decoding
compressed audio
15
MPEG 1 digital audio standards

Three perceptual coders in the MPEG 1
specification
Layers 1, 2 3
Layer 1 (.mp1)
Similar to the simple coder just described
32 sub-bands are used
Each frame contains 384 samples (32 x 12) lasting
about 8msec
A version of layer 1 was used in the Digital
Compact Cassette (DCC)
Layer 2 (.mp2)
Slightly more complex but better quality than
layer 1
Frame length increased to 1152 samples (32 x 36)
lasting about 20msec

16
MPEG 1 digital audio standards - more

Layer 2 (continued)
Data formatting of samples and side information
is slightly more efficient
Used in Digital Audio Broadcasting (DAB)
Layer 3 (.mp3)
Significantly more complex than layers 1 or 2
Capable of reasonable quality even at very low
data rates
A combination of fixed sub-band coding and
adaptive frequency transform coding is used to
give up to 576 frequency bands (compared to 32
for layers 1 2)
Uses signal statistics as well as signal waveform
for coding
Huffman encoding is applied to samples (more on
these to come..)
MP3 files most prevalent of compressed
audio.mp3 players etc
Introduced late 1990s.

17
From wax discs to MP3.

Original analogue records (discs, audio tape)
attempted to cope with all possible signals at
all possible times
The CD (1980) did the same and included error
correction (almost 1GByte/hour)
Perceptual encoders eliminate what isnt relevant
to the listener and compress to 100MByte per hour
(and less) using MP3 and related formats

Write a Comment

User Comments (0)

About PowerShow.com

Digital Audio PowerPoint PPT Presentation