Digital Audio - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Digital Audio

Description:

Ear is approximately logarithmic in subjective response to increasing volume ... is subjectively different as the volume changes (the phon curves) bass becomes ... – PowerPoint PPT presentation

Number of Views:188
Avg rating:3.0/5.0
Slides: 18
Provided by: BC856
Category:
Tags: audio | digital | phon

less

Transcript and Presenter's Notes

Title: Digital Audio


1
Digital Audio
  • Introducing perceptual encoders
  • Using psychoacoustics

2
Principal features of the ear / brain response
  • Ear is approximately logarithmic in subjective
    response to increasing volume (the 3dB fader in
    audio control)
  • Response of the ear to a fixed audio spectral
    distribution (e.g an audio recording) is
    subjectively different as the volume changes (the
    phon curves) bass becomes more pronounced as
    volume increases (loudness controls) and
    richness of tone increases over about 60db
    loudness level (i.e. harmonics introduced by
    hearing system)

3
Principal features of the ear / brain response
part 2
  • Masking occurs in both time and frequency
  • in frequency, this modifies the effective
    threshold of hearing
  • in time this can modify the effective threshold
    of hearing before a loud signal arrives (cuts
    off the build up) and after the signal stopped.
    The before deaf period can be a few msec, the
    after deaf period can be upto 200msec
  • Equivalently, the ear behaves differently for
    long and short duration bursts i.e when
    compared to a few hundred msec.

4
Masking holds to key to psychoacoustic
codes..raise significantly, the effective
threshold of hearing
MAF- minimum audible field threshold of hearing
5
Even with simple MAF curves
  • Audible dynamic range is less at 50Hz than
    5kHz..
  • Maybe we could start then by splitting the audio
    spectrum into bands and quantising each one
    differently forget masking at this stage..

6
Sub band coding digital audio sampled at
48ksamples/sec
16x48000 768 kbps
4x3x16000 192 kbps (reduce bits Per sample)
16x3x48000 2304 kbps
Bit rates
16x3x16000 768 kbps (1/3 in each channel)
7
Reducing the bits per sample an example using
only the standard MAF
8
Bits needed in different parts of the spectrum
still no masking
80
Peak Signal Level
70
9 bits
9 bits
10 bits
10 bits
10 bits
9 bits
10 bits
11 bits
12 bits
11 bits
12 bits
12 bits
60
50
40
Sound Pressure Level dB-SPL
30
Threshold of Hearing
20
10
Frequency Hz
0
5000
10000
15000
-10
-20
-30
9
Using masking a psychoacoustic model..
Signal
Signal Noise (SNR 24 dB)
Noise
  • Signal suppresses the noise
  • Raises effective threshold of hearing to the
    masking threshold
  • Establish a model of the new threshold of hearing
    and use this to determine the resolution needed
    in a particular frequency band essence of MPEG
    audio codes (but at present ignore temporal
    masking)
  • Only use the dynamic range needed and make this
    adaptive

10
Single tone masking
11
Masking on a more complex signal
12
Other information needed in addition to the coded
audioframes
  • The code is prepared and processed in frames of a
    predetermined length
  • Each frame contains mostly coded audio, but in
    addition
  • The peak level in each frequency sub band
  • The masking level in each sub band
  • The number of bits in each sample in each sub
    band

13
What does the system look like??- encoding
compressed audio
Digital Audio In
Sub-band filter bank
Scale and Quantise
Multiplex and Data Format
Coded Audio Out
Masking thresholds
Code additional Info
Psycho-acoustic model
FFT
ENCODER
Everything is done in the digital domain the
analogue original has been digitised to high
quality before entering the coder
14
What does the system look like??- decoding
compressed audio
15
MPEG 1 digital audio standards
  • Three perceptual coders in the MPEG 1
    specification
  • Layers 1, 2 3
  • Layer 1 (.mp1)
  • Similar to the simple coder just described
  • 32 sub-bands are used
  • Each frame contains 384 samples (32 x 12) lasting
    about 8msec
  • A version of layer 1 was used in the Digital
    Compact Cassette (DCC)
  • Layer 2 (.mp2)
  • Slightly more complex but better quality than
    layer 1
  • Frame length increased to 1152 samples (32 x 36)
    lasting about 20msec

16
MPEG 1 digital audio standards - more
  • Layer 2 (continued)
  • Data formatting of samples and side information
    is slightly more efficient
  • Used in Digital Audio Broadcasting (DAB)
  • Layer 3 (.mp3)
  • Significantly more complex than layers 1 or 2
  • Capable of reasonable quality even at very low
    data rates
  • A combination of fixed sub-band coding and
    adaptive frequency transform coding is used to
    give up to 576 frequency bands (compared to 32
    for layers 1 2)
  • Uses signal statistics as well as signal waveform
    for coding
  • Huffman encoding is applied to samples (more on
    these to come..)
  • MP3 files most prevalent of compressed
    audio.mp3 players etc
  • Introduced late 1990s.

17
From wax discs to MP3.
  • Original analogue records (discs, audio tape)
    attempted to cope with all possible signals at
    all possible times
  • The CD (1980) did the same and included error
    correction (almost 1GByte/hour)
  • Perceptual encoders eliminate what isnt relevant
    to the listener and compress to 100MByte per hour
    (and less) using MP3 and related formats
Write a Comment
User Comments (0)
About PowerShow.com