Title: Basic Features of Audio Signals (???????)
1Basic Features of Audio Signals(???????)
- Jyh-Shing Roger Jang (???)
- http//mirlab.org/jang
- MIR Lab, CSIE Dept
- National Taiwan Univ., Taiwan
2Audio Features
- Four commonly used audio features
- Volume, pitch, timbre, zero crossing rate
- Our goal
- These features can be perceived (more or less)
subjectively. - Our goal is to compute them quantitatively (and
objectively) for further processing and
recognition.
3General Steps for Audio Analysis
- Frame blocking
- Frame duration of 2040 ms or so
- Frame-based feature extraction
- Volume, zero-crossing rate, pitch, MFCC, etc
- Frame-based Analysis
- Pitch vector for QBSH comparison
- MFCC for HMM evaluation
4Frame Blocking
Overlap
Quiz candidate!
Zoom in
Sample rate 16 kHz Frame size 512
samples Frame duration 512/16000 0.032 s 32
ms Overlap 192 samples Hop size frame size
overlap 512-192 320 samples Frame rate
16000/320 50 frames/sec
Frame
5Audio Features in Time Domain
- 3 of the most prominent time-domain audio
features in a frame (also known as analysis
window)
Quiz candidate!
Fundamental period
Intensity
Timbre Waveform within an FP
6Audio Features in Frequency Domain
- Frequency-domain audio features in a frame
- Energy Sum of power spectrum
- Pitch Distance between harmonics
- Timbre Smoothed spectrum
-
Second formant F2
First formant F1
Pitch freq
Energy
7Frame-based Manipulation
- For simplicity, we usually pack frames into a
matrix for easy manipulation in MATLAB - y, fs audioread(file.wav)
- frameMat enframe(y, frameSize, overlap)
frameMat
Frame 1
Frame 2
Frame n
8Introduction to Volume
- Loudness of audio signals
- Visual cue Amplitude of vibration
- Also known as energy or intensity
- Two major ways of computing volume
- Volume
- Log energy (in decibel)
Quiz candidate!
9Volume Perceived and Computed
- Perceived volume is influenced by
- Frequency (example shown later)
- Timbre (example shown later)
- Computed volume is influenced by
- Microphone types
- Microphone setups
10Volume Computation
- To avoid DC bias (or DC drifting)
- DC bias The vibration is not around zero
- Computation
- Volume
- Log energy (in decibel)
- Theoretical background (How to prove them?)
Quiz candidate!
11Examples of Volume
- Functions for computing volume
- Example volume01
- Example volume02
- Example volume03
- Volume depends on
- Frequency
- Equal loudness test
- Timbre
- Example volume04
12Zero Crossing Rate
- Zero crossing rate (ZCR)
- The number of zero crossing in a frame.
- Characteristics
- ZCR is higher for noise and unvoiced sounds,
lower for voiced sounds. - Zero-justification is required before computing
ZCR.
- Usage
- For endpoint detection, especially in detection
the start and end of unvoiced sounds. - To distinguish noise from unvoiced sound, usually
we add a shift before computing ZCR.
Quiz candidate!
13ZCR Computations
- Two types of ZCR definitions
- If a sample with zero value is considered a case
of ZCR, then the value of ZCR is higher.
Otherwise its lower. - The distinction diminishes when using a higher
bit resolution.
- Other consideration
- ZCR with shift can be used to distinguish between
unvoiced sounds and silence. - But it is hard to set up the right shift amount.
14Examples of ZCR
- ZCR computing
- Example zcr01
- Example zcr02
- To use ZCR to distinguish between unvoiced sounds
and environmental noise - Example Example zcrWithShift
15Pitch
- Definition
- Pitch is also known as fundamental frequency,
which is equal to the no. of fundamental period
within a second. The unit used here is Hertz
(Hz).
- Unit
- More commonly, pitch is in terms of semitone,
which can be converted from pitch in Hertz
Quiz candidate!
Piano roll via HTML5
16Pitch Computation for Tuning Forks
- Pitch of tuning forks (code)
Quiz candidate!
17Pitch Computation for Speech
Quiz candidate!
18Tones in Mandarin Chinese
- Some statistics about Mandarin Chinese
- 5401 characters, each character is at least
associated with a base syllable and a tone - 411 base syllables, and most syllables have 4
tones, so we have 1501 tonal syllables
- Syllables with 3 or less tones
- ??????????
- More examples
- 1234??????????????
- ????????????????(Taiwanese)
- Tone sandhi????
19Features Related to Tones
- Tone is characterized by the pitch curves
- Tone 1 high-high
- Tone 2 low-high
- Tone 3 high-low-high
- Tone 4 high-low
- (Put you hand on your throat and you can feel
it) - Tone recognition is mostly based on features
obtained from pitch and volume
Quiz candidate!
20Tones in Mandarin TTS
- TTS Text to speech (demo)
- Tone Sandhi phonological change occurring
in tonal language - 33 ? 23
- ??????????????????
- ?
- ????? vs. ?????
- ?
- ???????? vs. ?????????
21Mandarin Tone Practice
22Sentences of All Tone 3
- Tone Sandhi of 33
- ???????????
- ????????????????
- ???????????????
- ???????,????
- ?????????????
- ?????,???????
- ?????,??????
- ??????,?????
Quiz candidate!
23Pitch Change due to Fast Forward
- If audio is played at a higher sample rate
- Pitch is higher
- Duration is shorter
- Pitch change due to sample rate change at
playback - Sample rate fs ? kfs (at playback)
- Duration d ? d/k
- Fundamental frequency ff ? kff
- Pitch pitch ? pitch12log2(k)
Quiz candidate!
24Pitch Perception
- Age-related hearing loss
- As one grows old, the audible frequency bandwidth
is getting narrower - Mosquito ringtone
- Low to high, high to low
- Applications
21k
17.4k
15k
12k
8k
25Other Things about Pitch
- Some interesting phenomena about pitch
- Beat
- Doppler effect
- Shepard tone
- An auditory illusion of a tone that continually
ascends or descends in pitch - Overtone singing
Quiz candidate!
Quiz candidate!
How to create these effects in MATLAB?
26Timbre
- Timbre is represented by
- Waveform within a fundamental period
- Frame-based energy distribution over frequencies
- Power spectrum (over a single frame)
- Spectrogram (over many frames)
- Frame-based MFCC (mel-frequency cepstral
coefficients)
27Timbre DemoReal-time Spectrogram
- Simulink model for real-time display of
spectrogram - dspstfft_audio (Before MATLAB R2011a)
- dspstfft_audioInput (R2012a or later)
Spectrogram
Spectrum