Title: Audio Features
1Audio Features Machine Learning
2Features for Speech Recognition and Audio Indexing
- Parametric Representations
- Short Time Energy
- Zero Crossing Rates
- Level Crossing Rates
- Short Time Spectral Envelope
- Spectral Analysis
- Filter Design
- Filter Bank Spectral Analysis Model
- Linear Predictive Coding (LPC)
3Methods
- Vector Quantization
- Finite code book of spectral shapes
- The code book codes for typical spectral shape
- Method for all spectral representations (e.g.
Filter Banks, LPC, ZCR, etc. ) - Ensemble Interval Histogram (EIH) Model
- Auditory-Based Spectral Analysis Model
- More robust to noise and reverberation
- Expected to be inherently better representation
of relevant spectral information because it
models the human cochlea mechanics
4Pattern Recognition
Parameter Measurements
Speech Audio,
Test Pattern Query Pattern
Reference Patterns
Pattern Comparison
Decision Rules
Recognized Speech, Audio,
5Pattern Recognition
6Spectral Analysis Models
- Pattern Recognition Approach
- Parameter Measurement gt Pattern
- Pattern Comparison
- Decision Making
- Parameter Measurements
- Bank of Filters Model
- Linear Predictive Coding Model
7Band Pass Filter
- Note that the bandpass filter can be defined as
- a convolution with a filter response function in
the time domain, - a multiplication with a filter response function
in the frequency domain
8Bank of Filters Analysis Model
9Bank of Filters Analysis Model
- Speech Signal s(n), n0,1,
- Digital with Fs the sampling frequency of s(n)
- Bank of q Band Pass Filters BPF1, ,BPFq
- Spanning a frequency range of, e.g., 100-3000Hz
or 100-16kHz - BPFi(s(n)) xn(ej?i), where ?i 2pfi/Fs is
equal to the normalized frequency fi, where i1,
, q. - xn(ej?i) is the short time spectral
representation of s(n) at time n, as seen through
the BPFi with centre frequency ?i, where i1, ,
q. - Note Each BPF independently processes s to
produce the spectral representation x
10Bank of Filters Front End Processor
11Typical Speech Wave Forms
12MFCCs
Speech Audio,
Preemphasis
Windowing
Fast Fourier Transform
Mel-Scale Filter Bank
MFCCs are calculated using the formula
Log()
- Where
- Ci is the cepstral coefficient
- P the order (12 in our case)
- K the number of discrete Fourier
- transform magnitude coefficients
- Xk the kth order log-energy output
- from the Mel-Scale filterbank.
- N is the number of filters
MFCCs first 12 most Signiifcant coefficients
Direct Cosine Transform
13Linear Predictive Coding Model
14Filter Response Functions
15SomeExamples of Ideal Band Filters
16Perceptually Based Critical Band Scale
17Short Time Fourier Transform
- s(m) signal
- w(n-m) a fixed low pass window
18Short Time Fourier TransformLong Hamming Window
500 samples (50msec)
Voiced Speech
19Short Time Fourier TransformShort Hamming
Window 50 samples (5msec)
Voiced Speech
20Short Time Fourier TransformLong Hamming Window
500 samples (50msec)
Unvoiced Speech
21Short Time Fourier TransformShort Hamming
Window 50 samples (5msec)
Unvoiced Speech
22Short Time Fourier TransformLinear Filter
Interpretation
23Linear Predictive Coding (LPC) Model
- Speech Signal s(n), n0,1,
- Digital with Fs the sampling frequency of s(n)
- Spectral Analysis on Blocks of Speech with an all
pole modeling constraint - LPC of analysis order p
- s(n) is blocked into frames n,m
- Again consider xn(ej?) the short time spectral
representation of s(n) at time n. (where ?
2pf/Fs is equal to the normalized frequency f). - Now the spectral representation xn(ej?) is
constrained to be of the form s/A(ej?), where
A(ej?) is the pth order polynomial with
z-transform - A(z) 1 a1z-1 a2z-2 apz-p
- The output of the LPC parametric Conversion on
block n,m is the vector a1,,ap. - It specifies parametrically the spectrum of an
all-pole model that best matches the signal
spectrum over the period of time in which the
frame of speech samples was accumulated (pth
order polynomial approximation of the signal).
24Vector Quantization
- Data represented as feature vectors.
- VQ Training set to determine a set of code words
that constitute a code book. - Code words are centroids using a similarity or
distance measure d. - Code words together with d divide the space into
a Voronoi regions. - A query vector falls into a Voronoi region and
will be represented by the respective codeword.
25Vector Quantization
- Distance measures d(x,y)
- Euclidean distance
- Taxi cab distance
- Hamming distance
- etc.
26Vector Quantization
- Clustering the Training Vectors
- Initialize choose M arbitrary vectors of the L
vectors of the training set. This is the initial
code book. - Nearest neighbor search for each training
vector, find the code word in the current code
book that is closest and assign that vector to
the corresponding cell. - Centroid update update the code word in each
cell using the centroid of the training vectors
that are assigned to that cell. - Iteration repeat step 2-3 until the averae
distance falls below a preset threshold.
27Vector Classification
- For an M-vector code book CB with codes
- CB yi 1 i M ,
- the index m of the best codebook entry for a
given vector v is - m arg min d(v, yi)
- 1 i M
28VQ for Classification
- A code book CBk yki 1 i M, can be used
to define a class Ck. - Example Audio Classification
- Classes crowd, car, silence, scream,
explosion, etc. - Determine by using VQ code books CBk for each of
the classes. - VQ is very often used as a baseline method for
classification problems.
29Sound, DNA Sequences!
- DNA helix-shaped molecule whose constituents are
two parallel strands of nucleotides - DNA is usually represented by sequences of these
four nucleotides - This assumes only one strand is considered the
second strand is always derivable from the first
by pairing As with Ts and Cs with Gs and
vice-versa
- Nucleotides (bases)
- Adenine (A)
- Cytosine (C)
- Guanine (G)
- Thymine (T)
30Biological Information From Genes to Proteins
31From Amino Acids to Proteins Functions
CGCCAGCTGGACGGGCACACCATGAGGCTGCTGACCCTCCTGGGCCTTCT
G TDQAAFDTNIVTLTRFVMEQGRKARGTGEMTQLLNSLCTAVKAIST
AVRKAGIAHLYGIAGSTNVTGDQVKKLDVLSNDLVINVLKSSFATCVLVT
EEDKNAIIVEPEKRGKYVVCFDPLDGSSNIDCLVSIGTIFGIYRKNSTDE
PSEKDALQPGRNLVAAGYALYGSATML
DNA / amino acid sequence 3D
structure protein functions
DNA (gene) ??? pre-RNA ??? RNA ??? Protein
RNA-polymerase Spliceosome
Ribosome
32Motivation for Markov Models
- There are many cases in which we would like to
represent the statistical regularities of some
class of sequences - genes
- proteins in a given family
- Sequences of audio features
- Markov models are well suited to this type of
task
33A Markov Chain Model
- Transition probabilities
- Pr(xiaxi-1g)0.16
- Pr(xicxi-1g)0.34
- Pr(xigxi-1g)0.38
- Pr(xitxi-1g)0.12
34Definition of Markov Chain Model
- A Markov chain1 model is defined by
- a set of states
- some states emit symbols
- other states (e.g., the begin state) are silent
- a set of transitions with associated
probabilities - the transitions emanating from a given state
define a distribution over the possible next
states - 1 ?????? ?. ?., ??????????????? ?????? ???????
????? ?? ????????, ????????? ???? ?? ?????.
???????? ??????-??????????????? ???????? ???
????????? ????????????. 2-? ?????. ??? 15.
(1906) ?. 135156
35Markov Chain Models Properties
- Given some sequence x of length L, we can ask how
probable the sequence is given our model - For any probabilistic model of sequences, we can
write this probability as - key property of a (1st order) Markov chain the
probability of each xi depends only on the value
of xi-1
36The Probability of a Sequence for a Markov Chain
Model
Pr(cggt)Pr(c)Pr(gc)Pr(gg)Pr(tg)
37Example Application
- CpG islands
- CG di-nucleotides are rarer in eukaryotic genomes
than expected given the marginal probabilities of
C and G - but the regions upstream of genes are richer in
CG di-nucleotides than elsewhere CpG islands - useful evidence for finding genes
- Application Predict CpG islands with Markov
chains - one Markov chain to represent CpG islands
- another Markov chain to represent the rest of the
genome
38Markov Chains for Discrimination
- Suppose we want to distinguish CpG islands from
other sequence regions - Given sequences from CpG islands, and sequences
from other regions, we can construct - a model to represent CpG islands
- a null model to represent the other regions
- We can then score a test sequence by
39Markov Chains for Discrimination
- Why can we use
- According to Bayes rule
- If we are not taking into account prior
probabilities (Pr(CpG) and Pr(null)) of the two
classes, then from Bayes rule it is clear that
we just need to compare Pr(xCpG) and Pr(xnull)
as is done in our scoring function score().
40Higher Order Markov Chains
- The Markov property specifies that the
probability of a state depends only on the
probability of the previous state - But we can build more memory into our states by
using a higher order Markov model - In an n-th order Markov model
- The probability of the current state depends on
the previous n states.
41Selecting the Order of a Markov Chain Model
- But the number of parameters we need to estimate
grows exponentially with the order - for modeling DNA we need
parameters for an n-th order model - The higher the order, the less reliable we can
expect our parameter estimates to be - estimating the parameters of a 2nd order Markov
chain from the complete genome of E. Coli (5.44 x
106 bases) , wed see each word 85.000 times on
average (divide by 43) - estimating the parameters of a 9th order chain,
wed see each word 5 times on average (divide
by 410 106)
42Higher Order Markov Chains
- An n-th order Markov chain over some alphabet A
is equivalent to a first order Markov chain over
the alphabet of n-tuples An - Example A 2nd order Markov model for DNA can be
treated as a 1st order Markov model over alphabet - AA, AC, AG, AT
- CA, CC, CG, CT
- GA, GC, GG, GT
- TA, TC, TG, TT
43A Fifth Order Markov Chain
Pr(gctaca)Pr(gctac)Pr(agctac)
44Hidden Markov Model A Simple HMM
Model 2
Model 1
Given observed sequence AGGCT, which state emits
every item?
45Tutorial on HMM
- L.R. Rabiner, A Tutorial on Hidden Markov Models
and Selected Applications in Speech Recognition, - Proceeding of the IEEE, Vol. 77, No. 22, February
1989.
46HMM for Hidden Coin Tossing
T
H
H
H H T T H T H H T T H
T
T
T
T
T
47Hidden State
- Well distinguish between the observed parts of a
problem and the hidden parts - In the Markov models weve considered previously,
it is clear which state accounts for each part of
the observed sequence - In the model above, there are multiple states
that could account for each part of the observed
sequence - this is the hidden part of the problem
48Learning and Prediction Tasks(in general, i.e.,
applies on both MM as HMM)
- Learning
- Given a model, a set of training sequences
- Do find model parameters that explain the
training sequences with relatively high
probability (goal is to find a model that
generalizes well to sequences we havent seen
before) - Classification
- Given a set of models representing different
sequence classes, and given a test sequence - Do determine which model/class best explains the
sequence - Segmentation
- Given a model representing different sequence
classes, and given a test sequence - Do segment the sequence into subsequences,
predicting the class of each subsequence
49Algorithms for Learning Prediction
- Learning
- correct path known for each training sequence -gt
simple maximum likelihood or Bayesian estimation - correct path not known -gt Forward-Backward
algorithm ML or Bayesian estimation - Classification
- simple Markov model -gt calculate probability of
sequence along single path for each model - hidden Markov model -gt Forward algorithm to
calculate probability of sequence along all paths
for each model - Segmentation
- hidden Markov model -gt Viterbi algorithm to find
most probable path for sequence
50The Parameters of an HMM
- Transition Probabilities
- Probability of transition from state k to state l
- Emission Probabilities
- Probability of emitting character b in state k
- Note HMMs can also be formulated using an
emission probability associated with a transition
from state k to state l.
51An HMM Example
Transition probabilities ? pi 1
Emission probabilities ? pi 1
52Three Important Questions(See also L.R. Rabiner
(1989))
- How likely is a given sequence?
- The Forward algorithm
- What is the most probable path for generating a
given sequence? - The Viterbi algorithm
- How can we learn the HMM parameters given a set
of sequences? - The Forward-Backward (Baum-Welch) algorithm
53How Likely is a Given Sequence?
- The probability that a given path is taken and
the sequence is generated
54How Likely is a Given Sequence?
- The probability over all paths is
- but the number of paths can be exponential in the
length of the sequence... - the Forward algorithm enables us to compute this
efficiently
55The Forward Algorithm
- Define to be the probability of being in
state k having observed the first i characters of
sequence x of length L - To compute , the probability of being in
the end state having observed all of sequence x - Can be defined recursively
- Compute using dynamic programming
56The Forward Algorithm
- fk(i) equal to the probability of being in state
k having observed the first i characters of
sequence x - Initialization
- f0(0) 1 for start state fi(0) 0 for other
state - Recursion
- For emitting state (i 1, L)
- For silent state
- Termination
57Forward Algorithm Example
Given the sequence xTAGA
58Forward Algorithm Example
- Initialization
- f0(0)1, f1(0)0f5(0)0
- Computing other values
- f1(1)e1(T)(f0(0)a01f1(0)a11)
- 0.3(10.500.2)0.15
- f2(1)0.4(10.500.8)
- f1(2)e1(A)(f0(1)a01f1(1)a11)
- 0.4(00.50.150.2)
-
- Pr(TAGA) f5(4)f3(4)a35f4(4)a45
59Three Important Questions
- How likely is a given sequence?
- What is the most probable path for generating a
given sequence? - How can we learn the HMM parameters given a set
of sequences?
60Finding the Most Probable Path The Viterbi
Algorithm
- Define vk(i) to be the probability of the most
probable path accounting for the first i
characters of x and ending in state k - We want to compute vN(L), the probability of the
most probable path accounting for all of the
sequence and ending in the end state - Can be defined recursively
- Again we can use use Dynamic Programming to
compute vN(L) and find the most probable path
efficiently
61Finding the Most Probable Path The Viterbi
Algorithm
- Define vk(i) to be the probability of the most
probable path p accounting for the first i
characters of x and ending in state k - The Viterbi Algorithm
- Initialization (i 0)
- v0(0) 1, vk(0) 0 for kgt0
- Recursion (i 1,,L)
- vl(i) el(xi) .maxk(vk(i-1).akl)
- ptri(l) argmaxk(vk(i-1).akl)
- Termination
- P(x,p) maxk(vk(L).ak0)
- pL argmaxk(vk(L).ak0)
62Three Important Questions
- How likely is a given sequence?
- What is the most probable path for generating a
given sequence? - How can we learn the HMM parameters given a set
of sequences?
63Learning Without Hidden State
- Learning is simple if we know the correct path
for each sequence in our training set - estimate parameters by counting the number of
times each parameter is used across the training
set
64Learning With Hidden State
- If we dont know the correct path for each
sequence in our training set, consider all
possible paths for the sequence - Estimate parameters through a procedure that
counts the expected number of times each
parameter is used across the training set
65Learning Parameters The Baum-Welch Algorithm
- Also known as the Forward-Backward algorithm
- An Expectation Maximization (EM) algorithm
- EM is a family of algorithms for learning
probabilistic models in problems that involve
hidden states - In this context, the hidden state is the path
that best explains each training sequence
66Learning Parameters The Baum-Welch Algorithm
- Algorithm sketch
- initialize parameters of model
- iterate until convergence
- calculate the expected number of times each
transition or emission is used - adjust the parameters to maximize the likelihood
of these expected values
67Computational Complexity of HMM Algorithms
- Given an HMM with S states and a sequence of
length L, the complexity of the Forward, Backward
and Viterbi algorithms is - This assumes that the states are densely
interconnected - Given M sequences of length L, the complexity of
Baum Welch on each iteration is
68Markov Models Summary
- We considered models that vary in terms of order,
hidden state - Three DP-based algorithms for HMMs Forward,
Backward and Viterbi - We discussed three key tasks learning,
classification and segmentation - The algorithms used for each task depend on
whether there is hidden state (correct path
known) in the problem or not
69Summary
- Markov chains and hidden Markov models are
probabilistic models in which the probability of
a state depends only on that of the previous
state - Given a sequence of symbols, x, the forward
algorithm finds the probability of obtaining x in
the model - The Viterbi algorithm finds the most probable
path (corresponding to x) through the model - The Baum-Welch learns or adjusts the model
parameters (transition and emission
probabilities) to best explain a set of training
sequences.