Dynamic Time Warping and Minimum Distance Paths for Speech Recognition - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Dynamic Time Warping and Minimum Distance Paths for Speech Recognition

Description:

Record, parameterise and store vocabulary of reference words ... Consecutive moves upwards/eastwards. 12. Global Constraints. 13. Local Constraints. mindist(r,c) ... – PowerPoint PPT presentation

Number of Views:906
Avg rating:3.0/5.0
Slides: 17
Provided by: mto80
Category:

less

Transcript and Presenter's Notes

Title: Dynamic Time Warping and Minimum Distance Paths for Speech Recognition


1
Dynamic Time Warping and Minimum Distance Paths
for Speech Recognition
  • Isolated word recognition
  • Task
  • Want to build an isolated word recogniser e.g.
    voice dialling on mobile phones
  • Method
  • Record, parameterise and store vocabulary of
    reference words
  • Record test word to be recognised and
    parameterise
  • Measure distance between test word and each
    reference word
  • Choose reference word closest to test word

2
Words are parameterised on a frame-by-frame
basis Choose frame length, over which speech
remains reasonably stationary Overlap frames e.g.
40ms frames, 10ms frame shift
40ms
20ms
We want to compare frames of test and reference
words i.e. calculate distances between them
3
Calculating Distances
  • Easy
  • Sum differences between corresponding frames
  • Problem
  • Number of frames wont always correspond

4
  • Solution 1 Linear Time Warping
  • Stretch shorter sound
  • Problem?
  • Some sounds stretch more than others

5
  • Solution 2
  • Dynamic Time Warping (DTW)

5 3 9 7 3
Test
4 7 4
Reference
Using a dynamic alignment, make most similar
frames correspond Find distances between two
utterences using these corresponding frames
6
Digression Dynamic Programming
  • The shortest route from Dublin to Limerick goes
    through
  • Kildare
  • Monasterevin
  • Portlaoise
  • Mountrath
  • Roscrea
  • Nenagh
  • Now consider the shortest route from Dublin to
    Nenagh
  • What towns does the route go through?

7
Intercity Example
8
(No Transcript)
9
Compute minimum distances dist each point and
place in mindist matrix mindist(5,3) min1
mindist(5,2), 1 mindist(4,2), 1
mindist(4,3)
Place distance between frame r of Test and frame
c of Reference in cell(r,c) of distance matrix
3 5 1 x 4 x 1 x
7 4 3 x 0 x 3 x
9 3 5 x 2 x 5 x
3 2 1 x 4 x 1 x
5 1 1 x 2 x 1 x
1 2 3
4 7 4
Test
3 5 11 x 8 x 5 x
7 4 10 x 4 x 7 x
9 3 7 x 4 x 9 x
3 2 2 x 5 x 4 x
5 1 1 x 3 x 4 x
1 2 3
4 7 4
Test
Reference
We can also find the path through the grid that
minimizes total cost of path
Reference
10
Examples so far are uni-dimensional Speech is
multi-dimensional e.g. two dimensions, using
points (4,3) and (5,2)
4 5
54321

x
x
1 2 3 4 5
Distance equation for 2 dimensions
Distance equation for multi-dimensional
11
Constraints
  • Global
  • Endpoint detection
  • Path should be close to diagonal
  • Local
  • Must always travel upwards or eastwards
  • No jumps
  • Slope weighting
  • Consecutive moves upwards/eastwards

12
Global Constraints
13
Local Constraints
mindist(r,c)
1
mindist(r,c-1)
weights
1
2
mindist(r-1,c)
mindist(r-1,c-1)
14
Points to Note
  • DTW really only suitable for small vocabularies
    and/or speaker dependent recognition
  • Should normalise for reference length
  • Can use multiple utterances and cluster them
  • Poor performance if recording environment changes
  • High computation cost

15
Evaluation
  • Performance of designs only comparable by
    evaluation
  • Use a test set
  • For single word recognition we can simply quote
    accuracy

In error analysis, it can be helpful to use a
confusion matrix
16
Confusion Matrix
references test tokens test tokens
references yes no
yes 24 2
no 3 21
Write a Comment
User Comments (0)
About PowerShow.com