HMM Toolkit HTK - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

HMM Toolkit HTK

Description:

The Hidden Markov Model Toolkit (HTK) is a portable toolkit for building and ... Due to this, we need to jerry-rig our data to the HTK parameterized data file format. ... – PowerPoint PPT presentation

Number of Views:922
Avg rating:3.0/5.0
Slides: 22
Provided by: publi5
Category:
Tags: hmm | htk | rig | toolkit

less

Transcript and Presenter's Notes

Title: HMM Toolkit HTK


1
HMM Toolkit (HTK)
  • Presentation by
  • Daniel Whiteley
  • AME department

2
What is HTK?
  • The Hidden Markov Model Toolkit (HTK) is a
    portable toolkit for building and manipulating
    hidden Markov models. HTK is primarily used for
    speech recognition research although it has been
    used for numerous other applications including
    research into speech synthesis, character
    recognition and DNA sequencing. HTK is in use at
    hundreds of sites worldwide.

3
What is HTK?
  • HTK consists of a set of library modules and
    tools available in C source form. The tools
    provide sophisticated facilities for speech
    analysis, HMM training, testing and results
    analysis. The software supports HMMs using both
    continuous density mixture Gaussians and discrete
    distributions and can be used to build complex
    HMM systems.

4
Basic HTK command format
  • The commands in HTK follow a basic command line
    format
  • HCommand options files
  • Options are indicated by a dash followed by the
    option letter. Universal options are capital
    letters.
  • In HTK, it is not necessary to use file
    extentions, but headers to determine their format.

5
Configuration files
  • As well, you can set up the configuration of HTK
    modules using config files. They are implemented
    using the -C option or they can be implemented
    globally using the command setenv HCONFIG
    myconfig where myconfig is your own config
    modifications.
  • All possible configuration variables can be found
    in chapter 18 of the HTK manual. However, for
    most of our purposes, we only need to create a
    config file with these lines
  • SOURCEKIND USER The user defined file format
    (not sound)
  • TARGETKIND ANON_D Keep the file the same
    format.

6
Using HTK
  • Parts of HMM modeling
  • Data Preparation
  • Model Training
  • Pattern Recognition
  • Model Analysis

7
Data Preparation
  • One small problem
  • HTK was tailored for speech recognition.
    Therefore, most of the data preparation tools are
    for audio.
  • Due to this, we need to jerry-rig our data to the
    HTK parameterized data file format.
  • HTK parameter files consist of a sequence of
    samples preceeded by a header. The samples are
    simply data vectors, whose components are 2-byte
    integers or 4-byte floating point numbers.
  • For us, these vectors will be a sequence of joint
    angles received from a motion capture session.

8
HTK file format
  • The file begins with a 12-byte header containing
    the following information
  • nSamples (4-byte int) Number of samples
  • samplePeriod (4-byte int) Sample period
    (calculated by multiplying the number by 100ns)
  • sampleSize (2-byte) Number of bytes per vector
  • parameterKind (2-byte int) Defines the type of
    data
  • For our purposes, either this parameter will be
    0x2400, which is the user defined parameter kind,
    or 0x2800, which is the discrete case.

9
HMM model creation
  • In order to model the motion capture squence, we
    need to create a prototype of the HMM. In this
    prototype, the values of B and ? are arbitrary.
    The same is true for the transition matrix A,
    save that any transition probability you set to
    zero will remain as zero.
  • Models are created using a scripting language
    similar to HTML.
  • As well, models in HTK have a beginning and
    ending state which are non-emitting. These
    states are not defined in the script.

10
HMM Model Example
Name of the file
Number of Gaussian distributions
  • h ''prototype''
  • ltBeginHMMgt
  • ltVectorSizegt 4 ltUSERgt
  • ltNumStatesgt 5
  • ltStategt 2 ltNumMixesgt 3
  • ltMixturegt 1 0.3
  • ltMeangt 4
  • 0.0 0.0 0.0 0.0
  • ltVariancegt 4
  • 1.0 1.0 1.0 1.0
  • ltMixturegt 2 0.4 ...
  • ltStategt 3 ...

Transition matrix A
  • ...
  • ltTransPgt
  • 0.0 0.4 0.3 0.3 0.0
  • 0.0 0.2 0.5 0.3 0.0
  • 0.0 0.2 0.2 0.4 0.2
  • 0.0 0.1 0.2 0.3 0.4
  • 0.0 0.0 0.0 0.0 0.0

Number of states
Mean observation vector
Sample size
Covariance matrix diagonal
All the transition probabilities for the ending
state are always zero
The distributions ID and weight
11
Vector Quantization
  • In order to reduce computation, we can make the
    HMM discreete.
  • In order to use a discreete HMM, we must first
    quantize the data into a set of standard vectors.
  • Warning in quantizing the data, error is
    inheritably introduced.
  • Before quantizing the data, we must first have a
    standard set of vectors, or a vector cookbook.
    This is made with HQuant.

12
HQuant
  • HQuant takes the training data and uses a K-means
    algorithm to evenly partition the data and find
    the centriods of these partitions to create our
    quantization vectors (QVs).
  • A sample command
  • HQuant -C config -n 1 64 -S train.scp vqcook
  • To reduce quatization time, a cookbook using a
    binary tree search algorithm can be made using
    the -t option.

Number of QVs for a certain data stream
You can use a script to list all of your training
files
Our cookbook will be written to this file
Use the configuration variables found in config
13
Converting to Discrete
  • The conversion of data files is done using the
    HCopy command. In order to quantize our data, we
    do this
  • HCopy C quantize rawdata qvdata
  • Where rawdata is our original data, qvdata is our
    quantized data, and quantize is a config file
    having these commands
  • SOURCEKIND USER We start with our
    original data
  • TARGETKIND DISCRETE Convert it into
    discrete data
  • SAVEASVQ T We throw away the continuous
    data
  • VQTABLE vqcook We use are previously made
    cookbook to quantize the data

14
Discrete HMM
  • Discreete HMMs are very similar to their
    continuous counterparts, save for a few changes.
  • Discrete probabilities are in logrithmic form,
    where
  • P(v) exp(-d(v)/2371.8)
  • o ltDiscretegt ltStreamInfogt 1 1
  • h dhmm
  • ltBeginHMMgt
  • ltNumStatesgt 5
  • ltStategt 2 ltNumMixesgt 10
  • ltDProbgt 546110
  • ....
  • ltEndHMMgt

Number of discrete symbols
Duplicate function
15
Model Training (token HMM)
  • The initialization of our prototype can be done
    using HInit
  • HInit options hmm data1 data2 data3 ...
  • HInit is used mainly for left-right HMMs. For
    more ergodic HMMs, it can be initialized by doing
    a flat-start. This is done by setting all means
    and variances to the global counterparts using
    HCompV
  • HCompV -m -S trainlist hmm

(The HHMM being trained)
16
Retraining
  • The model this then retrained using the
    Welch-Baum algorithm found in HRest
  • HRest -w 1.0 -v 0.0001 -S trainlist hmm
  • The -w and -v options are to set floors for the
    mixture probability and variances respectively.
    The float used in -w represents a multiplier of
    10-5.
  • This can be iterated as many times as wanted to
    achieve desired results.

17
Dictionary Creation
  • In order to create a recognition program or
    script, we must first create a dictionary.
  • A dictionary in HTK gives the word and its
    pronunciation. For our purposes, it will just
    consist of our token HMM that we trained.
  • RUNNING run
  • WALKING walk
  • JUMPING SKIPPING jump

Word
Tokens used to form the word
Displayed output (if not specified the word is
displayed)
18
Label Files
  • Label files contain a transcription of what is
    going on in the data sequence.
  • 000000 100000 walk
  • 100001 200000 run
  • 200001 300000 jump

End of frame in samples
Start of frame in samples
Token found in that time frame
19
Master Label Files (MLFs)
Same as a original label file
  • During training and recognition, we may have many
    test files and their accompanying label files.
    The label files can be condensed into one file
    called a master label file, or MLF.
  • !MLF!
  • /a.lab
  • 000000 100000 walk
  • 100001 200000 run
  • 200001 300000 jump
  • .
  • /b.lab
  • run
  • .
  • /jump.lab
  • jump
  • .

If the entire file is one token, it can be
labeled with just the token
The wildcard operator can be used to label
multiple files at once
20
Pattern Recognition
  • The recognition of a motion sequence is done by
    using HVite.
  • To receive a transcription of the recognition
    data in MLF format, we use
  • HVite a i results o SWT H hmmlist \
  • I transcripts.mlf S testfiles

Throws away unnecessary data in the label files
Output transcription file in MLF format
Text file containing a list of HMM used
Create word network from given transcriptions
MLF file that has the test files transcriptions
Motion capture data to be recognized
21
Model Analysis
  • The analysis of the recognition results is done
    by HResults.
  • HResults -I transcripts.mlf -H hmmlist results
  • Note The reference labels and the results
    labels must have different file extensions

List of HMMs used
MLF containing result labels
MLF containing the reference labels
Write a Comment
User Comments (0)
About PowerShow.com