Nonlinear Dynamical Invariants for Speech Recognition - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Nonlinear Dynamical Invariants for Speech Recognition

Description:

For vowel: /ae/ For nasal: /m/ For fricative: /sh ... Positive LE much higher for fricative than nasals and vowels. Page 16 of 23 ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 25
Provided by: jennifer194
Category:

less

Transcript and Presenter's Notes

Title: Nonlinear Dynamical Invariants for Speech Recognition


1
Nonlinear Dynamical Invariantsfor Speech
Recognition
S. Prasad, S. Srinivasan, M. Pannuri, G. Lazarou
and J. Picone Department of Electrical and
Computer Engineering Mississippi State
University URL http//www.ece.msstate.edu/researc
h/isip/publications/conferences/interspeech/2006/d
ynamical_invariants/
2
  • Motivation
  • State-of-the-art speech recognition systems
    relying on linear acoustic models suffer from
    robustness problem.
  • Our goal To study and use new features for
    speech recognition that do not rely on
    traditional measures of the first and second
    order moments of the signal.
  • Why nonlinear features?
  • Nonlinear dynamical invariants may be more robust
    (invariant) to noise.
  • Speech signals have both periodic-like and
    noise-like segments similar to chaotic signals
    arising from nonlinear systems.
  • The motivation behind studying such invariants
    to capture the relevant nonlinear dynamical
    information from the time series something that
    is ignored in conventional spectral analysis.

3
  • Attractors for Dynamical Systems
  • System Attractor Trajectories approach a limit
    with increasing time, irrespective of the initial
    conditions within a region
  • Basin of Attraction Set of initial conditions
    converging to a particular attractor
  • Attractors Non-chaotic (point, limit cycle or
    torus),or chaotic (strange attactors)
  • Example point and limit cycle attractors of a
    logistic map (a discrete nonlinear chaotic map)

4
  • Strange Attractors
  • Strange Attractors attractors whose shapes are
    neither points nor limit cycles. They typically
    have a fractal structure (i.e., they have
    dimensions that are not integers but fractional)
  • Example a Lorentz system with parameters

5
  • Characterizing Chaos
  • Exploit geometrical (self-similar structure)
    aspects of an attractor or the temporal evolution
    for system characterization
  • Geometry of a Strange Attractor
  • Most strange attractors show a similar structure
    at various scales, i.e., parts are similar to the
    whole.
  • Fractal dimensions can be used to quantify this
    self-similarity.
  • e.g., Hausdorff, correlation dimensions.
  • Temporal Aspect of Chaos
  • Characteristic exponents or Lyapunov exponents
    (LEs) - captures rate of divergence (or
    convergence) of nearby trajectories
  • Also Correlation Entropy captures similar
    information.
  • Any characterization presupposes that phase-space
    is available.
  • What if only one scalar time series measurement
    of the system (and not its actual phase space) is
    available?

6
  • Reconstructed Phase Space (RPS) Embedding
  • Embedding A mapping from a one-dimensional
    signal to an m-dimensional signal
  • Takens Theorem
  • Can reconstruct a phase space equivalent to the
    original phase space by embedding with m  2d1
    (d is the system dimension)
  • Embedding Dimension a theoretically sufficient
    bound in practice, embedding with a smaller
    dimension is adequate.
  • Equivalence
  • means the system invariants characterizing the
    attractor are the same
  • does not mean reconstructed phase space (RPS) is
    exactly the same as original phase space
  • RPS Construction techniques include differential
    embedding, integral embedding, time delay
    embedding, and SVD embedding

7
  • Reconstructed Phase Space (RPS) Time Delay
    Embedding
  • Uses delayed copies of the original time series
    as components of RPS to form a matrix
  • m embedding dimension, delay parameter
  • Each row of the matrix is a point in the RPS

8
  • Reconstructed Phase Space (RPS)

Time Delay Embedding of a Lorentz time series
9
  • Lyapunov Exponents
  • Quantifies separation in time between
    trajectories, assuming rate of growth (or decay)
    is exponential in time, as
  • where J is the Jacobian matrix at point p.
  • Captures sensitivity to initial conditions.
  • Analyzes separation in time of two trajectories
    with close initial points
  • where is the systems evolution function.

10
  • Correlation Integral
  • Measures the number of points within a
    neighborhood of radius, averaged over the entire
    attractor as
  • where are points on the attractor (which has
    N such points).
  • Theilers correction Used to prevent temporal
    correlations in the time series from producing an
    underestimated dimension.
  • Correlation integral is used in the computation
    of both correlation dimension and Kolmogorov
    entropy.

11
  • Fractal Dimension
  • Fractals objects which are self-similar at
    various resolutions
  • Correlation dimension a popular choice for
    numerically estimating the fractal dimension of
    the attractor.
  • Captures the power-law relation between the
    correlation integral of an attractor and the
    neighborhood radius of the analysis hyper-sphere
    as
  • where is the correlation integral.

12
  • Kolmogorov-Sinai Entropy
  • Entropy a well-known measure used to quantify
    the amount of disorder in a system.
  • Numerically, the Kolmogorov entropy can be
    estimated as the second order Renyi entropy ( )
    and can be related to the correlation integral of
    the reconstructed attractor as
  • where D is the fractal dimension of the systems
    attractor, d is the embedding dimension and is
    the time-delay used for attractor reconstruction.
  • This leads to the relation
  • In a practical situation, the values of and
    are restricted by the resolution of the
    attractor and the length of the time series.

13
  • Kullback-Leibler Divergence for Invariants
  • Measures discrimination information between two
    statistical models.
  • We measured invariants for each phoneme using a
    sliding window, and built an accumulated
    statistical model over each such utterance.
  • The discrimination information between a pair of
    models and is given by
  • provides a symmetric divergence measure
    between two populations from an
    information-theoretic perspective.
  • We use as the metric for quantifying the
    amount of discrimination information across
    dynamical invariants extracted from different
    broad phonetic classes.

14
  • Experimental Setup
  • Collected artificially elongated pronunciations
    of several vowels and consonants from 4 male and
    3 female speakers
  • Each speaker produced sustained sounds (4 seconds
    long) for three vowels (/aa/, /ae/, /eh/), two
    nasals (/m/, /n/) and three fricatives (/f/,
    /sh/, /z/).
  • The data was sampled at 22,050 Hz.
  • For this preliminary study, we wanted to avoid
    artifacts introduced by coarticulation.
  • Acoustic data to reconstructed phase space using
    time delay embedding with a delay of 10 samples.
    (This delay was selected as the first local
    minimum of the auto-mutual information vs. delay
    curve averaged across all phones.
  • Window Size 1500 samples.

15
  • Experimental Setup (Tuning Algorithmic Parameters)
  • Experiments performed to optimize parameters (by
    varying the parameters and choosing the value at
    which we obtain convergence) of estimation
    algorithm.
  • Embedding dimension for LE and correlation
    dimension 5
  • For Lyapunov exponent
  • number of nearest neighbors 30,
  • evolution step size 5,
  • number of sub-groups of neighbors 15.
  •  For Kolmogorov entropy
  • Embedding dimension of 15

16
  • Tuning Results Lyapunov Exponents

Lyapunov Exponents
  • For vowel /ae/ For nasal /m/ For
    fricative /sh/
  • In all three cases, the positive LE stabilizes at
    an embedding dimension of 5.
  • Positive LE much higher for fricative than nasals
    and vowels.

17
  • Tuning Results Kolmogorov Entropy

Kolmogorov Entropy
  • For vowel /ae/ For nasal /m/ For
    fricative /sh/
  • For vowels and nasals Have stable behavior with
    embedding dimensions around 12-15.
  • For fricatives Entropy estimate consistently
    increases with embedding dimension.

18
  • Tuning Results Correlation Dimension

Correlation Dimension
  • For vowel /ae/ For nasal /m/ For
    fricative /sh/
  • For vowels and nasals Clear scaling region at
    epsilon 0.75 Less sensitive to variations in
    embedding dimensions from 5-8.
  • For fricatives No clear scaling region more
    sensitive to variations in embedding dimension.

19
  • Experimental Results KL Divergence - LE
  • Discrimination information for
  • vowels-fricatives higher
  • nasals-fricatives higher
  • vowels-nasals lower

20
  • Experimental Results KL Divergence Kolmogorov
    Entropy
  • Discrimination information for
  • vowels-fricatives higher
  • nasals-fricatives higher
  • vowels-nasals lower

21
  • Experimental Results KL Divergence
    Correlation Dimension
  • Discrimination information for
  • vowels-fricatives higher
  • nasals-fricatives higher
  • vowels-nasals lower

22
  • Summary and Future Work
  • Conclusions
  • Reconstructed phase-space from speech data using
    Time Delay Embedding.
  • Extracted three nonlinear dynamical invariants
    (LE, Kolmogorov entropy, and Correlation
    Dimension) from embedded speech data.
  • Demonstrated the between-class separation of
    these invariants across8 phonetic sounds.
  • Encouraging results for speech recognition
    applications.
  • Future Work
  • Study speaker variability with the hope that
    variations in the vocal tract response across
    speakers will result in different attractor
    structures.
  • Add these invariants as features for speech and
    speaker recognition.

23
  • Resources

24
  • References
  • Kumar, A. and Mullick, S.K., Nonlinear Dynamical
    Analysis of Speech, Journal  of the Acoustical
     Society of America, vol. 100, no. 1, pp.
    615-629, July 1996.
  • Banbrook M., Nonlinear analysis of speech from a
    synthesis perspective, PhD Thesis, The
    University of Edinburgh, Edinburgh, UK, 1996.
  • Kokkinos, I. and Maragos, P., Nonlinear Speech
    Analysis using Models for Chaotic Systems, IEEE
    Transactions on Speech and Audio Processing,
    pp. 1098- 1109, Nov. 2005.
  • Eckmann, J.P. and Ruelle, D., Ergodic Theory of
    Chaos and Strange Attractors, Reviews of Modern
    Physics, vol. 57, pp. 617-656, July 1985.
  • Kantz, H. and Schreiber T., Nonlinear Time Series
    Analysis, Cambridge University Press, UK, 2003.
  • Campbell, J. P., Speaker Recognition A
    Tutorial, Proceedings of IEEE, vol. 85, no. 9,
    pp. 1437-1462, Sept. 1997.
Write a Comment
User Comments (0)
About PowerShow.com