Kenntnisbasierte ASR - PowerPoint PPT Presentation

1 / 89
About This Presentation
Title:

Kenntnisbasierte ASR

Description:

... war meistens das Wort (also keine kontinuierliche Spracherkennung! ... Die Wahl des Referenzmusters f r ein Wort kann die Erkennung stark beeinflussen. ... – PowerPoint PPT presentation

Number of Views:110
Avg rating:3.0/5.0
Slides: 90
Provided by: institutf3
Category:

less

Transcript and Presenter's Notes

Title: Kenntnisbasierte ASR


1
(No Transcript)
2
The definition of sound segments in phonetics
and speech technology
Intensive course Centre of Information
Society Technologies Sofia University St.
Kliment Ohridski 18. - 22. Februar 2002 Jacques
Koreman jkoreman_at_coli.uni-sb.de Bistra
Andreeva andreeva_at_coli.uni-sb.de Trajan
Iliev t_iliev_at_fmi.uni-sofia.bg Institute of
Phonetics University of the Saarland P.O. Box
15 11 50 D - 66041 Saarbrücken Germany
3
Overview of the course
  • Monday introduction and discussion of student
    projects
  • Tuesday 9-11 ASR techniques hidden
    Markov modelling (JK) 11-13 A formal
    description of Bulgarian sound segments (BA)
  • Wednesday 9-11 ASR techniques neural
    networks (JK) 11-13 The acoustic description
    of speech signals (BA)
  • Thursday 9-11 The segmentation of
    speech sounds using NNs (TI) 11-13 Sound
    segments and their boundaries (BA)
  • Friday discussion of student projects

4
(No Transcript)
5
The definition of sound segments in phonetics
and speech technology
  • Preamble

6
Goal of ASR systems
  • Automatic speech recognition (ASR) systems take
    the microphone signal as their input and
    recognise utterances as a sequence of words.
  • In order to achieve this, speech sounds must be
    recognised from the signal and matched to (a
    sequence of) words in the lexicon. By doing this,
    sound sequences which do not constitute a
    sequence of words are excluded from the search.

7
Goal of ASR systems
  • Since the microphone signal (reflecting the
    variations in air pressure which constitute the
    signal) is not a very informative representation,
    we first derive some sort of spectral
    representation from the signal. The spectrum does
    contain all the information we need to identify
    speech sounds.

8
Microphone signal spectrogram
d
e
p0
s
i
m
a
l
z
Y
b0
s
p0
t
e
m


(
9
Finding sounds in the spectrum
  • Two problems must be solved to find sounds in the
    speech signal
  • segmentation slicing the signal into sounds
  • identification determining which sounds
    were spoken

10
Segmentation
  • Segmentation is difficult, because speech is
    produced in a single flow, i.e. there are no
    pauses between the words and the articulators are
    constantly moving from one position to the next.
  • In some sound types the movement is intrinsic to
    the sound glides and diphthongs. But only when
    people speak slowly, we find so-called steady
    states in the signal. The interpretation of the
    movement depends on context, accentuation,
    speaking rate.

Have you ever tried to identifythe word or phone
boundariesin a language you do not know?!
11
Identification
  • Identification is also difficult, because no
    sound is produced the same twice. This is due to
    differences between speakers (and even the same
    speaker will never produce a sound exactly the
    same twice), context, accent and dialect,
    accentuation, situation (e.g. formal or
    informal), etc.etc.
  • Example pan span ban
  • Conclusion a sound can not always be identified
    on the basis of fixed cues!

12
The history of ASR systems
  • The first ASR systems were knowledge-based, i.e.
    they used phonetic knowledge about the
    realisation of speech sounds to identify them in
    the signal. The best results were achieved if
    broad sound classes were detected in a first
    step, and then a set of matching word candidates
    were selected, on the basis of which a fine
    search for distinguishing phonetic properties was
    then carried out.

13
The history of ASR systems
  • Although a few people still work with
    knowledge-based systems, most of the systems used
    nowadays are stochastic, i.e. they do not attempt
    to find specific phonetic characteristics of
    speech sounds in an all-or-none approach, but use
    probabilities of general spectral properties to
    compute a model for each speech sound.
  • For this reason, we shall only discuss stochastic
    modelling techniques in this course.

14
References knowledge-based ASR
  • Broad, D. und Shoup, J. (1975). Concepts for
    acoustic phonetic recognition. In D. Reddy,
    Speech Recognition. New York Academic Press.
  • Zue, V. (1990). The use of speech knowledge in
    automatic speech recognition. In A. Waibel and
    K.-F. Lee, Readings in Speech Recognition, pp.
    200-213. San Mateo Morgan Kaufmann Publishers,
    Inc.

15
References knowledge-based ASR
  • Stevens, K. (2000). From acoustic cues to
    segments, features and words, Proc. Int. Conf. On
    Spoken Lang. Proc. (ICSLP2000), Beijing.
  • Reetz (1999). Converting speech signals to
    phonological features. Proc. of the XIVth Conf.
    Of Phonetic Sciences (ICPhS99), San Francisco,
    1733-1736.
  • Lahiri (2000). Underspecified recognition. Proc.
    of the Conf. on Laboratory Phonology
    (LabPhon2000).

16
(No Transcript)
17
The definition of sound segments in phonetics
and speech technology
  • ASR techniques hidden Markov modelling

18
Hidden Markov modelling
Hidden Markov modelling is a stochastic
technique, which means that it models variation
(the variation in the signal) by using
probabilities. Usually, each sound (sometimes
word or word sequence) is represented by a hidden
Markov model (HMM). Whole utterances are then
modelled as a sequence of words, each of which
constitutes a sequence of sounds. The sequence of
words with the highest probability is
recognised.
19
Hidden Markov modelling
The a-priori probability of the words (lexicon)
and of word sequences (language) model play an
important role in computing the most likely
sequence of HMMs to have generated an acoustic
signal. I shall come back to this later. But let
us first look at how the match between an
acoustic signal and a sequence of HMMs is
determined.
20
Markov modelling
  • Markov models consist of states which are
    connected by transitions.
  • When the automaton is in a specific state, it
    emits a symbol (e.g. an acoustic vector)
  • Each transitions between two states has a
    probability associated with it.
  • Lets first look at a simple example, in which
    the states are represented by containers with
    coloured balls.

stochasticmodelling
21
MMs a simple example
  • We start in state S, which does not emit a
    symbol. From there, we go to state 1 with
    probability 1.
  • There we take a black ball from the container.

22
MMs a simple example
  • Then we either continue to the 2nd state (p
    0.4) and take a red ball or we go to state 1
    again (self-loop) and take another black ball
    from the container.
  • We continue until we get to state E and have
    collected a sequence of coloured balls.

23
MMs a simple example
  • We can now compute the probabilty that the HMM
    displayed below generated the shown sequence of
    observations as

in fact, we should have written x1 for each ball
takenfrom the container
1x0.6x0.4x0.5x0.5x0.5x05x05x0.5x0.7x0.7x0.7x0.3
24
Hidden Markov modelling
  • Hidden Markov models (HMMs) differ from Markov
    models in that the state emissions cannot be
    allotted to only a particular state.
  • In our example this would be the case, if all
    three (emitting) containers are filled with red,
    black and yellow balls.
  • The percentage of balls of the different colours
    can be different for the three containers, so
    that the colour emissions have different
    probabilities for each of the three states.

25
HMMs a simple example
  • We start in state S, which does not emit a
    symbol. From there, we go to state 1 with
    probability 1.
  • There we take a ball from the container, which
    can now be red, black or yellow.

26
HMMs a simple example
  • Then we go on to the 2nd state (p 0.4) and take
    a ball from the container or we go to state 1
    again and take another ball from that container.
  • We continue until we get to state E and have
    collected a sequence of coloured balls.

27
HMMs hidden states
  • In the situation that we can see a sequence of
    coloured balls it is now impossible to recognise
    with certainty in which states (from which
    container) each ball has been taken. The states
    are hidden, that is why we speak of hidden
    Markov modelling. 1 1 1 1 1 2 2 2 2 3 3 3 1
    1 1 2 2 2 2 2 3 3 3 3 etc.

28
HMMs speech recognition
  • Sequence of coloured balls acoustic frames of
    parameter vectors.
  • It is the task of the ASR system to identify the
    sequence of states which is most likely to have
    generated/emitted the frame sequence representing
    an utterance. This is dependent on the transition
    and emission probabilities of the states.

29
HMMs transitions
  • In ASR left-to-right models (as in the previous
    graphical representations) are used, because the
    acoustic events are ordered in time. Vowels, for
    instance, are often thought of as a sequence of
    onset transition, steady state and offset
    transition.
  • If a model is trained for pauses, transitions are
    allowed from each state to any other state
    (including self-loops), because the sequence of
    acoustic events is random (ergodic model).

30
HMMs emissions
  • Emissions can be described using
  • a vector codebook a fixed number of vectors are
    used to represent the acoustic space. They are
    related to states by state-specific emission
    probabilities.
  • Gaussian mixtures the variation in the acoustic
    realisation in each state is described by a
    normal distribution.

31
HMMs complex models
  • More complex models are also used
  • parallel states and multiple mixtures can capture
    the variation in the realisation of speech sounds
    (speaker, dialect, context, etc.) more
    effectively.
  • Generalised triphones describe a speech sound in
    different contexts. The contexts are grouped
    (e.g. according to place of articulation or on
    the basis of data-driven clustering techni-ques.
    The grouping reduces the requirements in terms of
    the size of the training corpus.

32
HMMs speech recognition
  • There are several state sequences which can
    generate the same signal (frame sequence). The
    state sequence with the highest probability is
    found using the Viterbi algorithm.
  • This is done for all HMMs. The HMM leading to the
    highest probability is recognised.
  • Since we do not usually want to recognise a
    single speech sound, but a sequence of speech
    sounds, the optimisation is performed over
    sequences of HMMs.

33
HMMs lexicon language model
  • HMM is now used for continuous, spontaneous
    speech recognition. Besides acoustic (hidden
    Markov) models, we also need a lexicon and a
    language model.
  • In the lexicon, all the words (or morphemes)
    which the system must be able to recognise are
    listed with their pronunciation.
  • In the language model, all the possible
    combinations of lexical entries are described.

34
HMMs lexicon
  • The lexicon entries consist of an orthographic
    word and its realisation in terms of a sequence
    of HMMs for speech sounds.
  • In order to better cope with variation in the
    pronunciation of words, pronunciation variants
    are sometimes added to the lexicon which include
    reductions, epentheses and assimilations.

35
HMMs lexicon
  • Phonological processes can lead to a change in
    the identity of a speech sound
  • deletion
  • insertion
  • assimilation

36
Phonological variation
HMMs lexicon
  • deletion
  • A speech sound which is there in the so-called
    canonical form (lexicon form), is not realised.
  • ........., isnt it??... ?????
  • (G.) Fährst du mit dem Bus??.......... ???????

37
Phonological variation
HMMs lexicon
  • insertion
  • A speech sound which is not there in the
    canonical form (lexicon form) is inserted.
  • tense - tents
  • (G.) Gans - Ganz

38
Phonological variation
HMMs lexicon
39
HMMs lexicon
  • assimilation
  • The phonological identity of a speech sound
    changes under the influence of the (segmental or
    prosodic) context in which it occurs.
  • input (cf. spelling immediate)
  • but not sometimes, some guys

40
HMMs lexicon
  • Pronunciation variants in the lexicon reduce the
    distance between an acoustic realisation and the
    lexical entry.
  • At the same time, however, the distance between
    lexical entries becomes smaller, which can lead
    to the misrecognition of words. For this reason,
    we often only add the most frequent pronunciation
    variants, e.g. for function words, to improve
    recognition.

41
HMMs language model
  • The language model can be implemented as a
  • rule system these have the advantage that they
    can lead to a better understanding of the
    linguistic properties of utterances.
  • probabilistic system n-gram probabilities are
    computed for word sequences. They general-ise
    less and need a lot of training data. If the test
    condition matches the training well (text type,
    lexical domain, etc.), they describe the observed
    speech behaviour very well.

42
HMMs language model
  • Orthographically distinguishable utterances can
    have identical acoustic realistions. The language
    model can choose one of two possible readings
    dependent on the probabilities of the word
    sequences.
  • Example r??????sp???

Recognise speech
Wreck a nice beach
43
HMMs language model
  • Orthographically distinguishable utterances can
    have identical acoustic realistions. The language
    model can choose one of two possible readings
    dependent on the probabilities of the word
    sequences.
  • Example ???????????????

Get up at eight oclock
Get a potato clock
44
HMMs applications
  • HMM systems are used in
  • information systems (travel information)
  • hands-free telephony
  • spoken input, e.g. in navigation systems
  • aids for the handicapped
  • dictation systems, e.g. NaturallySpeaking
    (Dragon), ViaVoice (IBM), FreeSpeech (Philips)

45
References
  • Van Alphen, P. und D. van Bergem (1989). Markov
    models and their application in speech
    recognition, Proceedings Institute of Phonetic
    Sciences, University of Amsterdam 13, 1-26.
  • Holmes, J. (1988). Speech Synthesis and
    Recognition (Kap. 8). Wokingham (Berks.) Van
    Nostrand Reinhold, 129-152.

46
References
  • Cox, S. (1988). Hidden Markov models for
    automatic speech recognition theory and
    application, Br. Telecom techn. Journal 6(2),
    105-115.
  • Lee, K.-F. (1989). Hidden Markov modelling
    past, present, future, Proc. Eurospeech 1989,
    vol. 1, 148-155.

47
(No Transcript)
48
The definition of sound segments in phonetics
and speech technology
  • ASR techniques neural networks

49
Neural networks
  • Artificial neural networks are particularly
    suited for the classification of input signals,
    e.g. to recognise to what sound an acoustic frame
    belongs.
  • They are not suited to integrate information over
    time.

50
NNs biological basis
  • The building blocks of a neural net are based on
    the functionality of biological nerve cells, as
    they are found in the brain (about 1010 neurons).
  • As a simplifying statement we could say that a
    nerve cell does no more (nor any less) than
    compute a weighted sum of its inputs and create
    an output dependent on this weighted sum.

51
NNs biological neuron
synapses on cell body
axon
propagation of activation
cell body (soma)
A membrane surroundsthe neuron
dendrites
52
NNs biological neuron
  • An axon carries the signal. It is long and can
    split itself several times. The ends of an axon
    are connected to dendrites or to the cell body of
    another nerve cell by means of synapses.
  • Once the threshold for the electric potential of
    a synapse is exceeded, the impulse propagates
    across the synapse.
  • The threshold of a synapse changes, if it is
    rarely/frequently activated.

53
NNs artificial neuron
  • Like a biological neuron, an artificial neuron
    has one or more inputs (cf. dendrites and axons
    directly connected to the cell body).
  • The function of a synapse is described by the
    activation function.
  • The output is generated on the basis of the
    activation.

54
NNs artificial neuron
input vectorx x1, x1,...., xn
output valuey f(z)
weights vectorw w1, w1,...., wn
activationz F(x,w)
x1
w1
y f(z)
z F(x,w)
w2
x2
wn
xn
55
NNs artificial neuron
Just a single neuron can distinguish two
categories. A simple NN consisting of one single
neuron can determine for instance whether water
is contaminated or not. This would be the case,
when specific measures exceed a threshold. Y
output z activation, is dependent on
inputs x and weights w f function linear,
threshold, sine, etc.
yf(z)
output
activation
56
NNs 2 main types
A NN is built up from single neurons. Two main
types of neural networks are distinguished, which
we shall discuss hereafter
  • multi-layer perceptrons (MLPs)
  • Kohonen networks

57
NNs MLP
By combining many neurons, the NN can learn very
complex relationships. A standard MLP consists of
the following layers
  • Input layer number of input units equals the
    number of signal parameters
  • Hidden layer (or layers) usually configured so
    that all units are connect to the units in the
    input layer and to each other
  • Output layer number of units equals the number
    of categories to be distinguished.

58
MLPs graphic display
Output layer Hidden layer 2 Hidden layer 1 Input
layer
59
MLPs connections
The connections between the units are learnt in
the training. Learning rules determine how the
(initially random) weights are optimised
dependent on the distance to the required output
(supervised learning). Because of the connections
between the units, the computation with NNs is
also called connectionism. Since the
information is processed by many units in
parallel, the expression parallel distributed
processing (PDP) is also used.
60
MLPs time
NNs are very suitable for categorising single
input frames. Change over time is not handled so
well. Several solutions to this problem have been
suggested
  • inputframe plus several contextframes
  • time-delayed NNs

61
NNs Kohonen networks
  • In MLPs the output computed by the NN for each
    input is compared with the required out-put
    (supervised). On the basis of the difference
    between computed and required output, the weights
    of the connections between the units are adapted
    (usually by backpropagation).
  • Kohonen networks, on the other hand, are
    unsupervised. For this reason they are also
    called self-organised.

62
NNs Kohonen networks
  • Kohonen networks only consist of an input and an
    output layer (competitive layer).
  • The units in the input layer are connected to all
    the units in the output layer.
  • All connections are weighted.

63
Kohonen networks training
  • At the start the weights of each unit are
    initialised with a vector of small random values
    (the vecor size equals the number of input
    parameters).
  • In the training the unit whose weights are
    closest (Euclidian distance) to the input vector
    wins.
  • The connection weights of the winning neuron are
    adapted in the direction of the input vector,
    without using information about the required
    output (unsupervised learning).

64
Kohonen networks weights
  • The units weights are adapted so, that they
    better predict the input vector. A similar, but
    smaller adatation is made for units which are
    close to the winning neuron in the
    self-organising map, while units which are
    farther away are inhibited (mexican hat
    function).
  • In this way, clusters build up in the network,
    which organise information in a
    topographical/phonotopic way.

65
Kohonen networks calibration
  • At the end of the training the Kohonen network is
    calibrated to each input vector (e.g. of
    acoustic parameters) is attached the required
    output (e.g. speech sound).
  • For each neuron, a list is created of the speech
    sounds by which it has been activated, together
    with the number of times each speech sound
    activated the neuron.
  • At the end of the calibration, a probability is
    computed for each neuron that it was activated by
    each speech sound.

66
Kohonen nets graphic display
67
Kohonen nets graphic display
Phonotopic map calibrated with speech sounds
(part)
68
Kohonen nets graphic display
Phonotopic map calibrated with speech sounds
(part)
69
Hybrid system
  • As we said at the beginning, neural nets are good
    at discriminating, but bad at modelling time.
  • One way of overcoming this problem is by using a
    hybrid system, in which the output of the neural
    net is used as input to hidden Markov modelling.

70
Hybrid systems
phone
phone
Kohonen network
MLP
spectral parameters
spectral parameters
71
References
  • Lippmann, R.P. (1989). Review of neural
    net-works for speech recognition, in A. Waibel
    und K.-F. Lee, Readings in Speech Recognit-ion,
    374-392. San Mateo Morgan Kaufmann.
  • Ritter, H., T. Martinez und K. Schulten (1992).
    Neural Computation and Self-Organizing Maps, Kap.
    2 - 4. Bonn Addison-Wesley.
  • Kohonen, T. (1988). The neural phonetic
    typewriter, in A. Waibel und K.-F. Lee,
    Readings in Speech Recognition, 413-424. San
    Mateo Morgan Kaufmann.

72
(No Transcript)
73
Einführung in dieautomatische Spracherkennung
  • Dynamic Time Warping
  • Sommersemester 2001
  • Jacques Koreman

74
Dynamic time warping (DTW)
  • Weil das Auffinden von invarianten akustischen
    Cues für Laute im Signal sich als sehr schwer
    herausgestellt hat, wird in vielen Systemen eine
    allgemeine Mustererkennung verwendet.
  • Die ersten Mustererkennungssysteme verwendeten
    dynamic time warping (DTW), auch dynamische
    programmierung genannt.
  • Die Erkennungseinheit war meistens das Wort (also
    keine kontinuierliche Spracherkennung!).

75
Das Prinzip
  • Für jedes Wort, das erkannt werden soll, wird ein
    Referenzmuster (E. template) gespeichert, mit
    dem alle Inputsignale verglichen werden. Das
    Referenzmuster, das die größte Ähnlichkeit zum
    Inputsignal aufweist, wird erkannt.
  • Vorteil koartikulatorische Effekte die Variation
    in der Lautrealisierung zufolge haben (Problem
    für kenntnisbasierte Spracherkennung) werden
    mitmodelliert.

76
Das Referenzmuster
  • Da wir wissen, daß die reine Schalldruckwelle,
    die mit dem Mikrofon aufgenommen werden kann, von
    Realisierung zu Realisierung stark
    unterschiedlich sein kann, wird eine andere
    Darstellung für das Referenzmuster verwendet. Sie
    besteht aus Parametern, die die Änderung der
    Verteilung der Energie über das Frequenz-spektrum
    darstellen (vgl. Spektrogramm).

77
Das Referenzmuster Abtastrate
  • Früher wurden die Parameter wegen dem Anspruch an
    Speicherkapazität einmal pro 10 oder 20 ms.
    abgeleitet. Heutzutage ist die Speicherkapazität
    meistens kein Problem mehr und wird alle 5 ms.
    eine Vektor von Parametern berechnet (manchmal
    sogar ein mal pro Milli-sekunde).

78
Das Referenzmuster Parameter
  • Die Parametrisierung des Signals kann bestehen
    aus
  • Filterbankparametern (linear rein akustisch
    Beschreibung)
  • Filterbankparametern (logarithmisch
    perzeptorische Modellierung)
  • LPC Analyseparametern (Modellierung des
    Produktionssystems)

79
Das Referenzmuster Beispiel
Output einer 9-Kanals Filterbank für eine
Realisierung vom Wort three und zwei vom Wort
eight. Wie erwar-tet, sind die beiden eight
ähnlicher (Holmes, S. 104).
80
Der Vergleich
Jedes Referenzmuster wird mit dem Inputsignal
verglichen. Dabei können Unterschiede in
Lautstärke, Intonation und zeitlicher
Realisierung vorliegen. DTW ist vor allem
geeignet, um zeitliche Änderungen zu modellieren.
  • Der Effekt von Lautstärke-Unterschieden, die in
    den meisten Fällen für den Unterschied zwischen
    Wörtern nicht relevant sind) wird geringer, wenn
    die Energie logaritmisch dargestellt wird.

81
Der Vergleich
  • Die Tonhöhe ist für den Unterschied zwischen
    Wörtern normalerweise nicht relevant (bis auf
    mikroprosodische Effekte). Sie wird dadurch
    ge-glättet, daß man breite Frequenzbänder wählt
    und ein längeres Zeitfenster benutzt, wodurch die
    einzelnen Perioden gepooled werden.
  • Zeitliche Änderungen entstehen durch
    unter-schiedliche Sprechgeschwindigkeiten, wobei
    auch lokale Änderungen (innerhalb des Wortes)
    auftreten können.

82
Der vergleich
Für Unterschiede in Sprechgeschwindigkeit kann
man dadurch kompensieren, daß man das
Input-muster ausdehnt (wenn es schnell gesprochen
wurde) oder komprimiert (wenn es langsam
gesprochen wurde), bevor man das Abstands-maß
berechnet. Mögliche Anpassungen (E. matchings)
  • Keine
  • linear
  • non-linear

83
Der Vergleich Visualisierung
Akustisches Muster und Vergleichsmaß (a)(b)
akust. Muster x und y, (c) keine zeitliche
Anpassung, (d) lineare und (e) nicht-lineare
Anpassung (Huang et al., S.72)
84
Der Vergleich die Diagonale
Wenn man das Testwort auf der x-Achse und das
Referenzmuster auf der y-Achse darstellt, stellt
die Diagonale den perfekten Pfad dar, da sie
bedeutet, daß die beiden Wortrealisierungen genau
gleich sind. Jede Abweichung von der Diagonale
wird bestraft. Die Summe dieser Abweichungen (für
die Energie in Filterbändern oft Euklidische
Abstände zwischen Referenzframe und Inputframe)
ergibt ein Abstandsmaß zwischen Inputwort und
Referenzmuster.
85
Der Vergleich Visualisierung
DTW Pfad für Test- und Referenzwort (Holmes, S.
116)
86
Der Vergleich Beschränkung
  • Negative Steigungen sind nicht erlaubt (die
    Reihenfolge der Realisierung ist immer von links
    nach rechts).
  • Seitenpfade, die einen zu großen Abstandswert
    aufweisen, werden abgeschnitten (E. pruning).

87
Probleme
  • Die Wahl des Referenzmusters für ein Wort kann
    die Erkennung stark beeinflussen.
  • Die Annahme, daß Anfangs- und Endpunkt des Wortes
    korrekt gefunden werden, ist nicht immer gegeben.
  • DTW eignet sich nicht sehr gut für die Erkennung
    kontinuierlich gesprochener Sprache, da jedes
    Frame der Anfang eines neuen Wortes darstellen
    kann, so daß die Zahl der Vergleiche
    explosionsartig zunimmt (z.B. six teenagers
    versus sixteen ages)..

88
Literaturangaben
  • Holmes, J. (1988). Speech Synthesis and
    Recognition (Kap. 7).Wokingham (Berks.) Van
    Nostrand Reinhold.
  • Holmes, J. (1991). Spracherkennung und
    Sprachsynthese (Kap. 7). München Oldenburg.
  • Huang, X.-D., Ariki, Y. und Jack, M. (1990).
    Hidden Markov Models for Speech Recognition, S.
    70-78. Edinburgh Edinburgh University Press.

89
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com