Computerimplemented neural model of speech production and speech acquisition

About This Presentation

Title:

Computerimplemented neural model of speech production and speech acquisition

Description:

Computer-implemented neural model of speech production and ... Medical Faculty of the Aachen University, RWTH Aachen, Germany ... glottis (2 tube sections) ... – PowerPoint PPT presentation

Number of Views:86

Avg rating:3.0/5.0

Slides: 109

Provided by: berndj5

Category:

more less

Transcript and Presenter's Notes

Title: Computerimplemented neural model of speech production and speech acquisition

1
Computer-implemented neural model of speech
production and speech acquisition

Bernd J. Kröger
Department of Phoniatrics, Pedaudiology,
and Communication Disorders,
Medical Faculty of the Aachen University,
RWTH Aachen, Germany

For more literature see http//www.speechtrainier.
eu
2
Aachen University, Germany
Bernd J. Kröger bkroeger_at_ukaachen.de Download
of refs www.speechtrainer.eu
3
Aachen University (Hospital)
3 Tesla MR-Scanner (Philips)
4
Overview

Introduction
The 3D model
The neural control component
Modeling speech acquisition
Speech perception experiments
Results and further work

5
Overview

Introduction
The 3D model
The neural control component
Modeling speech acquisition
Speech perception experiments
Results and further work

6
Speech Production
separate
technical term controller
control module central nervous system
controlled system / plant
articulatory and acoustic module articulators
and cavities
7
Note
Birkholz model (2006)

We have a lot of knowledge concerning the plant
articulatory geometries
speech acoustics

We have much less knowledge concerning
neural control of speech articulation
8
Note

We have a lot of knowledge concerning the plant
articulatory geometries
speech acoustics

We have much less knowledge concerning
neural control of speech articulation
9
Neural computer-implemented models of production
and acquisition including articulation and
acoustics

Neural models of speech production
e.g. Levelt, Dell ? neurolinguistic models
e.g. Guenther (2006) neural model of the
sensorimotor processes of speech production
(articulation)
newer models see this session (including my
talk)
Neural models of speech acquisition
focusing on pre-linguistic phases (babbling)
focusing on 00 16 i.e. time interval before
the start of the vocabulary spurt?? (Guenther
1995, Guenther et al. 2006, prelim. Bailly 1997)
This talk
I will present results of my work on neural
modeling of speech production and acquisition
which is related to the Guenther approach

10
Overview

Introduction
The 3D vocal tract model as front-end part of a
neural model of speech production
The neural control component
Modeling speech acquisition
Speech perception experiments
Results and further work

11
3D Articulatory Model
Birkholz et al. (2006)

11 wireframe meshes representing the
upper cover (palate, velum, pharynx wall, ) (4
meshes)
lower cover (mandible, pharynx, ) (3 meshes)
upper and lower teeth, lips, tongue (4 meshes)
belongs to the group of geo-metrical models in
comparison to statistical or biomechanical vocal
tract models

12
3D Articulatory Model
Birkholz et al. (2006)

complete model
upper and lower cover (light gray)
tongue and lips (dark gray)
upper and lower row of teeth (black)
The model is based on 3D static and 2D dynamic
MRI-Data of one speaker of Standard German (JD,
ZAS, Berlin)

see http//www.vocaltractlab.de
13
High Quality Acoustic Model
Birkholz et al. (2006)

comprises modeling the
subglottal tract (13 tube sections)
glottis (2 tube sections)
pharyngeal and oral tract (40 tube sections with
individual length not necessarily equidistant
cylinder tubes)
nasal tract (19 tube sections) and 4 sections for
paranasel sinuses

equivalent electrical circuit
14
Overview

Introduction
The 3D model
The neural control component
Modeling speech acquisition
Results
Further work

15
The Neural Control Component.Some basics

Ensembles / groups of neurons, representing
motor, sensory or other states constitute the
central levels of the neural control component
? neural maps

16
Motor states and sensory states
are based on a set of parameters each

somatosensory
information
- proprioceptive
- tactil
auditory
information

17
Articulatory or lower level motor parameters
10 parameters (defined a priori in a
geometrical articulatory model)

control positions of articulators relative to
position of other articulators
joint coordinates (vs. spatial coordinates)

18
JAA lower jaw angle

JAA influences position of
lower jaw
tongue body
tongue tip
lower lips

19
Motor states and sensory states
are based on a set of parameters each

somatosensory
information
- proprioceptive
- tactil
auditory
information

20
Proprioceptive parameters
Flesh point locations Absolute position of end
articulators with respect to hard palate (or
cranial system / skull )
tract-variables
? spatial coordinates (vs. joint coordinates)
on the border sensory vs. higher level motor
representations
21
Tactile parameters
contact pattern for a velar closure
g
contact area of movable articulators (tongue
body, tongue tip, lips) with vocal tract walls
(regions lower pharyngeal, upper pharyngeal,
velar, palatal, postalveolar, alveolar)
22
Tactile parameters
contact area of movable articulators (tongue
body, tongue tip, lips) with vocal tract walls
(regions lower pharyngeal, upper pharyngeal,
velar, palatal, postalveolar, alveolar)
23
Auditory parameters
3D model
area function
transfer function
and perimeter function
F1
F2
F3
Bark values of F1, F2, F3
24
Parameters values are directly coded by neural
activations in neural maps
1 ) Sub-groups of neurons are defined for each
parameter 2 ) All sub-groups of neurons for all
parameters define the whole neural map
TTy
HYy
TBx
JAy
LId
TTx
VEx
ULx
TBy
HYx
A specific activation of the 4 neu-rons occurrs
for each parameter value
LIH
LIP
TTA
TTL
TBA
TBL
HLH
HLV
JAA
VEH
JAA
Example motor parameter map
25
The Neural Control Component.Some basics

Ensembles / groups of neurons, representing
motor, sensory or other states constitute the
central levels of the neural control component ?
neural maps
Neural networks connect these neural maps ?
neural mappings

The organization of the whole control component
26
The whole neural control component
cerebral cortex
Cortico-cortical mappings
cortical
subcortical and peripheral
Guenther et al. (2006)
27
The neural control component
current state
efference copy
error signal
auditory map
sound-to-sensory mappings
sound-map syllable-map word-map
10 ms
proprioceptive map
tactile map
sound-to-motor mapping
somatosensory map
auditory state
5 ms
? feedback subsystem
10 ms
somato-sensory state
motor map
(joint coordinates)
lt 5 ms
motor state
cortical
12 ms
neuro-muscular processing
somato-sensory processing
auditory processing
articulatory state
30 ms
articulatory signal
acoustic signal
subcortical and peripheral
0 ms
28
The neural control component
current state
efference copy
error signal
auditory map
sound-to-sensory mappings
sound-map syllable-map word-map
10 ms
proprioceptive map
tactile map
sound-to-motor mapping
somatosensory map
auditory state
5 ms
? feedback subsystem
10 ms
somato-sensory state
motor map
(joint coordinates)
? later on during speech acquisition the
feed-forward control subsystem becomes more and
more active
lt 5 ms
motor state
cortical
12 ms
neuro-muscular processing
somato-sensory processing
auditory processing
articulatory state
30 ms
articulatory signal
acoustic signal
subcortical and peripheral
0 ms
29
Again Differentiation of maps vs. mappings
a sample neural network
map ensemble of neurons representing different
(sensory or motor) states (neural
representation) mapping association of
appropriate or related states in different
maps (has to be trained / learned gt adjustment
of link weights)
map 1 (e.g. sensory) mapping from map 1 to map
2 map 2 (e.g. motor)
Two types of neural networks were used
30
One-layer feed forward networks
unidirectional
sensory parameters (proprioceptive param.)
input neuron layer (map)
TTy
HYy
TBx
JAy
LId
TTx
VEx
ULx
TBy
HYx
Links each with each other neuron (mapping)
output neuron layer (map)
LIH
LIP
TTA
TTL
TBA
TBL
HLH
HLV
JAA
VEH
motor parameters (joint coordinates)
31
Self-organizing maps (SOMs)

SOM-neurons form the central part of the
self-organizing map (central layer of neurons
SOM-layer 2D map, representing an ensemble of
neurons of the cerebral cortex)
input neurons (in terms of Kohonen) ? map 1 and
map 2 (in our terms)
neurons representing e.g. sensory and motor
states
Links and link weights ? part of the mapping

Input
SOM
map 2 map 1
32
Self-organizing maps (SOMs)

SOM-neurons form the self-organizing map (central
layer of neurons SOM-layer cortical 2D map) ?
part of the mapping in our approach
input neurons (in terms of Kohonen) ? map 1 and
map 2 (in our terms)
neurons representing e.g. sensory and motor
states
link weights ? part of the mapping

both maps (input neurons in terms of Kohonen) can
be interpreted as input or output maps in our
approach
Input
SOM
map 2 map 1
Note this type of network is not unidirectional
but bi-/multi-directional
33
The Neural Control Component.Some basics

Ensembles of neurons, representing motor, sensory
or other states constitute the central levels of
the neural control component ? neural maps
Neural networks connect these neural maps ?
neural mappings
Neural networks must be trained, i.e. they have
to learn something during speech acquisition (and
later on)

34
Example for learning / trainingHow does it work
e.g. during babbling?(Babbling exploring your
vocal tract)
Sensory information can be generated by the 3D
model for each articulatory or motor state

somatosensory
information
- proprioceptive
- tactil
auditory
information

35
Learning during babbling
Sensory information can be generated by the 3D
model for each articulatory or motor state

somatosensory
information
- proprioceptive
- tactil
auditory
information

Produce a lot of random motor states
36
Learning during babbling
Sensory information can be generated by the 3D
model for each articulatory or motor state

somatosensory
information
- proprioceptive
- tactil
auditory
information

37
Learning during babbling
Sensory information can be generated by the 3D
model for each articulatory or motor state

somatosensory
information
- proprioceptive
- tactil
auditory
information

related
38
Learning during babbling
Sensory information can be generated by the 3D
model for each articulatory or motor state

somatosensory
information
- proprioceptive
- tactil
auditory
information

39
Learning during babbling
Sensory information can be generated by the 3D
model for each articulatory or motor state

somatosensory
information
- proprioceptive
- tactil
auditory
information

an amount of random motor states
associated with
40
Learning during babbling
Sensory information can be generated by the 3D
model for each articulatory or motor state ?
forms a training set
predict

somatosensory
representation
- proprioceptive
- tactil
auditory
representation

related
motor
sensory
training set
(e.g. 4.000 states)
41
Overview

Introduction
The 3D model
The neural control component
Modeling speech acquisition
Speech perception experiments
Results and further work

42
Modeling speech acquisition
A simple approach

Babbling
Mouthing
Proto-vocalic articulation
Proto-gestures
Imitation
Vowels

is not language specific
all these phases overlap more or less in time
43
Modeling speech acquisition

Babbling
Mouthing (silent mouthing)
Proto-vocalic articulation
Proto-gestures
Imitation
Vowels

44
The neural control component
auditory map
A silent mouthing training set was designed for
training the
proprioceptive map
tactile map
somatosensory map
auditory-to-motor mapping
auditory state
somatosensory-to-motor mapping
?
somato-sensory state
motor map
(joint coordinates)
motor state
cortical
neuro-muscular processing
somato-sensory processing
auditory processing
articulatory state
articulatory signal
acoustic signal
subcortical and peripheral
45
Training set silent mouthing

combination of min, (mid,) and max values of all
10 motor parameters (Kröger et al. 2006c, DAGA
Braunschweig)
double closures and non-physiological
articulations are avoided

subset for lips
subset for tongue
? 4608 patterns of training data
46
Training

Design of the net one-layer feed-forward,
2518 input neurons (somatosensory), 40 output
neurons (motor) ? ca. 2000 links
Set of 4608 patterns of training data
? min-max combination training set silent
mouthing
5.000 cycles of batch training
? mean error ca. 10 for prediction of a motor
state from its somatosensory state (Kröger et
al. 2006b, ISSP, Ubatuba, Brazil)
Software Java-version of SNNS (Stuttgart Neural
Network Simulator) http//www-ra.informatik.uni-tu
ebingen.de/SNNS/

47
Training results some features of motor
equivalence
despite prediction error 10
position of lower jaw low
position of lower jaw high
labial closure
apical closure
dorsal closure
each column somatosensory values are the same
(except of jaw parameter) ? acoustically relevant
closures are kept despite strong jaw perturbation
48
The neural control component
auditory map
proprioceptive map
tactile map
somatosensory map
auditory-to-motor mapping
auditory state
somatosensory-to-motor mapping
somato-sensory state
motor map
(joint coordinates)
motor state
cortical
neuro-muscular processing
somato-sensory processing
auditory processing
articulatory state
articulatory signal
acoustic signal
subcortical and peripheral
49
Modeling speech acquisition

Babbling
Mouthing
Proto-vocalic articulation
Proto-gestures
Imitation
Vowels

50
The neural control component
auditory map
A proto-vocalic training set was designed for
training the
proprioceptive map
tactile map
somatosensory map
auditory-to-motor mapping
auditory state
?
somatosensory-to-motor mapping
somato-sensory state
motor map
(joint coordinates)
motor state
cortical
neuro-muscular processing
somato-sensory processing
auditory processing
articulatory state
articulatory signal
acoustic signal
subcortical and peripheral
51
Proto-vocalic training set
? 540 states
start with a front high, back high and low gesture
i

use articulatory constraints
there remain only two degrees of freedom
giving the 2D-plane
? the training set covers the whole (continuous
language independent) vowel space

u
a
52
Proto-vocalic training set
i
u
continuous articulatory space for proto-vocalic
articulation
540 states, in three subsets
a
from u to a, i
from a to i, u
from i to u, a
53
Using SOMs for the auditory-to-motor mapping

10x10 100 neurons form the self-organizing map
layer
topology rectangular ordering (not hexagonal)
40 input neurons ? 4000 link weights
36 neurons representing the 9 joint coordinate
parameters (motor representation without velum)
4 neurons representing F1 and F2 (auditory
representation)
200 cycles x 524 training patterns 104.800
training steps
? mean error 0.6 for prediction of an
articulatory state (Kröger et al. 2006b)
? very precise mapping for predicting
protovocalic motor states from auditory states
(F1, F2)!!!
using standard SOM learning algorithm (Kohonen
1995 and 2001)
initialization random distribution of link
weights within interval 0.4 0.6
SOM update radius 5 neurons, rectangular
neighborhood function
SOM update radius decay 0.999
learning rate factor 0.1 learning rate decay
factor 0.99
modus winner takes all

54
Training results for the auditory-to-motor
mapping node plot of SOM-link weights
display of auditory link weight values for each
neuron
SOM neurons as cortical map
1.0
i
F2
a

The whole range of training data is covered
The topology is preserved (not many
doublings/folds)

u
0.0
0.0
1.0
F1
55
Training results for the auditory-to-motor
mapping bar plot of same SOM-link-weights
cortical map
cortical map
u
one neuron
Simply a different display of the auditory link
weight values for each neuron
i
a
56
Training results for the auditory-to-motor
mapping bar plot of SOM-weights

1 ) continuous F1-F2-transitions for the i-a
and u-a paths
2) Ordering of proto-vocalic states with respect
to phonetic criteria (high-low, front-back)
Phonetotopy (cp. fMRI-study of Obleser et al.
2007)

u
a
i
57
The neural control component
auditory map
static articulation
proprioceptive map
tactile map
somatosensory map
auditory-to-motor mapping
auditory state
somatosensory-to-motor mapping
somato-sensory state
motor map
(joint coordinates)
motor state
cortical
neuro-muscular processing
somato-sensory processing
auditory processing
articulatory state
articulatory signal
acoustic signal
subcortical and peripheral
58
Modeling speech acquisition

Babbling
Mouthing
Proto-vocalic articulation
Proto-gestures
Imitation
Vowels

static
dynamic
59
The auditory-to-motor mapping for
vocal-tract-closing gestures
Each training pattern is a whole gesture!
t0
t1
t2
t3
t4
Starting from different proto-vocalic positions
frequency bark
0
10
20
t0 0 msec t4 20 msec
time msec
motor state
auditory state
the complete closing gesture
whole formant pattern
association
60
The auditory-to-motor mapping for closing
gestures 10x10 SOM

10x10 100 neurons form the SOM neuron layer
59 input neurons ? 5800 link weights
56 neurons representing the formant transition
(F1, F2, F3 and its time deviation and length)
3 neurons representing closure-related
articulator labial / apical / dorsal and 4
neurons representing the starting vowel (TBx,
TBy)
Set of 22 x 9 198 training patterns
closing-gestures training-set based on 22
proto-vocalic articulatory states
production of 9 closures 3x labial, 3x apical,
and 3x dorsal
500 cycles x 198 training patterns / training
steps ? 99.000
? mean error 0.8 for prediction of the motor
state from its formant pattern
Standard SOM learning algorithm (Kohonen 1995 and
2001)
initialization random distribution of link
weights within interval 0.4 0.6
SOM update radius 3 neurons, rectangular
neighborhood function
SOM update radius decay 0.999
learning rate factor 0.1 learning rate decay
factor 0.99
modus winner takes all

61
Bar plot of the SOM for closing- /
proto-VC-gestures
10x10 SOM Column 1-3 SOM link weights for motor
parameter closure-related articulator
62
SOM is capable of learning different types of
formant transitions for different starting vowels
10x10 SOM Column 1-3 SOM link weights for motor
parameter closure-related articulator Formant
transitions auditory SOM link weights

clear separation for the motor states with
respect to the closure-related articulator

63
SOM training 2 (instance 2)
Column 1-3 closure-related articulator Column
4,5 SOM link weights for motor parameter
initial vocalic state (TBx, TBy) a TBy -gt
0 i TBx, TBy -gt 1 u TBx -gt 0 TBy -gt 1
64
SOM training 2 (brain 2)
Column 1-3 closure-related articulator Column
4,5 SOM link weights for motor parameter vowel
(TBx, TBy) a TBy -gt 0 i TBx, TBy -gt 1 u
TBx -gt 0 TBy -gt 1
65
SOM training 2 (brain 2)
Column 1-3 constriction forming
articulator Column 4,5 SOM link weights for
motor parameter vowel (TBx, TBy) a TBy -gt
0 i TBx, TBy -gt 1 u TBx -gt 0 TBy -gt 1
da
gu
di
bu
ba
du
bi
gi
ga
? phonetotopic ordering with respect to the
initial proto-vocalic state
66
Modeling speech acquisition

Babbling
Mouthing
Proto-vocalic articulation
Proto-gestures
Imitation
Vowels

This was non language-specific learning of
proto-vowels and proto-consonantal gestures
The feedback loop is now trained to a certain
degree, which allows imitation
language specific training
67
The neural control component
auditory map
language-specific V and VC
tactile map
proprioceptive map
somatosensory map
auditory state
auditory-to-motor mapping
somatosensory-to-motor mapping
somato-sensory state
motor map
joint coordinates
motor state
cortical
neuro-muscular processing
somato-sensory processing
auditory processing
articulatory state
articulatory signal
acoustic signal
subcortical and peripheral
feedback loop
68
Training set mothers vowels
/i/
/e/
hypothetical langu-age with 7 vocalic phonemes
/i/, /e/, /?/, /a/, /?/, /o/, /u/ 100 items per
vowel phoneme
/?/
F2
/a/
/?/
/o/
/u/
F1
69
Auditory mapping (F1, F2) for babbling and
imitation (brain 2)
15x15 SOM Input 4 F1/F2-neurons 500 cycles x
540 training patterns 270000 training steps
i
F2
nodes of SOM are continu-ously distributed over
the whole vowel space
a
u
F1
70
Auditory mapping (F1, F2) for babbling and
imitation
/i/
15x15 SOM Input 4 F1/F2-neurons 500 cycles x
540 training patterns 270000 training steps
/e/
/?/
F2
nodes of SOM are continu-ously distributed over
the whole vowel space - A shift towards
phonemic regions (perceptual magnet effect ?)
/a/
/?/
/o/
/u/
F1
concentration of net nodes at the phonemic regions
71
2nd example 5-vowel system
auditory link weights
/i/
hypothetical langu-age with 5 vocalic phonemes
/i/, /e/, /a/, /o/, /u/
/e/
F2
/a/
/o/
/u/
F1

concentration of net nodes at the phonemic
regions

72
vocalic training data
/i/
hypothetical langu-age with 5 vocalic phonemes
/i/, /e/, /a/, /o/, /u/ broader phoneme clouds
/e/
F2
/a/
/o/
/u/
F1
73
Bar-plot for the same V-SOM
phonemic link weights
hypothetical langu-age with 5 vocalic phonemes
/i/, /e/, /a/, /o/, /u/
? clear separation of phonemic regions
By the way The phonetic ordering of vowel
phonemes within the map is given. It results from
the ordering of nodes in the F1-F2-plane trained
during babbling! Phonetotopy!
74
Results of our speech acquisition modeling

Self-organizing maps are useful for modeling the
higher level mappings sensory-to-motor,
phonemic-to-sensory (and -to-motor)

But
The mappings shown thus far within the
schematic diagram of our model are
unidirectional.

75
The neural control component
auditory map
sound-to-sensory mappings
sound-map syllable-map word-map
should be modified with respect to the fact, that
we are using SOMs
tactile map
proprioceptive map
somatosensory map
auditory state
auditory-to-motor mapping
somatosensory-to-motor mapping
sound-to-motor mapping
somato-sensory state
motor map
joint coordinates
motor state
cortical
neuro-muscular processing
somato-sensory processing
auditory processing
articulatory state
articulatory signal
acoustic signal
subcortical and peripheral
76
Implications for the model from doing speech
acquisition

Self-organizing maps are very successful for
modeling the higher level mappings
sensory-to-motor, lexical-to-sensory (and
-to-motor)

But
The mappings shown thus far within the
schematic diagram of our model are
unidirectional.
The SOM neuron layers themselves are not
represented within the schematic diagram

77
The neural control component
auditory map
sound-to-sensory mappings
sound-map syllable-map word-map
should be modified with respect to the
SOMs Mappings are unidirectional? SOM layers are
not represented here!
tactile map
proprioceptive map
somatosensory map
auditory state
auditory-to-motor mapping
somatosensory-to-motor mapping
sound-to-motor mapping
somato-sensory state
motor map
joint coordinates
motor state
cortical
neuro-muscular processing
somato-sensory processing
auditory processing
articulatory state
articulatory signal
acoustic signal
subcortical and peripheral
78
The neural control component
auditory map
sound-to-sensory mappings
sound-map syllable-map word-map
should be modified with respect to the
SOMs Mappings are unidirectional? SOM layers are
not represented here!
tactile map
proprioceptive map
somatosensory map
auditory state
auditory-to-motor mapping
somatosensory-to-motor mapping
sound-to-motor mapping
somato-sensory state
motor map
joint coordinates
motor state
cortical
neuro-muscular processing
somato-sensory processing
auditory processing
articulatory state
articulatory signal
acoustic signal
subcortical and peripheral
79
The neural control component
auditory map
sound-map syllable-map word-map
should be modified with respect to the
SOMs Mappings are unidirectional SOM layers are
not represented here!
tactile map
proprioceptive map
somatosensory map
auditory state
auditory-to-motor mapping
somatosensory-to-motor mapping
somato-sensory state
motor map
joint coordinates
motor state
cortical
neuro-muscular processing
somato-sensory processing
auditory processing
articulatory state
articulatory signal
acoustic signal
subcortical and peripheral
80
The neural control component
sound-map syllable-map word-map
phonological map
auditory map
tactile map
proprioceptive map
V C CV VC CVC
central phonetic map
somatosensory map
auditory state
sound or syllable-specific layers repre-senting
the SOM layers
somato-sensory state
motor map
joint coordinates
motor state
cortical
neuro-muscular processing
somato-sensory processing
auditory processing
articulatory state
articulatory signal
acoustic signal
subcortical and peripheral
81
The neural control component
Introduce multidirectional mappings
sound-map syllable-map word-map
auditory map
static
task V
tactile map
proprioceptive map
V C CV VC CVC
central pho-netic map, sound or syllable specific
somatosensory map
static
auditory state
somato-sensory state
static
motor map
joint coordinates
co-activation of all levels phonemic, sensory,
motor.
motor state
cortical
neuro-muscular processing
somato-sensory processing
auditory processing
articulatory state
articulatory signal
acoustic signal
subcortical and peripheral
82
The neural control component
sound-map syllable-map word-map
phonological map
auditory map
dynamic
task VC
tactile map
proprioceptive map
V C CV VC CVC
central pho-netic map
somatosensory map
dynamic
auditory state
dynamic
somato-sensory state
motor map
joint coordinates
co-activation of all levels
motor state
cortical
neuro-muscular processing
somato-sensory processing
auditory processing
articulatory state
articulatory signal
acoustic signal
subcortical and peripheral
83
Now

How is the production of a sound or syllable
processed within this (new) model (after speech
acquisition)?

84
The neural control component
sound-map syllable-map word-map
phonological map
sound-map syllable-map word-map
auditory map
auditory map
compare internal and external states
task V
tactile map
proprioceptive map
proprioceptive map
tactile map
V C CV VC CVC
central pho-netic map
somatosensory map
somatosensory map
auditory state
auditory state
co-activation of all levels ? internal idea how
the pho-neme should sound and feel like
somato-sensory state
somato-sensory state
motor map
motor map
joint coordinates
joint coordinates
motor state
motor state
cortical
neuro-muscular processing
neuro-muscular processing
somato-sensory processing
somato-sensory processing
auditory processing
auditory processing
articulatory state
articulatory state
corrections if needed
articulatory signal
articulatory signal
acoustic signal
acoustic signal
subcortical and peripheral
85
Overview

Introduction
The 3D model
Motor and sensory representations
The neural control component
Modeling speech acquisition
Speech perception experiments using the
production model
Results and further work

86
Idea

Due to the multidirectional mappings introduced,
this production model can be used as a perception
model

87
The neural control component
sound-map syllable-map word-map
phonological map
sound-map syllable-map word-map
auditory map
auditory map
task V
tactile map
proprioceptive map
V C CV VC CVC
central pho-netic map
somatosensory map
auditory state
Note no coactivation of motor states is needed,
but may occur
somato-sensory state
motor map
joint coordinates
motor state
cortical
neuro-muscular processing
somato-sensory processing
auditory processing
auditory processing
articulatory state
articulatory signal
acoustic signal
subcortical and peripheral
external acoustic signal
88
Question

Do we get typical effects of vowel and consonant
perception using this path within the production
model?

Start consonant perception experiment Does
categorical perception occur?
89
Experiment Consonantal categorical perception

Start 1 ) Generate a stimulus continuum ab, ad,
ag

Then 2 ) train 20 different instances of the
model (using different initial link weight
settings and different ordering of the training
stimuli) ? 20 virtual listeners, doing
identification and discrimination
90
The neural control component
sound-map syllable-map word-map
phonological map
sound-map syllable-map word-map
auditory map
auditory map
task V
tactile map
proprioceptive map
V C CV VC CVC
central pho-netic map
somatosensory map
auditory state
Identification and discrimi-nation is done on the
level of the phonetic map (SOM)
somato-sensory state
motor map
joint coordinates
motor state
cortical
neuro-muscular processing
somato-sensory processing
auditory processing
auditory processing
articulatory state
articulatory signal
acoustic signal
subcortical and peripheral
external acoustic signal
91
Example Phonetic map for VC
Identification neuron with highest degree of
activation Discrimination city-block-distance
of neurons activated for stimuli A and B
brain 2
92
Identification and discrimination scores
consonants
/b/
/d/
/g/
CP lower discrimination scores within phoneme
regions in comparison to phoneme boundaries
100
percentage of identification / discrimination
50
on the basis of (phonological) identification
0
stimulus continuum
20 listeners
93
VC-SOM (brain 1)
apical
Are different instances of the model really
different ??
labial
dorsal
brain 1
94
VC-SOM (brain 2)
dorsal
labial
Are different instances of the model really
different ??
apical
brain 2
95
Vowels Categorical Perception(?)
Stimulus-continuum i, e, a ? 13 green
dots Training of 20 instan-ces of the model (? 20
listeners) Identification neuron with highest
degree of activation Discrimination
city-block-distance of neurons activated for
stimuli A and B
F2
F1
96
Example Phonetic map for V
Identification neuron with highest degree of
activation Discrimination city-block-distance
of neurons activated for stimuli A and B
97
Identification and discrimination scores vowels
Only slightly higher discrimination score at one
phoneme boundary ? CP ??
/i/
/e/
/a/
100
And Much higher percentage of measu-red than
calculated discrimination within all vocalic
phoneme regions in compari-son to consonants
percentage of identification / discrimination
50
0
stimulus continuum
20 listeners
98
Identification and discrimination scores
consonants
/b/
/d/
/g/
100
percentage of identification / discrimination
50
0
stimulus continuum
20 listeners
99
Calculated discrimination

Calculated discrimination is discrimination
based exclusively on differences in (phonemic)
identification
If the rate of calculated really perceived
(measured) discrimination
? the whole acoustic content of a speech item is
needed for phonemic processing.
If measured (perceived) gt calculated
discrimination
? these sounds include additional acoustic
information (e.g. vowels phonemic and phonetic
vowel quality)

100
Results modeling perception using the production
model

The production model gives typical results of
categorical perception
Consonants b, d, g are strongly perceived in a
categorical way (i.e. strongly encoded)
vowels are less categorical

101
Overview

Introduction
The 3D model
Motor and sensory representations
The neural control component
Modeling speech acquisition
Speech perception experiments
Results and further work

102
Results

A computer-implemented neural model of speech
production and speech acquisition based on the
Guenther 2006 approach has been introduced
Training of vowel and consonant production
(voiced plosives in CV-syllables) has been
illustrated.
This production model shows effects of speech
perception straightfor-wardly
Using SOM for cortical mappings straightforward
leads to phonetotopy (cp. results of imaging
experiments given by Obleser et al. 2007)

103
Further work General problems with models

Its limited power
if model is successful
? real human processes have to be similar?
if model fails
? natural process has to be different?
So
Model should be validated by data (e.g. imaging
studies)
On the other hand A model can be taken as
starting point for developing hypotheses for
experiments!

not necessarily
not necessarily
104
Further work

Include VC-syllables, CVC,
Include other types of consonants frics, nasals,
Include canonical babbling
Include comprehension and imitation of first
words
Very important
Separation of higher level and lower level motor
states
More realistic types of neural mappings. But the
principle of self-organization is important!
Solve the normalization problem Difference
between caretakers and toddlers vocal tract

105
to comprehension
from mental lexicon and syllabification
phonological plan
cortical
auditory state
infrequent syllables
auditory map
auditory-phonetic processing
frequent syllables
premotor
prosody
temporal lobe
high-order
primary
ssst.
frontal lobe
motor plan
parietal lobe
high-order
primary
subcortical
motor execution (control and corrections)
cerebellum basal ganglia thalamus
primary motor
cortical
motor state
subcortical and peripheral
auditory receptors and preprocessing
somatosensory receptors and preprocessing
skin, ears and sensory pathways
articulatory state
muscles and articulators tongue, lips, jaw,
velum
articulatory signal
acoustic signal
peripheral
neural model of speech production (Kröger 2007)
106
Acknowledgments

Many thanks to .
Peter Birkholz, Department of Computer Science,
University of Rostock for developing and
implementing the 3D articulatory model
(PhD-thesis)
Jim Kannampuzha, Student of computer Science at
RWTH Aachen for implementing the neural control
model
German Research Council for supporting this work
(Grant Nr KR1439/10-1 and KR 1439/13-1).
Georg Heike and Christiane Neuschaefer-Rube for
always supporting my work

107
Thanks for your attention !!
Bernd J. Kröger Email bkroeger_at_ukaachen.de Ho
mepage and literature http//www.speechtrainer.eu
Dona nobis pacem
108
References see also http//www.speechtrainer.eu

Birkholz P, Jackel D, Kröger BJ (2006)
Development and control of a 3D vocal tract
model. Proceedings of the IEEE International
conference on Acoustics, Speech, and Signal
Processing (ICASSP 2006) Toulouse, France, pp.
873-876
Bullock D, Grossberg S, Guenther FH (1993) A
self-organizing neural model of motor equivalent
reaching and tool use by a multijoint arm.
Journal of Cognitive Neuroscience 5 408-435
Guenther FH, Gjaja MN (1996) The perceptual
magnet effect as an emergent property of neural
map formation. Journal of the Acoustical Society
of America 100 1111-1121
Guenther FH, Ghosh SS, Tourville JA (2006) Neural
modeling and imaging of the cortical interactions
underlying syllable production. Brain and
Language 96 280-301
Kandel ER, Schwartz JH, Jessell TM (2000)
Principles of neural science. MacGraw-Hill, New
York
Kohonen T (2001) Self-organizing maps. Springer,
Berlin, 3rd edition
Kröger BJ, Birkholz P, Kannampuzha J,
Neuschaefer-Rube C (2006a) Modeling
sensory-to-motor mappings using neural nets and a
3D articulatory speech synthesizer. Proceedings
of INTERSPEECH 2006, Pittsburgh, Pennsylvania.
Kröger BJ, Birkholz P, Kannampuzha J,
Neuschaefer-Rube C (2006b) Ubatuba
Kröger BJ, Birkholz P, Kannampuzha J,
Neuschaefer-Rube C (2006c) Spatial-to-joint
coordinate mapping in a neural model of speech
production. DAGA-Proceedings of the annual
meeting of the German Acoustical Society,
Braunschweig, Germany (see also
http//www.speechtrainer.eu)
Oller DK, Eilers RE, Neal AR, Schwartz HK (1999)
Precursors to speech in infancy the prediction
of speech and language disorders. Journal of
Communication Disorders 32 223-245
Saltzman EL, Munhall KG (1989) A dynamic approach
to gestural patterning in speech production.
Ecological Psychology 1 333-382