Computerimplemented neural model of speech production and speech acquisition - PowerPoint PPT Presentation

1 / 108
About This Presentation
Title:

Computerimplemented neural model of speech production and speech acquisition

Description:

Computer-implemented neural model of speech production and ... Medical Faculty of the Aachen University, RWTH Aachen, Germany ... glottis (2 tube sections) ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 109
Provided by: berndj5
Category:

less

Transcript and Presenter's Notes

Title: Computerimplemented neural model of speech production and speech acquisition


1
Computer-implemented neural model of speech
production and speech acquisition
  • Bernd J. Kröger
  • Department of Phoniatrics, Pedaudiology,
  • and Communication Disorders,
  • Medical Faculty of the Aachen University,
  • RWTH Aachen, Germany

For more literature see http//www.speechtrainier.
eu
2
Aachen University, Germany
Bernd J. Kröger bkroeger_at_ukaachen.de Download
of refs www.speechtrainer.eu
3
Aachen University (Hospital)
3 Tesla MR-Scanner (Philips)
4
Overview
  • Introduction
  • The 3D model
  • The neural control component
  • Modeling speech acquisition
  • Speech perception experiments
  • Results and further work

5
Overview
  • Introduction
  • The 3D model
  • The neural control component
  • Modeling speech acquisition
  • Speech perception experiments
  • Results and further work

6
Speech Production
separate
technical term controller
control module central nervous system
controlled system / plant
articulatory and acoustic module articulators
and cavities
7
Note
Birkholz model (2006)
  • We have a lot of knowledge concerning the plant
  • articulatory geometries
  • speech acoustics

We have much less knowledge concerning
neural control of speech articulation
8
Note
  • We have a lot of knowledge concerning the plant
  • articulatory geometries
  • speech acoustics

We have much less knowledge concerning
neural control of speech articulation
9
Neural computer-implemented models of production
and acquisition including articulation and
acoustics
  • Neural models of speech production
  • e.g. Levelt, Dell ? neurolinguistic models
  • e.g. Guenther (2006) neural model of the
    sensorimotor processes of speech production
    (articulation)
  • newer models see this session (including my
    talk)
  • Neural models of speech acquisition
  • focusing on pre-linguistic phases (babbling)
  • focusing on 00 16 i.e. time interval before
    the start of the vocabulary spurt?? (Guenther
    1995, Guenther et al. 2006, prelim. Bailly 1997)
  • This talk
  • I will present results of my work on neural
    modeling of speech production and acquisition
    which is related to the Guenther approach

10
Overview
  • Introduction
  • The 3D vocal tract model as front-end part of a
    neural model of speech production
  • The neural control component
  • Modeling speech acquisition
  • Speech perception experiments
  • Results and further work

11
3D Articulatory Model
Birkholz et al. (2006)
  • 11 wireframe meshes representing the
  • upper cover (palate, velum, pharynx wall, ) (4
    meshes)
  • lower cover (mandible, pharynx, ) (3 meshes)
  • upper and lower teeth, lips, tongue (4 meshes)
  • belongs to the group of geo-metrical models in
    comparison to statistical or biomechanical vocal
    tract models

12
3D Articulatory Model
Birkholz et al. (2006)
  • complete model
  • upper and lower cover (light gray)
  • tongue and lips (dark gray)
  • upper and lower row of teeth (black)
  • The model is based on 3D static and 2D dynamic
    MRI-Data of one speaker of Standard German (JD,
    ZAS, Berlin)

see http//www.vocaltractlab.de
13
High Quality Acoustic Model
Birkholz et al. (2006)
  • comprises modeling the
  • subglottal tract (13 tube sections)
  • glottis (2 tube sections)
  • pharyngeal and oral tract (40 tube sections with
    individual length not necessarily equidistant
    cylinder tubes)
  • nasal tract (19 tube sections) and 4 sections for
    paranasel sinuses

equivalent electrical circuit
14
Overview
  • Introduction
  • The 3D model
  • The neural control component
  • Modeling speech acquisition
  • Results
  • Further work

15
The Neural Control Component.Some basics
  • Ensembles / groups of neurons, representing
    motor, sensory or other states constitute the
    central levels of the neural control component
  • ? neural maps

16
Motor states and sensory states
are based on a set of parameters each
  • somatosensory
  • information
  • - proprioceptive
  • - tactil
  • auditory
  • information

17
Articulatory or lower level motor parameters
10 parameters (defined a priori in a
geometrical articulatory model)
  • control positions of articulators relative to
    position of other articulators
  • joint coordinates (vs. spatial coordinates)

18
JAA lower jaw angle
  • JAA influences position of
  • lower jaw
  • tongue body
  • tongue tip
  • lower lips

19
Motor states and sensory states
are based on a set of parameters each
  • somatosensory
  • information
  • - proprioceptive
  • - tactil
  • auditory
  • information

20
Proprioceptive parameters
Flesh point locations Absolute position of end
articulators with respect to hard palate (or
cranial system / skull )
tract-variables
? spatial coordinates (vs. joint coordinates)
on the border sensory vs. higher level motor
representations
21
Tactile parameters
contact pattern for a velar closure
g
contact area of movable articulators (tongue
body, tongue tip, lips) with vocal tract walls
(regions lower pharyngeal, upper pharyngeal,
velar, palatal, postalveolar, alveolar)
22
Tactile parameters
contact area of movable articulators (tongue
body, tongue tip, lips) with vocal tract walls
(regions lower pharyngeal, upper pharyngeal,
velar, palatal, postalveolar, alveolar)
23
Auditory parameters
3D model
area function
transfer function
and perimeter function
F1
F2
F3
Bark values of F1, F2, F3
24
Parameters values are directly coded by neural
activations in neural maps
1 ) Sub-groups of neurons are defined for each
parameter 2 ) All sub-groups of neurons for all
parameters define the whole neural map
TTy
HYy
TBx
JAy
LId
TTx
VEx
ULx
TBy
HYx
A specific activation of the 4 neu-rons occurrs
for each parameter value
LIH
LIP
TTA
TTL
TBA
TBL
HLH
HLV
JAA
VEH
JAA
Example motor parameter map
25
The Neural Control Component.Some basics
  • Ensembles / groups of neurons, representing
    motor, sensory or other states constitute the
    central levels of the neural control component ?
    neural maps
  • Neural networks connect these neural maps ?
    neural mappings

The organization of the whole control component
26
The whole neural control component
cerebral cortex
Cortico-cortical mappings
cortical
subcortical and peripheral
Guenther et al. (2006)
27
The neural control component
current state
efference copy
error signal
auditory map
sound-to-sensory mappings
sound-map syllable-map word-map
10 ms
proprioceptive map
tactile map
sound-to-motor mapping
somatosensory map
auditory state
5 ms
? feedback subsystem
10 ms
somato-sensory state
motor map
(joint coordinates)
lt 5 ms
motor state
cortical
12 ms
neuro-muscular processing
somato-sensory processing
auditory processing
articulatory state
30 ms
articulatory signal
acoustic signal
subcortical and peripheral
0 ms
28
The neural control component
current state
efference copy
error signal
auditory map
sound-to-sensory mappings
sound-map syllable-map word-map
10 ms
proprioceptive map
tactile map
sound-to-motor mapping
somatosensory map
auditory state
5 ms
? feedback subsystem
10 ms
somato-sensory state
motor map
(joint coordinates)
? later on during speech acquisition the
feed-forward control subsystem becomes more and
more active
lt 5 ms
motor state
cortical
12 ms
neuro-muscular processing
somato-sensory processing
auditory processing
articulatory state
30 ms
articulatory signal
acoustic signal
subcortical and peripheral
0 ms
29
Again Differentiation of maps vs. mappings
a sample neural network
map ensemble of neurons representing different
(sensory or motor) states (neural
representation) mapping association of
appropriate or related states in different
maps (has to be trained / learned gt adjustment
of link weights)
map 1 (e.g. sensory) mapping from map 1 to map
2 map 2 (e.g. motor)
Two types of neural networks were used
30
One-layer feed forward networks
unidirectional
sensory parameters (proprioceptive param.)
input neuron layer (map)
TTy
HYy
TBx
JAy
LId
TTx
VEx
ULx
TBy
HYx
Links each with each other neuron (mapping)
output neuron layer (map)
LIH
LIP
TTA
TTL
TBA
TBL
HLH
HLV
JAA
VEH
motor parameters (joint coordinates)
31
Self-organizing maps (SOMs)
  • SOM-neurons form the central part of the
    self-organizing map (central layer of neurons
    SOM-layer 2D map, representing an ensemble of
    neurons of the cerebral cortex)
  • input neurons (in terms of Kohonen) ? map 1 and
    map 2 (in our terms)
  • neurons representing e.g. sensory and motor
    states
  • Links and link weights ? part of the mapping

Input
SOM
map 2 map 1
32
Self-organizing maps (SOMs)
  • SOM-neurons form the self-organizing map (central
    layer of neurons SOM-layer cortical 2D map) ?
    part of the mapping in our approach
  • input neurons (in terms of Kohonen) ? map 1 and
    map 2 (in our terms)
  • neurons representing e.g. sensory and motor
    states
  • link weights ? part of the mapping

both maps (input neurons in terms of Kohonen) can
be interpreted as input or output maps in our
approach
Input
SOM
map 2 map 1
Note this type of network is not unidirectional
but bi-/multi-directional
33
The Neural Control Component.Some basics
  • Ensembles of neurons, representing motor, sensory
    or other states constitute the central levels of
    the neural control component ? neural maps
  • Neural networks connect these neural maps ?
    neural mappings
  • Neural networks must be trained, i.e. they have
    to learn something during speech acquisition (and
    later on)

34
Example for learning / trainingHow does it work
e.g. during babbling?(Babbling exploring your
vocal tract)
Sensory information can be generated by the 3D
model for each articulatory or motor state
  • somatosensory
  • information
  • - proprioceptive
  • - tactil
  • auditory
  • information

35
Learning during babbling
Sensory information can be generated by the 3D
model for each articulatory or motor state
  • somatosensory
  • information
  • - proprioceptive
  • - tactil
  • auditory
  • information

Produce a lot of random motor states
36
Learning during babbling
Sensory information can be generated by the 3D
model for each articulatory or motor state
  • somatosensory
  • information
  • - proprioceptive
  • - tactil
  • auditory
  • information

37
Learning during babbling
Sensory information can be generated by the 3D
model for each articulatory or motor state
  • somatosensory
  • information
  • - proprioceptive
  • - tactil
  • auditory
  • information

related
38
Learning during babbling
Sensory information can be generated by the 3D
model for each articulatory or motor state
  • somatosensory
  • information
  • - proprioceptive
  • - tactil
  • auditory
  • information

39
Learning during babbling
Sensory information can be generated by the 3D
model for each articulatory or motor state
  • somatosensory
  • information
  • - proprioceptive
  • - tactil
  • auditory
  • information

an amount of random motor states
associated with
40
Learning during babbling
Sensory information can be generated by the 3D
model for each articulatory or motor state ?
forms a training set
predict
  • somatosensory
  • representation
  • - proprioceptive
  • - tactil
  • auditory
  • representation

related
motor
sensory
training set
(e.g. 4.000 states)
41
Overview
  • Introduction
  • The 3D model
  • The neural control component
  • Modeling speech acquisition
  • Speech perception experiments
  • Results and further work

42
Modeling speech acquisition
A simple approach
  • Babbling
  • Mouthing
  • Proto-vocalic articulation
  • Proto-gestures
  • Imitation
  • Vowels

is not language specific
all these phases overlap more or less in time
43
Modeling speech acquisition
  • Babbling
  • Mouthing (silent mouthing)
  • Proto-vocalic articulation
  • Proto-gestures
  • Imitation
  • Vowels

44
The neural control component
auditory map
A silent mouthing training set was designed for
training the
proprioceptive map
tactile map
somatosensory map
auditory-to-motor mapping
auditory state
somatosensory-to-motor mapping
?
somato-sensory state
motor map
(joint coordinates)
motor state
cortical
neuro-muscular processing
somato-sensory processing
auditory processing
articulatory state
articulatory signal
acoustic signal
subcortical and peripheral
45
Training set silent mouthing
  • combination of min, (mid,) and max values of all
    10 motor parameters (Kröger et al. 2006c, DAGA
    Braunschweig)
  • double closures and non-physiological
    articulations are avoided

subset for lips
subset for tongue
? 4608 patterns of training data
46
Training
  • Design of the net one-layer feed-forward,
  • 2518 input neurons (somatosensory), 40 output
    neurons (motor) ? ca. 2000 links
  • Set of 4608 patterns of training data
  • ? min-max combination training set silent
    mouthing
  • 5.000 cycles of batch training
  • ? mean error ca. 10 for prediction of a motor
    state from its somatosensory state (Kröger et
    al. 2006b, ISSP, Ubatuba, Brazil)
  • Software Java-version of SNNS (Stuttgart Neural
    Network Simulator) http//www-ra.informatik.uni-tu
    ebingen.de/SNNS/

47
Training results some features of motor
equivalence
despite prediction error 10
position of lower jaw low
position of lower jaw high
labial closure
apical closure
dorsal closure
each column somatosensory values are the same
(except of jaw parameter) ? acoustically relevant
closures are kept despite strong jaw perturbation
48
The neural control component
auditory map
proprioceptive map
tactile map
somatosensory map
auditory-to-motor mapping
auditory state
somatosensory-to-motor mapping
somato-sensory state
motor map
(joint coordinates)
motor state
cortical
neuro-muscular processing
somato-sensory processing
auditory processing
articulatory state
articulatory signal
acoustic signal
subcortical and peripheral
49
Modeling speech acquisition
  • Babbling
  • Mouthing
  • Proto-vocalic articulation
  • Proto-gestures
  • Imitation
  • Vowels

50
The neural control component
auditory map
A proto-vocalic training set was designed for
training the
proprioceptive map
tactile map
somatosensory map
auditory-to-motor mapping
auditory state
?
somatosensory-to-motor mapping
somato-sensory state
motor map
(joint coordinates)
motor state
cortical
neuro-muscular processing
somato-sensory processing
auditory processing
articulatory state
articulatory signal
acoustic signal
subcortical and peripheral
51
Proto-vocalic training set
? 540 states
start with a front high, back high and low gesture
i
  • use articulatory constraints
  • there remain only two degrees of freedom
  • giving the 2D-plane
  • ? the training set covers the whole (continuous
    language independent) vowel space

u
a
52
Proto-vocalic training set
i
u
continuous articulatory space for proto-vocalic
articulation
540 states, in three subsets
a
from u to a, i
from a to i, u
from i to u, a
53
Using SOMs for the auditory-to-motor mapping
  • 10x10 100 neurons form the self-organizing map
    layer
  • topology rectangular ordering (not hexagonal)
  • 40 input neurons ? 4000 link weights
  • 36 neurons representing the 9 joint coordinate
    parameters (motor representation without velum)
  • 4 neurons representing F1 and F2 (auditory
    representation)
  • 200 cycles x 524 training patterns 104.800
    training steps
  • ? mean error 0.6 for prediction of an
    articulatory state (Kröger et al. 2006b)
  • ? very precise mapping for predicting
    protovocalic motor states from auditory states
    (F1, F2)!!!
  • using standard SOM learning algorithm (Kohonen
    1995 and 2001)
  • initialization random distribution of link
    weights within interval 0.4 0.6
  • SOM update radius 5 neurons, rectangular
    neighborhood function
  • SOM update radius decay 0.999
  • learning rate factor 0.1 learning rate decay
    factor 0.99
  • modus winner takes all

54
Training results for the auditory-to-motor
mapping node plot of SOM-link weights
display of auditory link weight values for each
neuron
SOM neurons as cortical map
1.0
i
F2
a
  • The whole range of training data is covered
  • The topology is preserved (not many
    doublings/folds)

u
0.0
0.0
1.0
F1
55
Training results for the auditory-to-motor
mapping bar plot of same SOM-link-weights
cortical map
cortical map
u
one neuron
Simply a different display of the auditory link
weight values for each neuron
i
a
56
Training results for the auditory-to-motor
mapping bar plot of SOM-weights
  • 1 ) continuous F1-F2-transitions for the i-a
    and u-a paths
  • 2) Ordering of proto-vocalic states with respect
    to phonetic criteria (high-low, front-back)
    Phonetotopy (cp. fMRI-study of Obleser et al.
    2007)

u
a
i
57
The neural control component
auditory map
static articulation
proprioceptive map
tactile map
somatosensory map
auditory-to-motor mapping
auditory state
somatosensory-to-motor mapping
somato-sensory state
motor map
(joint coordinates)
motor state
cortical
neuro-muscular processing
somato-sensory processing
auditory processing
articulatory state
articulatory signal
acoustic signal
subcortical and peripheral
58
Modeling speech acquisition
  • Babbling
  • Mouthing
  • Proto-vocalic articulation
  • Proto-gestures
  • Imitation
  • Vowels

static
dynamic
59
The auditory-to-motor mapping for
vocal-tract-closing gestures
Each training pattern is a whole gesture!
t0
t1
t2
t3
t4
Starting from different proto-vocalic positions
frequency bark
0
10
20
t0 0 msec t4 20 msec
time msec
motor state
auditory state
the complete closing gesture
whole formant pattern
association
60
The auditory-to-motor mapping for closing
gestures 10x10 SOM
  • 10x10 100 neurons form the SOM neuron layer
  • 59 input neurons ? 5800 link weights
  • 56 neurons representing the formant transition
    (F1, F2, F3 and its time deviation and length)
  • 3 neurons representing closure-related
    articulator labial / apical / dorsal and 4
    neurons representing the starting vowel (TBx,
    TBy)
  • Set of 22 x 9 198 training patterns
  • closing-gestures training-set based on 22
    proto-vocalic articulatory states
  • production of 9 closures 3x labial, 3x apical,
    and 3x dorsal
  • 500 cycles x 198 training patterns / training
    steps ? 99.000
  • ? mean error 0.8 for prediction of the motor
    state from its formant pattern
  • Standard SOM learning algorithm (Kohonen 1995 and
    2001)
  • initialization random distribution of link
    weights within interval 0.4 0.6
  • SOM update radius 3 neurons, rectangular
    neighborhood function
  • SOM update radius decay 0.999
  • learning rate factor 0.1 learning rate decay
    factor 0.99
  • modus winner takes all

61
Bar plot of the SOM for closing- /
proto-VC-gestures
10x10 SOM Column 1-3 SOM link weights for motor
parameter closure-related articulator
62
SOM is capable of learning different types of
formant transitions for different starting vowels
10x10 SOM Column 1-3 SOM link weights for motor
parameter closure-related articulator Formant
transitions auditory SOM link weights
  • clear separation for the motor states with
    respect to the closure-related articulator

63
SOM training 2 (instance 2)
Column 1-3 closure-related articulator Column
4,5 SOM link weights for motor parameter
initial vocalic state (TBx, TBy) a TBy -gt
0 i TBx, TBy -gt 1 u TBx -gt 0 TBy -gt 1
64
SOM training 2 (brain 2)
Column 1-3 closure-related articulator Column
4,5 SOM link weights for motor parameter vowel
(TBx, TBy) a TBy -gt 0 i TBx, TBy -gt 1 u
TBx -gt 0 TBy -gt 1
65
SOM training 2 (brain 2)
Column 1-3 constriction forming
articulator Column 4,5 SOM link weights for
motor parameter vowel (TBx, TBy) a TBy -gt
0 i TBx, TBy -gt 1 u TBx -gt 0 TBy -gt 1
da
gu
di
bu
ba
du
bi
gi
ga
? phonetotopic ordering with respect to the
initial proto-vocalic state
66
Modeling speech acquisition
  • Babbling
  • Mouthing
  • Proto-vocalic articulation
  • Proto-gestures
  • Imitation
  • Vowels

This was non language-specific learning of
proto-vowels and proto-consonantal gestures
The feedback loop is now trained to a certain
degree, which allows imitation
language specific training
67
The neural control component
auditory map
language-specific V and VC
tactile map
proprioceptive map
somatosensory map
auditory state
auditory-to-motor mapping
somatosensory-to-motor mapping
somato-sensory state
motor map
joint coordinates
motor state
cortical
neuro-muscular processing
somato-sensory processing
auditory processing
articulatory state
articulatory signal
acoustic signal
subcortical and peripheral
feedback loop
68
Training set mothers vowels
/i/
/e/
hypothetical langu-age with 7 vocalic phonemes
/i/, /e/, /?/, /a/, /?/, /o/, /u/ 100 items per
vowel phoneme
/?/
F2
/a/
/?/
/o/
/u/
F1
69
Auditory mapping (F1, F2) for babbling and
imitation (brain 2)
15x15 SOM Input 4 F1/F2-neurons 500 cycles x
540 training patterns 270000 training steps
i
F2
nodes of SOM are continu-ously distributed over
the whole vowel space
a
u
F1
70
Auditory mapping (F1, F2) for babbling and
imitation
/i/
15x15 SOM Input 4 F1/F2-neurons 500 cycles x
540 training patterns 270000 training steps
/e/
/?/
F2
nodes of SOM are continu-ously distributed over
the whole vowel space - A shift towards
phonemic regions (perceptual magnet effect ?)
/a/
/?/
/o/
/u/
F1
concentration of net nodes at the phonemic regions
71
2nd example 5-vowel system
auditory link weights
/i/
hypothetical langu-age with 5 vocalic phonemes
/i/, /e/, /a/, /o/, /u/
/e/
F2
/a/
/o/
/u/
F1
  • concentration of net nodes at the phonemic
    regions

72
vocalic training data
/i/
hypothetical langu-age with 5 vocalic phonemes
/i/, /e/, /a/, /o/, /u/ broader phoneme clouds
/e/
F2
/a/
/o/
/u/
F1
73
Bar-plot for the same V-SOM
phonemic link weights
hypothetical langu-age with 5 vocalic phonemes
/i/, /e/, /a/, /o/, /u/
? clear separation of phonemic regions
By the way The phonetic ordering of vowel
phonemes within the map is given. It results from
the ordering of nodes in the F1-F2-plane trained
during babbling! Phonetotopy!
74
Results of our speech acquisition modeling
  • Self-organizing maps are useful for modeling the
    higher level mappings sensory-to-motor,
    phonemic-to-sensory (and -to-motor)
  • But
  • The mappings shown thus far within the
    schematic diagram of our model are
    unidirectional.

75
The neural control component
auditory map
sound-to-sensory mappings
sound-map syllable-map word-map
should be modified with respect to the fact, that
we are using SOMs
tactile map
proprioceptive map
somatosensory map
auditory state
auditory-to-motor mapping
somatosensory-to-motor mapping
sound-to-motor mapping
somato-sensory state
motor map
joint coordinates
motor state
cortical
neuro-muscular processing
somato-sensory processing
auditory processing
articulatory state
articulatory signal
acoustic signal
subcortical and peripheral
76
Implications for the model from doing speech
acquisition
  • Self-organizing maps are very successful for
    modeling the higher level mappings
    sensory-to-motor, lexical-to-sensory (and
    -to-motor)
  • But
  • The mappings shown thus far within the
    schematic diagram of our model are
    unidirectional.
  • The SOM neuron layers themselves are not
    represented within the schematic diagram

77
The neural control component
auditory map
sound-to-sensory mappings
sound-map syllable-map word-map
should be modified with respect to the
SOMs Mappings are unidirectional? SOM layers are
not represented here!
tactile map
proprioceptive map
somatosensory map
auditory state
auditory-to-motor mapping
somatosensory-to-motor mapping
sound-to-motor mapping
somato-sensory state
motor map
joint coordinates
motor state
cortical
neuro-muscular processing
somato-sensory processing
auditory processing
articulatory state
articulatory signal
acoustic signal
subcortical and peripheral
78
The neural control component
auditory map
sound-to-sensory mappings
sound-map syllable-map word-map
should be modified with respect to the
SOMs Mappings are unidirectional? SOM layers are
not represented here!
tactile map
proprioceptive map
somatosensory map
auditory state
auditory-to-motor mapping
somatosensory-to-motor mapping
sound-to-motor mapping
somato-sensory state
motor map
joint coordinates
motor state
cortical
neuro-muscular processing
somato-sensory processing
auditory processing
articulatory state
articulatory signal
acoustic signal
subcortical and peripheral
79
The neural control component
auditory map
sound-map syllable-map word-map
should be modified with respect to the
SOMs Mappings are unidirectional SOM layers are
not represented here!
tactile map
proprioceptive map
somatosensory map
auditory state
auditory-to-motor mapping
somatosensory-to-motor mapping
somato-sensory state
motor map
joint coordinates
motor state
cortical
neuro-muscular processing
somato-sensory processing
auditory processing
articulatory state
articulatory signal
acoustic signal
subcortical and peripheral
80
The neural control component
sound-map syllable-map word-map
phonological map
auditory map
tactile map
proprioceptive map
V C CV VC CVC
central phonetic map
somatosensory map
auditory state
sound or syllable-specific layers repre-senting
the SOM layers
somato-sensory state
motor map
joint coordinates
motor state
cortical
neuro-muscular processing
somato-sensory processing
auditory processing
articulatory state
articulatory signal
acoustic signal
subcortical and peripheral
81
The neural control component
Introduce multidirectional mappings
sound-map syllable-map word-map
auditory map
static
task V
tactile map
proprioceptive map
V C CV VC CVC
central pho-netic map, sound or syllable specific
somatosensory map
static
auditory state
somato-sensory state
static
motor map
joint coordinates
co-activation of all levels phonemic, sensory,
motor.
motor state
cortical
neuro-muscular processing
somato-sensory processing
auditory processing
articulatory state
articulatory signal
acoustic signal
subcortical and peripheral
82
The neural control component
sound-map syllable-map word-map
phonological map
auditory map
dynamic
task VC
tactile map
proprioceptive map
V C CV VC CVC
central pho-netic map
somatosensory map
dynamic
auditory state
dynamic
somato-sensory state
motor map
joint coordinates
co-activation of all levels
motor state
cortical
neuro-muscular processing
somato-sensory processing
auditory processing
articulatory state
articulatory signal
acoustic signal
subcortical and peripheral
83
Now
  • How is the production of a sound or syllable
    processed within this (new) model (after speech
    acquisition)?

84
The neural control component
sound-map syllable-map word-map
phonological map
sound-map syllable-map word-map
auditory map
auditory map
compare internal and external states
task V
tactile map
proprioceptive map
proprioceptive map
tactile map
V C CV VC CVC
central pho-netic map
somatosensory map
somatosensory map
auditory state
auditory state
co-activation of all levels ? internal idea how
the pho-neme should sound and feel like
somato-sensory state
somato-sensory state
motor map
motor map
joint coordinates
joint coordinates
motor state
motor state
cortical
neuro-muscular processing
neuro-muscular processing
somato-sensory processing
somato-sensory processing
auditory processing
auditory processing
articulatory state
articulatory state
corrections if needed
articulatory signal
articulatory signal
acoustic signal
acoustic signal
subcortical and peripheral
85
Overview
  • Introduction
  • The 3D model
  • Motor and sensory representations
  • The neural control component
  • Modeling speech acquisition
  • Speech perception experiments using the
    production model
  • Results and further work

86
Idea
  • Due to the multidirectional mappings introduced,
    this production model can be used as a perception
    model

87
The neural control component
sound-map syllable-map word-map
phonological map
sound-map syllable-map word-map
auditory map
auditory map
task V
tactile map
proprioceptive map
V C CV VC CVC
central pho-netic map
somatosensory map
auditory state
Note no coactivation of motor states is needed,
but may occur
somato-sensory state
motor map
joint coordinates
motor state
cortical
neuro-muscular processing
somato-sensory processing
auditory processing
auditory processing
articulatory state
articulatory signal
acoustic signal
subcortical and peripheral
external acoustic signal
88
Question
  • Do we get typical effects of vowel and consonant
    perception using this path within the production
    model?

Start consonant perception experiment Does
categorical perception occur?
89
Experiment Consonantal categorical perception
  • Start 1 ) Generate a stimulus continuum ab, ad,
    ag

Then 2 ) train 20 different instances of the
model (using different initial link weight
settings and different ordering of the training
stimuli) ? 20 virtual listeners, doing
identification and discrimination
90
The neural control component
sound-map syllable-map word-map
phonological map
sound-map syllable-map word-map
auditory map
auditory map
task V
tactile map
proprioceptive map
V C CV VC CVC
central pho-netic map
somatosensory map
auditory state
Identification and discrimi-nation is done on the
level of the phonetic map (SOM)
somato-sensory state
motor map
joint coordinates
motor state
cortical
neuro-muscular processing
somato-sensory processing
auditory processing
auditory processing
articulatory state
articulatory signal
acoustic signal
subcortical and peripheral
external acoustic signal
91
Example Phonetic map for VC
Identification neuron with highest degree of
activation Discrimination city-block-distance
of neurons activated for stimuli A and B
brain 2
92
Identification and discrimination scores
consonants
/b/
/d/
/g/
CP lower discrimination scores within phoneme
regions in comparison to phoneme boundaries
100
percentage of identification / discrimination
50
on the basis of (phonological) identification
0
stimulus continuum
20 listeners
93
VC-SOM (brain 1)
apical
Are different instances of the model really
different ??
labial
dorsal
brain 1
94
VC-SOM (brain 2)
dorsal
labial
Are different instances of the model really
different ??
apical
brain 2
95
Vowels Categorical Perception(?)
Stimulus-continuum i, e, a ? 13 green
dots Training of 20 instan-ces of the model (? 20
listeners) Identification neuron with highest
degree of activation Discrimination
city-block-distance of neurons activated for
stimuli A and B
F2
F1
96
Example Phonetic map for V
Identification neuron with highest degree of
activation Discrimination city-block-distance
of neurons activated for stimuli A and B
97
Identification and discrimination scores vowels
Only slightly higher discrimination score at one
phoneme boundary ? CP ??
/i/
/e/
/a/
100
And Much higher percentage of measu-red than
calculated discrimination within all vocalic
phoneme regions in compari-son to consonants
percentage of identification / discrimination
50
0
stimulus continuum
20 listeners
98
Identification and discrimination scores
consonants
/b/
/d/
/g/
100
percentage of identification / discrimination
50
0
stimulus continuum
20 listeners
99
Calculated discrimination
  • Calculated discrimination is discrimination
    based exclusively on differences in (phonemic)
    identification
  • If the rate of calculated really perceived
    (measured) discrimination
  • ? the whole acoustic content of a speech item is
    needed for phonemic processing.
  • If measured (perceived) gt calculated
    discrimination
  • ? these sounds include additional acoustic
    information (e.g. vowels phonemic and phonetic
    vowel quality)

100
Results modeling perception using the production
model
  • The production model gives typical results of
    categorical perception
  • Consonants b, d, g are strongly perceived in a
    categorical way (i.e. strongly encoded)
  • vowels are less categorical

101
Overview
  • Introduction
  • The 3D model
  • Motor and sensory representations
  • The neural control component
  • Modeling speech acquisition
  • Speech perception experiments
  • Results and further work

102
Results
  • A computer-implemented neural model of speech
    production and speech acquisition based on the
    Guenther 2006 approach has been introduced
  • Training of vowel and consonant production
    (voiced plosives in CV-syllables) has been
    illustrated.
  • This production model shows effects of speech
    perception straightfor-wardly
  • Using SOM for cortical mappings straightforward
    leads to phonetotopy (cp. results of imaging
    experiments given by Obleser et al. 2007)

103
Further work General problems with models
  • Its limited power
  • if model is successful
  • ? real human processes have to be similar?
  • if model fails
  • ? natural process has to be different?
  • So
  • Model should be validated by data (e.g. imaging
    studies)
  • On the other hand A model can be taken as
    starting point for developing hypotheses for
    experiments!

not necessarily
not necessarily
104
Further work
  • Include VC-syllables, CVC,
  • Include other types of consonants frics, nasals,
  • Include canonical babbling
  • Include comprehension and imitation of first
    words
  • Very important
  • Separation of higher level and lower level motor
    states
  • More realistic types of neural mappings. But the
    principle of self-organization is important!
  • Solve the normalization problem Difference
    between caretakers and toddlers vocal tract

105
to comprehension
from mental lexicon and syllabification
phonological plan
cortical
auditory state
infrequent syllables
auditory map
auditory-phonetic processing
frequent syllables
premotor
prosody
temporal lobe
high-order
primary
ssst.
frontal lobe
motor plan
parietal lobe
high-order
primary
subcortical
motor execution (control and corrections)
cerebellum basal ganglia thalamus
primary motor
cortical
motor state
subcortical and peripheral
auditory receptors and preprocessing
somatosensory receptors and preprocessing
skin, ears and sensory pathways
articulatory state
muscles and articulators tongue, lips, jaw,
velum
articulatory signal
acoustic signal
peripheral
neural model of speech production (Kröger 2007)
106
Acknowledgments
  • Many thanks to .
  • Peter Birkholz, Department of Computer Science,
    University of Rostock for developing and
    implementing the 3D articulatory model
    (PhD-thesis)
  • Jim Kannampuzha, Student of computer Science at
    RWTH Aachen for implementing the neural control
    model
  • German Research Council for supporting this work
    (Grant Nr KR1439/10-1 and KR 1439/13-1).
  • Georg Heike and Christiane Neuschaefer-Rube for
    always supporting my work

107
Thanks for your attention !!
Bernd J. Kröger Email bkroeger_at_ukaachen.de Ho
mepage and literature http//www.speechtrainer.eu
Dona nobis pacem
108
References see also http//www.speechtrainer.eu
  • Birkholz P, Jackel D, Kröger BJ (2006)
    Development and control of a 3D vocal tract
    model. Proceedings of the IEEE International
    conference on Acoustics, Speech, and Signal
    Processing (ICASSP 2006) Toulouse, France, pp.
    873-876
  • Bullock D, Grossberg S, Guenther FH (1993) A
    self-organizing neural model of motor equivalent
    reaching and tool use by a multijoint arm.
    Journal of Cognitive Neuroscience 5 408-435
  • Guenther FH, Gjaja MN (1996) The perceptual
    magnet effect as an emergent property of neural
    map formation. Journal of the Acoustical Society
    of America 100 1111-1121
  • Guenther FH, Ghosh SS, Tourville JA (2006) Neural
    modeling and imaging of the cortical interactions
    underlying syllable production. Brain and
    Language 96 280-301
  • Kandel ER, Schwartz JH, Jessell TM (2000)
    Principles of neural science. MacGraw-Hill, New
    York
  • Kohonen T (2001) Self-organizing maps. Springer,
    Berlin, 3rd edition
  • Kröger BJ, Birkholz P, Kannampuzha J,
    Neuschaefer-Rube C (2006a) Modeling
    sensory-to-motor mappings using neural nets and a
    3D articulatory speech synthesizer. Proceedings
    of INTERSPEECH 2006, Pittsburgh, Pennsylvania.
  • Kröger BJ, Birkholz P, Kannampuzha J,
    Neuschaefer-Rube C (2006b) Ubatuba
  • Kröger BJ, Birkholz P, Kannampuzha J,
    Neuschaefer-Rube C (2006c) Spatial-to-joint
    coordinate mapping in a neural model of speech
    production. DAGA-Proceedings of the annual
    meeting of the German Acoustical Society,
    Braunschweig, Germany (see also
    http//www.speechtrainer.eu)
  • Oller DK, Eilers RE, Neal AR, Schwartz HK (1999)
    Precursors to speech in infancy the prediction
    of speech and language disorders. Journal of
    Communication Disorders 32 223-245
  • Saltzman EL, Munhall KG (1989) A dynamic approach
    to gestural patterning in speech production.
    Ecological Psychology 1 333-382
Write a Comment
User Comments (0)
About PowerShow.com