Title: Speech, Language and Human-Computer Interaction
1Speech, Language andHuman-Computer Interaction
William Marslen-Wilson
Steve Young
Johanna Moore Martin Pickering Mark Steedman
2Contents
- Background and motivation
- State of the Art
- Speech recognition and understanding
- Cognitive neuroscience
- Computational models of interaction
- The Grand Challenge
- Research Themes
2
3Spoken language and human interaction will be an
essential feature of truly intelligent systems.
For example, Turing made it the basis of his
famous test to answer the question Can machines
think?
(Computing Machinery and Intelligence, Mind, 1950)
Spoken language is the natural mode of
communication and truly ubiquitous computing will
rely on it.
The Vision Apples Knowledge Navigator
The Reality A currently deployed flight enquiry
demo
... but we are not quite there yet!
3
Introduction
4Current situation
Human Language System
Collect data
Observe
Cognitive Sciences
Computational Language Use
Development of neuro-biologically and
psycholinguistically plausible accounts of human
language processes (comprehension and production)
Symbolic and statistical models of human
language processing (e.g., via parsing,
semantics, generation, discourse analysis)
Engineering Systems
4
Introduction
5State of the Art Speech Recognition
Goal is to convert acoustic signal to words
Acoustics
Y
He bought it
Words
W
State of the Art Speech Recognition
6General Approach Hierarchy of Markov Chains
State of the Art Speech Recognition
7Model Building
He said that ...
Speaking from the White House, the president said
today that the nation would stand firm against
the ....
about 500 million words of text
about 100 hours of speech
Acoustic Models
Language Model
7
State of the Art Speech Recognition
8Recognising
State of the Art Speech Recognition
9Progress in Automatic Speech Recognition
Easy
Word Error Rate
Hard
9
State of the Art Speech Recognition
10Current Research in Acoustic Modelling
Hidden Markov Model
Quasi-stationary assumption a major weakness
State of the Art Speech Recognition
11State of the Art Cognitive Neuroscience of
Speech and Language
- Scientific understanding of human speech and
language in state of rapid transformation and
development - Rooted in cognitive/psycholinguistic accounts of
the functional structure of the language
system. - Primary drivers now coming from neurobiology, new
neuroscience techniques
11
State of the Art Cognitive Neuroscience of
Speech and Language
12Neurobiology of homologous brain systems primate
neuroanatomy and neurophysiology
(Rauschecker Tian, PNAS, 2000)
12
State of the Art Cognitive Neuroscience of
Speech and Language
13- Provides a potential template for investigating
the human system - Illustrates the level of neural and functional
specificity that is achievable - Points to an explanation in terms of multiple
parallel processing pathways, hierarchically
organised
13
State of the Art Cognitive Neuroscience of
Speech and Language
14Speech and language processing in the human brain
- Requires an interdisciplinary combination of
neurobiology, psycho-acoustics,
acoustic-phonetics, neuro-imaging, and
psycholinguistics - Starting to deliver results with high degree of
functional and neural specificity - Ingredients for a future neuroscience of speech
and language
14
State of the Art Cognitive Neuroscience of
Speech and Language
15Hierarchical organisation of processes in primary
auditory cortex (belt, parabelt)
(from Patterson, Uppenkamp, Johnsrude
Griffiths, Neuron, 2002)
15
State of the Art Cognitive Neuroscience of
Speech and Language
16Hierarchical organisation of processing streams
Activation as a function of intelligibility for
sentences heard in different types of noise
(Davis Johnsrude, J. Neurosci, 2003). Colour
scale plots intelligibility-responsive regions
which were sensitive to the acoustic-phonetic
properties of the speech distortion (orange to
red) contrasted with regions (green to blue)
whose response was independent of lower-level
acoustic differences .
16
State of the Art Cognitive Neuroscience of
Speech and Language
17- Essential to image brain activity in time as well
as in space - EEG and MEG offer excellent temporal resolution
and improving spatial resolution - This allows dynamic tracking of the
spatio-temporal properties of language processing
in the brain - Demonstration (Pulvermüller et al) using MEG to
track cortical activity related to spoken word
recognition
17
State of the Art Cognitive Neuroscience of
Speech and Language
18700 ms
18
State of the Art Cognitive Neuroscience of
Speech and Language
19- Glimpse of future directions in cognitive
neuroscience of language - Importance of understanding the functional
properties of the domain - Neuroscience methods for revealing the
spatio-temporal properties of the underlying
systems in the brain
20State of the Art Computational Language Systems
- Modelling interaction requires solutions for
- Parsing Interpretation
- Generation Synthesis
- Dialogue management
- Integration of component theories and technologies
20
State of the Art Computational Language Systems
21Parsing and Interpretation
- Goal is to convert a string of words into an
interpretable structure. - Marks bought Brooks
- (TOP (S (NP-SBJ Marks)
- (VP (VPD bought)
- (NP
Brooks)) - ()))
- Translate treebank into a grammar and statistical
model
21
State of the Art Computational Language Systems
22Parser Performance
- Improvement in performance in recent years over
unlexicalized baseline of 80 in ParsEval - Magerman 1995 84.3 LP 84.0 LR
- Collins 1997 88.3 LP 88.1 LR
- Charniak 2000 89.5 LP 89.6 LR
- Bod 2001 89.7 LP 89.7 LR
- Interpretation is beginning to follow (Collins
1999 90.9 unlabelled dependency recovery) - However there are signs of asymptote
23Generation and Synthesis
- Spoken dialogue systems use
- Pre-recorded prompts
- Natural sounding speech, but practically limited
flexibility - Text-to-speech
- Provides more flexibility, but lacks adequate
theories of how timing, intonation, etc. convey
discourse information - Natural language (text) generation
- Discourse planners to select content from data
and knowledge bases and organise it into semantic
representations - Broad coverage grammars to realise semantic
representation in language
24Spoken Dialogue Systems
- Implemented as Hierarchical Finite State Machines
or Voice XML - Can
- Effectively handle simple tasks in real time
- automated call routing
- travel and entertainment information and booking
- Be robust in face of barge-in
- e.g., cancel or help
- Take action to get dialogue back on track
- Generate prompts sensitive to task context
24
State of the Art Computational Language Systems
25Limitations of Current Approaches
- Design and maintenance are labour intensive,
domain specific, and error prone - Must specify all plausible dialogues and content
- Mix task knowledge and dialogue knowledge
- Difficult to
- Generate responses sensitive to linguistic
context - Handle user interruptions, user-initiated task
switches or abandonment - Provide personalised advice or feedback
- Build systems for new domains
25
State of the Art Computational Language Systems
26What Humans Do that Todays Systems Dont
- Use context to interpret and respond to questions
- Ask for clarification
- Relate new information to whats already been
said - Avoid repetition
- Use linguistic and prosodic cues to convey
meaning - Distinguish whats new or interesting
- Signal misunderstanding, lack of agreement,
rejection - Adapt to their conversational partners
- Manage the conversational turn
- Learn from experience
26
State of the Art Computational Language Systems
27Current directions
- 1M words of labelled data is not nearly enough
- Current emphasis is on lexical smoothing and
semi-supervised methods for training parser
models - Separation of dialogue management knowledge from
domain knowledge - Integration of modern (reactive) planning
technology with dialogue managers - Reinforcement learning of dialogue policies
- Anytime algorithms for language generation
- Stochastic generation of responses
- Concept-to-speech synthesis
27
State of the Art Computational Language Systems
28Grand Challenge
To understand and emulate human capability for
robust communication and interaction.
28
Grand Challenge
29Research Programme
Three inter-related themes
- Exploration of language function in the human
brain - Computational modelling of human language use
- Analysis and modelling of human interaction
The development of all three themes will aim at a
strong integration of neuroscience and
computational approaches
29
Grand Challenge
30Theme 1 Exploration of language function in the
human brain
- Development of an integrated cognitive
neuroscience account - precise neuro-functional mapping of speech
analysis system - identification/analysis of different cortical
processing streams - improved (multi-modal) neuro-imaging methods for
capturing spatio-temporal patterning of brain
activity supporting language function - linkage to theories of language learning/brain
plasticity - Expansion of neurophysiological/cross-species
comparisons - research into homologous/analogous systems in
primates/birds - development of cross-species neuro-imaging to
allow close integration with human data - Research across languages and modalities (speech
and sign) - contrasts in language systems across different
language families - cognitive and neural implications of spoken vs
sign languages
30
Grand Challenge
31Theme 2 Computational modelling of human
language function
- Auditory modelling and human speech recognition
- learn from human auditory system especially use
of time synchrony and vocal tract normalisation - move away from quasi-stationary assumption and
develop effective continuous state models - Data-driven language acquisition and learning
- extend successful speech recognition and parsing
paradigm to semantics, generation and dialogue
processing - apply results as filters to improve speech and
syntactic recognition beyond the current
asymptote - develop methods for learning from large
quantities of unannotated data - Neural networks for speech and language
processing - develop kernel-based machine learning techniques
such as SVM to work in continuous time domain - understand and learn from human neural processing
31
Grand Challenge
32Theme 3 Analysis and modelling of human
interaction
- Develop psychology, linguistics, and neuroscience
of interactive language - Integrate psycholinguistic models with context to
produce situated models - Study biological mechanisms for interaction
- Controlled scientific investigation of natural
interaction using hybrid methods - Integration of eye tracking with neuro-imaging
methods - Computational modelling
- Tractable computational models of situated
interaction - e.g., Joint Action, interactive alignment,
obligations, SharedPlans - Integration across levels
- in interpretation integrate planning, discourse
obligations, and semantics into language models - in production
- semantics of intonation
- speech synthesizers that allow control of
intonation, timing
32
Grand Challenge
33Summary of Benefits
Grand Challenge
To understand and emulate human capability for
robust communication and interaction.
- Greater scientific understanding of human
cognition and communication - Significant advances in noise-robust speech
recognition, understanding, and generation
technology - Dialogue systems capable of adapting to their
users and learning on-line - Improved treatment and rehabilitation of
disorders in language function novel language
prostheses
33
Grand Challenge