Speech, Language and Human-Computer Interaction - PowerPoint PPT Presentation

1 / 33

About This Presentation

Title:

Speech, Language and Human-Computer Interaction

Description:

Cognitive Sciences ... Rooted in cognitive/psycholinguistic accounts of the ... Discourse planners to select content from data and knowledge bases and organise ... – PowerPoint PPT presentation

Number of Views:65

Avg rating:3.0/5.0

Slides: 34

Provided by: philip290

Category:

more less

Transcript and Presenter's Notes

Title: Speech, Language and Human-Computer Interaction

1
Speech, Language andHuman-Computer Interaction
William Marslen-Wilson
Steve Young
Johanna Moore Martin Pickering Mark Steedman
2
Contents

Background and motivation
State of the Art
Speech recognition and understanding
Cognitive neuroscience
Computational models of interaction
The Grand Challenge
Research Themes

2
3
Spoken language and human interaction will be an
essential feature of truly intelligent systems.
For example, Turing made it the basis of his
famous test to answer the question Can machines
think?
(Computing Machinery and Intelligence, Mind, 1950)
Spoken language is the natural mode of
communication and truly ubiquitous computing will
rely on it.
The Vision Apples Knowledge Navigator
The Reality A currently deployed flight enquiry
demo
... but we are not quite there yet!
3
Introduction
4
Current situation
Human Language System
Collect data
Observe
Cognitive Sciences
Computational Language Use
Development of neuro-biologically and
psycholinguistically plausible accounts of human
language processes (comprehension and production)
Symbolic and statistical models of human
language processing (e.g., via parsing,
semantics, generation, discourse analysis)
Engineering Systems
4
Introduction
5
State of the Art Speech Recognition
Goal is to convert acoustic signal to words
Acoustics
Y
He bought it
Words
W
State of the Art Speech Recognition
6
General Approach Hierarchy of Markov Chains
State of the Art Speech Recognition
7
Model Building
He said that ...
Speaking from the White House, the president said
today that the nation would stand firm against
the ....
about 500 million words of text
about 100 hours of speech
Acoustic Models
Language Model
7
State of the Art Speech Recognition
8
Recognising
State of the Art Speech Recognition
9
Progress in Automatic Speech Recognition
Easy
Word Error Rate
Hard
9
State of the Art Speech Recognition
10
Current Research in Acoustic Modelling
Hidden Markov Model
Quasi-stationary assumption a major weakness
State of the Art Speech Recognition
11
State of the Art Cognitive Neuroscience of
Speech and Language

Scientific understanding of human speech and
language in state of rapid transformation and
development
Rooted in cognitive/psycholinguistic accounts of
the functional structure of the language
system.
Primary drivers now coming from neurobiology, new
neuroscience techniques

11
State of the Art Cognitive Neuroscience of
Speech and Language
12
Neurobiology of homologous brain systems primate
neuroanatomy and neurophysiology
(Rauschecker Tian, PNAS, 2000)
12
State of the Art Cognitive Neuroscience of
Speech and Language
13

Provides a potential template for investigating
the human system
Illustrates the level of neural and functional
specificity that is achievable
Points to an explanation in terms of multiple
parallel processing pathways, hierarchically
organised

13
State of the Art Cognitive Neuroscience of
Speech and Language
14
Speech and language processing in the human brain

Requires an interdisciplinary combination of
neurobiology, psycho-acoustics,
acoustic-phonetics, neuro-imaging, and
psycholinguistics
Starting to deliver results with high degree of
functional and neural specificity
Ingredients for a future neuroscience of speech
and language

14
State of the Art Cognitive Neuroscience of
Speech and Language
15
Hierarchical organisation of processes in primary
auditory cortex (belt, parabelt)
(from Patterson, Uppenkamp, Johnsrude
Griffiths, Neuron, 2002)
15
State of the Art Cognitive Neuroscience of
Speech and Language
16
Hierarchical organisation of processing streams
Activation as a function of intelligibility for
sentences heard in different types of noise
(Davis Johnsrude, J. Neurosci, 2003). Colour
scale plots intelligibility-responsive regions
which were sensitive to the acoustic-phonetic
properties of the speech distortion (orange to
red) contrasted with regions (green to blue)
whose response was independent of lower-level
acoustic differences .
16
State of the Art Cognitive Neuroscience of
Speech and Language
17

Essential to image brain activity in time as well
as in space
EEG and MEG offer excellent temporal resolution
and improving spatial resolution
This allows dynamic tracking of the
spatio-temporal properties of language processing
in the brain
Demonstration (Pulvermüller et al) using MEG to
track cortical activity related to spoken word
recognition

17
State of the Art Cognitive Neuroscience of
Speech and Language
18
700 ms
18
State of the Art Cognitive Neuroscience of
Speech and Language
19

Glimpse of future directions in cognitive
neuroscience of language
Importance of understanding the functional
properties of the domain
Neuroscience methods for revealing the
spatio-temporal properties of the underlying
systems in the brain

20
State of the Art Computational Language Systems

Modelling interaction requires solutions for
Parsing Interpretation
Generation Synthesis
Dialogue management
Integration of component theories and technologies

20
State of the Art Computational Language Systems
21
Parsing and Interpretation

Goal is to convert a string of words into an
interpretable structure.
Marks bought Brooks
(TOP (S (NP-SBJ Marks)
(VP (VPD bought)
(NP
Brooks))
()))
Translate treebank into a grammar and statistical
model

21
State of the Art Computational Language Systems
22
Parser Performance

Improvement in performance in recent years over
unlexicalized baseline of 80 in ParsEval
Magerman 1995 84.3 LP 84.0 LR
Collins 1997 88.3 LP 88.1 LR
Charniak 2000 89.5 LP 89.6 LR
Bod 2001 89.7 LP 89.7 LR
Interpretation is beginning to follow (Collins
1999 90.9 unlabelled dependency recovery)
However there are signs of asymptote

23
Generation and Synthesis

Spoken dialogue systems use
Pre-recorded prompts
Natural sounding speech, but practically limited
flexibility
Text-to-speech
Provides more flexibility, but lacks adequate
theories of how timing, intonation, etc. convey
discourse information
Natural language (text) generation
Discourse planners to select content from data
and knowledge bases and organise it into semantic
representations
Broad coverage grammars to realise semantic
representation in language

24
Spoken Dialogue Systems

Implemented as Hierarchical Finite State Machines
or Voice XML
Can
Effectively handle simple tasks in real time
automated call routing
travel and entertainment information and booking
Be robust in face of barge-in
e.g., cancel or help
Take action to get dialogue back on track
Generate prompts sensitive to task context

24
State of the Art Computational Language Systems
25
Limitations of Current Approaches

Design and maintenance are labour intensive,
domain specific, and error prone
Must specify all plausible dialogues and content
Mix task knowledge and dialogue knowledge
Difficult to
Generate responses sensitive to linguistic
context
Handle user interruptions, user-initiated task
switches or abandonment
Provide personalised advice or feedback
Build systems for new domains

25
State of the Art Computational Language Systems
26
What Humans Do that Todays Systems Dont

Use context to interpret and respond to questions
Ask for clarification
Relate new information to whats already been
said
Avoid repetition
Use linguistic and prosodic cues to convey
meaning
Distinguish whats new or interesting
Signal misunderstanding, lack of agreement,
rejection
Adapt to their conversational partners
Manage the conversational turn
Learn from experience

26
State of the Art Computational Language Systems
27
Current directions

1M words of labelled data is not nearly enough
Current emphasis is on lexical smoothing and
semi-supervised methods for training parser
models
Separation of dialogue management knowledge from
domain knowledge
Integration of modern (reactive) planning
technology with dialogue managers
Reinforcement learning of dialogue policies
Anytime algorithms for language generation
Stochastic generation of responses
Concept-to-speech synthesis

27
State of the Art Computational Language Systems
28
Grand Challenge
To understand and emulate human capability for
robust communication and interaction.
28
Grand Challenge
29
Research Programme
Three inter-related themes

Exploration of language function in the human
brain
Computational modelling of human language use
Analysis and modelling of human interaction

The development of all three themes will aim at a
strong integration of neuroscience and
computational approaches
29
Grand Challenge
30
Theme 1 Exploration of language function in the
human brain

Development of an integrated cognitive
neuroscience account
precise neuro-functional mapping of speech
analysis system
identification/analysis of different cortical
processing streams
improved (multi-modal) neuro-imaging methods for
capturing spatio-temporal patterning of brain
activity supporting language function
linkage to theories of language learning/brain
plasticity
Expansion of neurophysiological/cross-species
comparisons
research into homologous/analogous systems in
primates/birds
development of cross-species neuro-imaging to
allow close integration with human data
Research across languages and modalities (speech
and sign)
contrasts in language systems across different
language families
cognitive and neural implications of spoken vs
sign languages

30
Grand Challenge
31
Theme 2 Computational modelling of human
language function

Auditory modelling and human speech recognition
learn from human auditory system especially use
of time synchrony and vocal tract normalisation
move away from quasi-stationary assumption and
develop effective continuous state models
Data-driven language acquisition and learning
extend successful speech recognition and parsing
paradigm to semantics, generation and dialogue
processing
apply results as filters to improve speech and
syntactic recognition beyond the current
asymptote
develop methods for learning from large
quantities of unannotated data
Neural networks for speech and language
processing
develop kernel-based machine learning techniques
such as SVM to work in continuous time domain
understand and learn from human neural processing

31
Grand Challenge
32
Theme 3 Analysis and modelling of human
interaction

Develop psychology, linguistics, and neuroscience
of interactive language
Integrate psycholinguistic models with context to
produce situated models
Study biological mechanisms for interaction
Controlled scientific investigation of natural
interaction using hybrid methods
Integration of eye tracking with neuro-imaging
methods
Computational modelling
Tractable computational models of situated
interaction
e.g., Joint Action, interactive alignment,
obligations, SharedPlans
Integration across levels
in interpretation integrate planning, discourse
obligations, and semantics into language models
in production
semantics of intonation
speech synthesizers that allow control of
intonation, timing

32
Grand Challenge
33
Summary of Benefits
Grand Challenge
To understand and emulate human capability for
robust communication and interaction.

Greater scientific understanding of human
cognition and communication
Significant advances in noise-robust speech
recognition, understanding, and generation
technology
Dialogue systems capable of adapting to their
users and learning on-line
Improved treatment and rehabilitation of
disorders in language function novel language
prostheses

33
Grand Challenge

Write a Comment

User Comments (0)