PowerPoint-Pr

About This Presentation

Title:

PowerPoint-Pr

Description:

Summary of the dialogue automatically generated by the system ... Prof. Gibbon. Univ. Bielefeld. Prof. Blauert. Univ. Bochum. Prof. Rohrer. Univ. Stuttgart ... – PowerPoint PPT presentation

Number of Views:613

Avg rating:3.0/5.0

Slides: 75

Provided by: janalexa

Category:

more less

Transcript and Presenter's Notes

Title: PowerPoint-Pr

1

Experiences from large NLP Projects
Jan Alexandersson
German research center for Artificial
Intelligence GmbH Stuhlsatzenhausweg 3, Geb. 43.1
66123 Saarbrücken Tel. (0681) 302-5347 Email
janal_at_dfki.de www.dfki.de/janal
2
Overview

Introduction
What was VerbMobil
What is SmartKom
Scaling
Experiences from VerbMobil
Conclusion

3
What was...
http//verbmobil.dfki.de
?
4
VerbMobil - What was it?

Speech-to-speech translation system
Robust processing of spontaneous dialogs
Speaker independent (adaptive)
Languages English, German, Japanese
Domains Appointment scheduling, travel planning
and hotel reservation, remote PC maintenance
Summary of the dialogue automatically generated
by the system
The system mediates between two humans, it does
not play an active role
There is no control of the ongoing dialog by the
system

5
The Verbmobil Partners
6
The Verbmobil Partners
7
Facts About the Project

23 participating institutions (in Verbmobil II),
from Germany and the USA
Over 900 full-time employees and students
involved over the whole duration
Funded by the German Ministry for Education and
Science and the participating companies

8
Project Organization
German Federal Ministry for Research and Education
Scientific Management
Group of Module Managers
Scientific Head W. Wahlster
Deputy Scientific Head A. Waibel
Module Coordinator N. Reithinger
Head of Project Management Group R. Karger
Head of System Integration Group A. Klüter
Verbmobil Advisory Board
9
Challenges for Language Engineering
10
Classification of Machine TranslationMethods
Interlingua
Semantic Transfer
SemanticStructure
SemanticStructure
SemanticAnalysis
SemanticGeneration
Syntactic Transfer
SyntacticStructure
SyntacticStructure
SyntacticGeneration
SyntacticAnalysis
Word Structure
Word Structure
Direct Translation
MorphologicAnalysis
MorphologicGeneration
Source Language
Target Language
11
The VerbMobil Case
Interlingua
Semantic Transfer
SemanticStructure
SemanticStructure
SemanticAnalysis
SemanticGeneration
SyntacticStructure
Syntactic Transfer
SyntacticStructure
SyntacticGeneration
SyntacticAnalysis
Word Structure
Word Structure
Direct Translation
MorphologicAnalysis
MorphologicGeneration
ProsodicAnalysis
ProsodicAnnotation
Speech Signal
Speech Signal
Source Language
Target Language
12
The Graphical User Interface
13
Focuses of Speech Recognitionin Verbmobil
DaimlerChrysler
University ofKarlsruhe
Multilinguality
Robustness
LargeVocabulary
RWTHAachen
14
General Speech Recognition Task
Audio Signal
Recognizers
Word Hypotheses Graph
interface between acoustic and linguistic
processing
15
Open Microphone Approach
Microphone open
Microphone 1
Synchronization
Speech output
Microphone 1
16
What Linguistic Analysis Really Needs

Syntactic Boundaries
He saw ? the man ? with the telescope
Prosody cannot help
Dialog Act Boundaries
No, I have no time at all on Thursday. D
But how about on Friday?
Dialog acts are pragmatic units that chunk the
input into
units which can be processed alone.
Prosodic Syntactic Boundaries
Of course ? not ? on Saturday
Syntactic boundaries that correlate to the
acoustic-phonetic
reality help during analysis within one
chunk/dialog act.
Important in spontaneous speech with elliptical
utterances.

17
Prosody in Verbmobil
18
Facts about Repairs in the Verbmobil Corpus

21 of all turns in the Verbmobil corpus (79 562
turns ) contain at least one self correction
The syntactic category is preserved in most
cases(For example Out of a sample of 266 verb
replacements, 224 are again mapped to verbs)
Repairs take place in a restricted context(in
98 the reparandum consists of less than 5
words)
Repair sequences underlie certain regularities

19
The Understanding of Spontaneous Speech Repairs
I need a car next Tuesday oops Monday
Editing Phase
Repair Phase
Original Utterance
Editing Term
Reparans
Reparandum
Recognition of Substitutions
Transformation of the Word Hypotheses Graph
I need a car next Monday
20
Architecture of Repair Processing
On Thursday I cannot no I can meet äh
after one
21
Multiple Approaches

Mono-cultural approaches are dangerous
humans vs. viruses ? diversity
Microsoft vs. ILOVEYOU and copycats ? alternative
software solutions
Some sources of errors in a speech translation
system
external
spontaneous speech not well formed, hesitations,
repairs
bad acoustic conditions
human dialog behavior
internal
knowledge gaps in modules
software errors
probabilistic processing
? Use multiple engines, varying approaches on
various stages of processing

22
Multiple Approaches in Verbmobil

Exclusive alternatives three different 16 kHz
German speech recognizers with various
capabilities
Competing approaches
three parsers HPSG, Chunk, Statistical
five translation tracks case-based, dialog-act
based, statistical, substring- based, linguistic
(deep) semantic translation
Needed selection and combination of results from
competing tracks
parsers combination of partial analyses in the
semantic processing modules
translation pre-selection module

23
Multiple Translation Tracks - Approaches and
Advantages

Case-based
Approach uses examples from the aligned
bilingual Verbmobil corpus
Advantage good translation if input matches
example in corpus
Dialog-act based
Approach extract core intention (dialog act) and
content
Advantage robust wrt. recognition errors
Statistical
Approach use statistical language and
translation models
Advantage guaranteed translation with high
approximate correctness
Substring- based
Approach combines statistical word alignment
with precomputation of translation chunks and
contextual clustering
Advantage guaranteed translation with high
approximate correctness
Linguistic (deep) semantic translation
Approach classic approach using semantic
transfer
Advantage high quality translation in case of
success

24
Example Based Translation

Result Translation and a confidence value
Benefit Improving Verbmobils translation
capabilities through an additional translation
path
Responsible DFKI, Kaiserslautern

TaskProviding a translation based on
translation templates and partial linguistic
analysis
Input WHGs or best Hypothesis
Method Definite Clause Grammar (DCG), graph
matching algorithms

25
Dialog-Act Based Translation

Result Translation and a confidence value,
additionally content descriptions for the dialog
module
Benefit Robust translation and content
extraction even when the recognition is erroneous
Responsible DFKI, Saarbrücken

TaskRobustly provide a translation of core
intentions and contents of the domain
Input Prosodically annotated best hypothesis
(flat WHG)
Method Statistical dialog-act classifier and
Finite State Transducers

26
Statistical Translation

Result Translation and a confidence value
Benefit Approximative correct translation for
spontaneous speech
Responsible RWTH Aachen

TaskProvide approximative correct translations
Input Prosodically annotated best hypothesis
(flat WHG)
Method Use statistical language and translation
models

27
Deep Translation

Result Translation containing content
information, suited for high quality speech
synthesis
Benefit
Delivers the highest quality, but is sensitive
to recognition errors and spontaneous speech
phenomena
Responsible Siemens AG, DFKI Saarbrücken,
Universität Tübingen, Universität des Saarlandes,
Universität Stuttgart, TU Berlin, CSLI Stanford

TaskProvide high quality translations
Input Prosodically annotated WHG and contextual
information
Method Use syntactic and semantic approaches to
analysis, transfer, and generation

28
Modules Involved

Deep Analysis HPSG Parser
Dialog Semanticscombination of parsing
results, and semantic resolution
Transfer VIT to VIT transfer
Generation TAG generation from VITs
DialogContext provides contextual information

Integrated processing comprises
search through the WHG
statistic parser
chunk parser
Semantic Construction provides VITs from
statistic and chunk parser output

29
The Multi-Parser Approach

Verbmobil uses three different syntactic parsers
an HPSG parser, a chunk parser, and a
probabilistic LR parser.
Every parser implements another level of parsing
accuracy, depth of syntactic analysis, and
robustness of the analyzing process.
Chunk parser Most robust but least accurate
analysis
HPSG parser Most accurate by least robust
analysis
Probabilistic parser Level of accuracy and
robustness between HPSG and chunk parser

30
Integrated Processing

Gets WHGs for the English, German, or Japanese
speech input and dispatches WHG information to
the three parsers
Provides an A search algorithm that allows any
connected parser to find the best scored path
using
acoustic score of the speech recognizer
Verbmobil trigram language model
Parsers analyze the same utterance simultaneously

31
HPSG Processing

Result Source language VITs
Benefit Delivers the highest quality, but is
sensitive to recognition errors and spontaneous
speech phenomena
Responsible DFKI Saarbrücken, CSLI Stanford

TaskThorough syntactic analysis
Input Word chains from integrated processing
Method Apply HPSG analysis

32
The Result is a Syntactic Tree
Alright, and that should get us there about nine
in the evening.
33
... but analysis is not always spanning
The train arise at seven thirty. We could take a
cab it to the hotel problem train station.
34
Semantic Construction

Result VITs
Benefit Providing results of shallow parser to
the deep analysis track
Responsible Universität Stuttgart (IMS)

TaskConvert and extend syntax trees to VITs
Input Syntax tree from statistical and chunk
parsers
Method Compositional construction using
semantic lexicon

35
Schematic Processing
Input
Syntactic tree
Lexcion access and interpretation of the
grammatical roles
Intermediate representation
Application Tree
Compositional semantic construction
Intermediate representation
VIT
Non compositional semantic construction using
transfer rule engine
Intermediate representation
Resulting VIT
36
Dialog Semantics

Result VIT ready for transfer
Benefit Enhances robustness of deep analysis
and provides vital information for transfer
Responsible Universität des Saarlandes,
Saarbrücken

TaskCombining results from various parsers,
reinterpret and correct VITs, and resolve
non-local ambiguities
Input VITs from different parsers
Method VIT models and rule based approaches

37
Combining Analyses from Various Parsers

Parsers deliver VITs for segments of a turn
May be spanning analyses or just partial
fragments
Combination necessary, both analyses of one
parsers, but also analyses from various parsers
Combination criteria
HPSG is better than statistical parsers is better
than chunk parser
Integrated results are better than fragments
Longer results are better than short ones

38
Semantic Based Transfer

Result VITs for generation
Benefit Translate VITs inside the deep
translation path
Responsible Universität Stuttgart (IMS)

TaskTransfer VITs from the source to the target
language
Input VITs
Method Rule based transfer

39
Context Evaluation

Result disambiguated transfer requests
Benefit Higher quality of transfer results
Responsible Technical University (TU) Berlin

TaskResolving ambiguities in the dialog context
during semantic transfer
Input Requests from transfer
Method Using world knowledge and rules

40
Dialog Processing

Result context information and dialog summaries
and minutes
Benefit Verbmobil knows what happens throughout
the dialog and can present it
Responsible DFKI, Saarbrücken

TaskProvides dialog context for all tracks and
computes main information for dialog summaries
Input Data from a lot of modules
Method Frame-like topic structuring and rules

41
Dialog Information in Semantic Transfer
42
The Intentional Structure
VM_Dialogue
Dialogue Level
PH_Greet
PH_Nego
Phase Level
G_Greet
G_Nego
G_Nego
Game Level
M_Greet
M_Tr_Init
M_Init
M_Resp
M_Greet
Move Level
DA Level
Greet
Pol_Form
Request
Suggest
Reject
Feedback
Introduce
Speaker
A
A
B
B
43
Collaboration for a New Functionality Summaries

Provide the users with a summary of the topics
that were agreed
Two benefits
have a piece of information to use in calendars
etc.
control the translation
Approach exploit already existing modules for
content extraction
dialog interpretation
planning the summary
generation
transfer

44
Summaries

Dialog module keeps track of the dialogdialog
model, context extraction, translations dialog
history
Three types of documents
Minutes relevant exchanges
Summary dialog results
Scripts complete dialog script

45
Multilingual Summaries

Multilinguality Integration of transfer module

Context
Syndialog
Dialog
VITs
VITs
VM-PROTO
Transfer (G?E)
VM-PROTO
GENGER
GENENG
Document structure
German Summary (HTML)
English Summary (HTML)
46
Result Summary
47
Generation

Result Strings, enriched with content-to-speech
(CTS) information to support synthesis
Benefit Output from the semantic transfer track
Responsible DFKI, Saarbrücken

TaskRobustly generate the output of the
semantic transfer in German, English, or Japanese
Input VITs from transfer
Method Constraint system for micro-planning,
TAG grammar (reusing HPSG grammars) for syntactic
realization

48
Multiple Translation Tracks Approx. correct
translation
120
100
97
case based
95
88
85
83
statistical
81
80
79
78
79
DA based
75
69
68
Sem. based
65
66
60
Substring
57
49
45
47
46
Selection (Man)
40
40
44
46
40
Selection (Learning)
37
Selection (Manual)
20
0
WA gt 50
WA gt 75
WA gt 80
37
44
46
case based
statistical
69
79
81
DA based
40
45
46
Sem. based
40
47
49
Substring
65
75
79
57
66
68
Selection (Automatic)
78
83
85
Selection (Learning)
88
95
97
Selection (Manual)
49
Verbmobil The Book
There are over 600 refereed papers on the various
aspects of and achievements in Verbmobil. Wolfgan
g Wahlster (ed.) "Verbmobil Foundations of
Speech-to-Speech Translation" Springer-Verlag
Berlin Heidelberg New York. 679 Pages ISBN
3-540-67783-6
50
What is...
http//smartkom.dfki.de
?
51
Reference Architecture for Multimodal Systems
2 Nov. 2001 Dagstuhl Seminar Fusion and
Coordination in Multimodal Interaction edited by
M. Maybury
Interaction Management
Mode Coordination
Mode Analysis
G
Discourse Management
T
Language
Biometrics
Multimodal Fusion
A
Graphics
Application Interface
ReferenceResolution
Multimodal ReferenceResolution
Gesture
G
Context Management
Initiate
Sound
V
Mode Design
Terminate
Expectation Management
Information, Applications, People
Presentation Design
A
Request
User(s)
Language
Intention Recognition
Select Content
Respond
Graphics
G
Design
Action Planning
Gesture
Integrate
A
Allocate
V
Sound
Coordinate
User Modeling
G
Animated Presentation Agent
Layout
User ID
Domain Model
Task Model
User Model
Discourse Model
Media Models
Application Models
Context Model
Representation and Inference, States and Histories
52
Situated Delegation-oriented Dialog Paradigm
Collaborative Problem Solving
IT Services
Service 1
Personalized Interaction Agent
User
specifies goal
delegates task
Service 2
cooperate on problems
asks questions
Service 3
presents results
53
The Main Modules on the Control GUI
54
More About the System

Modules realized as independent processes
Not all must be there (critical path speech or
graphic input to speech or graphic output)
(Mostly) independent from display size
Pool Communication Architecture (PCA) based on
PVM for Linux and NT
Modules know only about their I/O pools
Literature
Andreas Klüter, Alassane Ndiaye, Heinz Kirchmann
Verbmobil From a Software Engineering Point of
View System Design and Software Integration. In
Wolfgang Wahlster Verbmobil - Foundation of
Speech-To-Speech Translation. Springer, 2000.
Data exchanged using M3L documents
All modules and pools are visualized here ...

55
The Real Story
56
The Glue - M3L XML based Multimodal Markup
Language
Frame Languages Object-oriented
Modeling Primitives
NL/MM-Semantics More formal Semantics Subsumption
, Inferences
W3C Standards XML Schema/DTDs
M3L
NL/MM Representation
Domain Knowledge
XML schema
XML schema
XML schema
Pool
Pool
Pool
. ... .
57
Validation of Dialogue Systems

Project ValDia (DFKI DaimlerChrysler ULM)
Tool for validation of Dialogue Models/Managers
(DM)

Automatic
Analysis
ASR
Database
DM
Generator
Synthesis
Dialogue model
Manual
58
Validation of DM

Even slight changes can make test suites for DM
invalid (but not for parser, recognizer, )
Put persons in front of the complete system
We will eventually find errors
It is time consuming
For some scenarios impossible to exhaustively
validate a DM
What module failed to perform its task?
Combination of errors?
the whole system has to be put together

59
Validation of DM

ValDia approach Replace test person and I/O
modules with ValDia

Database
DM
Dialogue model
60
Experiences

ValDia detects errors
Logical
Combination of greet und request leads to goal
conflict in DM DM hang!
Technical
After about 500 Dialogues DM crashed due to
erroneous memory handling

61
What is
Scalability
?
62
What is Scale (-able)?

WordNet (1.6)
Noun scaling has 3 senses
(grading) the act of arranging in a graduated
series
act of measuring, arranging or adjusting
according to a scale
ascent by or as if by a ladder
Verb scale has 8 senses
measure by or as if by a scale "This bike scales
only 25 pounds
pattern, make, ... or estimate according to some
rate or standard
take by attacking with scaling ladders
(surmount) -- reach the highest point of
climb up by means of a ladder
scale, descale -- remove the scales from "scale
fish"
measure with or as if with scales "scale the
gold"
size or measure according to a scale

63
Scaling what/how?
Cheaper
Better
Robuster
Multilinguality
Depth
Faster
Bigger
Precision
Coverage
64
Coverage
SIZE
Speed
Robustness
Depth
65
Who are we scaling for?

EU
NSF
BMBF
Industri
...

Basic research ? Research Prototypes
Applied research / Product development
Real Systems
66
Experiences VerbMobil

Many people has said
With 15-20 persons on one spot I would make a
VerbMobil of my own. But muuuuuch
better/cheaper/...
This is not true!
Software enginering
Ex Speech recognition
-93
Single word recognition
Push-to-talk
-00
Open microphone
Spontaneous Speech

67
The VerbMobil Corpus

3,200 dialogs (G 1,454, E 726, J 1,020)
1,658 speakers (G 1,022, E 202, J 434)
79,562 turns (G 41,512, E 16,104, J 21,946)
1,520,000 running words (G 670,000, E 270,000,
J 580,000)
181,6 hours were recorded (G 96.1, E 37.9, J
47.7)
were recorded using
a close microphone,
a room microphone and
a telephone

68
The VerbMobil Corpus

transcribed and distributed on
56 CDs (21.5 GB)
Analyzing the corpus
206,000 instances of articulatory background
noise,
85,000 instances of breathing and
35,000 hesitations
voiced 19,000,
nasal 2,500,
vocalic-nasalized. 13,500
The Verbmobil data are distributed to research
or commercial users via the Bavarian Archive of
Speech Signals (BAS)

69
Experiences from WOZ
GER142 danach könnten wir gemeinsam Abendessen
gehen
SIM143 Bitte wiederholen Sie Ihre Äußerung.
Es ist ein Fehler in der semantischen
Verarbeitung aufgetreten
GER144 oh ,danach könnten wir gemeinsam
abendessen
SIM145 Bitte wiederholen Sie Ihre Äußerung mit
anderen Wörtern. Die semantische Verarbeitung
war nicht erfolgreich
GER146 äh, okay
ENG147 maybe a bit louder ?
GER148 yes , I invite you for the dinner.
70
Development HPSG

Starting point HPGS for written G/E
Goal 10.000 Lexical Entries for spont. spoken
G/E
Schema 20-40

12000
10000
8000
6000
4000
2000
0
-93
-96 (V1.0)
-98
-00
71
Development HPSG

What factors contributed to progress?
Getting to know the challenge
Spontaneous/Spoken vs
Written Language
Finding a Suitable Formalism
Tools
Interface
Verbmobil Interface Term (VIT)
Compilation Techniques
Test Suites
Corpora

72
Well Defined Interfaces

Speech Recognotion Linguistic Modules
Word Hypothesis Graph (WHG)
Between (deep) Linguistic Modules
VerbMobil Interface Term (VIT)
Linguistic Modules Synthesizer
Annotated String (Concept-to-Speech)

73
Verbmobil From a Software Engineering Point of
View

System Design and Software Integration

74
Software Technology Challenges

The goal
Build an integrated system
The situation
Researchers do research
Using different programming languages
Researchers dont want to be bothered with
technical details
The solution
Introducing the System Group
Maximal technical support for the
researchers/developers

75
The System Architecture
Verbmobil I
Verbmobil II
Multi-Agent Architecture
Multi-Blackboard Architecture
M1
M2
M3
M3
M1
M2
Blackboards
BB 2
BB 1
BB 3
M4
M5
M6
M5
M6
M4
? Modules know all communication partners ?
Direct communication between modules ? Reconfigu
ration difficult ? Software ICE and ICE Master ?
Basic Platform PVM
? Modules know their I/O data pools ? No direct
communication between modules 198 blackboards
vs. 2380 direct comm. paths ? Reconfiguration
easy ? Several instances of one
module/functionality ? Software PCA and Module
Manager ? Basic Platform PVM
76
Sample Pool Structure
77
Distributed Execution Supports Distributed
Development
server 2
controlling terminal
server 1
Pool Communication Architecture
User 1
User 2
78
Support from the System Group (1)

Integration framework (Testbed) with
common communication mechanism for all used
programming languages (C, C, Lisp, Prolog,
Java, Fortran, Tcl/Tk)
Narrow interface for all used programming
languages
Overall system control infrastructure
Standards on various levels
Installation
Compilation
Communication formats between modules
...
Toolbox for recording, replaying, testing,
inspecting data exchanged between modules, ...

79
The Testbed is the Integration Framework for the
Verbmobil System
80
The GUIVisualization and Debug Tool
.... and much more
81
Support from the System Group (2)Regular
Integration Cycles
Assure high system stability and robustness in
connection with large-scale testing
audio modules,testbed
82
Support from the System Group (3)The FTP Server