Title: PowerPoint-Pr
1 Experiences from large NLP Projects
Jan Alexandersson
German research center for Artificial
Intelligence GmbH Stuhlsatzenhausweg 3, Geb. 43.1
66123 Saarbrücken Tel. (0681) 302-5347 Email
janal_at_dfki.de www.dfki.de/janal
2Overview
- Introduction
- What was VerbMobil
- What is SmartKom
- Scaling
- Experiences from VerbMobil
- Conclusion
3What was...
http//verbmobil.dfki.de
?
4VerbMobil - What was it?
- Speech-to-speech translation system
- Robust processing of spontaneous dialogs
- Speaker independent (adaptive)
- Languages English, German, Japanese
- Domains Appointment scheduling, travel planning
and hotel reservation, remote PC maintenance - Summary of the dialogue automatically generated
by the system - The system mediates between two humans, it does
not play an active role - There is no control of the ongoing dialog by the
system
5 The Verbmobil Partners
6 The Verbmobil Partners
7Facts About the Project
- 23 participating institutions (in Verbmobil II),
from Germany and the USA - Over 900 full-time employees and students
involved over the whole duration - Funded by the German Ministry for Education and
Science and the participating companies
8Project Organization
German Federal Ministry for Research and Education
Scientific Management
Group of Module Managers
Scientific Head W. Wahlster
Deputy Scientific Head A. Waibel
Module Coordinator N. Reithinger
Head of Project Management Group R. Karger
Head of System Integration Group A. Klüter
Verbmobil Advisory Board
9Challenges for Language Engineering
10Classification of Machine TranslationMethods
Interlingua
Semantic Transfer
SemanticStructure
SemanticStructure
SemanticAnalysis
SemanticGeneration
Syntactic Transfer
SyntacticStructure
SyntacticStructure
SyntacticGeneration
SyntacticAnalysis
Word Structure
Word Structure
Direct Translation
MorphologicAnalysis
MorphologicGeneration
Source Language
Target Language
11The VerbMobil Case
Interlingua
Semantic Transfer
SemanticStructure
SemanticStructure
SemanticAnalysis
SemanticGeneration
SyntacticStructure
Syntactic Transfer
SyntacticStructure
SyntacticGeneration
SyntacticAnalysis
Word Structure
Word Structure
Direct Translation
MorphologicAnalysis
MorphologicGeneration
ProsodicAnalysis
ProsodicAnnotation
Speech Signal
Speech Signal
Source Language
Target Language
12The Graphical User Interface
13Focuses of Speech Recognitionin Verbmobil
DaimlerChrysler
University ofKarlsruhe
Multilinguality
Robustness
LargeVocabulary
RWTHAachen
14General Speech Recognition Task
Audio Signal
Recognizers
Word Hypotheses Graph
interface between acoustic and linguistic
processing
15Open Microphone Approach
Microphone open
Microphone 1
Synchronization
Speech output
Microphone 1
16What Linguistic Analysis Really Needs
- Syntactic Boundaries
- He saw ? the man ? with the telescope
Prosody cannot help - Dialog Act Boundaries
- No, I have no time at all on Thursday. D
- But how about on Friday?
- Dialog acts are pragmatic units that chunk the
input into - units which can be processed alone.
- Prosodic Syntactic Boundaries
- Of course ? not ? on Saturday
- Syntactic boundaries that correlate to the
acoustic-phonetic - reality help during analysis within one
chunk/dialog act. - Important in spontaneous speech with elliptical
utterances.
17Prosody in Verbmobil
18Facts about Repairs in the Verbmobil Corpus
- 21 of all turns in the Verbmobil corpus (79 562
turns ) contain at least one self correction - The syntactic category is preserved in most
cases(For example Out of a sample of 266 verb
replacements, 224 are again mapped to verbs) - Repairs take place in a restricted context(in
98 the reparandum consists of less than 5
words) - Repair sequences underlie certain regularities
19The Understanding of Spontaneous Speech Repairs
I need a car next Tuesday oops Monday
Editing Phase
Repair Phase
Original Utterance
Editing Term
Reparans
Reparandum
Recognition of Substitutions
Transformation of the Word Hypotheses Graph
I need a car next Monday
20Architecture of Repair Processing
On Thursday I cannot no I can meet äh
after one
21Multiple Approaches
- Mono-cultural approaches are dangerous
- humans vs. viruses ? diversity
- Microsoft vs. ILOVEYOU and copycats ? alternative
software solutions - Some sources of errors in a speech translation
system - external
- spontaneous speech not well formed, hesitations,
repairs - bad acoustic conditions
- human dialog behavior
- internal
- knowledge gaps in modules
- software errors
- probabilistic processing
- ? Use multiple engines, varying approaches on
various stages of processing
22Multiple Approaches in Verbmobil
- Exclusive alternatives three different 16 kHz
German speech recognizers with various
capabilities - Competing approaches
- three parsers HPSG, Chunk, Statistical
- five translation tracks case-based, dialog-act
based, statistical, substring- based, linguistic
(deep) semantic translation - Needed selection and combination of results from
competing tracks - parsers combination of partial analyses in the
semantic processing modules - translation pre-selection module
23Multiple Translation Tracks - Approaches and
Advantages
- Case-based
- Approach uses examples from the aligned
bilingual Verbmobil corpus - Advantage good translation if input matches
example in corpus - Dialog-act based
- Approach extract core intention (dialog act) and
content - Advantage robust wrt. recognition errors
- Statistical
- Approach use statistical language and
translation models - Advantage guaranteed translation with high
approximate correctness - Substring- based
- Approach combines statistical word alignment
with precomputation of translation chunks and
contextual clustering - Advantage guaranteed translation with high
approximate correctness - Linguistic (deep) semantic translation
- Approach classic approach using semantic
transfer - Advantage high quality translation in case of
success
24Example Based Translation
- Result Translation and a confidence value
- Benefit Improving Verbmobils translation
capabilities through an additional translation
path - Responsible DFKI, Kaiserslautern
- TaskProviding a translation based on
translation templates and partial linguistic
analysis - Input WHGs or best Hypothesis
- Method Definite Clause Grammar (DCG), graph
matching algorithms
25Dialog-Act Based Translation
- Result Translation and a confidence value,
additionally content descriptions for the dialog
module - Benefit Robust translation and content
extraction even when the recognition is erroneous - Responsible DFKI, Saarbrücken
- TaskRobustly provide a translation of core
intentions and contents of the domain - Input Prosodically annotated best hypothesis
(flat WHG) - Method Statistical dialog-act classifier and
Finite State Transducers
26Statistical Translation
- Result Translation and a confidence value
- Benefit Approximative correct translation for
spontaneous speech - Responsible RWTH Aachen
- TaskProvide approximative correct translations
- Input Prosodically annotated best hypothesis
(flat WHG) - Method Use statistical language and translation
models
27Deep Translation
- Result Translation containing content
information, suited for high quality speech
synthesis - Benefit
- Delivers the highest quality, but is sensitive
to recognition errors and spontaneous speech
phenomena - Responsible Siemens AG, DFKI Saarbrücken,
Universität Tübingen, Universität des Saarlandes,
Universität Stuttgart, TU Berlin, CSLI Stanford
- TaskProvide high quality translations
- Input Prosodically annotated WHG and contextual
information - Method Use syntactic and semantic approaches to
analysis, transfer, and generation
28Modules Involved
- Deep Analysis HPSG Parser
- Dialog Semanticscombination of parsing
results, and semantic resolution - Transfer VIT to VIT transfer
- Generation TAG generation from VITs
- DialogContext provides contextual information
- Integrated processing comprises
- search through the WHG
- statistic parser
- chunk parser
- Semantic Construction provides VITs from
statistic and chunk parser output
29The Multi-Parser Approach
- Verbmobil uses three different syntactic parsers
an HPSG parser, a chunk parser, and a
probabilistic LR parser. - Every parser implements another level of parsing
accuracy, depth of syntactic analysis, and
robustness of the analyzing process. - Chunk parser Most robust but least accurate
analysis - HPSG parser Most accurate by least robust
analysis - Probabilistic parser Level of accuracy and
robustness between HPSG and chunk parser
30Integrated Processing
- Gets WHGs for the English, German, or Japanese
speech input and dispatches WHG information to
the three parsers - Provides an A search algorithm that allows any
connected parser to find the best scored path
using - acoustic score of the speech recognizer
- Verbmobil trigram language model
- Parsers analyze the same utterance simultaneously
31HPSG Processing
- Result Source language VITs
- Benefit Delivers the highest quality, but is
sensitive to recognition errors and spontaneous
speech phenomena - Responsible DFKI Saarbrücken, CSLI Stanford
- TaskThorough syntactic analysis
- Input Word chains from integrated processing
- Method Apply HPSG analysis
32The Result is a Syntactic Tree
Alright, and that should get us there about nine
in the evening.
33... but analysis is not always spanning
The train arise at seven thirty. We could take a
cab it to the hotel problem train station.
34Semantic Construction
- Result VITs
- Benefit Providing results of shallow parser to
the deep analysis track - Responsible Universität Stuttgart (IMS)
- TaskConvert and extend syntax trees to VITs
- Input Syntax tree from statistical and chunk
parsers - Method Compositional construction using
semantic lexicon
35Schematic Processing
Input
Syntactic tree
Lexcion access and interpretation of the
grammatical roles
Intermediate representation
Application Tree
Compositional semantic construction
Intermediate representation
VIT
Non compositional semantic construction using
transfer rule engine
Intermediate representation
Resulting VIT
36Dialog Semantics
- Result VIT ready for transfer
- Benefit Enhances robustness of deep analysis
and provides vital information for transfer - Responsible Universität des Saarlandes,
Saarbrücken
- TaskCombining results from various parsers,
reinterpret and correct VITs, and resolve
non-local ambiguities - Input VITs from different parsers
- Method VIT models and rule based approaches
37Combining Analyses from Various Parsers
- Parsers deliver VITs for segments of a turn
- May be spanning analyses or just partial
fragments - Combination necessary, both analyses of one
parsers, but also analyses from various parsers - Combination criteria
- HPSG is better than statistical parsers is better
than chunk parser - Integrated results are better than fragments
- Longer results are better than short ones
38Semantic Based Transfer
- Result VITs for generation
- Benefit Translate VITs inside the deep
translation path - Responsible Universität Stuttgart (IMS)
- TaskTransfer VITs from the source to the target
language - Input VITs
- Method Rule based transfer
39Context Evaluation
- Result disambiguated transfer requests
- Benefit Higher quality of transfer results
- Responsible Technical University (TU) Berlin
- TaskResolving ambiguities in the dialog context
during semantic transfer - Input Requests from transfer
- Method Using world knowledge and rules
40Dialog Processing
- Result context information and dialog summaries
and minutes - Benefit Verbmobil knows what happens throughout
the dialog and can present it - Responsible DFKI, Saarbrücken
- TaskProvides dialog context for all tracks and
computes main information for dialog summaries - Input Data from a lot of modules
- Method Frame-like topic structuring and rules
41Dialog Information in Semantic Transfer
42The Intentional Structure
VM_Dialogue
Dialogue Level
PH_Greet
PH_Nego
Phase Level
G_Greet
G_Nego
G_Nego
Game Level
M_Greet
M_Tr_Init
M_Init
M_Resp
M_Greet
Move Level
DA Level
Greet
Pol_Form
Request
Suggest
Reject
Feedback
Introduce
Speaker
A
A
B
B
43Collaboration for a New Functionality Summaries
- Provide the users with a summary of the topics
that were agreed - Two benefits
- have a piece of information to use in calendars
etc. - control the translation
- Approach exploit already existing modules for
- content extraction
- dialog interpretation
- planning the summary
- generation
- transfer
44Summaries
- Dialog module keeps track of the dialogdialog
model, context extraction, translations dialog
history - Three types of documents
- Minutes relevant exchanges
- Summary dialog results
- Scripts complete dialog script
45Multilingual Summaries
- Multilinguality Integration of transfer module
Context
Syndialog
Dialog
VITs
VITs
VM-PROTO
Transfer (G?E)
VM-PROTO
GENGER
GENENG
Document structure
German Summary (HTML)
English Summary (HTML)
46Result Summary
47Generation
- Result Strings, enriched with content-to-speech
(CTS) information to support synthesis - Benefit Output from the semantic transfer track
- Responsible DFKI, Saarbrücken
- TaskRobustly generate the output of the
semantic transfer in German, English, or Japanese - Input VITs from transfer
- Method Constraint system for micro-planning,
TAG grammar (reusing HPSG grammars) for syntactic
realization
48Multiple Translation Tracks Approx. correct
translation
120
100
97
case based
95
88
85
83
statistical
81
80
79
78
79
DA based
75
69
68
Sem. based
65
66
60
Substring
57
49
45
47
46
Selection (Man)
40
40
44
46
40
Selection (Learning)
37
Selection (Manual)
20
0
WA gt 50
WA gt 75
WA gt 80
37
44
46
case based
statistical
69
79
81
DA based
40
45
46
Sem. based
40
47
49
Substring
65
75
79
57
66
68
Selection (Automatic)
78
83
85
Selection (Learning)
88
95
97
Selection (Manual)
49Verbmobil The Book
There are over 600 refereed papers on the various
aspects of and achievements in Verbmobil. Wolfgan
g Wahlster (ed.) "Verbmobil Foundations of
Speech-to-Speech Translation" Springer-Verlag
Berlin Heidelberg New York. 679 Pages ISBN
3-540-67783-6
50What is...
http//smartkom.dfki.de
?
51Reference Architecture for Multimodal Systems
2 Nov. 2001 Dagstuhl Seminar Fusion and
Coordination in Multimodal Interaction edited by
M. Maybury
Interaction Management
Mode Coordination
Mode Analysis
G
Discourse Management
T
Language
Biometrics
Multimodal Fusion
A
Graphics
Application Interface
ReferenceResolution
Multimodal ReferenceResolution
Gesture
G
Context Management
Initiate
Sound
V
Mode Design
Terminate
Expectation Management
Information, Applications, People
Presentation Design
A
Request
User(s)
Language
Intention Recognition
Select Content
Respond
Graphics
G
Design
Action Planning
Gesture
Integrate
A
Allocate
V
Sound
Coordinate
User Modeling
G
Animated Presentation Agent
Layout
User ID
Domain Model
Task Model
User Model
Discourse Model
Media Models
Application Models
Context Model
Representation and Inference, States and Histories
52Situated Delegation-oriented Dialog Paradigm
Collaborative Problem Solving
IT Services
Service 1
Personalized Interaction Agent
User
specifies goal
delegates task
Service 2
cooperate on problems
asks questions
Service 3
presents results
53The Main Modules on the Control GUI
54More About the System
- Modules realized as independent processes
- Not all must be there (critical path speech or
graphic input to speech or graphic output) - (Mostly) independent from display size
- Pool Communication Architecture (PCA) based on
PVM for Linux and NT - Modules know only about their I/O pools
- Literature
- Andreas Klüter, Alassane Ndiaye, Heinz Kirchmann
Verbmobil From a Software Engineering Point of
View System Design and Software Integration. In
Wolfgang Wahlster Verbmobil - Foundation of
Speech-To-Speech Translation. Springer, 2000. - Data exchanged using M3L documents
- All modules and pools are visualized here ...
55The Real Story
56The Glue - M3L XML based Multimodal Markup
Language
Frame Languages Object-oriented
Modeling Primitives
NL/MM-Semantics More formal Semantics Subsumption
, Inferences
W3C Standards XML Schema/DTDs
M3L
NL/MM Representation
Domain Knowledge
XML schema
XML schema
XML schema
Pool
Pool
Pool
. ... .
57Validation of Dialogue Systems
- Project ValDia (DFKI DaimlerChrysler ULM)
- Tool for validation of Dialogue Models/Managers
(DM)
Automatic
Analysis
ASR
Database
DM
Generator
Synthesis
Dialogue model
Manual
58Validation of DM
- Even slight changes can make test suites for DM
invalid (but not for parser, recognizer, ) - Put persons in front of the complete system
- We will eventually find errors
- It is time consuming
- For some scenarios impossible to exhaustively
validate a DM - What module failed to perform its task?
- Combination of errors?
- the whole system has to be put together
59Validation of DM
- ValDia approach Replace test person and I/O
modules with ValDia
Database
DM
Dialogue model
60Experiences
- ValDia detects errors
- Logical
- Combination of greet und request leads to goal
conflict in DM DM hang! - Technical
- After about 500 Dialogues DM crashed due to
erroneous memory handling
61What is
Scalability
?
62What is Scale (-able)?
- WordNet (1.6)
- Noun scaling has 3 senses
- (grading) the act of arranging in a graduated
series - act of measuring, arranging or adjusting
according to a scale - ascent by or as if by a ladder
- Verb scale has 8 senses
- measure by or as if by a scale "This bike scales
only 25 pounds - pattern, make, ... or estimate according to some
rate or standard - take by attacking with scaling ladders
- (surmount) -- reach the highest point of
- climb up by means of a ladder
- scale, descale -- remove the scales from "scale
fish" - measure with or as if with scales "scale the
gold" - size or measure according to a scale
63Scaling what/how?
Cheaper
Better
Robuster
Multilinguality
Depth
Faster
Bigger
Precision
Coverage
64Coverage
SIZE
Speed
Robustness
Depth
65Who are we scaling for?
Basic research ? Research Prototypes
Applied research / Product development
Real Systems
66Experiences VerbMobil
- Many people has said
- With 15-20 persons on one spot I would make a
VerbMobil of my own. But muuuuuch
better/cheaper/... - This is not true!
- Software enginering
- Ex Speech recognition
- -93
- Single word recognition
- Push-to-talk
- -00
- Open microphone
- Spontaneous Speech
67The VerbMobil Corpus
- 3,200 dialogs (G 1,454, E 726, J 1,020)
- 1,658 speakers (G 1,022, E 202, J 434)
- 79,562 turns (G 41,512, E 16,104, J 21,946)
- 1,520,000 running words (G 670,000, E 270,000,
J 580,000) - 181,6 hours were recorded (G 96.1, E 37.9, J
47.7) - were recorded using
- a close microphone,
- a room microphone and
- a telephone
68The VerbMobil Corpus
- transcribed and distributed on
- 56 CDs (21.5 GB)
- Analyzing the corpus
- 206,000 instances of articulatory background
noise, - 85,000 instances of breathing and
- 35,000 hesitations
- voiced 19,000,
- nasal 2,500,
- vocalic-nasalized. 13,500
- The Verbmobil data are distributed to research
or commercial users via the Bavarian Archive of
Speech Signals (BAS)
69Experiences from WOZ
GER142 danach könnten wir gemeinsam Abendessen
gehen
SIM143 Bitte wiederholen Sie Ihre Äußerung.
Es ist ein Fehler in der semantischen
Verarbeitung aufgetreten
GER144 oh ,danach könnten wir gemeinsam
abendessen
SIM145 Bitte wiederholen Sie Ihre Äußerung mit
anderen Wörtern. Die semantische Verarbeitung
war nicht erfolgreich
GER146 äh, okay
ENG147 maybe a bit louder ?
GER148 yes , I invite you for the dinner.
70Development HPSG
- Starting point HPGS for written G/E
- Goal 10.000 Lexical Entries for spont. spoken
G/E - Schema 20-40
12000
10000
8000
6000
4000
2000
0
-93
-96 (V1.0)
-98
-00
71Development HPSG
- What factors contributed to progress?
- Getting to know the challenge
- Spontaneous/Spoken vs
- Written Language
- Finding a Suitable Formalism
- Tools
- Interface
- Verbmobil Interface Term (VIT)
- Compilation Techniques
- Test Suites
- Corpora
72Well Defined Interfaces
- Speech Recognotion Linguistic Modules
- Word Hypothesis Graph (WHG)
- Between (deep) Linguistic Modules
- VerbMobil Interface Term (VIT)
- Linguistic Modules Synthesizer
- Annotated String (Concept-to-Speech)
73Verbmobil From a Software Engineering Point of
View
- System Design and Software Integration
74Software Technology Challenges
- The goal
- Build an integrated system
- The situation
- Researchers do research
- Using different programming languages
- Researchers dont want to be bothered with
technical details - The solution
- Introducing the System Group
- Maximal technical support for the
researchers/developers
75The System Architecture
Verbmobil I
Verbmobil II
Multi-Agent Architecture
Multi-Blackboard Architecture
M1
M2
M3
M3
M1
M2
Blackboards
BB 2
BB 1
BB 3
M4
M5
M6
M5
M6
M4
? Modules know all communication partners ?
Direct communication between modules ? Reconfigu
ration difficult ? Software ICE and ICE Master ?
Basic Platform PVM
? Modules know their I/O data pools ? No direct
communication between modules 198 blackboards
vs. 2380 direct comm. paths ? Reconfiguration
easy ? Several instances of one
module/functionality ? Software PCA and Module
Manager ? Basic Platform PVM
76 Sample Pool Structure
77Distributed Execution Supports Distributed
Development
server 2
controlling terminal
server 1
Pool Communication Architecture
User 1
User 2
78Support from the System Group (1)
- Integration framework (Testbed) with
- common communication mechanism for all used
programming languages (C, C, Lisp, Prolog,
Java, Fortran, Tcl/Tk) - Narrow interface for all used programming
languages - Overall system control infrastructure
- Standards on various levels
- Installation
- Compilation
- Communication formats between modules
- ...
- Toolbox for recording, replaying, testing,
inspecting data exchanged between modules, ...
79The Testbed is the Integration Framework for the
Verbmobil System
80The GUIVisualization and Debug Tool
.... and much more
81Support from the System Group (2)Regular
Integration Cycles
Assure high system stability and robustness in
connection with large-scale testing
audio modules,testbed
82Support from the System Group (3)The FTP Server
- Development at different cites
- Communication via Email and FTP Server
- UPLOAD
- Software for integration
- EXCHANGE
- Exchanging software between developers
- ALPHA Service
- New integrated complete system
83What contributed to the success of VerbMobil?
84Important Contributions
- Multiple approaches
- Management
- Meetings
- Project meetings, Work Shops, ...
- Corpus collection - Massive amounts of data for
- Testing, Linguistic Phenomena, Annotation
- System Group
- Test bed, Integration Cycles, ...
- Time
- The Internet
- ...
85Conclusion
- We still need
- lot of man power
- Researchers
- Software engineers
- Management
- lot of data
- annotate
- learn from
- All this costs a lot of /
- The Holy Grale of NLP (too?)
Self learning systems
86Thank you very much for your attention!