PowerPoint-Pr - PowerPoint PPT Presentation

1 / 74
About This Presentation
Title:

PowerPoint-Pr

Description:

Summary of the dialogue automatically generated by the system ... Prof. Gibbon. Univ. Bielefeld. Prof. Blauert. Univ. Bochum. Prof. Rohrer. Univ. Stuttgart ... – PowerPoint PPT presentation

Number of Views:612
Avg rating:3.0/5.0
Slides: 75
Provided by: janalexa
Category:

less

Transcript and Presenter's Notes

Title: PowerPoint-Pr


1
 
Experiences from large NLP Projects
Jan Alexandersson
German research center for Artificial
Intelligence GmbH Stuhlsatzenhausweg 3, Geb. 43.1
66123 Saarbrücken Tel. (0681) 302-5347 Email
janal_at_dfki.de www.dfki.de/janal
2
Overview
  • Introduction
  • What was VerbMobil
  • What is SmartKom
  • Scaling
  • Experiences from VerbMobil
  • Conclusion

3
What was...
http//verbmobil.dfki.de
?
4
VerbMobil - What was it?
  • Speech-to-speech translation system
  • Robust processing of spontaneous dialogs
  • Speaker independent (adaptive)
  • Languages English, German, Japanese
  • Domains Appointment scheduling, travel planning
    and hotel reservation, remote PC maintenance
  • Summary of the dialogue automatically generated
    by the system
  • The system mediates between two humans, it does
    not play an active role
  • There is no control of the ongoing dialog by the
    system

5
The Verbmobil Partners
6
The Verbmobil Partners
7
Facts About the Project
  • 23 participating institutions (in Verbmobil II),
    from Germany and the USA
  • Over 900 full-time employees and students
    involved over the whole duration
  • Funded by the German Ministry for Education and
    Science and the participating companies

8
Project Organization
German Federal Ministry for Research and Education
Scientific Management
Group of Module Managers
Scientific Head W. Wahlster
Deputy Scientific Head A. Waibel
Module Coordinator N. Reithinger
Head of Project Management Group R. Karger
Head of System Integration Group A. Klüter
Verbmobil Advisory Board
9
Challenges for Language Engineering
10
Classification of Machine TranslationMethods
Interlingua
Semantic Transfer
SemanticStructure
SemanticStructure
SemanticAnalysis
SemanticGeneration
Syntactic Transfer
SyntacticStructure
SyntacticStructure
SyntacticGeneration
SyntacticAnalysis
Word Structure
Word Structure
Direct Translation
MorphologicAnalysis
MorphologicGeneration
Source Language
Target Language
11
The VerbMobil Case
Interlingua
Semantic Transfer
SemanticStructure
SemanticStructure
SemanticAnalysis
SemanticGeneration
SyntacticStructure
Syntactic Transfer
SyntacticStructure
SyntacticGeneration
SyntacticAnalysis
Word Structure
Word Structure
Direct Translation
MorphologicAnalysis
MorphologicGeneration
ProsodicAnalysis
ProsodicAnnotation
Speech Signal
Speech Signal
Source Language
Target Language
12
The Graphical User Interface
13
Focuses of Speech Recognitionin Verbmobil
DaimlerChrysler
University ofKarlsruhe
Multilinguality
Robustness
LargeVocabulary
RWTHAachen
14
General Speech Recognition Task
Audio Signal
Recognizers
Word Hypotheses Graph
interface between acoustic and linguistic
processing
15
Open Microphone Approach
Microphone open
Microphone 1
Synchronization
Speech output
Microphone 1
16
What Linguistic Analysis Really Needs
  • Syntactic Boundaries
  • He saw ? the man ? with the telescope
    Prosody cannot help
  • Dialog Act Boundaries
  • No, I have no time at all on Thursday. D
  • But how about on Friday?
  • Dialog acts are pragmatic units that chunk the
    input into
  • units which can be processed alone.
  • Prosodic Syntactic Boundaries
  • Of course ? not ? on Saturday
  • Syntactic boundaries that correlate to the
    acoustic-phonetic
  • reality help during analysis within one
    chunk/dialog act.
  • Important in spontaneous speech with elliptical
    utterances.

17
Prosody in Verbmobil
18
Facts about Repairs in the Verbmobil Corpus
  • 21 of all turns in the Verbmobil corpus (79 562
    turns ) contain at least one self correction
  • The syntactic category is preserved in most
    cases(For example Out of a sample of 266 verb
    replacements, 224 are again mapped to verbs)
  • Repairs take place in a restricted context(in
    98 the reparandum consists of less than 5
    words)
  • Repair sequences underlie certain regularities

19
The Understanding of Spontaneous Speech Repairs
I need a car next Tuesday oops Monday
Editing Phase
Repair Phase
Original Utterance
Editing Term
Reparans
Reparandum
Recognition of Substitutions
Transformation of the Word Hypotheses Graph
I need a car next Monday
20
Architecture of Repair Processing
On Thursday I cannot no I can meet äh
after one
21
Multiple Approaches
  • Mono-cultural approaches are dangerous
  • humans vs. viruses ? diversity
  • Microsoft vs. ILOVEYOU and copycats ? alternative
    software solutions
  • Some sources of errors in a speech translation
    system
  • external
  • spontaneous speech not well formed, hesitations,
    repairs
  • bad acoustic conditions
  • human dialog behavior
  • internal
  • knowledge gaps in modules
  • software errors
  • probabilistic processing
  • ? Use multiple engines, varying approaches on
    various stages of processing

22
Multiple Approaches in Verbmobil
  • Exclusive alternatives three different 16 kHz
    German speech recognizers with various
    capabilities
  • Competing approaches
  • three parsers HPSG, Chunk, Statistical
  • five translation tracks case-based, dialog-act
    based, statistical, substring- based, linguistic
    (deep) semantic translation
  • Needed selection and combination of results from
    competing tracks
  • parsers combination of partial analyses in the
    semantic processing modules
  • translation pre-selection module

23
Multiple Translation Tracks - Approaches and
Advantages
  • Case-based
  • Approach uses examples from the aligned
    bilingual Verbmobil corpus
  • Advantage good translation if input matches
    example in corpus
  • Dialog-act based
  • Approach extract core intention (dialog act) and
    content
  • Advantage robust wrt. recognition errors
  • Statistical
  • Approach use statistical language and
    translation models
  • Advantage guaranteed translation with high
    approximate correctness
  • Substring- based
  • Approach combines statistical word alignment
    with precomputation of translation chunks and
    contextual clustering
  • Advantage guaranteed translation with high
    approximate correctness
  • Linguistic (deep) semantic translation
  • Approach classic approach using semantic
    transfer
  • Advantage high quality translation in case of
    success

24
Example Based Translation
  • Result Translation and a confidence value
  • Benefit Improving Verbmobils translation
    capabilities through an additional translation
    path
  • Responsible DFKI, Kaiserslautern
  • TaskProviding a translation based on
    translation templates and partial linguistic
    analysis
  • Input WHGs or best Hypothesis
  • Method Definite Clause Grammar (DCG), graph
    matching algorithms

25
Dialog-Act Based Translation
  • Result Translation and a confidence value,
    additionally content descriptions for the dialog
    module
  • Benefit Robust translation and content
    extraction even when the recognition is erroneous
  • Responsible DFKI, Saarbrücken
  • TaskRobustly provide a translation of core
    intentions and contents of the domain
  • Input Prosodically annotated best hypothesis
    (flat WHG)
  • Method Statistical dialog-act classifier and
    Finite State Transducers

26
Statistical Translation
  • Result Translation and a confidence value
  • Benefit Approximative correct translation for
    spontaneous speech
  • Responsible RWTH Aachen
  • TaskProvide approximative correct translations
  • Input Prosodically annotated best hypothesis
    (flat WHG)
  • Method Use statistical language and translation
    models

27
Deep Translation
  • Result Translation containing content
    information, suited for high quality speech
    synthesis
  • Benefit
  • Delivers the highest quality, but is sensitive
    to recognition errors and spontaneous speech
    phenomena
  • Responsible Siemens AG, DFKI Saarbrücken,
    Universität Tübingen, Universität des Saarlandes,
    Universität Stuttgart, TU Berlin, CSLI Stanford
  • TaskProvide high quality translations
  • Input Prosodically annotated WHG and contextual
    information
  • Method Use syntactic and semantic approaches to
    analysis, transfer, and generation

28
Modules Involved
  • Deep Analysis HPSG Parser
  • Dialog Semanticscombination of parsing
    results, and semantic resolution
  • Transfer VIT to VIT transfer
  • Generation TAG generation from VITs
  • DialogContext provides contextual information
  • Integrated processing comprises
  • search through the WHG
  • statistic parser
  • chunk parser
  • Semantic Construction provides VITs from
    statistic and chunk parser output

29
The Multi-Parser Approach
  • Verbmobil uses three different syntactic parsers
    an HPSG parser, a chunk parser, and a
    probabilistic LR parser.
  • Every parser implements another level of parsing
    accuracy, depth of syntactic analysis, and
    robustness of the analyzing process.
  • Chunk parser Most robust but least accurate
    analysis
  • HPSG parser Most accurate by least robust
    analysis
  • Probabilistic parser Level of accuracy and
    robustness between HPSG and chunk parser

30
Integrated Processing
  • Gets WHGs for the English, German, or Japanese
    speech input and dispatches WHG information to
    the three parsers
  • Provides an A search algorithm that allows any
    connected parser to find the best scored path
    using
  • acoustic score of the speech recognizer
  • Verbmobil trigram language model
  • Parsers analyze the same utterance simultaneously

31
HPSG Processing
  • Result Source language VITs
  • Benefit Delivers the highest quality, but is
    sensitive to recognition errors and spontaneous
    speech phenomena
  • Responsible DFKI Saarbrücken, CSLI Stanford
  • TaskThorough syntactic analysis
  • Input Word chains from integrated processing
  • Method Apply HPSG analysis

32
The Result is a Syntactic Tree
Alright, and that should get us there about nine
in the evening.
33
... but analysis is not always spanning
The train arise at seven thirty. We could take a
cab it to the hotel problem train station.
34
Semantic Construction
  • Result VITs
  • Benefit Providing results of shallow parser to
    the deep analysis track
  • Responsible Universität Stuttgart (IMS)
  • TaskConvert and extend syntax trees to VITs
  • Input Syntax tree from statistical and chunk
    parsers
  • Method Compositional construction using
    semantic lexicon

35
Schematic Processing
Input
Syntactic tree
Lexcion access and interpretation of the
grammatical roles
Intermediate representation
Application Tree
Compositional semantic construction
Intermediate representation
VIT
Non compositional semantic construction using
transfer rule engine
Intermediate representation
Resulting VIT
36
Dialog Semantics
  • Result VIT ready for transfer
  • Benefit Enhances robustness of deep analysis
    and provides vital information for transfer
  • Responsible Universität des Saarlandes,
    Saarbrücken
  • TaskCombining results from various parsers,
    reinterpret and correct VITs, and resolve
    non-local ambiguities
  • Input VITs from different parsers
  • Method VIT models and rule based approaches

37
Combining Analyses from Various Parsers
  • Parsers deliver VITs for segments of a turn
  • May be spanning analyses or just partial
    fragments
  • Combination necessary, both analyses of one
    parsers, but also analyses from various parsers
  • Combination criteria
  • HPSG is better than statistical parsers is better
    than chunk parser
  • Integrated results are better than fragments
  • Longer results are better than short ones

38
Semantic Based Transfer
  • Result VITs for generation
  • Benefit Translate VITs inside the deep
    translation path
  • Responsible Universität Stuttgart (IMS)
  • TaskTransfer VITs from the source to the target
    language
  • Input VITs
  • Method Rule based transfer

39
Context Evaluation
  • Result disambiguated transfer requests
  • Benefit Higher quality of transfer results
  • Responsible Technical University (TU) Berlin
  • TaskResolving ambiguities in the dialog context
    during semantic transfer
  • Input Requests from transfer
  • Method Using world knowledge and rules

40
Dialog Processing
  • Result context information and dialog summaries
    and minutes
  • Benefit Verbmobil knows what happens throughout
    the dialog and can present it
  • Responsible DFKI, Saarbrücken
  • TaskProvides dialog context for all tracks and
    computes main information for dialog summaries
  • Input Data from a lot of modules
  • Method Frame-like topic structuring and rules

41
Dialog Information in Semantic Transfer
42
The Intentional Structure
VM_Dialogue
Dialogue Level
PH_Greet
PH_Nego
Phase Level
G_Greet
G_Nego
G_Nego
Game Level
M_Greet
M_Tr_Init
M_Init
M_Resp
M_Greet
Move Level
DA Level
Greet
Pol_Form
Request
Suggest
Reject
Feedback
Introduce
Speaker
A
A
B
B
43
Collaboration for a New Functionality Summaries
  • Provide the users with a summary of the topics
    that were agreed
  • Two benefits
  • have a piece of information to use in calendars
    etc.
  • control the translation
  • Approach exploit already existing modules for
  • content extraction
  • dialog interpretation
  • planning the summary
  • generation
  • transfer

44
Summaries
  • Dialog module keeps track of the dialogdialog
    model, context extraction, translations dialog
    history
  • Three types of documents
  • Minutes relevant exchanges
  • Summary dialog results
  • Scripts complete dialog script

45
Multilingual Summaries
  • Multilinguality Integration of transfer module

Context
Syndialog
Dialog
VITs
VITs
VM-PROTO
Transfer (G?E)
VM-PROTO
GENGER
GENENG
Document structure
German Summary (HTML)
English Summary (HTML)
46
Result Summary
47
Generation
  • Result Strings, enriched with content-to-speech
    (CTS) information to support synthesis
  • Benefit Output from the semantic transfer track
  • Responsible DFKI, Saarbrücken
  • TaskRobustly generate the output of the
    semantic transfer in German, English, or Japanese
  • Input VITs from transfer
  • Method Constraint system for micro-planning,
    TAG grammar (reusing HPSG grammars) for syntactic
    realization

48
Multiple Translation Tracks Approx. correct
translation
120
100
97
case based
95
88
85
83
statistical
81
80
79
78
79
DA based
75
69
68
Sem. based
65
66
60
Substring
57
49
45
47
46
Selection (Man)
40
40
44
46
40
Selection (Learning)
37
Selection (Manual)
20
0
WA gt 50
WA gt 75
WA gt 80
37
44
46
case based
statistical
69
79
81
DA based
40
45
46
Sem. based
40
47
49
Substring
65
75
79
57
66
68
Selection (Automatic)
78
83
85
Selection (Learning)
88
95
97
Selection (Manual)
49
Verbmobil The Book
There are over 600 refereed papers on the various
aspects of and achievements in Verbmobil. Wolfgan
g Wahlster (ed.) "Verbmobil Foundations of
Speech-to-Speech Translation" Springer-Verlag
Berlin Heidelberg New York. 679 Pages ISBN
3-540-67783-6
50
What is...
http//smartkom.dfki.de
?
51
Reference Architecture for Multimodal Systems
2 Nov. 2001 Dagstuhl Seminar Fusion and
Coordination in Multimodal Interaction edited by
M. Maybury
Interaction Management
Mode Coordination
Mode Analysis
G
Discourse Management
T
Language
Biometrics
Multimodal Fusion
A
Graphics
Application Interface
ReferenceResolution
Multimodal ReferenceResolution
Gesture
G
Context Management
Initiate
Sound
V
Mode Design
Terminate
Expectation Management
Information, Applications, People
Presentation Design
A
Request
User(s)
Language
Intention Recognition
Select Content
Respond
Graphics
G
Design
Action Planning
Gesture
Integrate
A
Allocate
V
Sound
Coordinate
User Modeling
G
Animated Presentation Agent
Layout
User ID
Domain Model
Task Model
User Model
Discourse Model
Media Models
Application Models
Context Model
Representation and Inference, States and Histories
52
Situated Delegation-oriented Dialog Paradigm
Collaborative Problem Solving
IT Services
Service 1
Personalized Interaction Agent
User
specifies goal
delegates task
Service 2
cooperate on problems
asks questions
Service 3
presents results
53
The Main Modules on the Control GUI
54
More About the System
  • Modules realized as independent processes
  • Not all must be there (critical path speech or
    graphic input to speech or graphic output)
  • (Mostly) independent from display size
  • Pool Communication Architecture (PCA) based on
    PVM for Linux and NT
  • Modules know only about their I/O pools
  • Literature
  • Andreas Klüter, Alassane Ndiaye, Heinz Kirchmann
    Verbmobil From a Software Engineering Point of
    View System Design and Software Integration. In
    Wolfgang Wahlster Verbmobil - Foundation of
    Speech-To-Speech Translation. Springer, 2000.
  • Data exchanged using M3L documents
  • All modules and pools are visualized here ...

55
The Real Story
56
The Glue - M3L XML based Multimodal Markup
Language
Frame Languages Object-oriented
Modeling Primitives
NL/MM-Semantics More formal Semantics Subsumption
, Inferences
W3C Standards XML Schema/DTDs
M3L
NL/MM Representation
Domain Knowledge
XML schema
XML schema
XML schema
Pool
Pool
Pool
. ... .
57
Validation of Dialogue Systems
  • Project ValDia (DFKI DaimlerChrysler ULM)
  • Tool for validation of Dialogue Models/Managers
    (DM)

Automatic
Analysis
ASR
Database
DM
Generator
Synthesis
Dialogue model
Manual
58
Validation of DM
  • Even slight changes can make test suites for DM
    invalid (but not for parser, recognizer, )
  • Put persons in front of the complete system
  • We will eventually find errors
  • It is time consuming
  • For some scenarios impossible to exhaustively
    validate a DM
  • What module failed to perform its task?
  • Combination of errors?
  • the whole system has to be put together

59
Validation of DM
  • ValDia approach Replace test person and I/O
    modules with ValDia

Database
DM
Dialogue model
60
Experiences
  • ValDia detects errors
  • Logical
  • Combination of greet und request leads to goal
    conflict in DM DM hang!
  • Technical
  • After about 500 Dialogues DM crashed due to
    erroneous memory handling

61
What is
Scalability
?
62
What is Scale (-able)?
  • WordNet (1.6)
  • Noun scaling has 3 senses
  • (grading) the act of arranging in a graduated
    series
  • act of measuring, arranging or adjusting
    according to a scale
  • ascent by or as if by a ladder
  • Verb scale has 8 senses
  • measure by or as if by a scale "This bike scales
    only 25 pounds
  • pattern, make, ... or estimate according to some
    rate or standard
  • take by attacking with scaling ladders
  • (surmount) -- reach the highest point of
  • climb up by means of a ladder
  • scale, descale -- remove the scales from "scale
    fish"
  • measure with or as if with scales "scale the
    gold"
  • size or measure according to a scale

63
Scaling what/how?
Cheaper
Better
Robuster
Multilinguality
Depth
Faster
Bigger
Precision
Coverage
64
Coverage
SIZE
Speed
Robustness
Depth
65
Who are we scaling for?
  • EU
  • NSF
  • BMBF
  • Industri
  • ...

Basic research ? Research Prototypes
Applied research / Product development
Real Systems
66
Experiences VerbMobil
  • Many people has said
  • With 15-20 persons on one spot I would make a
    VerbMobil of my own. But muuuuuch
    better/cheaper/...
  • This is not true!
  • Software enginering
  • Ex Speech recognition
  • -93
  • Single word recognition
  • Push-to-talk
  • -00
  • Open microphone
  • Spontaneous Speech

67
The VerbMobil Corpus
  • 3,200 dialogs (G 1,454, E 726, J 1,020)
  • 1,658 speakers (G 1,022, E 202, J 434)
  • 79,562 turns (G 41,512, E 16,104, J 21,946)
  • 1,520,000 running words (G 670,000, E 270,000,
    J 580,000)
  • 181,6 hours were recorded (G 96.1, E 37.9, J
    47.7)
  • were recorded using
  • a close microphone,
  • a room microphone and
  • a telephone

68
The VerbMobil Corpus
  • transcribed and distributed on
  • 56 CDs (21.5 GB)
  • Analyzing the corpus
  • 206,000 instances of articulatory background
    noise,
  • 85,000 instances of breathing and
  • 35,000 hesitations
  • voiced 19,000,
  • nasal 2,500,
  • vocalic-nasalized. 13,500
  • The Verbmobil data are distributed to research
    or commercial users via the Bavarian Archive of
    Speech Signals (BAS)

69
Experiences from WOZ
GER142 danach könnten wir gemeinsam Abendessen
gehen
SIM143 Bitte wiederholen Sie Ihre Äußerung.
Es ist ein Fehler in der semantischen
Verarbeitung aufgetreten
GER144 oh ,danach könnten wir gemeinsam
abendessen
SIM145 Bitte wiederholen Sie Ihre Äußerung mit
anderen Wörtern. Die semantische Verarbeitung
war nicht erfolgreich
GER146 äh, okay
ENG147 maybe a bit louder ?
GER148 yes , I invite you for the dinner.
70
Development HPSG
  • Starting point HPGS for written G/E
  • Goal 10.000 Lexical Entries for spont. spoken
    G/E
  • Schema 20-40

12000
10000
8000
6000
4000
2000
0
-93
-96 (V1.0)
-98
-00
71
Development HPSG
  • What factors contributed to progress?
  • Getting to know the challenge
  • Spontaneous/Spoken vs
  • Written Language
  • Finding a Suitable Formalism
  • Tools
  • Interface
  • Verbmobil Interface Term (VIT)
  • Compilation Techniques
  • Test Suites
  • Corpora

72
Well Defined Interfaces
  • Speech Recognotion Linguistic Modules
  • Word Hypothesis Graph (WHG)
  • Between (deep) Linguistic Modules
  • VerbMobil Interface Term (VIT)
  • Linguistic Modules Synthesizer
  • Annotated String (Concept-to-Speech)

73
Verbmobil From a Software Engineering Point of
View
  • System Design and Software Integration

74
Software Technology Challenges
  • The goal
  • Build an integrated system
  • The situation
  • Researchers do research
  • Using different programming languages
  • Researchers dont want to be bothered with
    technical details
  • The solution
  • Introducing the System Group
  • Maximal technical support for the
    researchers/developers

75
The System Architecture
Verbmobil I
Verbmobil II
Multi-Agent Architecture
Multi-Blackboard Architecture
M1
M2
M3
M3
M1
M2
Blackboards
BB 2
BB 1
BB 3
M4
M5
M6
M5
M6
M4
? Modules know all communication partners ?
Direct communication between modules ? Reconfigu
ration difficult ? Software ICE and ICE Master ?
Basic Platform PVM
? Modules know their I/O data pools ? No direct
communication between modules 198 blackboards
vs. 2380 direct comm. paths ? Reconfiguration
easy ? Several instances of one
module/functionality ? Software PCA and Module
Manager ? Basic Platform PVM
76
Sample Pool Structure
77
Distributed Execution Supports Distributed
Development
server 2
controlling terminal
server 1
Pool Communication Architecture
User 1
User 2
78
Support from the System Group (1)
  • Integration framework (Testbed) with
  • common communication mechanism for all used
    programming languages (C, C, Lisp, Prolog,
    Java, Fortran, Tcl/Tk)
  • Narrow interface for all used programming
    languages
  • Overall system control infrastructure
  • Standards on various levels
  • Installation
  • Compilation
  • Communication formats between modules
  • ...
  • Toolbox for recording, replaying, testing,
    inspecting data exchanged between modules, ...

79
The Testbed is the Integration Framework for the
Verbmobil System
80
The GUIVisualization and Debug Tool
.... and much more
81
Support from the System Group (2)Regular
Integration Cycles
Assure high system stability and robustness in
connection with large-scale testing
audio modules,testbed
82
Support from the System Group (3)The FTP Server
  • Development at different cites
  • Communication via Email and FTP Server
  • UPLOAD
  • Software for integration
  • EXCHANGE
  • Exchanging software between developers
  • ALPHA Service
  • New integrated complete system

83
What contributed to the success of VerbMobil?
84
Important Contributions
  • Multiple approaches
  • Management
  • Meetings
  • Project meetings, Work Shops, ...
  • Corpus collection - Massive amounts of data for
  • Testing, Linguistic Phenomena, Annotation
  • System Group
  • Test bed, Integration Cycles, ...
  • Time
  • The Internet
  • ...

85
Conclusion
  • We still need
  • lot of man power
  • Researchers
  • Software engineers
  • Management
  • lot of data
  • annotate
  • learn from
  • All this costs a lot of /
  • The Holy Grale of NLP (too?)

Self learning systems
86
Thank you very much for your attention!
Write a Comment
User Comments (0)
About PowerShow.com