Title: Sin t
1Linguistic and Logical Tools for an Advanced
Interactive Speech System in Spanish
J. Álvarez, V. Arranz, N. Castell M. Civit TALP
Research Centre UPC, Barcelona
2Contents
- Introduction
- Corpora Construction
- System Architecture
- Understanding Module
- Input Problems Solutions Adopted
- Language Processing
- Morphology
- Syntax
- Semantic Extraction
- Dialogue Manager
- Conclusions
3Introduction
- Increasing need for more natural HMI
- Development of a dialogue system
- spontaneous speech
- restricted-domain railway information
- rather user-friendly communication exchange
- language of application Spanish
- Other related systems
- ATIS, TRAINS, LIMSI ARISE, TRINDI, ...
4Corpora Construction
- Project objective none available in Spanish
- Two different corpora developed
- human-human
- human-machine (Wizard of Oz technique)
- 150 different situations
- an open scenario
- total of 227 dialogues
5System Architecture
6Understanding Module
7Input Problems (1)
- Recognition Errors
- Excess of information
- U sábado treinta de octubre (Saturday,
October 30) - R un tren que o sábado treinta de octubre
- (a train that or...)
- Erroneous recognition
- U gracias (thank you)
- R sí pero ellos (yes but they)
8Input Problems (2)
- Grammar errors
- Lack of prepdet contractions de el ? del
- Wrong use of indefinite determiner un de
octubre - ? uno de octubre (1 October)
- Wrong orthographical transcriptions qué/que,
a/ha,... - ? (what/that, to/has,...)
9Input Problems (3)
- Problems caused by spontaneous speech
- Syntactic disfluencies
- U a ver los horarios de los trenes que van de
Teruel a Barcelona el este próximo viernes y que
vayan de Barcelona a Teruel el próximo que
vuelvan de Barcelona a Teruel el próximo domingo - Lexical disfluencies, pauses, noises, ...
-
10Solutions
- Adapting the recogniser to the domain
- Adapting the recogniser to spontaneous speech
- Adapting the understanding module
- Closing the entry channel
11Tools
- MACO Morphological Analyzer Corpus Oriented
Carmona et al., 98 - RELAX Relaxation Labelling Based Tagger Padró,
97 - TACAT Tagged Corpus Analyzer Tool Castellón et
al., 98 - PRE Production Rule Environment Turmo, 99
12Example
- User turn
- Me gustaría información sobre trenes de
Guadalajara a Cáceres para la primera semana de
agosto - (I would like some information about trains
from Guadalajara to Caceres for the first week of
August)
13Language Processing
Transcription
14Morphology (1)
- MACO
- contains knowledge organised into classes and
inflection paradigms - uses a task/domain lexicon less ambiguity and
better execution time - provides all possible labels per word
- RELAX
- disambiguates obtained labels
- is constraint based with relaxation labelling
15Morphology (2)
me yo PP1CSO00 gustaría gustar VMCP1S0 información
información NCFS000 sobre sobre SPS00 trenes
tren NCMP000 de de SPS00 Guadalajara guadalajara
NP000C0 a a SPS00 Cáceres cáceres NP000C0 para
para SPS00 la la TDFS0 primera primero
MOFS00 semana semana NCFS000 de de SPS00 agosto
agosto NCMS000 . . Fp
16Syntax (1)
- TACAT
- shallow parser
- context-free grammar adapted for the domain
- rules re-written for dates, timetables and proper
names - bottom-up strategy
- this adaptation helps semantic searches
17Syntax (2)
posgtS posgtpatons
posgtpp1cso00 , formagt"Me" , lemagt"yo"
posgtgrup-verb posgtvmcp3s0 ,
formagt"gustaría" , lemagt"gustar"
posgtsn posgtncfs000 ,
formagt"información" , lemagt"información"
posgtgrup-sp posgtsps00 ,
formagt"sobre" , lemagt"sobre"
posgtsn posgtncmp000 , formagt"trenes"
, lemagt"tren" posgtgrup-sp
posgtsps00 , formagt"de" , lemagt"de"
posgtsn posgtnp000c0 , formagt
"Guadalajara" , lemagt " Guadalajara"
posgtgrup-sp posgtsps00 ,
formagt"a" , lemagt"a" posgtsn
posgtnp000c0 , formagt"Cáceres" , lemagt "
Cáceres" posgtgrup-sp
.........
18Semantic Extraction (1)
Aim generation of semantic frames
19Semantic Extraction (2)
- System implemented in PRE
- PRE
- production rule environment
- very flexible and robust
- rule conditions contain syntactic patterns and
lexical items to search for - priority, score and control allow to specify
rule application, location of concept to
extract,...
20Semantic Extraction (3)
(rule CiudadOrigen3 ruleset CiudadOrigen
priority 10 score 0,_,1,0
control forever ending Postrule
(InputSentence tree ltagttree_matching(
posgtgrup-sp
lemagt dedesde
posgt np000c0, formagt?forma
)) -gt (?_ Print(CiudadOrigen,?
forma)) (?_ REM(CiudadOrigen,X,a)))
21Understanding Module
22Dialogue Manager (1)
- Implemented using YAYA Alvarez, 00
- Reasoning engine combines
- frames from the understanding module, with
- facts from the dialogue history, and with
- axioms
- in order to generate
- reaction facts from the system
- Output based on frames
- for the natural language generator (content)
- for the recogniser (Speech Act prediction)
23Dialogue Manager (2)
Output Frame
Sentence to generate De Guadalajara a Cáceres
qué día desea viajar? (From Guadalajara to
Caceres, when do you wish to travel?)
24Conclusions
- Corpus development valuable resource
- Adaptation of general NLP tools for
- domain
- spontaneous speech dialogue
- Development of new tools
- semantic extraction (use of PRE) flexible
robust - dialogue manager (use of YAYA) fast to develop
easy to modify - Challenge processing in real time
25Linguistic and Logical Tools for an Advanced
Interactive Speech System in Spanish
J. Álvarez, V. Arranz, N. Castell M. Civit TALP
Research Centre UPC, Barcelona