Title: RRL: A Rich Representation Language for the Description of Agent Behaviour in NECA
1RRL A Rich Representation Language for the
Description of Agent Behaviour in NECA
- Paul Piwek, ITRI, Brighton
- Brigitte Krenn, OFAI, Vienna
- Marc Schröder, DFKI, Saarbrücken
- Martine Grice, IPUS, SaarbrĂĽcken
- Stefan Baumann, IPUS, SaarbrĂĽcken
- Hannes Pirker, OFAI, Vienna
2(No Transcript)
3NECA
- Duration 2.5 years
- Start October 2001
- A new generation of mixed multi-user / multi
agent virtual spaces for the internet - Populated by
- affective conversational agents
4Affective Conversational Agents
- Express themselves through
- Emotional speech and
- synchronised non-verbal expression
5Application Scenarios
The NECA Platform will be evaluated in two
concrete application scenarios
- Socialite
- a multi-user web-application in the social domain
-
- eShowRoom
- a novel approach to the presentation of products
in e-Commerce applications
6 Socialite
7(No Transcript)
8NECAs Architecture
User Input
Affective Reasoner (AR)
Scene Generator
Scene Description
9NECAs Architecture
User Input
Affective Reasoner (AR)
Scene Generator
Scene Description
Multi-modal Natural Language Generator (M-NLG)
Multi-modal Output
10NECAs Architecture
User Input
Affective Reasoner (AR)
Scene Generator
Scene Description
Multi-modal Natural Language Generator (M-NLG)
Multi-modal Output
Text/Concept to Speech Synthesis (CTS)
Emotional Speech
PhoneticProsodic Information
11NECAs Architecture
User Input
Affective Reasoner (AR)
Scene Generator
Scene Description
Multi-modal Natural Language Generator (M-NLG)
Multi-modal Output
Text/Concept to Speech Synthesis (CTS)
Emotional Speech
PhoneticProsodic Information
Gesture Assignment Module (GA)
Animation directives
12NECAs Architecture
User Input
Affective Reasoner (AR)
Scene Generator
Scene Description
Multi-modal Natural Language Generator (M-NLG)
Multi-modal Output
Text/Concept to Speech Synthesis (CTS)
Emotional Speech
PhoneticProsodic Information
Gesture Assignment Module (GA)
Animation directives
Player-Specific Rendering
Animation Control Sequence
13NECAs Architecture
User Input
Affective Reasoner (AR)
Scene Generator
RRL
Scene Description
Multi-modal Natural Language Generator (M-NLG)
RRL
Multi-modal Output
Text/Concept to Speech Synthesis (CTS)
Emotional Speech
RRL
PhoneticProsodic Information
Gesture Assignment Module (GA)
RRL
Animation directives
Player-Specific Rendering
Animation Control Sequence
14Requirements for RRL
- Application Domain
- Represent combinations of different types of
information - Expressivity
- Processing Modules
- Ease of manipulation/search (incremental/fast)
- Developers (Maintainability)
- Predictability
- Locality
- Conciseness
- Intelligibility
15Scene Description
SG
What is a Scene? I Theatr. 1 A subdivision of
(an act of) a play, in which the time is
continuous and the setting fixed, the action
and dialogue comprised in any one of these
subdivisions. (New Shorter Oxford English
Dictionary, 1996)
M-NLG
TTS/CTS
GA
16Scene Descriptions in a Nutshell
- Network representations
- Flat, uniform
- Use the Description Logical T and A-box
distinction. T-box defines types, subtypes,
attributes and constants - Can emulate CFGs, so we can include, e.g.,
semantic representation languages Discourse
Representation Theory (Kamp Reyle, 1994) - Reification of expressions in the network provide
useful handles for interleaving different types
of information - Lends itself well for graphical representation
17Scene Descriptions in a Nutshell
- Further Features of (RRL) Scene Descriptions
- For communication between modules XML syntax
- Temporal relations are explicitly represented.
- Meta-conditions used in DRT for WH-questions,
Topics and Bridging Anaphora
18eShowRoom Example
19eShowRoom Example
20eShowRoom Example
21eShowRoom Example
22Multimodal Output
SG
- Multimodal Natural Language Generation (M-NLG)
supplies - Information on emotional state
- Conceptually rich input for Speech Synthesis
- Initial specification of gestures and facial
expressions for later use in Gesture Assignment
M-NLG
TTS/CTS
GA
23Necas Speech Synthesis Emotions
SG
- Not restricted to prosody (pitch, duration)
- Several voice databases
- diphon-inventories for different voice qualities
(modal, loud, soft) - Emotive interjections
- Gradual emotional states
- Shades of emotion / changing over time
M-NLG
TTS/CTS
GA
24Necas Speech Synthesis Concept-to-Speech
SG
- Concept-to-Speech instead of Text-to-Speech
approach - Part of Speech tags
- Syntactic structure
- Information status (given/new)
- Information structure (theme/rheme)
M-NLG
TTS/CTS
GA
25CTS specific information
SG
- ltsentencegt
- lttextgtThis car has leather seats.lt/textgt
- ltgesture modality"voice" meaning"beautiful"/gt
- ltsentencegt
M-NLG
TTS/CTS
GA
26CTS specific information
SG
- ltsentencegt
- lttextgtThis car has leather seats.lt/textgt
- ltgesture modality"voice" meaning"beautiful"/gt
-
-
- ltword text"This" pos"PDAT"/gt
-
- ltword text"car" pos"NN"/gt
-
-
-
-
- ltword text"has" pos"VAFIN"/gt
-
- ltword text"leather seats" pos"NN" /gt
-
- ltpunct text"." pos"."/gt
- lt/sentencegt
M-NLG
TTS/CTS
GA
27CTS specific information
SG
- ltsentencegt
- lttextgtThis car has leather seats.lt/textgt
- ltgesture modality"voice" meaning"beautiful"/gt
-
- ltsynPhrase category"NP" function"SB"gt
- ltword text"This" pos"PDAT"/gt
-
- ltword text"car" pos"NN"/gt
-
- lt/synPhrasegt
-
- ltsynPhrase phrase"VP" function"PD"gt
- ltword text"has" pos"VAFIN"/gt
- ltsynPhrase phrase"NP" function"OA"gt
- ltword text"leather seats" pos"NN" /gt
- lt/synPhrasegt
- ltpunct text"." pos"."/gt
- lt/synPhrasegt
-
M-NLG
TTS/CTS
GA
28CTS specific information
SG
- ltsentencegt
- lttextgtThis car has leather seats.lt/textgt
- ltgesture modality"voice" meaning"beautiful"/gt
-
- ltsynPhrase category"NP" function"SB"gt
- ltword text"This" pos"PDAT"/gt
- ltinfoStatus type"referent-given"gt
- ltword text"car" pos"NN"/gt
- ltinfoStatus /gt
- lt/synPhrasegt
-
- ltsynPhrase phrase"VP" function"PD"gt
- ltword text"has" pos"VAFIN"/gt
- ltsynPhrase phrase"NP" function"OA"gt
- ltword text"leather seats" pos"NN" /gt
- lt/synPhrasegt
- ltpunct text"." pos"."/gt
- lt/synPhrasegt
M-NLG
TTS/CTS
GA
29CTS specific information
SG
- ltsentencegt
- lttextgtThis car has leather seats.lt/textgt
- ltgesture modality"voice" meaning"beautiful"/gt
- ltinfoStruct part"theme"gt
- ltsynPhrase category"NP" function"SB"gt
- ltword text"This" pos"PDAT"/gt
- ltinfoStatus type"referent-given"gt
- ltword text"car" pos"NN"/gt
- lt/infoStatusgt
- lt/synPhrasegt
- ltinfoStruct part"rheme"gt
- ltsynPhrase phrase"VP" function"PD"gt
- ltword text"has" pos"VAFIN"/gt
- ltsynPhrase phrase"NP" function"OA"gt
- ltword text"leather seats" pos"NN" /gt
- lt/synPhrasegt
- ltpunct text"." pos"."/gt
- lt/synPhrasegt
- lt/infoStructgt
M-NLG
TTS/CTS
GA
30Prosodic/Phonetic Information for GA
SG
- Phonetics
- exact timing of speech sounds, pauses and
interjections - Prosody
- boundarie locations for
- syllables
- words
- prosodic phrases
M-NLG
TTS/CTS
GA
31Prosodic/Phonetic Information for GA
SG
- information on
- syllables bearing word-stress
- position and type of sentence accents
- position and type of prosodic boundaries
M-NLG
TTS/CTS
GA
32Animation directives
SG
- Phonetic information (phonemes) used for
specifying - Visemes
- breathing
M-NLG
TTS/CTS
GA
33Animation directives
SG
- Prosodic information (stress, accents, phrasing)
used for specifying - synchronization of gestures with speech
- eye-blinking
- gaze
M-NLG
TTS/CTS
GA
34Conclusions
- RRL is representation language for wide range of
expert knowledge required at interfaces of NECA
modules. - Scene Descriptions uniform representation/integra
tion of different types of information
(illustrated with integration of DRT) using
handles - Speech Synthesis conceptually rich input as
opposed to text - Gesture Assignment access to exact timing of
speech