Getting started - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Getting started

Description:

... pragmatic/semantic parameters to prosodic ones, finding the combinations that ... These will then be used to build a prosodic model for each emotional state. ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 29
Provided by: Anna170
Category:

less

Transcript and Presenter's Notes

Title: Getting started


1
Preparing Future multiSensorial inTerAction
Research
2
Outline
  • Introduction
  • Project objectives
  • Participants
  • Objectives, Scenarios, Approaches for each WP
  • Expected results
  • Main steps

3
Introduction

Speech to speech translation
Multilingual and Multisensorial Communication
(MMC)
Detection and expressions of emotional states
Core speech technologies for children
PFSTAR intends to contribute to establish future
activities in the field of MMC on firmer bases
by providing technological baselines, comparative
evaluationsand assessment of prospects of core
technologies, which future research
anddevelopment efforts can build from.
4
Project objectives
  • The project builds on years of research
    already conducted in several national and
    international research projects (NESPOLE!,
    C-STAR, Verbmobil, SmartKom).PFSTAR wants to
    improve on, refine, stabilise, and align current
    achievements to turn them into true technological
    baselines along with careful assessments and
    evaluations.

5
Project objectives
  • The goal of this project is to contribute to
    advance research and lay the foundations for
    future efforts on the topic of Multilingual and
    Multisensorial Communication

6
Participants
Istituto Trentino di Cultura Centro per la
Ricerca Scientifica e Tecnologica (ITC-irst)
Interactive Systems Laboratories at Universitaet
Karlsruhe (UKA)
Institute for Pattern Recognition of
Friedrich-Alexander Universitaet - Erlangen
Nurnberg (UERLN)
Department of Electronic, Electrical Computing
Engineering of the University of Birmingham (UB)
Kungl Tekniska Hogskolan (KTH)
RWTH Computer Science Department
Istituto di Scienze e Tecnologie della
Cognizione, Sezione di Padova Fonetica e
Dialettologia, CNR
7
WP2 Technologies for speech translation
Objectives
  • Comparative evaluation and integration of
    different technological baselines for speech to
    speech translation over a range of application
    scenarios.

8
WP2 Technologies for speech translation
Scenarios
- Human to human interaction (tourism and
traveling domains) - Document translation (open
domain) - Cross-language information retrieval
(open domain)
9
WP2 Technologies for speech translation
  • Approaches
  • - Interlingua-based approaches
  • Direct translation approaches (statistical
    models)

10
WP3-WP4 Technologies for emotions
  • Common Objectives
  • Consideration of the emotional state of both
    partners in computer-mediated human-human
    communication, to enhance the quality of the
    exchange.
  • Understand how the machine can support
    emotionally more adequate exchanges in
    Human-Computer Interaction.
  • Extend attention to observable paralinguistic and
    extra-linguistic markers, besides the linguistic
    ones.

11
WP3-WP4 Technologies for emotions
  • Approach
  • Two workpackages
  • WP3 focusing on speech (analysis/recognition and
    synthesis)
  • WP4 focusing on synthetic faces

12
WP3 Technologies for emotionsspeech
  • Objectives
  • Identification, extraction and assessment of
    prosodic and other linguistic cues correlated
    with, and indicating the expression of emotional
    states in speech.
  • Definition and assessment, in conjunction with
    WP4, of a technological baseline for believable
    expressive agents (talking heads), capable of
    communicating emotions through speech and facial
    gestures.

13
WP3 Technologies for emotions speech
Scenario
  • Analysis automatic dialogue systems
    interaction with entertainment robots
    human-machine, telephone-based communication.
  • Synthesis a broad scenario in which
    communication is mediated by expressive agents

14
WP3 Technologies for emotions speech
  • Approaches -analysis
  • Use of a large feature vector modelling the
    chosen prosodic parameters fundamental frequency
    (F0), energy, duration, and pauses
  • For each relevant emotional phenomena, a separate
    classifier will be used, whose output will be a
    probability rating.
  • All probabilities will be weighted (using
    automatic optimisation methods), yielding a
    single probability for each emotional state.

15
WP3 Technologies for emotions speech
  • Approaches -synthesis
  • Correlate syntactic and pragmatic/semantic
    parameters to prosodic ones, finding the
    combinations that yield better predictions.
  • These will then be used to build a prosodic model
    for each emotional state.
  • Use of different classifiers, like classification
    and regression trees, linear regression
    techniques, neural networks, etc., comparing and
    integrating their results.

16
WP4 Technologies for emotions synthetic faces
  • Objectives

Definition and assessment of a
technological baseline for believable virtual
agents in the form of talking heads, which
produces can communicate emotions by using both
the speech synthesis to be developed in WP3, and
facial gestures.
17
WP4 Technologies for emotions synthetic faces
Scenario
Human-computer interactive communication
Spoken dialogue systems 3D animated
agents
18
WP4 Technologies for emotions synthetic faces
  • Approaches
  • Development of a model of predefined prototypical
    facial gestures for the relevant subset of basic
    emotions.
  • Based on available and collected data, the
    generation models will be augmented to handle the
    complex interaction/integration of the linguistic
    and extralinguistic signals. The result will be a
    set of gesture libraries for controlling the
    facial expression of emotions.

19
WP5 Speech technologies for children
Objectives
Establish ASR (automatic speech
recognition) baselines for childrens speech in
English, German, Italian and Swedish.
20
WP5 Speech technologies for children
Scenario
  • Reading tutor
  • Interactive learning tools
  • Conversational interfaces for children

21
WP5 Speech technologies for children
  • Approaches
  • Acoustic feature extraction
  • Inter-speaker acoustic variability reduction
    through vocal tract length normalization
  • Acoustic modeling
  • Recognition of spontaneous speech spoken by
    children

22
Expected Results
PFSTAR intend to provide the European RD
community with the technological baselines for
future research and development efforts, with a
strong focus towards achieving the common goal of
bringing a solution to Multilingual and
Multisensorial Communication
23
Expected results WP2
Speech translation technologies
  • Improvement on current baselines
  • Comparison across various application scenarios
    of different approaches to contribute to the
    definition of new research directions and
    specific target applications for each approach.

24
Expected results WP3
Technologies for emotions speech (1)
  • Baseline results for different parameters
  • Recommendations for where to put more intensive
    research (classification technology, prosodic
    features, linguistic features, and units to be
    classified) based on results from realistic data
    rather than predefined sentences.

25
Expected results WP3
Technologies for emotions speech (2)
  • A classification of the different emotion classes
    which will be tunable according to a cost
    function, so that the overall system performance,
    rather than the pure recognition rate, can be
    optimised
  • Assessment of the interplay of different
    linguistic parameters in synthesis.

26
Expected results WP4
Technologies for emotions synthetic faces
  • Definition and assessment of a technological
    baseline for believable virtual agents in the
    form of talking heads
  • Collection and annotation of relatively small but
    varied database of audiovisual emotional speech
    in dialogue situations in the target languages
    Italian and Swedish

27
Expected results WP5
Speech technologies for children
  • Baselines for the involved languages (English,
    Italian, Swedish and German), with a significant
    increase in recognition rate.
  • An understanding of the extent of inter-speaker
    variability and of intra-speaker variability with
    respect to adults.
  • An assessment of the importance of
    children-specific pronunciation dictionaries and
    children-specific language models.

28
PFSTAR main steps
Final Workshop open to external participation
M4
Final set of results
M3
First set of results
M2
Specifications for technological baselines and
assessment procedures
M1
Time line
m1
m16
m24
After m24
Write a Comment
User Comments (0)
About PowerShow.com