Research Directions in MultimodalMultimedia Systems for Children - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Research Directions in MultimodalMultimedia Systems for Children

Description:

Spontaneous children's speech exhibits greater degree of acoustic and linguistic ... Pronunciation modeling. Linguistic analysis of non-native speech ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 47
Provided by: miche236
Category:

less

Transcript and Presenter's Notes

Title: Research Directions in MultimodalMultimedia Systems for Children


1
Research Directions in Multimodal/Multimedia
Systems for Children
  • Alexandros Potamianos
  • Dept. of Electronics Computer Engineering
  • Technical University of Crete, Greece
  • September 2004

2
Outline
  • Motivation and Goals
  • Recent Work
  • Acoustic Analysis
  • Acoustic Modeling
  • Linguistic Analysis/Modeling
  • Pragmatics/Dialogue Analysis/Modeling
  • HCI and human factors
  • Speech/Multimodal Interfaces
  • Dialogue/Multimodal Systems
  • Research Directions

3
Motivation and Goals
  • Dynamics of man-machine interactions different
    for children and adults
  • Spontaneous childrens speech exhibits greater
    degree of acoustic and linguistic variability.
  • Problem solving skills and approaches differ with
    age.
  • Current spoken language technology not
    robust enough to handle spontaneous childrens
    speech (open research issues).
  • Little work exists in
  • Analysis modeling of conversational user
    interfaces for children.
  • Investigating multiple modalities of
    child-machine interaction.

4
Previous Work
  • Acoustic and linguistic analysis(Eguchi and Hirsh
    1969, Kent 1976, Goldstein 1980)
  • Babbling and initial language acquisition (Wexler
    and Culicover 1980)
  • Adults speaking to children (Fernald and Mazzle
    1991)
  • Speech disorders (JSHR)
  • Educational systems using speech recognition
    (Strommen and Frome 1993, Mostow et al 1995)

5
Recent Work
  • Acoustic Analysis
  • Acoustic Modeling
  • Linguistic Analysis/Modeling
  • Pragmatics/Dialogue Analysis/Modeling
  • HCI and human factors
  • Speech/Multimodal Interfaces
  • Dialogue/Multimodal Systems

6
Acoustic Analysis
  • What has been done
  • Ages 6-18
  • American English
  • Pitch
  • Formant Frequencies
  • Duration
  • Spectral Variability
  • Other work
  • Language acquisition
  • Speech pathologies

7
(No Transcript)
8
(No Transcript)
9
Acoustic Analysis Results
  • Mean and variance of acoustic correlates reach
    adult range occurs around 13 or 14 years
  • Children younger than 10 years show greater
    within-subject variability than adults
  • Formant values scale linearly with age,
    especially for males
  • Variability may reach minima around 14-16 years

10
Acoustic Analysis
  • What could be done short term
  • Investigate non-vowel phonemes for American
    English
  • Investigate other languages
  • Investigate English as a second-language
    (non-native speech)
  • What could be done long term
  • Ages 3-6
  • Other work
  • Language acquisition
  • Speech pathologies

11
Acoustic Modeling
  • What has been done
  • ASR baseline per age
  • Matched conditions (train-test on children)
  • Mismatched conditions (train on adults-test on
    children)
  • Vocal Tract Length Normalization (VTLN)
  • Global warping factor
  • Per utterance computed warping factor
  • Adaptation techniques
  • Bias Removal, MLLR, MAP, other
  • Acoustic modeling age-dependent models
  • Combinations of VTLN adaptation acoustic
    modeling

12
(No Transcript)
13
Acoustic Modeling Results
  • For matched conditions (train-test on children)
  • for children 7 (?) and higher performance similar
    to adults
  • For mismatched conditions (train on adults-test
    on children)
  • Performance significantly lower (x 2-5 times
    adult error)
  • Varies with age
  • VTLNadaptation reduces the adult-children
    performance gap by about 60

14
Acoustic Modeling
  • What could be done
  • Front-end, features
  • Enhanced VTLN algorithms
  • Other adaptation algorithms
  • Better acoustic modeling

15
Linguistic Analysis/Modeling
  • What has been done
  • Spontaneous speech
  • Length of Utterances, Pause Time, Duration,
    Filled Pauses
  • Linguistic exploration, linguistic perplexity.
  • Extraneous Speech, Variability Across Similar
    Utterances
  • Effects of Task Experience, Age and Gender on
    language usage

16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
Linguistic Analysis Results
  • No major age- and gender- differences for the
    8-14 age group for
  • Linguistic perplexity
  • Length of utterances
  • Linguistic exploration
  • Girls ages 11-13 display larger vocabulary, more
    exploration and somewhat longer utterances than
    boys
  • Disfluencies and hesitations as a function of age
    and gender
  • Frequency of false-starts (2 of utts) and
    mispronunciations (2 of utts) greater for the
    younger children than the older ones.
  • Breathing noise (4 of utts) 60 more common in
    younger children.
  • Frequency of filled pauses (8 of utts) for
    older children twice that of the younger ones.

21
Linguistic Analysis/Modeling
  • What could be done
  • Pronunciation modeling
  • Linguistic analysis of non-native speech
  • Linguistic analysis/modeling of spontaneous
    speech
  • More tasks (simpler/more complex)
  • More data
  • Better age coverage (challenge)

22
Pragmatics/Dialog Analysis
  • What has been done
  • Dialog strategies for problem solving
  • Determine factors related to task completion,
    time to completion, skipping dialog states,
    multiple requests
  • Determine the role of age, gender, and experience
    level on dialog interaction
  • Stereotypical dialogue modeling

23
Game Screen
24
Dialog-Tagging Tool
25
Dialog States
  • Navigate- Moving within a state
  • ( i.e. left or right)
  • Talk2Him- To stop a person for questioning
  • WhereDid- Ask question Where did the suspect
    go?
  • Arrest- Arrest suspect
  • TellAbout- Ask question Tell me about the
    suspect?
  • Goodbye- Tell person Thank you, goodbye
  • Merged State- Miscellaneous
  • Cluebook- Go to Magnifying Glass where clues
    are written
  • CloseDatabase- Get out of Atlas
  • Enterfeature- Enter in a clue about a suspects
    appearance
  • ActionState- Go to a state in the United States
  • Warrant- obtain warrant for a suspects arrest
  • Find- Look up a location
  • Database- Go to Atlas to look up a location

26
Navigation Graph
Double queries are about 7 times more common than
single queries
27
Dialog Diagrams
Cluebook Diagram
Database Diagram
Single and multiple feature entries were equally
common
Single question queries comprised about 60
28
Dialog Data Analysis
  • Speech utterances assigned to dialog states based
    on actions triggered (manual tagging), i.e.,
    dialog FSM, superset of game FSM.
  • Talk2Him - commands asking for attention of a
    characters attention.
  • TellMeAbout - queries about suspects whereabouts
    and physical characteristics.
  • Dialog state transitions analyzed as a function
    of age (8-10 vs. 11-14 year olds), gender, and
    experience levels.
  • Extraneous speech patterns in dialog flow
    modeled.
  • Dialog flow differences between voice and
    keyboardmouse modalities identified.

29
Dialog Modeling Results
  • Dialog flow patterns of male and female children
    very similar.
  • Dialog flow structure age-dependent due to
    differences in game playing skills older
    children
  • complete the game faster (take fewer turns).
  • spend less time in database search (more
    knowledgeable).
  • attempt multiple sub-tasks simultaneously (double
    queries).
  • Extraneous speech utterances age-dependent,
    speaker-dependent, and dialog-context dependent
    for younger children
  • there were twice as many extraneous utterances
    than for the older ones.
  • on average 5 of all utterances were extraneous.
  • the number of extraneous utterances ranged
    between 0-25 among individuals (7 variance).

30
Dialog Modeling
  • What could be done
  • Dialogue modeling as a function of age
  • More tasks (simpler/harder)
  • Better age coverage (challenge)

31
Speech Interface Human Factors
  • What has been done
  • Determine preference for voice versus
    conventional interfaces
  • Consider effects of
  • Age and gender
  • Task success (won/loss of game)
  • Level of experience
  • Investigate other human factor issues
  • Multi-modal interfaces for children (prototypes)

32
Population Statistics and Solicitation
  • Permission was obtained from Superintendents and
    flyers were distributed to the Summit and
    Berkeley Heights School Districts
  • 15 response rate
  • Consent forms to collect and analyze childrens
    speech were signed by participants parent or
    guardian

33
Exit Interview
  • Example Questions
  • What did you like about using voice activation?
  • What did you like/dislike about the game?
  • What other things would you like to see become
    voice activated?
  • Would you like to use voice with a keyboard and a
    mouse?
  • Rate Each Item On A
  • Scale of 1-5 (5 High)
  • Voice interface
  • Game
  • Use of headset
  • Database search
  • Error messages (TTS)
  • Multi-modal interface
  • Previous computer usage

34
Game and Voice Enjoyment
Childrens Response to Voice
  • 93 of children rated using their voice at least
    4 out of 5, while only 81 rated the game a 4 or
    5.
  • Enthusiasm for voice peaked in the 11-12 range.

35
Effect of Subjects Success
  • Game and voice enjoyment increased with number of
    games won

36
Effect of Gender
  • Gender had a negligible effect on subjects
    ratings

37
Multi-modal interface
  • Dislike of having to spell decreased with age
  • 2/3 of children preferred a multi-modal interface
    to voice only

38
Other Effects
  • Dislike of error messages(TTS) decreased with age
  • Enjoyment of headset roughly correlated with
    enjoyment of game

39
Human Factor Results
  • Age dependent factors had mostly to do with the
    game rather than the interface
  • Exception text-to-speech synthesis (younger kids
    did not like it)
  • Kids liked interacting with the computer using
    voice
  • Kids prefer combining interface modalities
  • Recognition accuracy and speed are crucial to the
    success of application

40
Multiple Modalities
  • Voice Vs. Keyboard Mouse based on data from 12
    children
  • Total number of commands roughly the same for
    navigation/query and database entry sub tasks.
  • 50 more actions in database search with keyboard
    and mouse.
  • Greetings (Thank you, Goodbye) reduced by
    factor of 3 with keyboard and mouse.

Although voice might not always be the most
efficient modality, it is the most natural one.
41
Interfaces Future Work
  • Analysis of uni-modal and multi-modal interface
    usage as a function of age
  • Adaptive interfaces
  • Adapt to age and experience of child
  • Educational systems and interfaces
  • Interfaces for children with disabilities
  • Interfaces with intelligence and personality
  • Multi-modal interfaces

42
Systems
  • Prototypes
  • Toys
  • Desktop
  • Educational
  • Children with special needs

43
Future Directions Summary
  • Acoustic Analysis
  • Investigate non-vowel phonemes for American
    English
  • Investigate other languages
  • Investigate English as a second-language
    (non-native speech)
  • Ages 3-6
  • Language acquisition
  • Speech pathologies
  • Acoustic Modeling
  • Front-end, features
  • Enhanced VTLN algorithms
  • Other adaptation algorithms
  • Better acoustic modeling

44
Future Directions Summary
  • Linguistic Analysis/Modeling
  • Pronunciation modeling
  • Linguistic analysis of non-native speech
  • Linguistic analysis/modeling of spontaneous
    speech
  • More tasks, more data, better age coverage
  • Dialogue Analysis/Modeling
  • Dialogue modeling as a function of age
  • More tasks, better age coverage (challenge)

45
Future Directions Summary
  • Interfaces/Human Factors
  • Analysis of interface usage as a function of age
    (challenge)
  • Adaptive interfaces Adapt to age and experience
    of child
  • Educational systems and interfaces
  • Interfaces for children with disabilities
  • Interfaces with intelligence and personality
  • Multi-modal interfaces
  • Systems
  • Toys
  • Desktop
  • Education
  • Children with special needs

46
Conclusions
  • Good progress on acoustic analysis and acoustic
    modeling
  • Technology works!
  • Less work on language/dialogue/interface aspects
  • Interesting prototype systems built (including
    multi-modal)
  • Not many products using speech recognition for
    children
Write a Comment
User Comments (0)
About PowerShow.com