SystemLevel Evaluation of ECAs - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

SystemLevel Evaluation of ECAs

Description:

Rather than focus on evaluation of ECA out of context, we considered: ... Migrating (e.g., ATR Agent Salon) Contextual Factors. Application domains. Entertainment ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 41
Provided by: John1426
Category:

less

Transcript and Presenter's Notes

Title: SystemLevel Evaluation of ECAs


1
System-Level Evaluation of ECAs
  • Dagstuhl EECA Symposium
  • March 15-19, 2004

2
Participants
  • W. Lewis Johnson, moderator
  • Tim Barker
  • Niels Ole Bernsen
  • Gautam Biswas
  • Marc Cavazza
  • Noor Christoph
  • Michael Gerhard
  • Anton Nijholt
  • Helmut Prendinger

3
Topics
  • Contextual factors that influence evaluation
  • Methodologies and methods for system-level
    evaluation
  • Relation to development methodologies
  • Evaluating user interaction
  • Evaluating user experience
  • Evaluating system effectiveness
  • Evaluating ECA characteristics in context

4
System-Level Evaluation Perspective
  • Rather than focus on evaluation of ECA out of
    context, we considered
  • Evaluation of user-ECA interaction
  • E.g., how fluent and successful is it?
  • Evaluation of the user experience
  • What is the users perception of interacting with
    ECA?
  • Evaluation of effectiveness
  • Does the ECA help to achieve the desired
    outcomes?
  • System-level evaluation requires
  • A functional ECA capable of interaction
  • A realistic usage context

5
Contextual Factors
  • A number of contextual factors can influence
    system-level evaluation

6
Contextual Factors
  • Target user group
  • Novice users
  • Young people
  • Disabled
  • Low literacy users
  • Users with motivational difficulties

7
Contextual Factors
  • Situational context of use
  • In the schools
  • In the living room
  • In the field
  • In the office
  • What is relevant here?
  • Public / private space
  • Noisy, distractive / quiet environments
  • Portable vs. fixed systems
  • Safety-critical vs. safe work environments

8
Contextual Factors
  • Context of activity
  • Isolated activity (e.g., games, current Web sales
    agents)
  • Integrated activity (e.g., within curriculum,
    work activity, ongoing interaction history etc.)
  • Impact
  • Prior user expectations, which in turn influence
    interaction, user experience
  • Overall effectiveness
  • Related factor in-experience activity context

9
Contextual Factors
  • Stance of user toward environment
  • First person (Andersen, MRE)
  • 3rd person (Tactical Language)
  • Identification with avatar character
  • God/director (Carmens Bright IDEAS)
  • Passive audience member
  • Number of users
  • Single
  • Many (Virtual Art Gallery)
  • Number of agents
  • Single
  • Many
  • Stance of agent toward environment
  • Resident
  • Migrating (e.g., ATR Agent Salon)

10
Contextual Factors
  • Application domains
  • Entertainment
  • Education
  • Training
  • Health interventions
  • Virtual community support
  • Information retrieval

11
Contextual Factors
  • Role in the application (major vs. minor)
  • Essential
  • Virtual dramas
  • Agents conveying a unique type of information,
    e.g., affect, uncertainty
  • Supporting / supplementary
  • Virtual guides e.g., Clippy
  • Facilitators for information retrievals
  • Elective
  • Customization of interface
  • Temporary / Phased
  • Support novices, then fade away as users gain
    skill

12
Contextual Factors
  • Social role of the ECA
  • Tutors
  • Advisors
  • Autonomous representatives
  • Antagonists / foils
  • Personal assistants
  • Companions

13
Contextual Factors
  • Life cycle of interaction
  • A single episodic interaction
  • Museum kiosks
  • Repeated episodic interactions
  • Web site hosts
  • Over extended episodes
  • Dramas, training systems
  • Over a series of extended episodes
  • Training systems, serialized dramas
  • Unending / lifelong
  • Ambient agents

14
Evaluation System Development
  • ECA development requires an iterative spiral
    approach
  • Prototypes are essential for requirements for
    elicitation, more so than for more well
    understood interfaces
  • Example approach accretive development
  • Evaluations needed to take advantage of
    fortuitous surprises in user-ECA interaction
  • Evaluation protocol should be iterative too
  • Particularly at early stages extended
    observations interviews of representative users
    are useful
  • Issue what is representative? Can that be done
    fairly?
  • Additional agendas theory development
  • May require a different iteration cycle

15
Evaluation Methodologies
  • ECA evaluations must include both user and system
    perspective
  • At early stages lightweight methods like case
    studies later, more quantitative methods
  • Choice of subjects
  • Self-selected representatives
  • Stratified samples
  • Protocol might involve non-system interactions,
    for comparison and transfer

16
Evaluation Methods
  • Extended observations interviews
    questionnaires
  • Logging interaction data
  • Physiological data
  • Videotapes
  • Coding methods critical
  • Often developed for the needs of the application
  • Manual, or built-in to logging process
  • How to do it systematically? Training raters,
    etc.
  • Gaps in coding scheme
  • Stimulated recall
  • Triangulate multiple methods
  • Need valid methods for qualitative evaluation
    assessment
  • Common methods
  • Semi-structured interviews
  • Ratings by validated raters
  • Validated questionnaire instruments
  • E.g., adopt valid instruments from social
    psychology

17
Prototypical Examples
  • Andersen
  • Tactical Language
  • SILA
  • Virtual Art Gallery
  • Bettys Brain
  • Virtual Interactive Presenter

18
Andersen
  • Contextual factors
  • User group 10-18 year olds
  • Isolated episodic interaction
  • Noisy environment
  • Strong user expectations for character
  • Spiral evaluation
  • Multi-person criticism
  • Formal theory-based evaluation

19
Andersen
  • Evaluating user-agent interaction
  • Characteristics
  • Domain-oriented conversation success
  • Variables
  • Initiative (driving the dialog, volunteered
    information)
  • Expertise symmetry
  • Measures
  • Wrt coding scheme for interaction (iteratively
    developed)
  • Conversational coherence
  • Metacommunication
  • Dialog repairs

20
Andersen
  • Evaluating user experience
  • Characteristics
  • Ease of use of interface
  • Impression of interaction
  • Satisfaction with the interaction
  • Fun
  • Measures
  • Scripted interview

21
Andersen
  • ECA properties
  • Nonverbal behavior
  • Only beginning to evaluate
  • Believability (means likeness to real Andersen)
  • Instance of personality
  • Measures
  • All from structured interview
  • Cursory assessments of coded interactions so far

22
TLTS
  • Overall approach tight spiral evaluation
  • Informal formative evaluations at USC US
    Military Academy
  • Short-term tests with many users
  • Extended sessions with representative users
  • Additional tests planned
  • Evaluating user-agent interaction
  • Characteristics
  • Interaction fluency
  • Sufficiently rapid response?
  • Frequency of dialog dysfluency
  • Repair of dysfluencies
  • Appropriateness of feedback
  • E.g. pronunciation feedback
  • Log learners actions
  • Log systems response
  • Native speakers rate the learners speech
  • Instructor rates agent response
  • As needed to inform iterative design
  • Pattern of user interaction

23
TLTS
  • User experience
  • Structured interviews
  • End, and at intermediate stage

24
TLTS
  • Effectiveness
  • Performance in the game
  • How much scaffolding is required
  • Transfer to post-test
  • Transfer to real conversation
  • Ablation / comparison studies planned
  • Tutoring vs. game vs. tutoring game

25
SILA
  • Spiral evaluation model
  • Study students in a natural setting
  • Wizard-of-Oz (colleagues, students)
  • Pilot study (with colleagues)
  • Main evaluation (students)

26
SILA
  • User-agent interaction
  • Characteristics of interaction
  • Equal participation between agent and human
  • Positive and negative collaborations
  • Measures
  • Categorize moves in the logs
  • Observational notes
  • Interviews, semi-structured
  • Comparing videos against logs

27
SILA
  • User experience
  • Characteristics
  • Perception of personality of the agent
  • Perception of helpfulness of the agent
  • Perception of ease of use
  • Method
  • Semi-structured interview
  • Triangulation with video

28
SILA
  • Effectiveness
  • Characteristics
  • Improvement of summarization skills with SILA
    after
  • Variables
  • Perceived personality of the agent
  • Methods
  • Independent markers mark summaries with and
    without

29
Virtual Art Gallery
  • Overall method non-spiral approach
  • Field study (without ECA)
  • Controlled experiment with agent, without agent
    (30 subjects)

30
Virtual Art Gallery
  • User agent interaction
  • Characteristics
  • Similarity to human dialog conventions
  • Relevance of agent answers to dialog and
    environment
  • Methods
  • Log files, transcripts
  • User-perceived problems

31
Virtual Art Gallery
  • User experience
  • Characteristics
  • Presence
  • Awareness
  • Involvement
  • Immersion
  • Methods / measures
  • Questionnaire, pre and post
  • Validated questionnaire

32
Bettys Brain
  • Evaluation steps
  • Present mockup video to students, simulated
    walkthrough
  • Initial versions tested on Vanderbilt undergrads
  • Three large studies
  • Testing components of learning-by-teaching
  • Self-assessment
  • Pure tutoring system without agent
  • Real face vs. animated face

33
Bettys Brain
  • User interaction
  • How well student was able to teach Betty
  • Ease of use of concept map structure
  • Students understanding of Bettys answers (
    visualizations)
  • Effect of Betty emotions on interaction
  • Methods
  • Currently semi-structured exit interviews

34
Bettys Brain
  • User experience
  • Characteristics
  • Motivation to learn through interaction with
    Betty
  • Measure
  • Exit interviews

35
Bettys Brain
  • Effectiveness
  • Of the learning by teaching process
  • Quality of the concept maps
  • Measures
  • Raters of quality of maps
  • Study logs to assess the process
  • Far-transfer evaluation

36
VIP
  • Evaluation steps
  • Component-based evaluation, e.g., speech
  • Integrated evaluation (with colleagues)

37
VIP
  • User interaction
  • Characteristics
  • Dialog success (acceptance of the broadcast)
  • Speech act success
  • Degree of engagement in conversation (not just
    query)
  • Variable
  • Comparison with non-ECA-based spoken dialog (not
    explicitly done yet)
  • Measure
  • Acceptance or not of the broadcast
  • Speech act recognition error

38
Example Evaluating Believability
  • Speech and gesture should be perceived as
    believable, lifelike
  • Relevant insofar as it contributes to
    system-level characteristics
  • Effectiveness in application
  • Felicity in interaction
  • User experience of rapport, empathy
  • Ovoidance of infelicities that cause introduction
    of disbelief
  • Tends to fall out of spiral evaluation approach
  • Depends on
  • User population
  • Seasoned game players (expect good graphics)
  • Kids (similarity to cartoons, etc.)
  • Domain experts (physicians)
  • Task transfer requirements (but not uniformly)
  • ECA life cycle (avoiding repetitiveness)
  • Background context (e.g., isolated agent vs.
    multi-agent scene)
  • Cognitive load, focus of attention
  • Situational expectations (story), role (mentor,
    proxies)
  • Style of drawing
  • Display size

39
Example Evaluating Social Relationship
  • Relevant characteristics of reciprocal social
    relationship
  • Social distance power
  • Empathy
  • Types of relationships
  • User-agent
  • Agent-agent
  • Relevance
  • User-agent -gt user experience, interaction
  • Agent-agent -gt user experience
  • Can be evaluated as part of experience
    interaction evaluation
  • Characteristics of dialog history, agent logging
    its own model of the social relationship
  • Depends on
  • Presentation of social role
  • User population (role vis-à-vis agents role)
  • ECA life cycle

40
Where are the Micro-Evaluations?
  • During design, in component testing
  • In analyzing causes of system-level performance
  • But there is a big gap between micro- and
    macro-evaluation
  • Difficult to use micro-level analysis to predict
    system-level performance
  • Situated evaluations more predictive of system
    characteristics
  • E.g., using accretive development
  • Iterative development precludes heavyweight
    analyses
  • Difficult to infer clean micro-level results from
    situated analyses
  • Should we try to bridge this gap? How?
Write a Comment
User Comments (0)
About PowerShow.com