Title: Microevaluation
1Micro-evaluation
Dagstuhl working group 3
Final preliminary report
- Jonas Beskow
- Justine Cassell
- Dirk Heijlen
Han Noot Patrick Olivier Danielle Pele
Emiel Krahmer Andrew Marriott Dominic Massaro
EECA, Dagstuhl, March 15-19, 2004
2Plan
- What micro-evaluation is.
- The state of the art.
- Discussion.
3What is micro-evaluation?
- A method to test whether the designers model as
implemented in an ECA is understood by subjects
in the intended way. - One important motivation Added value of ECAs in
applications cannot be proven without being sure
that the underlying models are correct.
4Micro-evaluation paradigm
5Topics that were discussed
- Audio-visual speech.
- Non-verbal behaviour.
- Natural language content.
- Dialogue control and interaction.
- Personality, emotion, mood culture.
6Plan
- What micro-evaluation is.
- The state of the art.
- Discussion.
7About models
- No lack of models (if you know where to look).
- Phonetics
- Conversational analysis
- Cognitive science
- Social psychology, etc.
- Main complication
- Many of these models are incomplete and typically
lack ECA relevant information.
8So, we need to collect data
- Standard research methodology applies.
- Social sciences
- Any research methodology textbook.
- Facial analysis
- Ekman et al. (1972, 1982), Wagner (1993).
- Talking heads/AV speech
- Massaro, Perceiving Talking Faces (ch. 13)
- ECAs
- Ruttkay and Pelachaud (2004)
9Elicitation studies
- Record people.
- Paraphrasing Ekman et al. (1972/1982)
- Elicitation circumstance must be representative.
- There must be an independent criterion.
- Data sampling must be representative.
- Issues
- One speaker vs group of speakers.
- Naturalistic vs experimental.
- Ecological vs. functional validity.
10Data validation studies
- Annotation.
- Multiple judges
- Good coding scheme
- Kappa statistics
- Coverage of the model.
- Training vs testing
- Accuracy, precision, recall, F,
- Perception studies see judgement studies.
11Judgement studies
- Implement model or data in ECA and test with
human subjects. - If possible, compare to no ECA baseline and
human top-line / gold standard . - Task and Data Analysis
- Choose appropriate tasks / scenarios
- Choose behavioral measures / metrics
- Choose appropriate analyses
- Formative Evaluation
- Apply to next generation or different ECA
- Repeat evaluation paradigm
12But
- The devil is in the details.
- It may be difficult to find the right task or
scenario to test your model. - Never ask directly Does my ECA have property
x? - Look for specific paradigm which forces
subjects to make functional use of the ECAs
behavior. - This is the creative part which makes micro-
evaluation fun!
13Case-study Cassell (in prep)
- How to show that gestures actually support a
users understanding of the information presented
by an ECA? - Let ECA tell a story about houses with/without
gestures. - Forced choice selection paradigm.
- Which house was described?
14Plan
- What micro-evaluation is.
- The state of the art.
- Discussion.
15One relation to other working groups
- Group 1
- Micro-evaluation methods give an
operationalization of the collection and use of
corpora for ECA design. - Group 2
- Micro-evaluation methods apply to realism and
hyperrealism alike and offer a mechanism to
verify empirical issues there.
16- Group 4
- Macro-evaluation involves micro-evaluation
methodology. - Micro-evaluation should precede macro-evaluation.
- Group 5
- Criteria/methods for micro-evaluation should be
used for ECA contest. - Good methodology helps for sharing resources
(i.e., experimental findings).
17Two model status
- Much more problems discussing micro-evaluation of
emotion and personality than with audio-visual
speech and non-verbal communication. - Why?
- Conscious versus unconscious?
- Displaying versus feeling / being?
- More real, underlying work is needed to fill
in the more complicated models.
18Three Micro- vs. macro-evaluation
- How to make sure micro-evaluation results stay
valid in macro setting? - Introduce cognitive load as a factor in the
micro-evaluation methods. - E.g., noise in audio-visual speech.
- E.g., using a secondary task.
19Four How to get this all done?
- To do all this work life-time research project
to fill many PhDs. - Try to engage researchers from outside the ECA
community by raising different kinds of
questions. - Try to initiate more multi-disciplinary research.
- The experimental results are also relevant beyond
the ECA community (e.g., better understanding of
human cognition) .
20Five Where do we go from here?
- We will be doing better micro-evaluation studies
from now on - Try to compile our notes into a coherent and
readable whole with references, methodological
best practices, etc.