Cost of Misunderstandings - PowerPoint PPT Presentation

About This Presentation
Title:

Cost of Misunderstandings

Description:

Data-driven approach to quantitatively assess the costs of various ... stays the same across a large range of the operating characteristic of the classifier. ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 27
Provided by: danb7
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Cost of Misunderstandings


1
Cost of Misunderstandings
  • Modeling the Cost of Misunderstanding Errors in
    the CMU Communicator Dialog System
  • Presented by Dan Bohus (dbohus_at_cs.cmu.edu)
  • Work by Dan Bohus, Alex Rudnicky
  • Carnegie Mellon University, 2001

2
Outline
  • Quick overview of previous utterance-level
    confidence annotation work.
  • Modeling the cost of misunderstandings in spoken
    dialog systems.
  • Experiments results.
  • Further analysis.
  • Summary, further work, conclusion

3
Utterance-Level Confidence Annotation Overview
  • Confidence annotation data-driven
    classification
  • Corpus 2 months, 131 dialogs, 4550 utterances.
  • Features 12 features from decoder, parsing,
    dialog management levels.
  • Classifiers Decision Tree, ANN, BayesNet,
    AdaBoost, NaiveBayes, SVM Logistic Regression
    model (later on).

4
Confidence annotator performance
  • Baseline error rate 32
  • Garble baseline 25
  • Classifiers performance 16
  • Differences between classifiers are statistically
    insignificant except for Naïve Bayes
  • On a soft-metric, logistic regression model
    clearly outperformed the others
  • But is this the right way to evaluate performance?

5
Judging Performance
  • Classification Error Rate (FPFN).
  • Assumes implicitly that FP and FN errors have
    same cost
  • But cost of misunderstanding in dialog systems is
    presumably different for FPs and FNs.
  • Build an error function which take into account
    these costs, and optimize for that.
  • Cost also depends on
  • domain/system not a problem
  • dialog state

6
Problem Formulation
  • (1) Develop a cost model which allows us to
    quantitatively assess the costs of FP and FN
    errors.
  • (2) Use the costs to pick the optimal tradeoff
    point on the classifier ROC.

7
The Cost Model
  • Model the impact of the FPs and FNs on the system
    performance
  • Identify a suitable performance metric P
  • Build a statistical regression model at the
    dialog session level
  • P f(FPs, FNs)
  • P k CostFPFP CostFNFN (Linear Regr)
  • Then we can plot f, and implicitly optimize for P

8
Measuring Performance
  • User Satisfaction (i.e. 5-point scale)
  • Hard to get
  • Very subjective hard to make it consistent
    across users
  • Concept transfer efficiency
  • CTC correctly transferred concepts per turn
  • ITC incorrectly transferred concepts per turn
  • Completion

9
Detour The Dataset
  • 134 dialogs (2561 utterances), collected using 4
    scenarios
  • Satisfaction scores only for 35 dialogs
  • Corpus manually labeled at the concept and level
  • 4 labels OK / RBAD / PBAD / OOD
  • Aggregate utterance labels generated
  • Confidence annotator decisions logged
  • Computed counts of FPs, FNs, CTCs, ITCs for each
    session

10
Example
  • U I want to fly from Pittsburgh to Boston
  • S I want to fly from Pittsburgh to Austin
  • C I_want/OK Depart_Loc/OK
    Arrive_Loc/RBAD
  • Only 2 relevantly expressed concepts
  • If Accept CTC 1, ITC 1
  • If Reject CTC 0, ITC 0

11
Targeting Efficiency Model 1
  • 3 Successively refined models
  • CTC FP FN TN k
  • CTC - correctly transferred concepts / turn
  • TN true negatives

12
Targeting Efficiency Model 2
  • CTC - ITC (REC ) FP FN TN k
  • ITC - incorrectly transferred concepts / turn
  • REC relevantly expressed concepts

13
Targeting Efficiency Model 3
  • CTC-ITC RECFPCFPNCFNTNk
  • 2 types of FPs
  • With concepts - FPC
  • Without concepts - FPNC

14
Model 3 - Results
  • CTC-ITC RECFPCFPNCFNTNk

15
Other models
  • Completion (binary)
  • Logistic regression model
  • Estimated model does not indicate a good fit
  • User satisfaction (5-point scale)
  • Based on only 35 dialogs
  • R2 0.61 (similar to literature Walker et al)
  • Explanation subjectivity of metric limited
    dataset

16
Problem Formulation
  • (1) Develop a cost model which allows us to
    quantitatively assess the costs of FP and FN
    errors.
  • (2) Use the costs to pick the optimal tradeoff
    point on the classifier ROC.

17
Tuning the Confidence Annotator
  • Using Model 3
  • CTC-ITC RECFPNCFPCFNTNk
  • Drop k REC, plug in the values
  • Cost 0.48FPNC2.12FPC1.33FN0.56TN
  • Minimize Cost instead of Classification Error
    Rate (FPFN), and well implicitly maximize
    concept transfer efficiency.

18
Operating Characteristic
19
Further Analysis
  • Is CTC-ITC really modeling dialog performance ?
  • Mean 0.71, Std.Dev 0.28
  • Mean for completed dialogs 0.82
  • Mean for uncompleted dialogs 0.57
  • Difference between means significant at very high
    level of confidence
  • P-value 7.2310-9 (in t-test)
  • So, looks like CTC-ITC is okay, right ?

20
Further Analysis (contd)
  • Can we reliably extrapolate to other areas of the
    operating characteristic ?

21
Further Analysis (contd)
  • Can we reliably extrapolate to other areas of the
    operating characteristic ?
  • Yes, look at the distribution of the FP and FN
    ratios across dialogs.

22
Further Analysis (contd)
  • Impact of baseline error rate ?
  • Compared models constructed based on high and low
    error rates
  • For low error rate curve becomes monotonically
    increasing
  • This clearly indicates that trust everything /
    have no confidence is the way to go in this
    setting

23
Our explanation so far
  • Ability to easily overwrite incorrectly captured
    information in the CMU Communicator
  • Relatively low error rates
  • Likelihood of repeated misrecognition is low

24
Conclusion
  • Data-driven approach to quantitatively assess the
    costs of various types of misunderstandings.
  • Models based on efficiency fit data well
    obtained costs confirm intuition.
  • For CMU Communicator, model predicts that total
    cost stays the same across a large range of the
    operating characteristic of the classifier.

25
Further Experiments
  • But, of course, we can verify predictions
    experimentally
  • Collect new data with the system running with a
    very low threshold.
  • 55 dialogs collected so far.
  • Thanks to those who have participated in these
    experiments.
  • Help if you have the time to the others
  • www.cs.cmu.edu/dbohus/scenarios.htm
  • Re-estimate models, verify predictions

26
Confusion Matrix
  • FP False acceptance
  • FN False detection/rejection
  • Fallout FP/(FPTN) FP/NBAD
  • CDR 1-Fallout 1-(FP/NBAD)
Write a Comment
User Comments (0)
About PowerShow.com