Title: a principled approach for rejection threshold optimization
1a principled approach for rejection threshold
optimization
- Dan Bohus www.cs.cmu.edu/dbohus
- Alexander I. Rudnicky www.cs.cmu.edu/air
- Computer Science Department
- Carnegie Mellon University
- Pittsburgh, PA, 15217
2understanding errors and rejection
- systems often misunderstand
- use confidence scores
- common design pattern
- compare input confidence against a threshold
- reject utterance if confidence is too low
- may lead to false rejections
3rejection tradeoff
- misunderstandings vs. false rejections
false rejections
misunderstandings
4rejection tradeoff
- misunderstandings vs. false rejections
- correctly vs. incorrectly transferred concepts
correctly transferred concepts / turn
incorrectly transferred
5question
- given this trade-off, how can we optimize the
rejection threshold in a principled fashion?
6outline
- current solutions
- proposed approach
- data
- results
- conclusion
7current solutions
- follow ASR manual Nuance documentation
- acknowledge the tradeoff postulate costs
- misunderstandings are X times more costly than
false rejections Raymond et al 2004 Kawahara
et al, 2000 Cuayahuitl et al, 2002 - costs are likely to differ
- across domains / systems
- across dialog states within a system
8proposed approach
- derive costs in a principled fashion
2. choose a dialog performance metric task
completion (binary, kappa) TC 3. build a
regression model logit(TC) ? C0 CCTCCTC
CITCITC 4. optimize threshold to maximize
performance th argmax (CCTCCTC CITCITC)
9state-specific costs
- costs are different in different dialog states
- CTC and ITC on a per-state basis
- logit(TC) ? C0
- CCTCstate1CTCstate1 CITCstate1ITCstate1
- CCTCstate2CTCstate2 CITCstate2ITCstate2
- CCTCstate3CTCstate3 CITCstate3ITCstate3
-
- optimize separate threshold for each state
- thstate_x argmax (CCTCstate_xCTCstate_x
CITCstate_xITCstate_x)
10outline
- current solutions
- proposed approach
- data
- results
- conclusion
11data
- collected using RoomLine
- phone-based, mixed-initiative spoken dialog
system - conference room reservations
- sphinx-2
- utterance-level confidence annotator 0-1
- 46 participants (first-time users)
- 10 scenario-driven interactions
- corpus
- 449 dialog sessions
- 8278 user turns
- manually labeled decoded concept correctness
12roomline states
- 71 dialog states total
- clustered into 3 classes
- open-request
- How may I help you?
- request(bool)
- Would you like a reservation for this room?
- Would you like a room with a projector?
- request(non-bool)
- For what time would you like to reserve the room?
13results task success model
- model predicting binary task success
14results threshold optimization
open-request
1
0.5
0
0
1
0.5
0.25
0.75
15results threshold optimization
- utility profiles are different across the three
states - task duration models lead to similar results
16conclusion
- principled method for optimizing rejection
threshold - determine costs for various types of
understanding errors - data-driven approach
- can derive state-specific costs
- bridge mismatches between off-the-shelf
confidence annotators and domain
17thank you
18fit for task success model
19expected changes in task success
Remains to be seen
20task duration model
21Model 2 Resulting fit and coefficients
R2 0.56
intro data collection rejection threshold