USNA experiments FebMar - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

USNA experiments FebMar

Description:

Damage Control Symbology. Highlight Action of Interest. Highlight Generalizations. Symbology and Coaching. Contextualize discussion. Shorten/eliminate spoken ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 45
Provided by: godelSt
Category:

less

Transcript and Presenter's Notes

Title: USNA experiments FebMar


1
USNA experiments (Feb/Mar)
  • Approx. 210 USNA midshipmen
  • 12 sections of about 18 cadets
  • 6 GB of Data 15 GB video arrived back at
    Stanford March 7!
  • Simulator databases
  • tutor logfiles
  • Speech wavefiles to simulator and tutor
  • Webcam footage (video only) of some subjects
  • User questionnaires background, satisfaction,
    comments
  • Pre Post Tests

2
USNA Location
  • 1 hour 50 minute lab
  • Time pressure
  • Computer classroom 32 dual P4 2.4GHz w/ 512MB,
    17" flat screens, nVidia GEForce FX 5200
  • Subjects seated very close together

3
Noise-Canceling Headphones
  • Protravelgear.com
  • Every 3 db in ANR approx double the noise
    reduction

4
ANR for speaking and listening
  • Speech Andrea ANC-700 microphones
  • Listening Protravelgear PlaneQuiet headsets
  • Subjects wore both at once
  • Impressionistically, Stanford students in dry run
    didnt mind this, but USNA students seemed to be
    bothered by the headsets

5
Graphic interaction
  • What tutor and student can interact with
  • Compartments
  • Bulkheads
  • Regions
  • Labels
  • Compartment groups
  • Methods
  • Single-click
  • Click and drag
  • Circling

6
Active vs. Passive Tutoring
7
Tutoring vs. No Tutoring
  • No tutoring at all (USNA, winter 2005)
  • Students played computer solitaire between
    simulator sessions
  • By knowledge area (Stanford, spring 2004)
  • Boundaries, jurisdiction, sequencing
  • Students were tutored on a single knowledge area
    in each of 3 tutoring sessions

8
Distribution of USNA subjects per condition
9
Tutoring Beats No Tutoring
  • Consider all the active and passive tutoring
    conditions against the one no tutoring condition
  • being tutored is positively correlated with
    improvement in the test score.
  • R .241, with a significance (2-tailed) of .005,
    with N132. The correlation is significant at
    the 0.01 level (2-tailed).
  • Being tutored is also positively correlated
    with the proportion of the student's actions that
    were correct.
  • R .245, with a significance of .018, N92. The
    correlation is significant at the 0.05 level
    (2-tailed).

10
More Test Score Improvement for Tutored Subjects
11
Tutored Subjects improve more in Ordering Correct
Actions
12
Passive Tutoring Better for Sequencing Test Score
Impr.
  • Take out the no tutoring condition, and compare
    active vs. passive tutoring
  • active tutoring is negatively correlated with
    improvement in the test score.
  • R -.239, with a significance (2-tailed) of
    .009, with N119. The correlation is significant
    at the 0.01 level (2-tailed).
  • Break down the test score improvement by
    knowledge area
  • active tutoring is negatively correlated with
    the improvement in sequencing test score. R
    -.198, with a significance of .031, with N119.
    The correlation is signficant at the 0.05 level
    (2-tailed).
  • With respect to the boundaries/jurisdiction test
    score improvement, active tutoring is not
    significantly correlated.
  • If we look at all the performance areas, there is
    no significant correlation with active vs.
    passive tutoring in performance metrics.

13
Passive vs. Active Test Scores
14
Passive Tutoring Improves Sequencing Test Scores
More
15
Completion of Tutoring Content
  • Almost no active tutoring subjects completed a
    tutoring session
  • Almost all passive tutoring subjects completed
    the tutoring session
  • Does material covered make the difference?

16
Test Score by Time Taken
17
Mean Time for Pre Post Tests
18
Pre-Test Time Taken Score
19
No correlations with graphic input or output
  • both active and passive tutoring sessions
    together, graphic input vs. no graphic input
  • no correlations with test score improvement or
  • No correlations with improvement in any
    performance metrics
  • Same for graphic output, with both active and
    passive tutoring session together
  • only active tutoring, graphic input vs. those
    without it
  • no correlations with test score improvement
  • No correlations with improvement in any
    performance metrics.
  • Same for graphic output.

20
Test Scores vs. Performance Metrics
  • no correlation between improvement in test score
    on sequencing and improvement in performance of
    sequencing actions
  • (either correctness of action, or amount of
    pending expert actions performed).
  • no correlation between improvement in test score
    on jurisdiction/boundaries and improvement in
    performance of jurisdiction or boundary actions
  • (either correctness of action, or amount of
    pending expert actions performed).
  • no correlation between the post test scores and
    the simulator performance statistics by area.

21
Test Score Improvement by Condition
22
Performance Improvement by Area
23
Post-Test Score by Condition
24
Satisfaction with Tutor (preliminary)
25
Tutor Satisfaction
  • Highest rating on tutor being accurate mean
    5.07, std. dev. 1.67
  • Lowest rating on tutor understanding student
    mean 3.85, std. dev.1.81

26
Speech Synthesis
  • Festival (University of Edinburgh)
  • Concatenative synthesis
  • Can use many different standard voices
  • Allows customized limited domain voices (FestVox)
  • Allows markup for emphasis, phrase types
  • FestVox limited domain voices allow templates and
    slots
  • Lets first discuss the fire in the access trunk.
  • Lets discuss the flood in the cleaning gear
    locker.
  • Lets discuss the smoke in the fan room.
  • We recorded 1,764 utterances, many duplicates

27
FestVox limited domain voice
  • Issues of scale
  • speed of synthesis gets noticeably slow as the
    recorded voice corpus approaches 2000 utterances
  • may need to break into related voices
  • We cached many utterances in advance, so not a
    big problem
  • Could study benefits of limited domain voice vs.
    general diphone voice synthesis

28
Open-Ended Questions
  • Definition Questions
  • First of all can you tell me what primary
    boundaries are?
  • And now can you tell me what secondary boundaries
    are?
  • What did we define primary boundaries as earlier?
  • What did we define secondary boundaries as
    earlier?
  • Why Questions
  • Why is it necessary to investigate after the
    alarm sounds?
  • Why is it necessary to isolate when you have a
    report of fire?
  • Why is it necessary to set fire boundaries when
    you have a report of fire?
  • Why is it necessary to set flood boundaries when
    you have a report of flood?
  • Why is it necessary to set smoke boundaries when
    you have a report of smoke?

29
Some Sample Answers to Open-Ended Questions
  • System Why is it necessary to investigate after
    the alarm sounds?
  • Student to see if it's a false fire
  • System Why is it necessary to set smoke
    boundaries when you have a report of smoke?
  • Student prevent smoke from spreading to other
    compartments
  • System First of all can you tell me what primary
    boundaries are?
  • Student first two bulkheads around the crisis

30
Answers to Open-Ended questions
  • Did students produce longer, more complex
    answers?
  • Did their speech recognition experience earlier
    in the session influence their answer length and
    complexity?

31
Statistical Language Model
  • Benefits
  • Smaller process size
  • Quicker development cycle
  • More robust coverage
  • Issues
  • A class-based LM generalizes beyond specific
    corpus
  • The tagging grammar producing classes can obscure
    distinctions
  • repair two, three, five vs. compartment numbers

32
SLM data
  • 8585 utterances to SCoT tutor
  • 10880 utterances to DC-Train simulator
  • Data sources
  • San Diego Fleet Training Center
  • Spring 2004 experiment
  • Summer 2004 experiment
  • Fictional data for new tutoring areas

33
Gemini NL Grammar Particulars(slightly outdated)
  • 170 grammar rules
  • 755 one-word lexical entries
  • 1769 multi-word lexical entries
  • Including
  • 48 action verbs (some synonymous)
  • 33 lexical items for ship personnel
  • 391 compartment names
  • 1053 frame numbers (for compartments, bulkheads,
    valves, etc.) to reduce speech recognition
    errors
  • 13 synonyms for yes

34
NL Interpretation
  • First try Gemini, aiming for a logical form
    interpretation
  • If doesnt parse, Nuance slots as a backoff
    robustness strategy
  • Being idiosyncratic isnt so bad for the backoff

35
Nuance NL interpretation rules
Forward_and_Aft ((forward and aft) (front
and back) (before and after) forward
aft (of the crisis fire
smoke flood compartment
casualty) )
ltposition-adjective forward-and-aftgt Either_Sid
e (on either side of the crisis fire smoke
flood compartment casualty )
ltposition-adjective either-sidegt
36
Nuance slots vs. Gemini LFs
  • Nuance slots
  • allow for quick development cycle, because close
    to domain representation used by dialogue manager
  • Easy to include very particular, idiosyncratic
    patterns
  • Only one instance of a slot is filled per
    sentence must define multiple slots for
    boundaries
  • Allow some structure in NL representation

37
Damage Control Symbology
38
Highlight Action of Interest
39
Highlight Generalizations
40
Symbology and Coaching
  • Contextualize discussion
  • Shorten/eliminate spoken descriptions
  • Focus attention on relevant parts
  • Succinct comparison with expert actions
  • Hint about actions

41
Indicators of Uncertainty
  • Response latency
  • Pauses
  • Um/uh
  • Hedges I guess, I think
  • Do subjects produce them?
  • Can we detect them?
  • What can we do with them?

42
Pauses within Utterances
  • Transcribers marked noticeable pauses as pause
  • 35 instances in about 100 tutoring sessions
  • Used Sphinx2 to perform forced-alignments of
    transcription with wavefile
  • Appears to line up reasonably well with pauses
    marked by transcribers
  • We have not yet calculated mean pause length or
    analyzed variation between speakers

43
Sphinx Forced Alignment
44
Uncertainty indicators
  • Response Latency
  • We record milliseconds between system utterance
    end and start of user speech
  • Um/uh
  • They do occur in speech to system
  • Detecting them possible
  • depends on accuracy of speech recognition.
Write a Comment
User Comments (0)
About PowerShow.com