Title: EyeBased Interaction in Graphical Systems: Theory
1Eye-Based Interaction in Graphical Systems
Theory Practice
- Andrew Duchowski
- Computer Science
- Clemson University
- andrewd_at_cs.clemson.edu
Roel Vertegaal Computing and Information
Science Queens University roel_at_acm.org
2Overview
- Basics of visual attention, human vision, eye
movements and signal analysis - Eye tracking hardware specifications
- Video eye tracker integration
- Principal eye tracking system modes
- interactive or diagnostic
- Example systems and potential applications
3Course Schedule Part I
- Introduction to the Human Visual System (HVS)
- Neurological substrate of the HVS
- Physiological and functional descriptions
- Visual Perception
- Spatial, temporal, color vision
- Eye Movements
- Saccades, fixations, pursuits, nystagmus
4Course Schedule Part II
- Part II Eye tracking systems
- The eye tracker
- early developments
- video-based eye trackers
- system use
- Integration issues
- application design
- calibration
- data collection / analysis
5Course Schedule Part III / Demo
- Part III Potential applications
- VR Human Factors
- Collaborative systems Advertising
- Psychophysics Displays
- Demonstration GAZE Groupware System
6Eye-Based Interaction in Graphical Systems
Theory Practice
- Part I
- Introduction to the Human Visual System
7A Visual Attention
When the things are apprehended by the senses,
the number of them that can be attended to at
once is small, Pluribus intentus, minor est ad
singula sensus' William James
- Latin translation Many filtered into few for
perception - Visual scene inspection is performed minutatim
(piecemeal), not in toto
8A.1 Visual Attentionchronological review
- Qualitative historical background a dichotomous
theory of attentionthe what and where of
(visual) attention - Von Helmholtz (ca. 1900) mainly concerned with
eye movements to spatial locations, the where,
I.e., attention as overt mechanism (eye
movements) - James (ca. 1900) defined attention mainly in
terms of the what, i.e., attention as a more
internally covert mechanism
9A.1 Visual Attentionchronological review
(contd)
- Broadbent (ca. 1950) defined attention as
selective filter from auditory experiments
generally agreeing with Von Helmholtzs where - Deutsch and Deutsch (ca. 1960) rejected
selective filter in favor of importance
weightings generally corresponding to James
what - Treisman (ca. 1960) proposed unified theory of
attentionattenuation filter (the where)
followed by dictionary units (the what)
10A.1 Visual Attentionchronological review
(contd)
- Main debate at this point is attention parallel
(the where) or serial (the what) in nature? - Gestalt view recognition is a wholistic process
(e.g., Kanizsa figure) - Theories advanced through early recordings of eye
movements
11A.1 Visual Attentionchronological review
(contd)
- Yarbus (ca. 1967) demonstrated sequential, but
variable, viewing patterns over particular image
regions (akin to the what) - Noton and Stark (ca. 1970) showed that subjects
tend to fixate identifiable regions of interest,
containing informative details coined term
scanpath describing eye movement patterns - Scanpaths helped cast doubt on the Gestalt
hypothesis
12A.1 Visual Attentionchronological review
(contd)
- Fig.2 Yarbus early scanpath recording
- trace 1 examine at will
- trace 2 estimate wealth
- trace 3 estimate ages
- trace 4 guess previous activity
- trace 5 remember clothing
- trace 6 remember position
- trace 7 time since last visit
13A.1 Visual Attentionchronological review
(contd)
- Posner (ca. 1980) proposed attentional
spotlight, an overt mechanism independent from
eye movements (akin to the where) - Treisman (ca. 1986) once again unified what
and where dichotomy by proposing the Feature
Integration Theory (FIT), describing attention as
a glue which integrates features at particular
locations to allow wholistic perception
14A.1 Visual Attentionchronological review
(contd)
- Summary the what and where dichotomy
provides an intuitive sense of attentional,
foveo-peripheral visual mechanism - Caution the what/where account is probably
overly simplistic and is but one theory of visual
attention
15B Neurological Substrate of the Human Visual
System (HVS)
- Any theory of visual attention must address the
fundamental properties of early visual mechanisms - Examination of the neurological substrate
provides evidence of limited information capacity
of the visual systema physiological reason for
an attentional mechanism
16B.1 The Eye
- Fig. 3 The eyethe worlds worst camera
- suffers from numerous optical imperfections...
- ...endowed with several compensatory mechanisms
17B.1 The Eye (contd)
18B.1 The Eye (contd)
- Imperfections
- spherical abberations
- chromatic abberations
- curvature of field
- Compensations
- irisacts as a stop
- focal lenssharp focus
- curved retinamatches curvature of field
19B.2 The Retina
- Retinal photoreceptors constitute first stage of
visual perception - Photoreceptors ? transducers converting light
energy to electrical impulses (neural signals) - Photoreceptors are functionally classified into
two types rods and cones
20B.2 The Retinarods and cones
- Rods sensitive to dim and achromatic light
(night vision) - Cones respond to brighter, chromatic light (day
vision) - Retinal construction 120M rods, 7M cones
arranged concentrically
21B.2 The Retinacellular makeup
- The retina is composed of 3 main layers of
different cell types (a 3-layer sandwich) - Surprising fact the retina is inverted
photoreceptors are found in the bottom layer
(furthest away from incoming light) - Connection bundles between layers are called
plexiform or synaptic layers
22B.2 The Retinacellular makeup (contd)
- Fig.5 The retinocellular layers (w.r.t. incoming
light) - ganglion layer
- inner synaptic plexiform layer
- inner nuclear layer
- outer synaptic plexiform layer
- outer layer
23B.2 The Retinacellular makeup (contd)
- Fig.5 (contd) The neuron
- all retinal cells are types of neurons
- certain neurons mimic a digital gate, firing
when activation level exceeds a threshold - rods and cones are specific types of dendrites
24B.2 The Retinaretinogeniculate organization
(from outside in, w.r.t. cortex)
- Outer layer rods and cones
- Inner layer horizontal cells, laterally
connected to photoreceptors - Ganglion layer ganglion cells, connected
(indirectly) to horizontal cells, project via the
myelinated pathways, to the Lateral Geniculate
Nuclei (LGN) in the cortex
25B.2 The Retinareceptive fields
- Receptive fields collections of interconnected
cells within the inner and ganglion layers - Field organization determines impulse signature
of cells, based on cell types - Cells may depolarize due to light increments ()
or decrements (-)
26B.2 The Retinareceptive fields (contd)
- Fig.6 Receptive fields
- signal profile resembles a Mexican hat
- receptive field sizes vary concentrically
- color-opposing fields also exist
27B.3 Visual Pathways
- Retinal ganglion cells project to the LGN along
two major pathways, distinguished by
morphological cell types ? and ? cells - ? cells project to the magnocellular (M-) layers
- ? cells project to the parvocellular (P-) layers
- Ganglion cells are functionally classified by
three types X, Y, and W cells
28B.3 Visual Pathwaysfunctional response of
ganglion cells
- X cells sustained stimulus, location, and fine
detail - nervate along both M- and P- projections
- Y cells transient stimulus, coarse features, and
motion - nervate along only the M-projection
- W cells coarse features and motion
- project to the Superior Colliculus (SC)
29B.3 Visual Pathways (contd)
- Fig.7 Optic tract and radiations (visual
pathways) - The LGN is of particular clinical importance
- M- and P-cellular projections are clearly visible
under microscope - Axons from M- and P-layers of the LGN terminate
in area V1
30B.3 Visual Pathways (contd)
- Table.1 Functional characteristics of ganglionic
projections
31B.4 The Occipital Cortex and Beyond
- Fig.8 The brain and visual pathways
- the cerebral cortex is composed of numerous
regions classified by their function
32B.4 The Occipital Cortex and Beyond (contd)
- M- and P- pathways terminate in distinct layers
of cortical area V1 - Cortical cells (unlike center-surround ganglion
receptive fields) respond to orientation-specific
stimulus - Pathways emanating from V1 joining multiple
cortical areas involved in vision are called
streams
33B.4 The Occipital Cortex and Beyonddirectional
selectivity
- Cortical Directional Selectivity (CDS) of cells
in V1 contributes to motion perception and
control of eye movements - CDS cells establish a motion pathway from V1
projecting to areas V2 and MT (V5) - In contrast, Retinal Directional Selectivity
(RDS) may not contribute to motion perception,
but is involved in eye movements
34B.4 The Occipital Cortex and Beyondcortical
cells
- Two consequences of visual systems
motion-sensitive, single-cell organization - due to motion sensitivity, eye movements are
never perfectly still (instead tiny jitter is
observed, termed microsaccade)if eyes were
stabilized, image would fade! - due to single-cell organization, representation
of natural images is quite abstract there is no
retinal buffer
35B.4 The Occipital Cortex and Beyond2
attentional streams
- Dorsal stream
- V1, V2, MT (V5), MST, Posterior Parietal Cortex
- sensorimotor (motion, location) processing
- the attentional where?
- Ventral (temporal) stream
- V1, V2, V4, Inferotemporal Cortex
- cognitive processing
- the attentional what?
36B.4 The Occipital Cortex and Beyond3
attentional regions
- Posterior Parietal Cortex (dorsal stream)
- disengages attention
- Superior Colliculus (midbrain)
- relocates attention
- Pulvinar (thalamus colocated with LGN)
- engages, or enhances, attention
37C Visual Perception (with emphasis on
foveo-peripheral distinction)
- Measurable performance parameters may often (but
not always!) fall within ranges predicted by
known limitations of the neurological substrate - Example visual acuity may be estimated by
knowledge of density and distribution of the
retinal photoreceptors - In general, performance parameters are obtained
empirically
38C.1 Spatial Vision
- Main parameters sought visual acuity, contrast
sensitivity - Dimensions of retinal features are measured in
terms of projected scene onto retina in units of
degrees visual angle, - where S is the object size and D is distance
39C.1 Spatial Visionvisual angle
40C.1 Spatial Visioncommon visual angles
- Table 2 Common visual angles
41C.1 Spatial Visionretinal regions
- Visual field 180 horiz. ? 130 vert.
- Fovea Centralis (foveola) highest acuity
- 1.3 visual angle 25,000 cones
- Fovea high acuity (at 5, acuity drops to 50)
- 5 visual angle 100,000 cones
- Macula within useful acuity region (to about
30) - 16.7 visual angle 650,000 cones
- Hardly any rods in the foveal region
42C.1 Spatial Visionvisual angle and receptor
distribution
- Fig.10 Retinotopic receptor distribution
43C.1 Spatial Visionvisual acuity
- Fig.11 Visual acuity at eccentricities and light
levels - at photopic (day) light levels, acuity is fairly
constant within central 2 - acuity drops of linearly to 5 drops sharply
(exp.) beyond - at scotopic (night) light levels, acuity is poor
at all eccentricities
44C.1 Spatial Visionmeasuring visual acuity
- Acuity roughly corresponds to foveal receptor
distribution in the fovea, but not necessarily in
the periphery - Due to various contributing factors (synaptic
organization and later-stage neural elements),
effective relative visual acuity is generally
measured by psychophysical experimentation
45C.2 Temporal Vision
- Visual response to motion is characterized by two
distinct facts persistence of vision (POV) and
the phi phenomenon - POV essentially describes human temporal
sampling rate - Phi describes threshold above which humans
detect apparent movement - Both facts exploited in media to elicit motion
perception
46C.2 Temporal Visionpersistence of vision
- Fig.12 Critical Fusion Frequency
- stimulus flashing at about 50-60Hz appears steady
- CFF explains why flicker is not seen when viewing
sequence of still images - cinema 24 fps ? 3 72Hz due to 3-bladed shutter
- TV 60 fields/sec, interlaced
47C.2 Temporal Visionphi phenomenon
- Phi phenomenon explains why motion is perceived
in cinema, TV, graphics - Besides necessary flicker rate (60Hz), illusion
of apparent, or stroboscopic, motion must be
maintained - Similar to old-fashioned neon signs with
stationary bulbs - Minimum rate 16 frames per second
48C.2 Temporal Visionperipheral motion perception
- Motion perception is not homogeneous across
visual field - Sensitivity to target motion decreases with
retinal eccentricity for slow motion... - higher rate of target motion (e.g., spinning
disk) is needed to match apparent velocity in
fovea - but, motion is more salient in periphery than in
fovea (easier to detect moving targets than
stationary ones)
49C.2 Temporal Visionperipheral sensitivity to
direction of motion
- Fig.13 Threshold isograms for peripheral rotary
movement - periphery is twice as sensitive to
horizontal-axis movement as to vertical-axis
movement - (numbers in diagram are rates of pointer movement
in rev./min.)
50C.3 Color Visioncone types
- foveal color vision is facilitated by three types
of cone photorecptors - a good deal is known about foveal color vision,
relatively little is known about peripheral color
vision - of the 7,000,000 cones, most are packed tightly
into the central 30 foveal region
- Fig.14 Spectral sensitivity curves of cone
photoreceptors
51C.3 Color Visionperipheral color perception
fields
- blue and yellow fields are larger than red and
green fields - most sensitive to blue, up to 83 red up to 76
green up to 74 - chromatic fields do not have definite borders,
sensitivity gradually and irregularly drops off
over 15-30 range
- Fig.15 Visual fields for monocular color vision
(right eye)
52C.4 Implications for Design of Attentional
Displays
- Need to consider distinct characteristics of
foveal and peripheral vision, in particular - spatial resolution
- temporal resolution
- luminance / chrominance
- Furthermore, gaze-contingent systems must match
dynamics of human eye movement
53D Taxonomy and Models of Eye Movements
- Eye movements are mainly used to reposition the
fovea - Five main classes of eye movements
- saccadic
- smooth pursuit
- vergence
- vestibular
- physiological nystagmus
- (fixations)
- Other types of movements are non-positional
(adaptation, accommodation)
54D.1 Extra-Ocular Muscles
- Fig.16 Extrinsic muscles of the eyes
- in general, eyes move within 6 degrees of freedom
(6 muscles)
55D.1 Oculomotor Plant
- Fig.17 Oculomotor system
- eye movement signals emanate from three main
distinct regions - occipital cortex (areas 17, 18, 19, 22)
- superior colliculus (SC)
- semicircular canals (SCC)
56D.1 Oculomotor Plant (contd)
- Two pertinent observations
- eye movement system is, to a large extent, a
feedback circuit - controlling cortical regions can be functionally
characterized as - voluntary (occipital cortexareas 17, 18, 19, 22)
- involuntary (superior colliculus, SC)
- reflexive (semicircular canals, SCC)
57D.2 Saccades
- Rapid eye movements used to reposition fovea
- Voluntary and reflexive
- Range in duration from 10ms - 100ms
- Effectively blind during transition
- Deemed ballistic (pre-programmed) and stereotyped
(reproducible)
58D.2 Saccadesmodeling
- Fig.18 Linear moving average filter model
- st input (pulse), xt output (step), gk
filter coefficients - e.g., Haar filter 1,-1
59D.3 Smooth Pursuits
- Involved when visually tracking a moving target
- Depending on range of target motion, eyes are
capable of matching target velocity - Pursuit movements are an example of a control
system with built-in negative feedback
60D.3 Smooth Pursuitsmodeling
- Fig.19 Linear, time-invariant filter model
- st target position, xt (desired) eye
position, h filter - retinal receptors give additive velocity error
61D.4 Nystagmus
- Conjugate eye movements characterized by
sawtooth-like time course pattern (pursuits
interspersed with saccades) - Two types (virtually indistinguishable)
- Optokinetic compensation for retinal movement of
target - Vestibular compensation for head movement
- May be possible to model with combination of
saccade/pursuit filters
62D.5 Fixations
- Possibly the most important type of eye movement
for attentional applications - 90 viewing time is devoted to fixations
- duration 150ms - 600ms
- Not technically eye movements in their own right,
rather characterized by miniature eye movements - tremor, drift, microsaccades
63D.6 Eye Movement Analysis
- Two significant observations
- only three types of eye movements are mainly
needed to gain insight into overt localization of
visual attention - fixations
- saccades
- smooth pursuits (to a lesser extent)
- all three signals may be approximated by linear,
time-invariant (LTI) filter systems
64D.6 Eye Movement Analysisassumptions
- Important point it is assumed observed eye
movements disclose evidence of overt visual
attention - it is possible to attend to objects covertly
(without moving eyes) - Linearity although practical, this assumption is
an operational oversimplification of neuronal
(non-linear) systems
65D.6 Eye Movement Analysisgoals
- goal of analysis is to locate regions where
signal average changes abruptly - fixation end, saccade start
- saccade end, fixation start
- two main approaches
- summation-based
- differentiation-based
- both approaches rely on empirical thresholds
Fig.20 Hypothetical eye movement signal
66D.6 Eye Movement Analysisdenoising
- Fig.21 Signal denoisingreduce noise due to
- eye instability (jitter), or worse, blinks
- removal possible based on device characteristics
(e.g., blink 0,0)
67D.6 Eye Movement Analysissummation based
- Dwell-time fixation detection depends on
- identification of a stationary signal (fixation),
and - size of time window specifying range of duration
(and hence temporal threshold) - Example position-variance method
- determine whether M of N points lie within a
certain distance D of the mean (?) of the signal - values M, N, and D are determined empirically
68D.6 Eye Movement Analysisdifferentiation based
- Velocity-based saccade/fixation detection
- calculated velocity (over signal window) is
compared to threshold - if velocity gt threshold then saccade, else
fixation - Example velocity detection method
- use short Finite Impulse Response (FIR) filters
to detect saccade (may be possible in real-time) - assuming symmetrical velocity profile, can extend
to velocity-based prediction
69D.6 Eye Movement Analysis (contd)
(a) position-variance
(b) velocity-detection
- Fig.22 Saccade/fixation detection
70D.6 Eye Movement Analysisexample
- Fig.23 FIR filter velocity-detection method
based on idealized saccade detection - 4 conditions on measured acceleration
- acc. gt thresh. A
- acc. gt thresh. B
- sign change
- duration thresh.
- thresholds derived from empirical values
71D.6 Eye Movement Analysisexample (contd)
- Amplitude thresholds A, B derived from expected
peak saccade velocities 600/s - Duration thresholds Tmin, Tmax derived from
expected saccade duration 120ms - 300ms
Fig.24 FIR filters for saccade detection
72Eye-Based Interaction in Graphical Systems
Theory Practice
- Part II
- Eye Tracking Systems
73E The Eye Tracker
- Two broad applications of eye movement monitoring
techniques - measuring position of eye relative to the head
- measuring orientation of eye in space, or the
point of regard (POR)used to identify fixated
elements in a visual scene - Arguably, the most widely used apparatus for
measuring the POR is the video-based corneal
reflection eye tracker
74E.1 Brief Survey of Eye Tracking Techniques
- Four broad categories of eye movement
methodologies - electro-oculography (EOG)
- scleral contact lens/search coil
- photo-oculography (POG) or video-oculography
(VOG) - video-based combined pupil and corneal reflection
75E.1 Brief Survey of Eye Tracking Techniques
(contd)
- First method for objective eye movement
measurements using corneal reflection reported in
1901 - Techniques using contact lenses to improve
accuracy developed in 1950s (invasive) - Remote (non-invasive) trackers rely on visible
features of the eye (e.g., pupil) - Fast image processing techniques have facilitated
real-time video-based systems
76E.1 Brief Survey of Eye Tracking TechniquesEOG
- most widely used method some 20 years ago (still
used today) - similar to electro-mechanical motion-capture
- measures eye movements relative to head position
- not generally suitable for POR measurement
(unless head is also tracked)
- Fig.25 EOG measurement
- relies on measurement of skins potential
differences, using electrodes placed around the
eye
77E.1 Brief Survey of Eye Tracking
TechniquesScleral Contact Lens/Search Coil
- Fig.26 Scleral coil
- search coil embedded in contact lens and
electromagnetic field frames
- possibly most precise
- similar to electromagnetic position/orientation
trackers used in motion-capture
78E.1 Brief Survey of Eye Tracking
TechniquesScleral Contact Lens/Search Coil
(contd)
- highly accurate, but limited measurement range
(5) - measures eye movements relative to head position
- not generally suitable for POR measurement
(unless head is also tracked)
- Fig.27 Example of scleral suction ring
insertion - most intrusive method
- insertion of lens requires care
- wearing of lens causes discomfort
79E.1 Brief Survey of Eye Tracking TechniquesPOG
/ VOG
- Fig.28 Example of POG / VOG methods and devices
- wide variety of techniques based on measurement
of distinguishable ocular features (similar to
optical mocap)
- pupil apparent shape
- limbus position of iris-sclera boundary
- infra-red corneal reflection of directed light
source
80E.1 Brief Survey of Eye Tracking
TechniquesVideo-Based Combined Pupil and Corneal
Reflection
- Fig.29 Table-mounted (remote) video-based eye
tracker - compute POR, usually in real-time
- utilize relatively cheap video cameras and image
processing hardware - can also allow limited head movement
81E.1 Brief Survey of Eye Tracking
TechniquesVideo-Based Combined Pupil and Corneal
Reflection (contd)
- Fig.30 Head-mounted video-based eye tracker
- essentially identical to table-mounted systems,
but with miniature optics
- most suitable for (graphical) interactive
systems, e.g., VR - binocular systems also available
82E.1 Brief Survey of Eye Tracking
TechniquesCorneal Reflection
- Two points of reference on the eye are needed to
separate eye movements from head movements, e.g., - pupil center
- corneal reflection of nearby, directed light
source (IR) - Positional difference between pupil center and
corneal reflection changes with eye rotation, but
remains relatively constant with minor head
movements
83E.1 Brief Survey of Eye Tracking
TechniquesCorneal Reflection (contd)
- Fig.31 Purkinje images
- corneal reflections are known as the Purkinje
images, or reflections - front surface of cornea
- rear surface of cornea
- front surface of lens
- rear surface of lens
- video-based trackers typically locate the first
Purkinje image
84E.1 Brief Survey of Eye Tracking
TechniquesCorneal Reflection (contd)
- Purkinje images appear as small white dots in
close proximity to the (dark) pupil - tracker calibration is achieved by measuring user
gazing at properly positioned grid points
(usually 5 or 9) - tracker interpolates POR on perpendicular screen
in front of user
Fig.32 Pupil and Purkinje images as seen by eye
trackers camera
85E.1 Brief Survey of Eye Tracking
TechniquesCorneal Reflection (contd)
- DPI trackers measure rotational and translational
eye movements - 1st and 4th reflections move together through
same distance upon eye translation, but separate
upon eye rotation - highly precise
- used to be expensive and difficult to set up
- Fig.33 Dual-Purkinje image (DPI) eye tracker
- so-called generation-V trackers measure the 1st
and 4th Purkinje images
86F Integration Issues and Requirements
- Integration of eye tracker into graphics system
chiefly depends on - delivery of proper graphics video stream to
tracker - subsequent reception of trackers 2D gaze data
- Gaze data (x- and y-coordinates) are typically
either stored by tracker or sent to graphics host
via serial cable - Discussion focuses on video-based eye tracker
87F Integration Issues and Requirements (contd)
- Video-based trackers main advantages over other
systems - relatively non-invasive
- fairly accurate (to about 1 over a 30 field of
view) - for the most part, not difficult to integrate
- Main limitation sampling frequency, typically
limited to video frame rate, 60Hz
88F Integration Issues and Requirements (contd)
- Fig.34 Virtual Reality Eye Tracking (VRET) Lab
at Clemson - integration description based on VRET lab
equipment - two systems described
- table-mounted, monocular system
- HMD-fitted, binocular system
89F Integration Issues and RequirementsVRET lab
equipment
- SGI Onyx2 InfiniteReality graphics host
- dual-rack, dual-pipe, 8 MIPS R10000 CPUs
- 3Gb RAM, 0.5G texture memory
- ISCAN eye tracker
- table-mounted pan/tilt camera monocular unit
- HMD-fitted binocular unit
- Virtual Research V8 HMD
- Ascension 6 Degree-Of-Freedom (DOF) Flock Of
Birds (FOB) d.c. electromagnetic head tracker
90F Integration Issues and Requirementspreliminari
es
- Primary requirements
- knowledge of video format required by tracker
(e.g., NTSC, VGA) - knowledge of data format returned by tracker
(e.g., byte order, codes) - Secondary requirementstracker capabilities
- fine-grained cursor control and readout?
- transmission of trackers operating mode along
with gaze data?
91F Integration Issues and Requirementsobjectives
- Scene alignment
- required for calibration, display, and data
mapping - use trackers fine-cursor to measure graphics
display dimensionsit is crucial that graphically
displayed calibration points are aligned with
those displayed by eye tracker - Host/tracker synchronization
- required for generation of proper graphics
display, i.e., calibration or stimulus - use trackers operating mode data
92F.1 System Installation
- Primary wiring considerations
- video cablesimperative that graphics host
generate video signal in format expected by eye
tracker - example problem graphics host generates VGA
signal (e.g., as required by HMD), eye tracker
expects NTSC - serial linecomparatively simple serial driver
typically facilitated by data specifications
provided by eye tracker vendor
93F.1 System Installation (contd)
- HMD driven by VGA
- switchbox controls video between monitors and HMD
- 2 VGA-NTSC converters
- TV driven by NTSC
- Fig.35 Video signal wiring diagram for the VRET
lab at Clemson
94F.1 System Installationlessons learned at
Clemson
- Various video switches were needed to control
graphics video and eye camera video - Custom VGA cables (13W3-HD15) were needed to feed
monitors, HMD, and tracker - Host VGA signal had to be re-configured
(horizontal sync not sync-on-green) - Switchbox had to be re-wired (missing two lines
for pins 13 and 14!)
95F.2 Application Program Requirements
- Two example applications
- 2D image-viewing program (monocular)
- VR gaze-contingent environment (binocular)
- Most important common requirement
- mapping eye tracker coordinates to application
programs reference frame - Extra requirements for VR
- head tracker coordinate mapping
- gaze vector calculation
96F.2.1 Eye Tracker Screen Coordinate
Mappinggeneral
- The eye tracker returns the users POR relative
to the trackers screen reference frame, e.g., a
512?512 pixel plane - Tracker data must be mapped to the dimensions of
the application screen - In general, to map x' ? a,b to range c,d,
97F.2.1 Eye Tracker Screen Coordinate Mappingto
3D viewing frustum
- Fig.36 Eye tracker to VR mapping
- note the eye tracker origin at top-left
98F.2.1 Eye Tracker Screen Coordinate Mappingto
3D viewing frustum (contd)
- to convert eye tracker coordinates (x',y') to
graphics coordinates (x,y),
- the term (512 - y') handles the y-coordinate flip
so that eye tracker screen is converted to
bottom-left of the viewing frustum - if dimensions of graphics window are static,
e.g., 640?480, above equation can be hardcoded
99F.2.1 Eye Tracker Screen Coordinate Mappingto
2D image plane
- Conversion of eye tracker coordinates (x',y') to
2D image plane coordinates (x,y) is handled
similarly - For example, if viewable image plane has
dimensions 600?450,
- Note the above mapping assumes eye tracker
coordinates are in range 0,512 - In practice, usable coordinates depend on
location of application window on eye tracking
screen
100F.2.1 Eye Tracker Screen Coordinate Mappingto
2D image plane (contd)
- use eye trackers fine cursor movement to measure
application windows extents - calculate mapping, e.g., for a 600?450 window,
- Fig.37 Application window measurement
101F.2.2 Mapping Flock Of Birds Tracker Coordinates
- For VR applications, position and orientation of
the head is required (obtained from head tracker,
e.g., FOB) - The tracker reports 6 Degree-Of-Freedom (DOF)
information regarding sensor position and
orientation - Orientation is given in terms of Euler angles
102F.2.2 Mapping Flock Of Birds Tracker Coordinates
(contd)
Table 3 Euler angle names
- Euler angles roll, pitch, and yaw are represented
by R, E, A, respectively - each describes rotation angle about one axis
103F.2.2 Mapping Flock Of Birds Tracker Coordinates
(contd)
- Euler angles are described by familiar
homogeneous rotation matrices
- the composite 4?4 matrix, containing all
rotations in one
104F.2.2 Mapping Flock Of Birds Tracker Coordinates
(contd)
- in VR, the composite transformation matrix,
returned by the head tracker, is used to
transform an arbitrary directional vector, w x
y z 1, to align it with the current sensor
(head) orientation
- this formulation is used to align the initial
view vector, up vector, and eventually gaze
vector with the current head-centric reference
frame - note that the FOB matrix may be shifted by 1
105F.2.2 Mapping Flock Of Birds Tracker Coordinates
(contd)
- e.g., transforming the initial view vector, v
0 0 -1 1
- e.g., transforming the initial up vector, u 0
1 0 1
- gaze vector is transformed similarly
106F.2.3 3D Gaze Point and Vector Calculation
- The gaze point calculation in 3-space depends on
only the relative positions of the two eyes in
the horizontal axis - Parameters of interest here are the 3D virtual
(world) coordinates of the gaze point, (xg, yg,
zg) - These coordinates can be determined from
traditional stereo geometry
107F.2.3 3D Gaze Point and Vector Calculation
(contd)
- Fig.39 Basic binocular geometry
- helmet position is the origin, (xh, yh, zh)
- helmet view vector is the optical (viewer-local)
z-axis - helmet up vector is the (viewer-local) y-axis
- eye tracker provides instantaneous viewer-local
gaze coordinates (mapped to viewing frustum)
108F.2.3 3D Gaze Point and Vector Calculation
(contd)
- given instantaneous binocular gaze coordinates
(xl,yl) and (xr,yr) at focal distance f along the
viewer-local z-axis, the gaze point (xg,yg,zg)
can be derived parametrically
- where the interpolant s is given as
109F.2.3 3D Gaze Point and Vector Calculation
(contd)
- the gaze point can be expressed parametrically as
a point on a ray with origin (xh, yh, zh), the
helmet position, with the ray emanating along a
vector scaled by parameter s
- or, in vector notation, g h sv, where h is
the head position, v is the central view vector,
and s is the scale parameter as defined
previously - note the view vector here is not related to the
view vector given by the head tracker
110F.2.3 3D Gaze Point and Vector Calculation
(contd)
- the view vector related to the gaze vector is
obtained by subtracting the helmet position from
the midpoint of the eye tracked x-coordinate and
focal distance to the near view plane,
where m denotes the left and right eye
coordinate midpoint
111F.2.3 3D Gaze Point and Vector Calculation
(contd)
- to transform the vector v to the proper
(instantaneous) head orientation, this vector
should be normalized, then transformed by the
orientation matrix returned by the head tracker - the transformed vector v gives the gaze direction
(ray) - using the helmet position h and gaze direction v,
we can express the gaze vector via a parametric
representation of a ray with linear interpolant
t
112F.2.4 Virtual Fixation Coordinates
- The gaze vector can be used in VR to calculate
virtual fixation coordinates - Fixation coordinates are obtained via traditional
ray/polygon intersection calculations, as used in
ray tracing - The fixated object of interest (polygon) is the
one closest to the viewer which intersects the ray
113F.2.4 Virtual Fixation Coordinatesray/plane
intersection
- The calculation of a ray and all polygons in the
scene is obtained via a parametric representation
of the ray
- where ro defines the rays origin (point) and rd
defines the ray direction (vector) - For gaze, use ro h, the head position, rd
v, the gaze direction vector
114F.2.4 Virtual Fixation Coordinatesray/plane
intersection (contd)
- Recall the plane equation Ax By Cz D 0,
where A2 B2 C2 1, i.e., A, B, C define the
plane normal - Calculate the ray/plane intersection,
- Find the closest ray/plane intersection to the
viewer, where t gt 0
115F.2.4 Virtual Fixation Coordinatesray/plane
intersection (contd)
- possible divide-by-zero need to check for this
(if close to 0 then ray and plane dont
intersect) - if dot product is greater than 0, surface is
hidden from viewer (use to speed up code)
- Fig.40 Ray/plane geometry
- N is actually -N, to calculate angle between ray
and face normal
116F.2.4 Virtual Fixation Coordinatesray/plane
intersection (contd)
- the parameter t defines the point of intersection
along the ray at the plane defined by N - if t gt 0, then point of intersection p is given
by p ro trd - this only gives the intersection of the ray and
the (infinite!) plane - need to test whether p lies within confines of
the polygonal face
Fig.41 Ray/plane intersection algorithm
117F.2.4 Virtual Fixation Coordinatespoint-in-polyg
on
- for each edge
- calculate plane perpendicular to polygon, passing
through the edges two vertices - N' N ? (B - A)
- calculate new planes equation
- test point p to see if it lies above or below
new plane - is p is above all planes, p is inside polygon
Fig.42 Point-in-polygon problem
118F.3 System Calibration and Usage
- Most video-based eye trackers require calibration
- Usually composed of simple stimuli (dots,
crosses, etc.) displayed sequentially at far
extents of viewing window - Application program displaying stimulus must be
able to draw calibration stimulus at appropriate
locations and at appropriate time
119F.3 System Calibration and Usage (contd)
- Fig.43 Usual graphics draw routine augmented by
mode-sensitive eye tracking code
120F.3 System Calibration and Usage (contd)
- calibration stimulus is displayed in both RESET
and CALIBRATE states this facilitates initial
alignment of the application window (default
calibration dot is at center) - stimulus scene (e.g., image or VE) is only
displayed if display condition is satisfied this
can be used to limit duration of display - for VR applications, draw routine may be preceded
by left or right viewport calls (for stereoscopic
displays) - the main program loop is responsible for 1)
reading eye (and head) tracker data 2) mapping
coordinates 3) starting/stopping timers 4)
recording or acting on gaze coordinates
121F.3 System Calibration and Usage (contd)
while(1) getEyeTrackerData(x,y) mapEyeTracker
Data(x,y) switch(eye tracker state) case
RUN if(!starting) start timer
displayStimulus1 if(timer() gt DURATION)
displayStimulus0 else storeData(x,y) case
RESET case CALIBRATE starting0 redraw
()
- Fig.44 Main loop (2D imaging application)
122F.3 System Calibration and Usage (contd)
while(1) getHeadTrackerData(eye,dir,upv) getE
yeTrackerData(xl,yl,xr,yr) mapEyeTrackerData(xl,
yl,xr,yr) s b/(xl - xr b) // linear gaze
interpolant h eyex, eyey, eyez // head
position v (xlxr)/2 - xh, (ylyr)/2 - yh,
f-zh // central view vector transformVectorToHea
d(v) // multiply v by FOB matrix g h
sv // calculate gaze point switch(eye tracker
state) ... redraw()
- Fig.45 Main loop (VR application)
123F.3 System Calibration and Usage (contd)
- once application program has been developed,
system is ready for use general manner of usage
requires the following steps - move application window to align it with eye
trackers default (central) calibration dot - adjust the eye trackers pupil and corneal
reflection thresholds - calibrate the eye tracker
- reset the eye tracker and run (program displays
stimulus and stores data) - save recorded data
- optionally re-calibrate again
124F.4 Data Collection and Analysis
- Data collection is fairly straightforward store
point of regard info along with timestamp - Use linked list since number of samples may be
large
Fig.46 2D imaging POR data structure
125F.4 Data Collection and Analysis (contd)
- For VR applications, data structure is similar,
but will require z-component - May also store head position
- Analysis follows eye movement analysis models
presented previously - Goals 1) eliminate noise 2) identify fixations
- Final point label stored data appropriately
with many subjects, experiments tend to generate
LOTS of data
126F.4 Data Collection and Analysis (contd)
- Fig.47 Example of 3D gaze point in VR
- calculated gaze point of user in art gallery
environment - raw data, blinks removed
127Eye-Based Interaction in Graphical Systems
Theory Practice
- Part III
- Potential Gaze-Contingent Applications
128G ApplicationsIntroduction
- Wide variety of eye tracking applications exist,
each class increasingly relying on advanced
graphical techniques - Psychophysics Human Factors
- Advertising Displays
- Virtual Reality HCI Collaborative Systems
- Two broad categories diagnostic or interactive
129H Psychology, Psychophysics, and Neuroscience
- Applications range from basic research in vision
science to investigation of visual exploration in
aesthetics (e.g., perception of art) - Examples
- psychophysics spatial acuity, contrast
sensitivity, ... - perception reading, natural scenery, ...
- neuroscience cognitive loads, with fMRI, ...
130H Psychology, Psychophysics, and Neuroscience
(contd)
(a) aesthetic group
(b) semantic group
- Fig.48 Perception of art
- small but visible differences in scanpaths
- similar sets of fixated image features
131I Ergonomics and Human Factors
- Applications range from usability studies to
testing effectiveness of cockpit displays - Examples
- evaluation of tool icon groupings
- comparison of gaze-based and mouse interaction
- organization of click-down menus
- testing electronic layout of pilots visual
flight rules - testing simulators for training effectiveness
132I Ergonomics and Human Factors (contd)
- Fig.49 Virtual aircraft cargo-bay environment
- examination of visual search patterns of experts
during aircraft inspection tasks - 3D scanpaths gaze/wall intersection points
133J Marketing / Advertising
- Applications range from assessing ad
effectiveness (copy testing) in various media
(print, images, video, etc.) to disclosure
research (visibility of fine print) - Examples
- eye movements over print media (e.g., yellow
pages) - eye movements over TV ads, magazines, ...
134J Marketing / Advertising (contd)
- Fig.50 Scanpaths over magazine ads
135K Displays
- Applications range from perceptually-based image
and video display design to estimation of
corrective display functions (e.g., gamma, color
spaces, etc.) - Examples
- JPEG/MPEG (no eye tracking per se, but
perceptually based, e.g., JPDs) - gaze-contingent displays (e.g., video-telephony,
) - computer (active) vision
136K Displays (contd)
(a) Haar HVS reconstruction
(b) wavelet acuity mapping
- Fig.51 Gaze-based foveo-peripheral image coding
- 2 Regions Of Interest (ROIs)
- smooth degradation (wavelet interpolation)
137L Graphics and Virtual Reality
- Applications range from eye-slaved foveal Region
Of Interest (ROI) VR simulators to
gaze-contingent geometric modeling - Examples
- flight simulators (peripheral display
degradation) - driving simulators (driver testing)
- gaze-based dynamic Level Of Detail modeling
- virtual terrains
138L Graphics and Virtual Reality (contd)
- Fig.52 Gaze-contingent Martian terrain
- subdivided quad mesh
- per-block LOD
- resolution level based on viewing direction and
distance
139L Graphics and Virtual Reality (contd)
140L Graphics and Virtual Reality (contd)
141L Graphics and Virtual Reality (contd)
142L Graphics and Virtual Reality (contd)
143M Human-Computer Interaction and Collaborative
Systems
- Applications range from eye-based interactive
systems to collaboration - Examples
- intelligent gaze-based informational displays
(text scroll window synchronized to gaze) - self-disclosing display where digital
characters responded to users gaze (e.g.,
blushing) - multiparty VRML environments
144M Human-Computer Interaction and Collaborative
Systems (contd)
- multiparty tele-conferencing and document sharing
system - images rotate to show gaze direction (who is
talking to whom) - document lightspot (deictic look at this
reference)
- Fig.53 GAZE Groupware display
Fig.54 GAZE Groupware interface
145Eye-Based Interaction in Graphical Systems
Theory Practice
- For further information
- http//www.vr.clemson.edu/eyetracking
- SIGGRAPH course notes
- Eye Tracking Research Applications Symposium
146Dont forget to attend
- Eye Tracking Research Applications
- Symposium 2000
- November 6th-8th 2000, Palm Beach Gardens, FL,
USA - Sponsored by ...
- With corporate sponsorship from
- http//www.vr.clemson.edu/eyetracking/et-conf/
147Eye-Based Interaction in Graphical Systems
Theory Practice
- Demonstration
- GAZE Groupware System