Computer Vision in the Interface - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Computer Vision in the Interface

Description:

'Computer vision technology can be used to build machines that 'look at people' ... as vision, speech and sound processing and haptic I/O into the user interface. ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 14
Provided by: hughessu
Category:

less

Transcript and Presenter's Notes

Title: Computer Vision in the Interface


1
Computer Vision in the Interface
  • Communications of the ACM January 2004/ Vol.47,
    No.1
  • By Matthew Turk

2
Agenda
  • Definition of Computer Vision
  • Research
  • Computer Vision Functionality
  • Face Detection and Face Recognition
  • Successes
  • Technical Challenges
  • Continued Investment in Research
  • KidsRoom Project
  • Visual information is clearly important as
    people converse and interact with one another.

3
What is computer vision?
  • Computer vision technology can be used to build
    machines that look at people and automatically
    perceive relevant visual information. Through
    modality of vision, we can determine a number of
    salient facts and features about others,
    including their location, identity, approximate
    age, focus of attention, facial expression,
    posture, gestures and general activity. Visual
    cues affect the content and flow of conversation,
    and they impart contextual information different
    from, but related to, speech for example,
    gesture or facial expression may be a key signal,
    or the direction of gaze may disambiguate the
    object referred to in speech as this or the
    direction over there. Visual and speech are
    co-expressive and complementary channels in
    human-human interaction.

4
Research
  • Traditionally been motivated by application areas
    such as biological vision modeling, robot
    navigation and manipulation, surveillance,
    medical imaging, and various inspection,
    detection and recognition tasks.
  • Primary aim is to use vision as an effective
    input modality in human-computer interaction.
    Integration of multiple perceptual modalities
    such as vision, speech and sound processing and
    haptic I/O into the user interface.

5
Research
  • Focus is on modeling, recognizing and
    interpreting human behavior to convey things such
    as identity, location, and movement. To fully
    support visual aspects of interaction, several
    tasks need to be addressed.

6
Computer Vision Functionality
  • Face detection and location How many people are
    in the scene and where are they?
  • Face recognition Who is it?
  • Head and face tracking Where is the users
    head, and what is the specific position and
    orientation of the face?
  • Facial expression analysis Is the user smiling,
    laughing, frowning, speaking, sleepy?
  • Audiovisual speech recognition Using
    lip-reading and face-reading along with speech
    processing, what is the user saying?
  • Eye-gaze tracking Specifically where are the
    users eyes looking?
  • Hand tracking Where is the users hands, in 2D
    or 3D? What are the specific hand
    configurations?
  • Body tracking Where is the users body and what
    is its articulation?
  • Gait recognition Whose style of walking/running
    is this?
  • Recognition of postures, gestures and activity
    What is this person doing?

7
Face detection and face recognition
  • Face detection and face recognition have received
    the most attention and have seen the most
    progress.
  • 1st computer programs to recognize human faces
    appeared in the late 60s and 70s, but none were
    fast enough to support any recognition close to
    real time.

8
Face Recognition
  • Feature locations, face shape, face texture
  • Computational models
  • Component analysis
  • Linear discriminate analysis
  • Gabor wavelet networks
  • Active Appearance Models
  • Technology Companies that develop and market face
    recognition for access, security and surveillance
    apps
  • Indentix
  • Viisage Technology
  • Cognitec Systems

9
Face Detection
  • To locate all faces in a scene at various scales
    and orientations
  • Works well in constrained environments
  • Some promising prototypes - feature-based action
    unit recognition system for facial expression
    analysis

10
Successes
  • Eye-gaze tracking active sensing, sending an
    infrared light source toward the users eye to
    use as a reference direction
  • Pfinder produced a contour representation of
    the bodys silhouette
  • Tracking hand positions in 2D and 3D
  • Moores Law improvements in hardware
  • Advances in camera technology
  • A rapid increase in digital video installation
  • Availability of software tools such as Intels
    and Open CV library (small, flexible and
    affordable)

11
Technical Challenges
  • Robustness
  • Speed
  • Initialization
  • Usability
  • Contextual integration

12
Continued investment in research
  • Face Recognition Technology Program from 1993
    1997
  • Face Recognition Vendor Tests of 2000 and 2002
  • Have provided performance measures for assessing
    the capabilities of both research and commercial
    face recognition systems
  • Human Identification at a Distance Program
  • Pursue multi-modal fusion techniques including
    gait recognition to identify people 25-150 feet
  • Video Surveillance and Monitoring Program
  • Recognize activity of interest for future
    surveillance apps
  • Grants from the National Science Foundation
  • Microsoft, IBM and Intel

13
KidsRoom Project
  • KidsRoom project at the MIT Media Lab -
    fully-automated, interactive narrative playspace
    for children. Using images, lighting, sound, and
    computer vision action recognition technology, a
    child's bedroom was transformed into an unusual
    world for fantasy play. Objects in the room
    became characters in an adventure, and the room
    itself actively participated in the story,
    guiding and reacting to the children's choices
    and actions. Through voice, sound, and image the
    KidsRoom entertained and provoked the mind of the
    child.
  • Provided an interactive, narrative play space for
    children
  • Uses computer vision to recognize users
    locations and their actions helped deliver a
    compelling interactive experience for the
    participant
  • www.kidsroom.com
Write a Comment
User Comments (0)
About PowerShow.com