Introduction to Biometrics - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Introduction to Biometrics

Description:

Introduction to Biometrics. Dr. Bhavani Thuraisingham. The University of Texas at Dallas ... Introduction ... Introduction (Continued) ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 36
Provided by: chrisc8
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Biometrics


1
Introduction to Biometrics
  • Dr. Bhavani Thuraisingham
  • The University of Texas at Dallas
  • Lecture 12
  • Biometric Technologies Voice Scan
  • October 3, 2005

2
Outline
  • Introduction
  • How does it work
  • Components
  • Voice Scan Process
  • Template generation and matching
  • Market and Applications
  • Strengths and Weaknesses
  • Research Directions
  • Summary
  • Appendix Banking Application

3
References
  • Course Text Book, Chapter 7
  • http//www.biometricsinfo.org/voicerecognition.htm

4
Introduction
  • Voice Recognition is a technology which allows a
    user to use his/her voice as input.
  • Voice recognition may be used to dictate text
    into the computer or to give commands to the
    computer (such as opening application programs,
    pulling down menus, or saving work).
  • Older voice recognition applications require each
    word to be separated by a distinct space.
  • This allows the machine to determine where one
    word begins and the next stops.
  • These kinds of voice recognition applications are
    still used to navigate the computer's system, and
    operate applications such as web browsers or
    spread sheets.

5
Introduction (Continued)
  • Newer voice recognition applications allow a user
    to dictate text fluently into the computer.
  • These new applications can recognize speech at up
    to 160 words per minute.
  • Voice recognition uses a neural net to "learn" to
    recognize voice.
  • As you speak, the voice recognition software
    remembers the way you say each word.
  • This customization allows voice recognition, even
    though everyone speaks with varying accents and
    inflection.
  • In addition to learning how you pronounce words,
    voice recognition also uses grammatical context
    and frequency of use to predict the word you wish
    to input.
  • While the accuracy of voice recognition has
    improved users still experience problems with
    accuracy either because of the way they speak or
    the nature of their voice.
  •  

6
How does it work?
  • Voice recognition technology utilizes the
    distinctive aspects of the voice to verify the
    identity of individuals.
  • Voice recognition is occasionally confused with
    speech recognition, a technology which translates
    what a user is saying (a process unrelated to
    authentication).
  • Voice recognition technology, by contrast,
    verifies the identity of the individual who is
    speaking.
  • The two technologies are often bundled speech
    recognition is used to translate the spoken word
    into an account number, and voice recognition
    verifies the vocal characteristics against those
    associated with this account. 

7
How does it work? (Concluded)
  • Voice recognition can utilize any audio capture
    device, including mobile and land telephones and
    PC microphones.
  • The performance of voice recognition systems can
    vary according to the quality of the audio signal
    as well as variation between enrollment and
    verification devices
  • During enrollment an individual is prompted to
    select a passphrase or to repeat a sequence of
    numbers.
  • The passphrases selected should be approximately
    1-1.5 seconds in length very short passphrases
    lack enough identifying data, and long passwords
    have too much, both resulting in reduced
    accuracy.
  • The individual is generally prompted to repeat
    the passphrase or number set a handful of times,
    making the enrollment process somewhat longer
    than most other biometrics. 

8
Components of the Voice Scan System
  • Users spoken phrase is converted from analog to
    digital formant and transmitted to local or
    central PC
  • For desktop verification applications, engine
    that provides templates based functions may
    reside on the local or central PC
  • For telephone based applications the software may
    reside in the Institution that users are
    interacting with
  • Voice scan comparisons are tied directly to
    existing authenticating systems
  • May be web-enabled

9
Process
  • Data Acquisition
  • Audio capture devices include mobile and land
    telephones and PC microphones
  • Individual selects a passphrase and repeats it or
    repeats sequence of numbers
  • Should be long enough
  • Not too loud or soft
  • More difficult with PC/ mobile phones than with
    land telephones
  • Data Processing
  • The data is proceed before template creation
  • Eliminates gaps and performs filtering

10
Distinctive Features
  • Measures vocal qualities not detectable by humans
  • Pitch and frequency are key features measured
  • Voice scan algorithms also measure
  • gain or intensity
  • short time spectrum of speech,
  • format frequencies,
  • linear prediction coefficients,
  • cepstral coefficient
  • Spectrograms
  • Nasal coarticulation
  • Replicable only by human voice and therefore more
    secure

11
Template Creation/Generation
  • Based on statistics based pattern matching called
    Hidden Markov Models (HMM)
  • HMM are generalized profiles that are formed
    through the comparison of multiple samples to
    find characteristically repeating patterns
  • During enrollment template generation relies on
    the capture of multiple voice samples and are
    analyzed to determine the qualities that can be
    relied upon for later recognition

12
Template Matching
  • Production voice scan technologies are not
    capable of one-many identification
  • Operates in one-one authentication mode
  • When user attempts verification the system
    compares the live submission with the profile
    created and then returns a statistical rating
  • Users may change their speech during enrollment
    and verification and therefore not very reliable

13
Applications
  • Voice recognition is a strong solution for
    implementations in which vocal interaction is
    already present.
  • It is not a strong solution when speech is
    introduced as a new process.
  • Telephony is the primary growth area for voice
    recognition, and will likely be by far the most
    common area of implementation for the technology.
  • Telephony-based applications for voice
    recognition include account access for financial
    services, customer authentication for service
    calls
  • These solutions route callers through enrollment
    and verification subroutines, using
    vendor-specific hardware and software integrated
    with an institution's existing infrastructure. 
  • Voice recognition has also been implemented in
    physical access solutions for border crossing

14
Deployment
  • NYC Department of Corrections NY DOC
  • Used to check the location of Juvenile offenders
  • The offender is called and he/she has to call
    back. The voice is verified and also caller ID is
    checked
  • Pilot projects in banking
  • Ireland, Belgium, South Africa
  • Technology form T-NETIX and Buytel

15
Market
  • Though revenues from the technology are
    relatively small today, voice recognition will
    likely draw substantially greater revenues
    through 2007.
  • Most likely to be deployed in telephony-based
    environments (such as account access for
    financial services and customer authentication
    for service calls).
  • Voice recognition revenues are projected to grow
    from 12.2m in 2002 to 142.1m in 2007.
  • Voice recognition revenues are expected to
    comprise approximately 4 of the entire biometric
    market.

16
Strengths of Voice Scan
  • One of the challenges facing large-scale
    implementations of biometrics is the need to
    deploy new hardware to employees, customers and
    users.
  • One strength of telephony-based voice recognition
    implementations is that they are able to
    circumvent this problem, especially when they are
    implemented in call center and account access
    applications.
  • The ability to use existing telephones means that
    voice recognition vendors have hundreds of
    millions of authentication devices available for
    transactional usage today. 
  • Resistant to Imposters
  • Imposter may not guess correct passphrases and
    account numbers

17
Weaknesses of Voice Scan
  • There may be noise with the voice
  • Low accuracy as enrollment voice may differ from
    verification voice for the same user
  • Large template size
  • Does not work well with PC

18
Research Directions
  • Improve accuracy
  • Model variations of voice for the same speaker
  • Improve performance
  • Better PC-based methods
  • Better models
  • HMM, neural networks

19
Technology Comparison
  • Method Coded Pattern Misidentification
    rate Security
  • Iris Recognition Iris pattern 1/1,200,000
  • Fingerprinting Fingerprints 1/1,000
  • Hand Shape Size, length and thickness of
    hands 1/700
  • Facial Recognition Outline, shape and
    distribution of eyes and nose 1/100
  • Signature Shape of letters, writing order, pen
    pressure 1/100
  • Voiceprinting Voice characteristics 1/30

20
Summary
  • Can be widely used
  • Telephones available
  • Low accuracy
  • People can change voices
  • Many applications
  • E.g., Banking Telephony

21
Introduction to Biometrics
  • Dr. Bhavani Thuraisingham
  • The University of Texas at Dallas
  • Telephone Banking Application of Voice Scan
  • October 3, 2005

22
The Problem
  • Telephone banking is increasingly popular with
    customers, and will be increasingly attractive to
    banks and other financial institutions as they
    start to implement highly cost effective
    automated speech recognition technology to handle
    routine transactions (the subject of another
    "financial futures" web page).
  • But the procedures for verifying customers over
    the telephone are unsatisfactory, both in terms
    of customer convenience and also, increasingly,
    from a security point of view.
  • The Problem The usual approach to verifying
    customers - proving that they are who they claim
    to be - is to use some sort of PIN or password.
    To avoid the customer having to say the password
    out loud, they are usually prompted for, say, the
    second and fourth letters in the password.

23
The Problem (Concluded)
  • There are several problems with this approach
  • Firstly, passwords and PINs are difficult to
    remember and unwieldy for customers to use in
    this manner.
  • Secondly, it takes time - identification and
    verification of the caller is often the
    lengthiest component of a transaction and this
    translates directly to the bottom line.
  • Many customers write down their passwords or
    reveal them to the operator (in extreme cases
    they may self select the same PIN that they use
    for ATM withdrawals).
  • Many call centers prompt the caller for
    additional 'secret' items such as their mother's
    maiden name, but this only exacerbates the other
    two problems.

24
The Solution
  • The Solution? Voice Verification Technology now
    exists which enables individuals to be reliably,
    rapidly and cost-effectively verified on the
    basis of the physical characteristics of their
    voice.
  • Vendors now supply commercial voice verification
    technology.
  • A good example is Nuance Communications, based in
    California, using essentially the same technology
    which underlies their speaker independent speech
    recognition software.
  • But in this case recognition is speaker dependent
    - the customer is only allowed to use the system
    if their individual voiceprint matches their
    identity (normally established though an account
    number).

25
The Solution (Concluded)
  • A new customer automatically enrolls in the
    system over the telephone by repeating about 10
    four digit numbers or reading a short piece of
    text.
  • The software extracts from this a number of
    physical characteristics which are unique to that
    voice.
  • In all subsequent transactions, the caller, once
    identified, is asked to repeat a couple of
    randomly generated PINs or, for example, names of
    cities (this is to prevent imposters
    tape-recording a customer saying their password
    or PIN).
  • If the voiceprint matches the one stored against
    the account number the transaction proceeds if
    not, the customer is referred to a supervisor.

26
Details
  • The speaker-specific characteristics of speech
    are due to differences in physiological and
    behavioral aspects of the speech production
    system in humans.
  • The main physiological aspect of the human speech
    production system is the vocal tract shape.
  • The vocal tract is generally considered as the
    speech production organ above the vocal folds,
    which consists of the following
  • (i) laryngeal pharynx
  • (ii) oral pharynx
  • (iii) oral cavity
  • (iv) nasal pharynx
  • (v) nasal cavity

27
Details (Continued)
  • The vocal tract modifies the spectral content of
    an acoustic wave as it passes through it, thereby
    producing speech.
  • Hence, it is common in speaker verification
    systems to make use of features derived only from
    the vocal tract.
  • In order to characterize the features of the
    vocal tract, the human speech production
    mechanism is represented as a discrete-time
    system
  • The acoustic wave is produced when the airflow
    from the lungs is carried by the trachea through
    the vocal folds.
  • This source of excitation can be characterized as
    phonation, whispering, frication, compression,
    vibration, or a combination of these.

28
Details (Continued)
  • Phonated excitation occurs when the airflow is
    modulated by the vocal folds.
  • Whispered excitation is produced by airflow
    rushing through a small triangular opening
    between the arytenoid cartilage at the rear of
    the nearly closed vocal folds.
  • Frication excitation is produced by constrictions
    in the vocal tract.
  • Compression excitation results from releasing a
    completely closed and pressurized vocal tract.
  • Vibration excitation is caused by air being
    forced through a closure other than the vocal
    folds, especially at the tongue.

29
Details (Continued)
  • Speech produced by phonated excitation is called
    voiced,
  • Produced by phonated excitation plus frication is
    called mixed voiced
  • Produced by other types of excitation is called
    unvoiced.
  • It is possible to represent the vocal-tract in a
    parametric form as the transfer function H(z).
  • In order to estimate the parameters of H(z) from
    the observed speech waveform, it is necessary to
    assume some form for H(z).
  • Ideally, the transfer function should contain
    poles as well as zeros.
  • However, if only the voiced regions of speech are
    used then an all-pole model for H(z) is
    sufficient.

30
Details (Concluded)
31
Choice of Features
  • The LPC (linear predictive coding) features were
    very popular in the early speech-recognition and
    speaker-verification systems.
  • However, comparison of two LPC feature vectors
    requires the use of computationally expensive
    similarity measures
  • Hence LPC features are unsuitable for use in
    real-time systems.
  • The use of the cepstrum has been suggested,
    defined as the Inverse Fourier transform of the
    logarithm of the magnitude spectrum, in
    speech-recognition applications.
  • The use of the cepstrum allows for the similarity
    between two cepstral feature vectors to be
    computed as a simple Euclidean distance.
  • It has been demonstrated that the cepstrum
    derived from the LPC features results in the best
    performance
  • Consequently, LPC derived cepstrum for speaker
    verification system is used in general.

32
Spoeaker Modeling
  • Using cepstral analysis, an utterance may be
    represented as a sequence of feature vectors.
  • Utterances spoken by the same person but at
    different times result in similar yet a different
    sequence of feature vectors.
  • The purpose of voice modeling is to build a model
    that captures these variations in the extracted
    set of features.
  • There are two types of models that have been used
    extensively in speaker verification and speech
    recognition systems
  • stochastic models and template models.

33
Speaker Modeling (Continued)
  • The stochastic model treats the speech production
    process as a parametric random process and
    assumes that the parameters of the underlying
    stochastic process can be estimated in a precise,
    well defined manner.
  • The template model attempts to model the speech
    production process in a non-parametric manner by
    retaining a number of sequences of feature
    vectors derived from multiple utterances of the
    same word by the same person.
  • Template models dominated early work in speaker
    verification and speech recognition because the
    template model is intuitively more reasonable.
  • However, recent work in stochastic models has
    demonstrated that these models are more flexible
    and hence allow for better modeling of the speech
    production process.

34
Speaker Modeling (Concluded)
  • A very popular stochastic model for modeling the
    speech production process is the Hidden Markov
    Model (HMM).
  • HMMs are extensions to the conventional Markov
    models, wherein the observations are a
    probabilistic function of the state
  • the model is a doubly embedded stochastic process
    where the underlying stochastic process is not
    directly observable (it is hidden).
  • The HMM can only be viewed through another set of
    stochastic processes that produce the sequence of
    observations.
  • Thus, the HMM is a finite-state machine, where a
    probability density function p(x s_i) is
    associated with each state s_i. The states are
    connected by a transition network, where the
    state transition probabilities are a_ij p(s_i
    s_j).

35
Pattern Matching
  • The pattern matching process involves the
    comparison of a given set of input feature
    vectors against the speaker model for the claimed
    identity and computing a matching score. For the
    Hidden Markov models the matching score is the
    probability that a given set of feature vectors
    was generated by the model.
  •  
Write a Comment
User Comments (0)
About PowerShow.com