Title: A HardwareSoftware Codesign Approach for Face Recognition
1A Hardware/Software Co-design Approach for Face
Recognition
- By Shawki Areibi
- University of Guelph
- School of Engineering
- Engineering Systems Computing
- Guelph, Ontario, Canada
-
2Outline
- Introduction
- Face Recognition
- Applications
- Background
- Face Recognition Methodolgies
- Artificial Neural Networks
- H/S Co-Design Approach
- MicroBlaze Embedded System (software)
- MicroBlaze with dedicated hardware module (H/S)
- Results
- Conclusions Future Work
3Introduction
- Face Recognition Given still or video images of
a scene, identify or verify one or more persons
in the scene using a stored database of faces
Database
Image 1
Target Image
Image 3 Matches The Target Image
Image 2
Image 3
Image 4
4Typical Applications of Face Recognition
- Entertainment
Video game,
Virtual reality, Human robot interaction - Smart Cards
Drivers licenses,
Immigration, National ID, Passports, Voter
registration, Welfare fraud - Information Security
Internet Access, Medical
records, Secure trading terminals - Law Enforcement and Surveillance
Advanced video surveillance, Shoplifting, Suspect
tracking and investigation.
5Face Recognition Methodologies
- Statistical Methods
- Template matching compares the image with a
single template using a distance metric. - Projection-based methods (Principal Component
Analysis) - ANN based approach
- Geometrical local feature based ANN (fractal
codes, i.e eyes, nose, eyebrows) - Holistic-based ANN (applying BP on the intensity
values of the face image)
6Artificial Neural Networks
An Artificial Neuron
Inputs
Output
f(net)
i
Inputs
Output
An Artificial Neural Network (ANN)
(a set of processing elements (PEs) with
adjustable strength (weights))
A three-layer Perceptron Structure
7Functionality of ANNs
Desired
- Training
- Present data to the ANN
- ANN computes an output
- Compare computed output with desired
output - Modify ANN weights to reduce error
Input
Output
Adaptive System
Cost
Change Parameters
Error
Training Algorithm
- Testing
- Present new data to the ANN
- ANN computes an output based on its training
8ANN Typical Applications
- Function Approximation
- Process Modeling, Process Control, Data Modeling,
Machine Diagnostics - Time series Prediction
- Financial Forecasting, Bankruptcy Prediction,
Sales Forecasting, Dynamic System Modeling - Data Mining
- Clustering, Data Visualization, Data Extraction
- Classification
- Medical Diagnosis, Target Recognition, Character
Recognition, Face Recognition, Speech
Recognition, Fraud Detection, etc.
9Solving face recognition by ANN
Input Node Value
Computed Hidden Node Value
ah
ah
Computed Hidden Node Error
bi, ei
ei
bi
Computed Output Node Value
Computed output Node Error
cj, dj
cj
dj
Target Output
j
wij
wij
i
vhi
vhi
Adjustable Weights Between Hidden Layer and
Output Layer
h
Adjustable Weights Between Input Layer and Hidden
Layer
Training image list
10 Methodology
TO MAP ANNs ONTO A PLATFORM WHERE PERFORMANCE AND
FLEXIBILITY CAN BE BALANCED
Slow training
Lack of clear methodology to determine the
network topology
The function is hard to change after designed
TARGETING FPGAs
Time-consuming to get a hardware implementation
Results
11Xilinx Multimedia Board
- Xilinx Virtex-II XC2V2000 FPGA
- 512?36-bit 130 MHz ZBT (Zero Bus Turnaround RAM)
- 16M Flash memory and RS232 port
- push buttons
- SVGA output
- Onboard network connection, 10/100 Ethernet
- Audio CODEC compliant with AC97 and stereo
amplifier with 18-bit sigmadelta A/Ds and D/As - Supports a single channel of real time PAL or
NTSC video input - Headphone and microphone
12MicroBlaze
- Full Harvard, RISC pipelined architecture with
32-bit data and instruction word - support Virtex, Virtex-E, Virtex-II, Virtex-II
Pro, Spartan-II, and Spartan-IIE devices - 102 Dhrystone MIPS (D-MIPS) on Virtex-II Pro
device at 150 MHz - Minimum logic requirements 900 logic cells
- 32-bit pipelined RISC architecture
- 32?32-bit general purpose registers
- support Local Memory Bus (LMB) for fast access of
on-chip BRAMs - Support IBM CoreConnect On-chip Peripheral Bus
(OPB) for accessing peripherals - Processor peripherals compatible with PowerPC on
Virtex-II Pro - complete hardware and software development tool
and debug solution
13A Pure MicroBlaze System
myjtaguart
mytimer
(i_lmb) instruction local memory bus
OPB Timer/Counter
lmb_bram_cntlr
bram
mblaze
OPB BUS
MicroBlaze
myuart
mygpio
(d_lmb) data local memory bus
HyperTerminal
Switches
14A Pure MicroBlaze System
Face images are available at
http//www-2.cs.cmu.edu/afs/cs.cmu.edu/project/the
o-8/faceimages/faces/
(Professor Tom M. Mitchells Machine Learning
Course, Carnegie Mellon University)
- 20 individuals, each with 32 images varying in
expression, the direction, and whether or not
their eyes are open - In total, 624 grayscale images in a PGM format
- each image has a resolution of 120128 pixels
- each image pixel described by a grayscale
intensity value between 0 (black) and 255 (white)
15A Pure MicroBlaze System
Start_timer() Randomization() while(iltepoch_num)
for(j0 jltnumber of training images
j) Initial input nodes to
the values of image j Set target
value for image j forward()//impleme
nted by C backward()//implemented by C
update()//implemented by C
Stop_timer() //Testing process Initial input
nodes to the value of a certain testing
image forward() print and interpret the result
void Randomization() for(h0
jltNODE_NUM_I h) for(i0
IltNODE_NUM_H I)
if(rand()2-1) vhirand()
mod 5000/10000.0 else
vhi-rand() mod 5000/10000.0
...
16A Pure MicroBlaze System
void backward() for(j0 jltNODE_NUM_O
j) dj ?(1-cj)cj(c
j-cj) for(i0 iltNODE_NUM_H
i) temp0.0
for(j0 jltNODE_NUM_O j)
temptemp ?(1- bi)biwijdj
eitemp
void forward() for(i0 iltNODE_NUM_H
i) temp0.0
for(h0 hltNODE_NUM_I h)
temptempahvhi
bi1.0/(1.0exp(-?threhitemp))
for(j0 jltNODE_NUM_O j)
temp0.0 for(i0
iltNODE_NUM_H i)
temptempbiwij
ci1.0/(1.0exp(-?threojtemp))
void update() for(j0 jltNODE_NUM_O
j) for(i0 iltNODE_NUM_H
i)
wijwij?bidj
threojthreoj ?dj
for(i0 iltNODE_NUM_H i)
for(h0 hltNODE_NUM_I h)
vhivhi ?ahei
threhithrehi ?ei
k
17A Pure MicroBlaze System
18A Pure MicroBlaze System
Result on pure MicroBlaze system
Prof. Tom Mitchells C code
19A H/S System
myjtaguart
mytimer
(i_lmb) instruction local memory bus
OPB JTAG_UART
OPB Timer/Counter
lmb_bram_cntlr
bram
mblaze
OPB Block RAM
OPB Block RAM Controller
OPB BUS
MicroBlaze
FSL0
myuart
mygpio
FSL1
OPB UART Lite
OPB GPIO
(d_lmb) date local memory bus
myhum
Hardware Update Module (HUM)
HyperTerminal
Switches
20A H/S System
FSL0_S_Clk
FSL1_M_Clk
FSL0_S_Data
FSL1_M_Data
FSL0_S_Control
FSL1_M_Control
FSL0_S_Read
FSL1_M_Write
FSL0_S_Exist
FSL1_M_Full
SYS_CLK
Counter1
21A H/S System
void update() for(i0 iltNODE_NUM_H
i) for(j0 jltNODE_NUM_O
j) microblaze_nbwrite_
datafsl(wij,0)
microblaze_nbwrite_datafsl(?,0)
microblaze_nbwrite_datafsl(bi,0)
microblaze_nbwrite_datafsl(dj,0)
for(j0 jltNODE_NUM_O j)
microblaze_nbread_datafsl(wij
,1)
- From MicroBlaze to FSL0 and from FSL1 to
MicroBlaze
Old wij
To update weight wij
?
bi
dj
New wijold wij?bidj
Old threoj
To update threshold threoj
?
1
dj
New threojold threoj ?dj
22Hardware Update Unit (HUM)
- Input signals
- Ready_cal (from counter1)
- Ready_out (from UPDATE_UNITs)
- Done_out (from counter2)
- Output signals
- Start_cal (to start UPDATE_UNITs)
- Start_out (to start counter2)
wij
?
dj
READY
bi
Waiting
00
1XX
UPDATE UNIT
XX1
Calculating
10
DONE
RESULT
Sending
X1X
01
23A H/S System
Execution time
1
old
Speedup
overall
Fraction
Execution time
enhanced
new
(1-Fraction )
enhanced
Speedup
enhanced
1.69
24Conclusion Future Work
Conclusion
- A pure software implementation gives you
flexibility - A pure hardware implementation gives your
performance - A H/S system balances flexibility and performance
Future Work
- Improvement to hardware implementation of adder,
multiplier and sigmoid function - More MicroBlazes with dedicated hardware
25Thank you!
The presentation and code are available
at http//www.uoguelph.ca/sareibi Email
sareibi_at_uoguelph.ca
26A H/S System
CLK
FSL0_S_Exists
D0
D1
D13
D14
D15
FSL0_S_Exists
FSL0_S_Read
D0
Register P00
D1
Register P01
D15
Register P33
27Backpropagation algorithm
A H/S Co-design Approach
- Error
- b f (? v ?a ? ), c f (? w ? b ?
), where - f(x)(1e ) .
- E c -c
- From hidden to output layer
- ?w -? ??c -c ? f (c )?b
??c ?(1-c )?c -c ?b ?d b - ?? ?d
- From input to hidden layer
- ?v ?a e , e b ? (1-b )?? w d
- ?? ?e
i
hi
h
i
j
ij
i
j
h
i
-x
-1
1
2
k
j
j
2
?E
k
k
j
j
?w
ij
j
i
j
j
j
j
i
j
i
ij
j
j
hi
h
i
i
i
i
ij
j
j
i
i
28A Single Virtex-II FPGA Slice
Appendix
- 4-input LUT
- 16-bit distributed SelectRAM
- 16-bit shift register
- D FlipFlop
29OPB Timer/Counter
Capture Trig0
Capture Trig1
OPB BUS
TCSR0
TLR0
TLR1
TCSR1
TCR0
TCR1
OPB BUS
TC_Interrupt
GenerateOut0
GenerateOut1
Void Start_timer() XIo_Out32(XPAR_OPB_TIME
R_TLR0,0X00000000) XIo_Out32(XPAR_OPB_TIMER
_TCSR0,0X00000020) XIo_Out32(XPAR_OPB_TIMER
_TCSR0,0X00000080)
Void Stop_timer() cyclesXio_In32(XPAR_OPB
_TIMER_TCR0) Xio_Out32(XPAR_OPB_TIMER_TCSR0
,0X00000000)
30Introduction
A biological neuron system
The Dendrites (Greek, dendr /o tree) of a neuron
are its many short, branching fibers extending
from the cell body or soma. These fibers increase
the surface area available for receiving incoming
information.
The Synapse (Greek, syn union, association) is
the point of connection between two neurons or
between a neuron and a muscle or gland.
Electrochemical communication between neurons
takes place at these junctions. The synapse
consists of three elements 1) the presynaptic
membrane which is formed by the terminal button
of an axon, 2) the postsynaptic membrane which is
composed of a segment of dendrite or cell body,
and 3) the space between these two structures
which is called the synaptic cleft. Some cells in
the nervous system have as many as two hundred
thousand synaptic connections.
Axon is a singular fiber that carries information
away from the soma to the synaptic sites of other
neurons (dendrites and somas), muscles, or
glands.
31Background
Floating-point format
Arithmetic formats for implementing MLP-BP
32Background
Fixed-point format
Arithmetic formats for implementing MLP-BP
-4
33Background
Precision and Range
Arithmetic formats for implementing MLP-BP
If we only have 4 bits (?2) to represent a
positive real number
Floating-point
0 0 0 0
0 0 0 1
0
1
2
3
4
5
6
7
exponent
mantissa
Fixed-point
0 0 0 0
0 0 0 1
0
1
2
3
3.75
- Floating-point format has large dynamic range but
with varied precision - fixed-point format has limited range but its
precision is constant
34RC implementations
Floating-point adder
Arithmetic formats for implementing MLP-BP
OP1
OP2
READY
EXCEPTION_IN
3
5
3.15?10 2.14 ?10
5
3
2.14?10 3.15 ?10
PARAMETERIZED _COMPARATOR
5
5
2.14?10 0.0315 ?10
5
2.1715?10
SWAP
5
2.17?10
f1
e2
f2
s1
s2
e1
SHIFT_ADJUST
- Re-arrangement of input operands (SWAP)
- Pre-shift for mantissa alignment (SHIFT_ADJUST)
- Mantissa addition/subtraction (ADD_SUB)
- Post-shift of mantissa and increment of exponent
for result correction (CORRECTION)
ADD_SUB
s
e
f
clear
enable
CORRECTION
exception
EXCEPTION_OUT
RESULT
DONE
Pavle Belanovic Library of Parameterized Hardware
Modules for Floating-Point Arithmetic with An
Example Application M.S. Thesis, Dept of
Electrical and Computer Engineering,
Northeastern University, June 2002
35RC implementations
Floating-point multiplier
Arithmetic formats for implementing MLP-BP
EXCEPTION_IN
READY
OP1
OP2
5
3
3.15?10 ?2.14 ?10
f1
f2
e1
e2
5
3
2.14?10 ? 3.15 ?10
s1
s2
cout
8
6.741 ?10
exp_bit1
BIAS-1
-
0
DONE
EXCEPTION_OUT
RESULT
Pavle Belanovic Library of Parameterized Hardware
Modules for Floating-Point Arithmetic with An
Example Application M.S. Thesis, Dept of
Electrical and Computer Engineering,
Northeastern University, June 2002
36RC implementations
Fixed-point adder
Arithmetic formats for implementing MLP-BP
Fixed-point adder
OP1
OP2
m
m
Fixed-point adder
Carry Lookahead adder
Ripple carry adder
m1
OP2(m-1)
OP1(m-1)
OP2(1)
OP1(1)
OP2(0)
OP1(0)
OP2(m-1)
OP1(m-1)
OP2(1)
OP1(1)
OP2(0)
OP1(0)
0
0
From CLA logic
.
.
.
.
.
.
.
.
.
SUM(m)
SUM(m-1)
SUM(m-1)
SUM(1)
SUM(0)
SUM(1)
SUM(0)
SUM(m)
37RC implementations
Fixed-point multiplier
Arithmetic formats for implementing MLP-BP
OP1
OP2
READY
EXCEPTION_IN
m
m
A
B
Unsigned multiplier
PRODUCT
2?m
DONE
EXCEPTION_OUT
RESULT
38RC implementations
Fixed-point unsigned multiplier
Arithmetic formats for implementing MLP-BP
A
B
READY
A
B
READY
Extender (extend to double length of the input)
Extender (extend to double length of the input)
Shifter (shift LSB out at each cycle when
start/stop is 1)
start/stop
0
Controller
0
0
0
PRODUCT
PRODUCT
39ResultsDiscussion
Tested formats and implementation details
Arithmetic formats for implementing MLP-BP
40ResultsDiscussion
Comparison of various formats space requirements
Arithmetic formats for implementing MLP-BP
41ResultsDiscussion
Comparison of various formats space requirements
Arithmetic formats for implementing MLP-BP
Targeting Spartan-IIE FPGA
Targeting Virtex-II FPGA
42ResultsDiscussion
A Pure Hardware XOR ANN
Arithmetic formats for implementing MLP-BP
Cant be separated by a line! Not classifiable by
a single perceptron
1
0
1
0
43Contributions
A H/S Co-design Approach
- Investigation of different arithmetic formats for
implementing ANNs - Construction of a pure MicroBlaze system for face
recognition problem by using ANN technique - Construction of a MicroBlaze with dedicated
hardware system (H/S) for face recognition
problem by using ANN technique - Submission of a journal paper to the Canadian
Journal of Electrical and Computer Engineering
(CJECE)