Title: Object Tracking
1Object Tracking
2Tracking
- Tracking is the task of estimating the trajectory
of an object in the image plane as it moves
around a scene - It is important in the field of computer vision
and artificial intelligence - Interest generated due to high powered computers,
high quality and inexpensive video cameras and
need for automated video analysis
3Applications of Tracking
- Motion-based recognition
- Automated surveillance
- Video indexing
- Human-computer interaction
- Traffic monitoring
- Vehicle navigation
- Automatic target recognition in military domain
- Visual inspection in industries
4Classification
- Tracking is broadly divided into Point Tracking,
Kernel Tracking and Silhouette Tracking
5Difficulties Faced
- Tracking is highly computation intensive
- Algorithms define multiple correlations,
convolutions and other complex operations - These operations are very difficult to perform on
a microprocessor - Microprocessors are serial in nature, but most
operations are inherently parallel
6Possible Solutions
- Implement the parallel operations in hardware
using ASICs - Not flexible
- Cannot change the template object
- Difficult to modify parameters such as size and
shape of object being tracked - Implement using FPGAs and reconfigurable
computing - Provides a tradeoff between speed of hardware and
flexibility of software
7Classical Image tracking system
- Detection is a decision making process
- Tracking involves associating discreet detections
over time to form a track path - Recognition uses the results of detection and
tracking to classify the object
8Template Matching
- It is an object detection technique used to find
an object in a search image - Correlate the template image with the scene image
and find the location where the result is minimum - Software solutions are flexible but is too slow
9Paper 1Reconfigurable Shape Adaptive Template
Matching
- Jörn Gause, Peter Y. K. Cheung, Wayne Luk
- Imperial College, London
- IEEE Symposium on Field Programmable Custom
Computing Machines - 2002
10Objective
- Reconfigurable strategies for Shape-Adaptive
Template Matching to detect arbitrarily shaped
objects in images/video frames - Static Design
- Template is stored on off-chip memory
- Partially Dynamic Design
- Template is stored in on-chip memory, allowing
reconfiguration - Dynamic Design
- Configuration data is completely adapted to shape
and size of the template
11Purpose
- Algorithm is truly object oriented, i.e. it
depends only on the template used - Software solutions may provide the flexibility
but are too slow for real-time video processing - ASIC implementation is not practical due to
infinite number of sizes of the template - Thus, a reconfigurable architecture is proposed
to implement a fast and flexible SA-TM design
12Prior Work and their Conclusions
- Work has been performed using dynamic
reconfiguration leading to acceleration and
effective logic capacity usage - Computer vision algorithms
- not shape adaptive
- applied to small images (512 X 512)
- Automatic Target Recognition
- Binary templates (16 X 16) and small image (128 X
128) - Run-time reconfiguration decreases area-execution
time product if the search image is large enough - Dynamic Reconfiguration suitable to
shape-adaptive algorithms if reconfiguration
overhead is small
13Shape Adaptive Template Matching
- Aim
- To find a template object of arbitrary shape and
size within a search image or video frame of any
size using a reconfigurable computing
architecture - Search Image consists of WH pixels
- The template consists of p opaque pixels and can
have any shape. It is bounded by a box of size
wh - In the bounding box, each pixel has one mask bit
1 if the pixel belongs to the object and 0
otherwise
14Shape Adaptive Template Matching
- Template is shifted over the image over
(W-w1)(H-h1) locations - Sum of Absolute Distances over luminance pixel
values is chosen as the comparison metric - The match is found when SAD(y,x) is minimum and
smaller than a certain threshold
15Systolic Array for SA-TM
- Data appears in a horizontal raster scan fashion
- The AD computations for a position along with the
clock cycle and the SAD is shown
16Systolic Array for SA-TM
- A signal flow graph representing the previous
example is shown below - The node lti,jgt represents I(y,x) T(i,j)
- The pixel values I(x,y) are broadcasted
sequentially to all PEs and all computations
parallel - At the end of 42 clock cycles, all 20 SAD values
will be computed
17Structure of PE for SA-TM
- The following general systolic array, adapted to
the shape of the template object is presented - Each pixel belonging to a template is represented
by a PE - The template pixel value is stored in the ROM
within the PE
18Structure of PE for SA-TM
- Size of Sum_in and Sum_out depend on the position
of PE in the SFG - N max(m,c)
- Max. distance in one PE is 2c 1
- Max. intermediate sum of k of the ADs is (2c
1)k - This requires bits
19Area Calculations
- Area of PE contains a
constant part (AD) and a
variable part which
grows with n - where a and b are constants
- The total area to implement p PEs is given by
20Area Calculations
- The area then simplifies
to - where
21Further Area Calculations
- Registers are required to delay the intermediate
sums - PEs and registers are arranged according to the
mask of the template - After each line of PEs, W-w shift registers are
needed
22Further Area Calculations
- wh-p pixels require shift registers-
- W-w pixels require shift registers-
23Summary of Structure
- p PEs are required to for AD computations and
summation of intermediate results - Arrangement of PEs in the same way as the pixels
of the template - wh-p gaps represent transparent pixels filled
with registers - W-w shift registers are required in each but the
last row to store intermediate sums - The size of the kth adder where 1k p is given
by
24Reconfigurable design strategiesDYNAMIC DESIGN
- Reconfigured for every possible template size and
shape and search frame size - The template is a part of the configuration data
and word lengths can be optimized - One input to the AD module is constant, it can be
replaced by a look-up table, which stores the AD
value for each I(y,x) - p PEs and wh p (W-w)(h-1) registers are
required
25Reconfigurable design strategiesDYNAMIC DESIGN
- The area required is
- The total execution time TD consists of the
computation time, the reconfiguration time and
the compilation time - The execution time for N frames is
-
26Reconfigurable design strategiesSTATIC DESIGN
- Dynamic design useful when the template is
searched for in a large number of video frames of
the same size - In static design, FPGA configuration is not
changed when a new template is used - Number of search frame sizes and template shapes
and sizes is unlimited, only a subset of all
solutions are implemented
27Reconfigurable design strategiesSTATIC DESIGN
- If the search frame size is fixed, the following
PE structure is used - Template pixel values T(i,j) come from external
memory and a multiplexer is used to determine if
either addition or delay is performed
28Reconfigurable design strategiesSTATIC DESIGN
- The area for the static design is
- The execution time is given by
- Advantages No recompilation of the design code
or reconfiguration of the device - Disadvantages
- large external RAM, which stores template pixels
and mask bits, makes the design slower - For large frame sizes, the number of I/O pins
required is extremely large
29Reconfigurable design strategiesPARTIALLY
DYNAMIC DESIGN
- Combines the advantages of both static and
dynamic design - Template pixels and mask bits are stored in
on-chip memory - To change the template, only a reconfiguration of
memory parts is required
30Reconfigurable design strategiesPARTIALLY
DYNAMIC DESIGN
- Where tbit is the time needed to reconfigure 1 bit
31FPGA Implementation and Results
- The PEs for the three reconfigurable designs have
been implemented for different values of output
word length n and c8 on a Xilinx Virtex XCV1000E - Using these results, the constant values a and b
for each design is determined
32Results for a small example
- For a template where wh3, p8, W7, H6, the
following results are obtained - From the first two rows, it can be seen that the
calculated and the measured values are almost
equal
33Results for HDTV format
- W1920, H1080, frame rate 30Hz
- Area of dynamic design is 34 smaller than static
and 16 smaller than partially dynamic designs
34Results for HDTV format
- Area requirement for dynamic design (same p) but
different shapes, (w/h) is shown below
35Results for HDTV format
- Total execution times T required for different
number of frames and different techniques
36Speed-up Achieved
- Comparison with software (1.4GHz Pentium 4 PC )
for HDTV frame format-1 frame
37Conclusion
- Number of logic cells required for static and
partially dynamic design is constant for a frame
size - The dynamic design leads to significant savings
in area - Static design is suitable for an operation on one
or only a few frames - Partial and fully dynamic designs perform well if
matching is done on a large number of frames
38Paper 2FPGA-based Template Matching using
Distance Transforms
- S. Hezel, A. Kugel, R. Männer, D. M. Gavrila
- IEEE Symposium on Field Programmable Custom
Computing Machines - 2002
39Objective
- A high performance FPGA solution for generic
shape-based object detection - To present a step by step implementation of
components of object detection systems - Template matching performed with distance
transforms - Method is robust to missing or partially
incorrect data - Employing highly parallel pipelines, high
speed-up can be achieved in comparison to
sequential machines - Matching is done for many binary templates
concurrently using several distance transformed
images
40Method Followed
- Target object represented by binary templates,
containing positional and edge information - Scene image is preprocessed by edge segmentation,
edge cleaning and distance transforms - Matching involves correlating the templates with
the distance-transformed scene image - Locations where the mismatch is below a
user-defined threshold gives the object location
41Hardware Used
- FPGA implementation target PCI based FPGA
co-processors - Final implementation was carried out on a RACE-1
coprocessor - XILINX Virtex-2 FPGA (XC2V3000)
- Four 36-bit wide 133MHz SRAM banks
- 64 bit, 66MHz PCI
42Matching Algorithm using Distance Transforms
- The distance transforms converts a binary image
consisting of feature and non-feature pixels into
an image where each pixel denotes the distance to
the nearest featured pixel
43Matching with Distance Transforms
- It involves 2 binary images
- Segmented/Feature template T
- Segmented/Feature image I
- On and off pixels denote the presence and absence
of a feature - Actual features dont matter, and only edge
points are used - Feature template is given offline
- Feature image is derived by feature extraction
44Matching with Distance Transforms
- The template T is translated and positioned over
the DT image of I - Measure D(T,I) determined by pixel values of the
DT image which lie under the on pixels of the
template - The lower the distance, the better the match
- One measure for distance is the chamfer distance
- Where T is the number of features in T
45Matching using Distance Transforms
- A template is considered as matched at locations
where D(T,I) lt ? - The advantage of matching
a template with the DT
image is that it provides a
smoother similarity
measure
a) Original image b) Template c) Edge image d)
DT image
46Matching Algorithm Components
- The matching algorithm contains the following
components - Edge detection
- Edge noise removal
- Computation of the distance transform
- Correlation between the template and DT image
- Calculation of the distance transform is a two
stage process
47Preprocessing ArchitectureEDGE DETECTION
- Sobel Operators for edge detection
- Mask is fixed and image is transformed under the
mask, line by line - Two lines of the original image is copied into
the FPGA RAM - Calculations are done in parallel using two
pipelined Aus - A new pixel is fed into the shift register every
clock cycle - If SX SY gt threshold, then pixel is a feature
- Discrete orientations are determined in parallel
48Morphological Cleaning
- Aim is to remove noise in the binary edge image
- Three or less connected pixels are considered as
noise - Cleaning module is built as a pipeline with a
logic unit that has parallel access to all
relevant pixels - The LU detects in parallel all possible
combinations of three or less connected pixels
49Distance Transformation
- The chamfer metric is used for distance
- Two-step process
- 1st step edge detection, morphological cleaning
and the forward distance transformation - A non-symmetric forward and backward mask is
present to calculate distance - Image translated under this mask, first in
forward and then in backward direction - All 8 directional images are processed in
parallel - Results clipped to 4 bits so all directions of a
pixel can be stored in a single word in memory
50Control and Resources
Pipeline with forward transformation
Pipeline with backward transformation
51Resource Requirements
- The resource utilization of the two pipelines for
images of size 512X512 and 8-bit input data is
given
52Architecture of Template Matching
- A pipelined parallel approach is made use of
- Relevant data of all multiple templates are
stored in shift register arrays and correlation
of all templates are carried out simultaneously - Depending on the number, size and shape of the
templates, varying FPGA resources are used
53Parallel Pipelined Matching
- To calculate the correlation of one template, the
following summations must be performed - The pixels of one DT image corresponding to the
template pixels have to be added - This is done 8 times, for each DT image
- The intermediate sum of these 8 sums is
calculated - For N templates, this has to be done N times
54Parallel Pipelined Matching
- Correlations of all templates carried out
simultaneously - Each DT image has its one SRA
- 8 SRAs are required for 8 DT images
- For each template, one adder tree which has
access to all SRA is assigned - The calculation strategy is similar to Sobel
calculations
55Parallel Pipelined Matching
- Each SRA differs in its extension depending on
the shape of the templates
56Control
- The SRAs can be filled with DT pixels such that
each SRA receives one input data every clock
cycle - The data is resorted before storing it in the SRA
57Control
- Filling the pipeline
- Fill the SRA which has the biggest extension
- Other SRAs are filled simultaneously
- Each SRA is filled with the correct DT pixels
- The pipeline is never stalled and all registers
can be clock enabled - After the pipeline is filled, no results of
possible matched templates are stored - The verification of the results is conducted on
the PC
58Results
- Results for Placement and Routing for
Preprocessing (PP) and Template Matching (TM) are
shown above - A speed-up to 200 was achieved in comparison to
software implementation on Pentium III 500MHz
processor