Image Processing With FPGAs - PowerPoint PPT Presentation

About This Presentation
Title:

Image Processing With FPGAs

Description:

Shorter design cycles than ASICs. Well suited for implementing ... Capable of interfacing with a wide range of external devices such as memory or ASICs. ... – PowerPoint PPT presentation

Number of Views:1916
Avg rating:3.0/5.0
Slides: 51
Provided by: annEc6
Category:

less

Transcript and Presenter's Notes

Title: Image Processing With FPGAs


1
Image Processing With FPGAs
  • Zach Fuchs
  • Sarit Patel
  • EEL6935
  • 14 April 2008

2
FPGA-Based Configurable Systolic Architecture for
Window-Based Image Processing
  • Authors
  • César Torres-Huitzil
  • Miguel Arias-Estrada

3
Introduction
  • Image processing is a fundamental step in modern
    machine vision systems.
  • Many complex algorithms use lower level results
    to pursue higher level goals.
  • e.g. edge detection to determine object
  • Real time performance in video applications is
    usually required.

4
Difficulty Building Systems
  • Most computer vision applications are
    computationally intensive
  • Sequential nature of conventional processors slow
    down performance
  • Different computations in processing limits
    parallelization
  • Real time performance is required

5
Sample Applications
  • Robotics
  • Multimedia
  • Virtual reality
  • Industrial inspection
  • Medical engineering
  • Autonomous navigation

6
Goals of Paper
  • Design 2D systolic architecture for window-based
    image processing
  • Consider design issues
  • Flexibility
  • Silicon area
  • Power consumption
  • Performance
  • Area

7
Window-Based Image Processing
  • Large number of repetitive neighbor operations
    over image data
  • Area of w x w pixels extracted from image
  • Transformed according to window mask and
    mathematical functions
  • Produce single, new output according to transform

8
Windows-Based Image Processing
2
1
3
9
Window-Based Operators
  • Same scalar function applied on a pixel by pixel
    basis
  • Scalar functions
  • e.g. relational, arithmetic, logical, look up
    tables
  • Reduction functions
  • Reduce window of results from scalar function to
    one output
  • e.g. accumulation, maximum, absolute value

10
Computational Requirements
  • Window-based operations are computationally
    expensive tasks
  • Focusing on convolution
  • Convolution - the amount of overlap between f and
    a reversed and translated version of g
  • In general, complexity O(w2 x M x N)
  • w x w window mask
  • M x N image

11
Data Transfer Rate
  • Must transfer data between image acquisition
    module, memory, and processor
  • Input Data Transfer Rate
  • Output Data Transfer Rate
  • b of bits per pixel
  • fF processing rate of images per second
  • Requires efficient use of communication bandwidth
    and parallel processing

12
Implementation Technology FPGA
  • Provides massive parallel structures and high
    density for logic arithmetic
  • Tasks implemented by spatially rather than
    temporally
  • Possible to control at bit level to build
    specialized data paths
  • Offer more raw computational power compared to
    conventional processors
  • Shorter design cycles than ASICs
  • Well suited for implementing parallel
    architectures.

13
Memory Accesses
  • Gap between processor speed and memory access
    speed
  • Memory access overhead critical issue
  • Window-based operations are memory intensive
    require new pixel in each step
  • High potential for parallelism since independent
    operations are applied to large regions of image
    arrays

14
Memory Accesses
  • Pixels might not be stored as neighboring
    elements
  • Parallelism is hidden
  • Windows usually overlap with neighboring windows
  • Must create vectors of data elements and process
    them using parallel vectorization techniques.

15
Overlapping Windows
  • Three windows shown shaded box indicates
    overlapping data.

16
Overlapping Windows
  • Some pixels can be used in computation of all
    three windows
  • Reduce memory accesses for those pixels by a
    factor of 3
  • Large number of windows means less overlap
  • Must compromise between data overlap and window
    count

17
Data Parallelism
  • Can be combined with loop unrolling to diminish
    memory accesses for sequential accesses
  • Process one window, then slide to the right and
    process next
  • Unroll this loop so more windows are computed in
    parallel
  • Authors use vertical unrolling
  • Can apply to horizontal unrolling equally

18
Data Parallelism
  • Number of pixels read per column is directly
    dependent on number of rows processed in parallel
  • Number of pixels read w NR 1
  • w windows mask length/width
  • NR rows processed
  • Number of Memory Accesses (MxN Image)

19
Data Parallelism
20
Systolic Architecture
  • Configurable Window Processor (CWP)
  • Processing element in systolic arch.
  • Architecture reads data from input memory
  • P image pixel
  • W window mask coefficients
  • Transmitted to array of processing elements for
    computation

21
Array of CWPs
  • LDC Local data collector
  • Collects results of CWPs
  • CWP
  • Compute a window operator on same column of input
    image
  • D Delay line / shift register
  • Used for synchronization purposes

22
Architecture Flow
  • Pixel is broadcast to all CWPs
  • At each clock cycle
  • Each CWP receives a different window coefficient
  • New image pixel for all processing elements
  • Each CWP multiplies and accumulates values until
    all pixels in a window are processed
  • After short latency, the LDC will collect the
    data and send it to output memory

23
CWP
  • AP Arithmetic Processor (ALU)
  • Multiplies
  • LRM Local Reduction Module
  • Accumulator
  • Pc Result of window operation
  • Wd delayed window coefficient

24
Systolic Architecture
25
Processing Time
  • Latency
  • Time required to start pipeline operation
  • Measured between activation of first CWP to last
    CWP
  • Parallel processing time
  • Time when all CWPs are working in parallel
  • Addition of all times to process set of rows
  • Performance compromised with number of rows
    processed
  • Directly reflects silicon resources allocated to
    architecture

26
Throughput
  • Number of elemental operations system can perform
    per second
  • Only scalar function and local reduction function
    are considered

27
Implementation
  • Fully parameterizable VHDL description
  • Use generics to make design flexible
  • Structural description used only elementary logic
    operations
  • Design is platform, version, technology, and
    tool independent
  • Used XCV2000E-6 VirtexE FPGA w/ 2 Million Gates

28
FPGA Technical Data
29
Performance Results
  • I/O time not considered in results
  • 512x512 Image w/ 7x7 Window Mask

30
Performance Results
  • Image processing time for 7x7 window mask is 8.35
    ms
  • Leaves enough time for image acquisition
  • 30ms required for real-time constraints
  • Post-processing also possible

31
Performance Results
  • Throughput increases with number of processing
    elements
  • Utilization and activity efficiency of processing
    elements decrease

32
Improving Performance
  • Optimize design mapped on the FPGA
  • Apply timing restrictions for increased speed
  • Use better FPGA
  • Note that performance requirement for real-time
    operation is still met with lower FPGA

33
Comparisons to Other Architectures
34
Area/Performance Tradeoffs
  • Low resource utilization allows implementation in
    compact mobile apps
  • High computational density due to small area
    usage
  • Can reduce hardware or clock frequency
  • Reduces power
  • Still meets timing requirements

35
Reconfigurability
  • Flexible enough to support different window-based
    image operators
  • Allows different image-based applications on a
    SoC

36
Conclusion
  • Easy to exploit SIMD for parallelism in image
    processing
  • FPGAs allow reconfigurability and flexibility
  • Real-time constraints can be met with high
    performance and low area usage
  • All Images and Graphs from
  • Torres-Huitzil, Cesar, and Miguel Arias-Estrada.
    "FPGA-Based Configurable Systolic Architecture
    for Window-Based Image Processing." EURASIP
    Journal on Applied Signal Processing 7(2005)
    1024-1034.

37
Hardware, Design and Implementation Issues on a
FPGA-Based Smart Camera
  • Fabio Dias, Francois Berry, Jocelyn Serot,
    Francois Marmoiton

38
Summary of Paper
  • Describe the hardware architecture of a
    FPGA-based Smart Camera research platform and
    some of the hardware design issues.
  • Propose a architectural design methodology based
    on pre-programmed processing elements.
  • Provide a low level image processing example.
  • Present an embedded tracking application to show
    the cameras utilization.

39
What is a Smart Camera?
  • Smart cameras utilize embedded processing to
    relieve some of the low level computational
    burden of the interfacing system.
  • Reduce communication flow and overhead.
  • Processing resources consist of FPGA devices,
    medi/streaming processors, DSPs, etc.

40
Why FPGA devices?
  • Reconfigurability
  • Allows the camera to adapt to a wide range of
    applications.
  • Parallelism
  • Take advantage of independence of many
    computational tasks in order to meet time
    restraints.
  • Hardware Flexibility
  • Capable of interfacing with a wide range of
    external devices such as memory or ASICs.

41
Smart Camera Hardware Architecture
  • ALTERA Stratix EP1S60F1020C7
  • 4Mpixels LUPA-400 image sensor
  • (2) 2d accelerometers
  • (3) gyroscopes
  • 10Mb SRAM
  • 64Mb SDRAM

42
Smart Camera Hardware Architecture
43
Design Methodology
  • Centralized around reconfiguration of the FPGA.
  • Set of Pre-designed configurable data processing
    elements (PEs).
  • Programmable Control Module
  • System supervisor, communicating with the PEs
    through registers and hand-shake signals
  • Configures and synchronizes different PEs

44
Design Methodology
Schematic of a SoPC architecture illustrating the
proposed methodological approach.
45
Generic Window-Based Processing Element
  • Applied over a small defined over a small defined
    portion of the input image.
  • Deal with large amounts of data because they are
    often applied over the entire image.
  • Examples
  • Convolution
  • Correlation estimation
  • Morphological transformations

46
Generic Window-BasedProcessing Element
47
Smart Camera Application
  • Template Tracking System
  • VGA images sent to host computer to be displayed.
  • The user selects frame of interest for tracking.
  • A search window is acquired and stored into
    memory.
  • A sliding window SAD algorithm is applied.
  • The portion with the best correlation score is
    considered the as being the new template
    location.
  • A null acceleration model is employed in order to
    predict displacement in the next frame.

48
Smart Camera Application
Embedded tracking implemented architecture
49
Experimental Results
50
Conclusion
  • Generic window-based processing element
    successfully implemented in an FPGA.
  • An image tracking algorithm utilizing the
    described design methodology successfully
    implemented with adequate performance.
  • A flexible FPGA base smart camera research
    platform created for future research.
  • All Images and Graphs from
  • Dias, Fabio, Francois Berry, Jocelyn Serot, and
    Francois Marmoiton, "HARDWARE, DESIGN AND
    IMPLEMENTATION ISSUES ON A FPGA-BASED SMART
    CAMERA." IEEE 1-4244-1354-0/07(2007) 20-26.
Write a Comment
User Comments (0)
About PowerShow.com