Re-configurable Parallel Stream Processor with self-assembling and self-restorable micro-architecture - PowerPoint PPT Presentation

About This Presentation
Title:

Re-configurable Parallel Stream Processor with self-assembling and self-restorable micro-architecture

Description:

Re-configurable Parallel Stream Processor with self-assembling and self ... Each task requires high-speed data-stream processing ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 24
Provided by: valeriki
Category:

less

Transcript and Presenter's Notes

Title: Re-configurable Parallel Stream Processor with self-assembling and self-restorable micro-architecture


1
Re-configurable Parallel Stream Processor with
self-assembling and self-restorable
micro-architecture
Lev Kirischian, Irina Terterian, Pil Woo Chun
and Vadim Geurkov Embedded and Re-configurable
Systems Lab RYERSON University, CANADA
2
Example of Multi-task Data-Flow workload where
each task can run in different modes

Tasks
Task 4 Mode 1 Mode 3 Mode 4 Mode 7
Task 3
Task 2 Mode 1
Task 2 Mode 2
Task 1 Mode 1 Mode 2
Mode 3
Time
3
Usual Approach Conventional Processors with
Software-to-Task Optimization (Compilers OS)
  • Software-to-task optimization allows using
    conventional computing platforms with fixed
    architecture (Superscalar, VLIW, etc.) coupled
    with software compilers and OS.
  • Limitations of the conventional processors
  • If tasks are executed on sequential computing
    system processing time often cannot fit
    specification requirements
  • If tasks are executed on parallel computing
    system with fixed architecture
    cost-effectiveness of these parallel computers
    strongly depend on the tasks algorithm or data
    structure

4
Alternative Approach Application Specific
Processors (ASP) with Static Hardware-to-Task
Optimization
ASP allows reaching required cost-performance
parameters because ASP-architecture is optimized
on data-flow graph of the task and task data
structure
Limitations for the Application Specific
Processors
  1. Decrease of performance if task algorithm or data
    structure changes
  2. Limited possibility for further modernization
  3. High cost for multi-task or multi-mode custom
    computing systems

5
Proposed Approach Reconfigurable Processor with
Dynamic Architecture-to-Task Optimization
High-performance computing system for multi-task
data-flow applications should contain two major
components
1. Dynamically Re-configurable Computing Platform
based on partially-configurable FPGA devices to
provide maximum possible hardware flexibility.
2. Library of Application Specific Virtual
Processors (ASVP) configuration bit-streams to
program On-Chip Application Specific Processors
circuitry for the period of time while
Application (Task) is active.
6
Architecture of Partially Reconfigurable FPGA
devices (Xilinx Virtex Family)
Configuration Data Files
Internal Configuration SRAM
In
Out
I / O Frame
I / O Frame
CLBs Frame 1
CLBs Frame N
Block RAM
CLBs Frame i
Block RAM
Internal (Virtual BUS)
CLB - Configurable Logic Block - Uniform Logic
Element of a Frame, smallest individually
configurable component in the FPGA
7
Concept of Application Specific Virtual Processor
(ASVP)
Application Specific Virtual Processor (ASVP)
a group of logic resources dedicated and
optimally configured to reflect the algorithm and
data structure of the task. ASVP is presented in
a form of configuration data file (configuration
bit-stream) to be downloaded into the FPGA when
task should be activated
8
Life-cycle of Application Specific Virtual
Processor
1. ASVP-core downloads to the Reconfigurable
platform before task activation 2. ASVP
performs the task data processing as long as it
is necessary without interruption or time sharing
of dedicated logic resources with any other
task 3. After task completion all resources
included in the ASVP can be re-configured
for any other task.
9
ASVP Architecture-to-Task Optimization in
Partially Reconfigurable FPGA
FPGA Slots 1 2 3 ...
Data-Flow Graph
X O R X O R

FPGA
Virtual Hardware Component XOR
Data In
XOR
XOR

Input
Output
Data Out
Internal (Virtual) BUS
10
Micro-architecture of a Virtual Hardware Component
11
Virtual Hardware Component Virtual Bus
Interconnection
Virtual Bus
Virtual Hardware Component Boundary
12
Micro-architecture of Application Specific
Virtual Processor (ASVP)
Micro-architecture of ASVP is based on Virtual
Hardware Components interconnected via Virtual
Bus lines
13
Parallel Task Processing on the Dynamically
Re-configurable Stream Processor (DRSP)
Data out 2
Data out 3
Data in 2
ASVP 2
ASVP1 for Task 1
ASVP 3
Data out 1
I/O 3
I/O 4
I/O 1
I/O 2
Data in 1
FU 3
FU 2
FU 1
FU 4
RIM 1
RIM 2
RIM 3
RIM 4
Virtual Bus
14
DRSP System Level Architecture
Host PC
Data Stream Source
Task Memory Task 1AfixAmodes . Task
hAfixAmodes
PRCP-base
Reconfigurable Functional Unit Afix i
Cache Memory Amodes i
P C I - Bus
PCI-Interface Module
Configuration Data Bus
RT-HOS
Data Out
15
Architecture of Reconfigurable Computing Module
SPI
2 x 3.43 Gbit / S (12 bit300 MHz) Input LVDS
ports
8.12 Gbit /S LVTTL BUS (64 bit x 133MHz)
Real-Time Hardware Operating System Based on
XCV50E Vertex FPGA
Reconfig. Functional Unit RFM 0111-002
PCI Inter face 800 Mbit/S
Config.Files / Data Cache (4x512KB)
SPI
2 x 3.43 Gbit / S (12 bit300 MHz) Output LVDS
Ports
16
Reconfigurable Computing Module based on Xilinx
Virtex-E family of FPGA Devices
17
Restoration of ASVP using spare CLB-column
Column 1 2 3 ...
If hardware fault occurs the damaged Virtual
Hardware Component can be relocated to the
reserved CLB-column.
X O R X O R

AP i

Input
Output
Communication Field
18
When the proposed technology is most beneficial?
  • Workload consists of many tasks, where each task
    can run in different modes.
  • Each task requires high-speed data-stream
    processing
  • Task algorithms may be modified within life cycle
    of a system
  • Active tasks must run in parallel and should not
    be interrupted in any case when one of the tasks
    switches its mode or terminates.
  • System can be remotely or self-restored even if
    some hardware fault occurs

19
DRSP Application for Networked Intelligent
Manufacturing Systems
High performance parallel data-stream
processing (up to thousands of billions
operations / sec.) of big volume of data (up to
hundreds of Giga bits) for a)
Complex image processing and image recognition,
b) Spectrum analysis and digital signal
processing, c) Data transmission via LAN with
data compression / decompression and encryption /
decryption, d) Control of high performance
manufacturing equipment and robotic systems.
20
Acceleration of Task / Mode Switching
Acceleration of task or mode switching comparing
with Entire FPGA-based system increases when
number of CLB-columns in ASVP is minimal and can
be over that 20 times faster
21
Minimization of Hardware Resources
Minimization of Logic resources in DRSP approach
Comparing with entire FPGA-based systems
Modes Tasks 2 4 8 16
4 2.8 4.4 7.6 14
8 5.6 8.8 15.2 28
16 11.2 17.6 30.4 56
When number of tasks and task modes increases in
a workload, respectively increases the
cost-effectiveness of DRSP
22
SUMMARY RDSP Comparing with Conventional CPU,
DSP or ASP Platforms

DRSP Conv. CPU DSP
ASP
Performance Flexibility Reliability
Lower than DRSP Much lower than DRSP
Much lower than DRSP
Somewhat higher
None, or very little
Lower than DRSP
Much lower than DRSP
Much lower than DRSP
Lower than DRSP
23
Thank you
Write a Comment
User Comments (0)
About PowerShow.com