Title: FPGA Partial Reconfiguration
1FPGA Partial Reconfiguration
- Presented by Abelardo Jara-Berrocal
- HCS Research Laboratory
- College of Engineering
- University of Florida
April 10th, 2009
2Outline
- Introduction
- Partial Reconfiguration (PR) Overview
- Proposed Design Methodologies
- Framework analysis
- F4 Virtual Architecture for Partial
Reconfiguration and Design Automation for PR
Design
3Introduction Fully reconfigurable systems
Battery
FPGA
Config 1
Configuration lines
disabled
disabled
enabled
System controller
General purpose I/O
Config 2
enabled
disabled
Bitstreams storage
disabled
Required design
Shared memory
External I/O
Config 3
Config 1 Request
Config 2 Request
1. Device too small for complex designs
2. Big full bitstreams (long reconfiguration time)
3. Complete system operation is halted prior to
reconfiguration
Design station
4Introduction Modular Reconfiguration
- Types of Modular Dynamic Reconfiguration
- Static Partial Reconfiguration Reconfiguring a
portion of the device (changing the
functionality) when the device is inactive
without affecting other areas of the device - Dynamic Partial Reconfiguration (PDR)
Reconfiguring a portion of the device while the
remaining design is still active and operating
without affecting the remaining portion of the
device. - Virtex 4 and Virtex 5 devices support DPR
)
Reconfigurable region 1
Reconfigurable region 2
5Partial Reconfiguration
- Partial Reconfiguration is useful for systems
with multiple functions that can time-share the
same FPGA resources. - TERMINOLOGY
- Reconfigurable Region (PRR)
- Reconfigurable Module (PRM)
- Static Logic
- Bus Macro
- Partial Bitstream
- Merged Bitstream
-
6Introduction A sample PR architecture
Battery
FPGA
disabled
enabled
JTAG
Base system configuration
Bitstreams storage
enabled
External I/O
Reconfigurable area
Static area
Module A request
1. System controller does not need to be placed
in an external device
2. Access to fast Internal Configuration Access
Port (ICAP 32 bits, 100 MHz)
3. Smaller partial bitstreams
4. No need to halt complete system when
reconfiguring a module
5. Time multiplexing of FPGA resources, load and
unload HW modules on demand
7Medium for Partial Reconfiguration
- External JTAG, UART (RS232)
- Internal ICAP
- ICAP (Internal Configuration Access Port)
- Self-Reconfiguration controlled by soft-processor
- Internal read and write access to configuration
logic - Faster
- HWICAP (provided by Xilinx)
- Wraps the ICAP with additional logic to read and
write frames to BRAM - Slave to PLB (Processor Peripheral Bus)
- 100MHz, 32 bits
8Additional considerations
- General benefits from PDR
- Saves space on the FPGA
- Less time to change only a part of design
- Reduction of power dissipation by storing
functionality to external memory - Smaller FPGAs can be used to run an application
- Architecture adaptation
- Architecture adaptability
- Main advantage, system can modify its internal
modules based two schemes - Data-Driven Characteristics of input data
changes at the runtime - Artificial intelligence, Evolutionary
architectures, Adaptive Signal Processing - Situation-Driven System load/unload modules to
adapt to environment conditions - Adaptive Fault tolerance, intelligent management
of system resources
9Bus Macros
- Bus Macros Means of communication between PRMs
and static design - All connections between PRMs and static design
must pass through a bus macro with the exception
of a clock signal - Type of Bus Macros
- Tri-state buffer (TBUF) based bus macros
- Slice-based (or LUT-based) bus macros
- Advantage of slice-based bus macros
- No signals lines should cross the border in
partial reconfiguration - TBUFs will ignore the boundaries
- Slice-based signals not crossing boundaries
10LUT-based Slice Macros
11Introduction Current PR Design Flow
- Steps
- Partition the system into modules
- Define static modules and reconfigurable modules
- Decide the number of PR regions (PRRs)
- Decide PRR sizes, shapes and locations
- Map modules to PRRs
- Define PRR interfaces, instantiate slice macros
for PRR interfaces - Many manual steps
- Design partitioning
- Number of PRRs
- PRR sizes, shapes and locations
- Mapping PRMs to PRRs
- Type and placement of PRR interfaces
Design partitioning
Design floorplanning and budgeting
Static modules
Reconfigurable Modules (PRMs)
FPGA
Static region
2
of PRRs?
1
12Introduction Early Access PR Design Flow
- Introduced by Xilinx in FPL06
- Major improvements
- Automatic implementation scripts
- Rectangular regions (not full column
reconfiguration) - Static nets can cross reconfigurable regions
- Slice macros replace bus macros
- Partitioning and floorplanning steps are manually
executed - Design guidelines for these steps are not
provided
Placement and PRRs constraints
Reconfigurable design specifications
PRM Bitstreams
Xilinx PR Implementation Flow
Design floorplanning and budgeting
Design partitioning
(manual)
Full Initial Bistream
(automatic)
Potential for development of automatic CAD tools
13Introduction Current PR design tools limitations
- PR design is a very specialized task
- Only a physical level of support is provided
- Architectural knowledge of the target device is a
must - Not very flexible, many design constraints
- Partitioning and floorplanning steps are manually
executed - No performance sensitive design guidelines are
provided - No automatic heuristics based design flow is
available too - Lack of abstraction from low level details
14PR Overview Taxonomy of PR systems design flows
PR Designs
Multipurpose
Special purpose
- Highly specialized systems design
- All PRMs that will exist on the system are known
at design time - Each PRR is independently optimized (size, shape,
location, interface) based on the PRMs that will
be mapped to it - Output is
- Floorplan defining a static region and a set of
optimized PRRs - The set of PRMs that can be placed in each PRR
(PRMs to PRRs mapping)
- Not optimized for a specific application
- PRMs required by the application are not known
when designing the base system - Goal is to design a flexible and reusable base
design that can be used for several different PR
systems - Base system designer defines a set of PRRs with
fixed shapes, sizes, locations and interfaces - Generated floorplan is used as input template for
the PRMs implementation
15PRR Geometries
- PR system design flows require
- Proper metrics for PRR performance analysis
- Design guidelines for efficient PRR floorplanning
- Study of the effects of varying PRR shape over
- Maximum Clock Frequency
- Partial Bitstream Size
- Five separate test cores
- Beamforming (DSP/slice)
- CFAR (slice/memory)
- AES (register)
- Performed on V4SX55 thus far
Aspect ratio PRR Height / PRR Width
16Framework analysis Beamforming (125 MHz, 40)
- 5022 slices
- 16 DSP48s
- 17 RAMB16s
- Baseline, non-PR performance 1614 kB, 127.845
MHz
Clock frequency (MHz)
Bitstream size (kB)
Aspect ratio
Aspect ratio
17Framework analysis CFAR (100 MHz, 16)
- 2610 slices
- 2 DSP48s
- 34 RAMB16s
- Baseline, non-PR performance 1001 kB, 103.616
MHz
Clock frequency (MHz)
Bitstream size (kB)
Aspect ratio
Aspect ratio
18Framework analysis AES (80 MHz, 13.75)
- 3634 slices
- 3943 registers
- 4 RAMB16s
- Baseline, non-PR performance 1393 kB, 80.483
MHz
Bitstream size (kB)
Clock frequency (MHz)
Aspect ratio
Aspect ratio
19F4 Virtual Architecture and Design Automation
for Partial Reconfiguration
- Dr. Ann Gordon-Ross
- Dr. Alan D. George
- UF ECE Faculty
Abelardo Jara Shaon Yousuft Rohit Kumar Terence
Frederick CHREC Students
20Approach
- Task 3 Bitstream Relocation
- Port Bit Reloc to Microblaze
- Context save and restore for PRMs
PR for Application Designers
- Task 2 PR Design Flow Automation
- Framework to model and design PR systems
- Identification of points in Xilinx PR Design Flow
amenable for automation - Software tools (C/C programs/scripts) for
automatable steps
- Task 1 VA for PR Adaptive Embedded Systems
- SCORES Inter-module Communication Architecture
- VAPRES Multipurpose Base Embedded Platform
- Initial Research on fast algorithms for online
PRMs placement and scheduling
20
21Background VA for Adaptive PR Embedded Systems
- Multi-purpose base system platform to build
runtime-adaptive HW processing embedded systems - Architectural support for on-demand HW module
loading/unloading - HW modules can offer better performance than SW
modules - Exploit increased parallelism
- Main bottleneck
- Inter-module communication flows through
centralized controller - Can be alleviated by adding custom inter-module
communication architecture - VA benefits
- Adaptive base system platform
- Response to environmental changes
- HW/SW partitioned applications
- Time-shared virtual resources enable larger
available area for system operations - Improved system resource utilization
- Case study application PR for Mobile Agents
Target A
Target B
Adaptive embedded system at each processing node
Type A target
Type B target
External memory
Type A module
Type A module
Free slot
Controller and peripherals
SCORES
Type B module
VAPRES
21
22VAPRES
- (Virtual Architecture for Partially
Reconfigurable Adaptive Embedded Systems)
Microblaze
USB
Shared memory
Network (other VAPRES nodes)
Fast Simplex Link (FSL)
UART
PLB Bus
Flash controller
PRR1
PRR2
PRR3
PRR4
PRM A
BUFR
ICAP
Network
Interface
Interface
Interface
Interface
Switch
Network-on-chip (SCORES)
- VAPRES Motivations/Benefits
- Embedded base architecture for multi-purpose PR
systems - Facilitates dynamic HW modules placement and
scheduling - Provides dynamic module frequency scaling
- Computing power can be distributed among
VAPRES-based nodes
- VAPRES Architectural Components
- Partially Reconfigurable Regions (PRRs)
- Independently clocked using BUFRs
- PR modules (PRMs) can span multiple PRRs
- Controlling agent (Microblaze)
- Dynamic module placement and scheduling
- Module control and context save/restore
- Partial reconfiguration through ICAP
- Communication with other VAPRES nodes
22
22
23Background Current Application PR Design Flow
- PR is a very powerful feature of Xilinx FPGAs,
but requires specialized skills
- Manual steps
- Partition the application into modules
- Define static modules and partially
reconfigurable modules (PRMs) - Determine the number of PR regions (PRRs)
- Determine PRR sizes, shapes, and locations
(resource allocation) - Map PRMs to PRRs
- Define PRR interfaces and instantiate slice
macros for PRR interfaces - Automatiable points and optimization problems
(design-time) - Design partitioning
- Number of PRRs
- PRR sizes, shapes, and locations
- Mapping PRMs to PRRs
- Type and placement of PRR interfaces
- Reconfiguration schedule
Design partitioning
Design floorplanning and budgeting
Static modules
Reconfigurable Modules (PRMs)
of PRRs?
2
1
FPGA
Static region
Potential for automation through C/C programs
or scripts
23
24Questions