Concurrent Engineering - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Concurrent Engineering

Description:

... of bits which represents the burn-in configuration of the Hardware Block (HB) eg. ... PLDs are soft wired for re-use of static hardware resources. Cost effective ... – PowerPoint PPT presentation

Number of Views:670
Avg rating:3.0/5.0
Slides: 48
Provided by: voicu1
Category:

less

Transcript and Presenter's Notes

Title: Concurrent Engineering


1
Hardware/Software Codesign of Embedded Systems
Reconfigurable Computing
Voicu Groza SITE Hall, Room 5017 562 5800 ext.
2159 Groza_at_SITE.uOttawa.ca
2
Outline
  1. Introduction
  2. Enabling Technologies
  3. Fix, configurable, reconfigurable ...
  4. Reconfigurable Architectures
  5. Run-Time-Reconfigurable System-on-Chip
  6. Conclusion and Future Work
  7. References

3
1. Introduction
  • Reconfigurable computing Definition
  • Why reconfigurable computing ?

4
Reconfigurable Computing - Definition
  • Reconfigurable Computing (RC) presence of
    hardware (HW) that can be reconfigured
    (reconfigware - RW)
  • 1960 Gerald Estrin, The UCLA Fixed-Plus-Variable
    (FV) Structure Computer
  • DeHon and Wawrzynek computing via a
    postfabrication and spatially programmed
    connection of processing elements.
  • The architecture used in the computation is
    determined postfabrication and can therefore
    adapt to the characteristics of the executed
    algorithms.
  • The computation is spatial, in contrast to the
    more temporal style associated with
    microprocessors.

5
Re-inventing the wheel...
  • wire your own computer

6
Why reconfigurable computing ?
  • Is your belt long enough?
  • Embedded hand-held devices need to reduce
  • the power consumption targets,
  • the acceptable packaging and manufacturing costs,
  • the time-to-market
  • High-performance computing
  • Todays computationally intensive applications
    require more processing power
  • streaming video,
  • image recognition and processing,
  • highly interactive services
  • telecommunications
  • genes
  • Cray revived its latest entry-level XD1
    supercomputer by combining AMD Opteron processors
    with FPGAs for compute acceleration in a Linux
    environment.

7
Why reconfigurable computing cont.
PRO CON
High-performance micro-processors Versatile SW Off the-shelf solution For some applications might not be fast enough power consumption (gt100W/gigaFLOP) cost (ks)
Reconfigurable Computing Systems Versatile SW HW Computing structure matches application Given fabric can implement numerous functional units. Built out of off-the-shelf components, reduce design-time wires are slow big bit-slices are costly to interconnect -gt large silicon area performance overhead devices must store configuration on the chip
Application-Specific Integrated Circuits (ASIC) Does not suffer from the serial (and often slow and power-hungry) instruction fetch, decode and execute cycle that is at the heart of all microprocessors. Consumes less power fixed structure the cost of producing an ASIC (the masks cost 1 M ), the time to develop a custom integrated circuit
8
2. Enabling Technologies
  • Programmable ICs CPLD and FPGA (Xilinx 1984)
  • HW Abstractions
  • Fine-grained Reconfiguration is at the gate and
    register level.
  • By reconfiguration of registers, gates, and their
    interconnections, the internal structure of
    functional units is changed.
  • 2 major technologies
  • Complex Programmable Logic Devices (CPLD)
    EEPROM based
  • Field-Programmable Gate Arrays (FPGA) SRAM
    based
  • Coarse-grained Reconfiguration is based on a set
    of fixed blocks, like functional units, processor
    cores, and memory tiles.
  • The reconfiguration is merely the reprogramming
    of the interconnections between the fixed blocks.

9
Complex Programmable Logic Devices (CPLD)
  • Supplied with no predetermined logic function.
  • Programmed by user to implement any digital logic
    function.
  • Requires specialized computer software for design
    and programming.
  • Complex PLD (CPLD) A PLD that has several
    programmable sections with internal
    interconnections between the sections.
  • The basic building block of a CPLD is a macrocell
    which implements a logic function that is
    synthesized into a sum of product equations,
    followed by a D-type register.
  • Macrocells are grouped into logic blocks which
    are connected via a centralized interconnect
    array.

10
Altera MAX 7000 macrocell
11
Field-Programmable Gate Array (FPGA)
  • Reconfigurable functional units
  • coarse grained - ALUs and storage
  • fine-grained - small lookup tables

Interconnection network
Universal gates and/or storage elements
Switches
12
Basic ingredient Look Up Table (LUT)
Universal gate Look-up table memory
  • Logic Cell

0 0 0 1
a0
data
a1
a0
a1 a2
  • Memory elements SRAM

a1
13
Configurable Logic Blocks (CLB - Xilinx)Logic
Array Block (LAB Altera)
XILINX Spartan II CLB
2 logic cells 1 slice (Xilinx) or 1 Adaptive
Logic Module (ALM - Altera) 2 slices HW
abstractions Configurable Logic Blocks (CLB -
Xilinx)
14
Xilinx - Spartan II Architecture
  • IOBs provide the interface between the package
    pins and the internal logic
  • CLBs provide the functional elements for
    constructing most logic
  • Dedicated block RAM memories (4096 bits each)
  • Clock DLLs for clock distribution delay
    compensation and clock domain control
  • Versatile multi-level interconnect structure

15
Xilinx Virtex FPGA Model
Logic block
CLB
IO Mux
Switch Matrix
Switch Matrix
Line Segments
Programmable Interconnect Point (PIP)
16
Virtex-II Architecture Overview
  • 1 CLB 8 slices
  • 1 slice contains 2 function generators F G
    which are configurable as
  • 4-input look-up tables (LUTs), or
  • 16-bit shift registers, or
  • 16-bit distributed SelectRAM memory.

DCM Digital Clock Manager Block SelRAM 18 Kbit
(2k x 9bit of dual-port RAM) Multiplier blocks
18-bit x 18-bit
Device CLBsRow x Col Logic Cells Slices DistribRAM (Kb) DSP BlockRAM (Kb) SelRam
XC4VLX200 192 x 116 200,448 89,088 1392 96 336 6,048
17
3. Fix, configurable, reconfigurable ...
  • A simple classification
  • Non-configurable computing
  • Configurable computing
  • Reconfigurable computing
  • Each has its own characteristics, (dis)advantages
    and applications

18
3.1. Non-Configurable Computing
  • Uses fixed hardware such as ASICs or Custom VLSI
    circuits (eg. Microprocessors like x86, Sparc,
    DEC, PowerPC, etc)
  • Long product turnaround time, usually around 3-6
    months
  • Optimized for performance
  • Can be quite costly
  • Hardwired thus no room for error, re-work,
    improvement

19
3.2. Configurable Computing
Bitstream
Configuring Host
11100100011111111111111111100110001111000111111111
01101001011101101110001001100011100000000011010101
011110101011010111111111111
01101001011101101110001001100011100111001010011000
11100111001010011000111001110010100110001110011100
10
  • Configuring host supervises FPGA reconfiguration
    of a new bitstream
  • A bitstream is a sequence of bits which
    represents the burn-in configuration of the
    Hardware Block (HB) eg. synthesized, place and
    routed design

20
3.2. Configurable Computing (Contd)
  • Advantages
  • Uses configurable hardware such as FPGA or CPLD
  • PLDs are soft wired for re-use of static hardware
    resources
  • Cost effective
  • Quick turnaround time
  • Flexible and ease in design process
  • Disadvantages
  • Inefficient use of hardware resources, cannot use
    idle FPGA area during run-time
  • Slow reconfiguration time, because of
    reconfiguring the entire FPGA for a single
    Hardware Block (HB)
  • Thus, must stop execution while reconfiguring a
    new Hardware Block

21
3.3. Reconfigurable Computing
Configuring Host
Bitstream
01101001011101101110001001100011100111001010011000
11100111001010011000111001110010100110001110011100
10110010
11100100011111111111111111100110001111000111111111
01101001011101101110001001100011100
11100100011111111111111111100110001111000111111111
01101001011101101110001001100011100
We could also use a placement algorithm to
possibly fit all requested HBs into the FPGA
22
3. Reconfigurable Computing (Contd)
  • Advantages
  • Same as Configurable Computing
  • No need to completely stop the execution while
    reconfiguring the FPGA with a new HB
  • Efficient use of static hardware resources can
    swap out or move HBs around to fit new HBs on the
    FPGA, no need for a larger FPGA or a second one
  • Fast reconfiguration times
  • Run-time reconfiguration on the fly
  • Less power consumption, as we can swap out HBs
  • Disadvantages
  • Routing HBs can be a heavy overhead for the
    configuring host especially if HBs are too large
    or when defragmentation is necessary

23
What is Run-Time Reconfiguration (RTR) ?
  • On-the-fly flexibility
  • Combines characteristics of co-processors with
    those of reconfigurable computing
  • Introduces overhead to reconfigure the
    co-processor but offsets by increasing execution
    speed (faster in H/W!)

24
4. Reconfigurable Architectures
  • External stand-alone processing unit
  • Attached processing unit
  • Reconfigurable functional unit
  • Co-processor
  • Processor embedded in a reconfigurable fabric
  • (Compton Hauck)

25
External stand-alone processing unit
RPU coupled to the I/O system bus
The RECON System John Reid Hauser John Wawrzynek
Randy H. Katz (University of California,
Berkeley) Consists of a SUN SparcStation host
and a reconfigurable coprocessor board (The board
exploits a XC4010 FPGA as the reconfigurable
processor unit).
26
Attached processing unit
RPU coupled to the local bus
  • TKDM
  • Marco Platzner
  • ETH Zurich
  • FPGA module that uses the DIMM (dual inline
    memory module) bus for high-bandwidth
    communication with the host CPU.
  • It is integrated with the Linux host OS
  • offers functions for data communication and FPGA
    reconfiguration.

27
Attached processing unit (Cont.)
  • Morphosys
  • Nader Bagherzadeh
  • University of California, Irvine
  • Coarse grain MorphoSys operates on 8 / 16-bit
    data.
  • Configuration RC array is configured by context
    words, which specify an instruction opcode for
    RC.
  • Depth of programmability The Context Memory can
    store up to 32 planes of configuration.
  • Dynamic reconfiguration Contexts are loaded into
    Context Memory without interrupting RC operation.
  • Local/Host Processor The control processor (Tiny
    RISC) and RC Array are resident on the same chip.
  • Fast Memory Interface Through DMA controller.
  • Consists of a combination of a RISC processor
    core with an array of coarse-grain reconfigurable
    cells
  • It utilizes a DMA controller in order to load the
    configuration data (context) into the Context
    Memory

28
Reconfigurable functional unit
RPU integrated in the CPU
  • Chimaera
  • S. Hauck
  • University Washington, Seatle
  • System treats the reconfigurable logic as a cache
    for RPU instructions.
  • Those instructions that have recently been
    executed, or that we can otherwise predict might
    be needed soon, are kept in the reconfigurable
    logic.
  • If another instruction is required, it is brought
    into the RPU by overwriting one or more of the
    currently loaded instructions.

Chimaera
29
Co-processor
RPU coupled to the CPU
  • GARP
  • Hauser Wawrzynek
  • University of California, Berkley
  • A reconfigurable architecture that combines
    reconfigurable hardware with a standard MIPS
    processor on the same die to retain better
    feature performance.
  • Two configurations can never be active at the
    same time on its reconfigurable array which can
    significantly reduce the overall performance of
    the system.

30
5. RTR-SoC System Architecture
Execution unit of HBs
Allows dedicated OMA-RPU access
Stores program and data code
IBM OPB
Runs software instructions
Stores HB bitstreams
RTR-SoC System Architecture
31
Application and Reconfiguration Flows
  • While the application flow runs on AE, RE sends
    RTR_PREP_HB to the ICAP controller, to start the
    loading of the first HB bitstream onto the RPU.
  • Once this HB is ready in the RPU, the ICAP sends
    back an RTR_ACK to the RE.
  • The newly implemented HB on the RPU starts to
    work as soon as it is ENABLEd by the
    reconfiguration flow on RE.
  • Upon completion, HB sets flag RTR_DONE to make
    the application flow aware that it is ready for
    use.
  • Once the application flow on AE has prepared data
    that HB needs, AE asserts the flag DATA_READY.
  • HB asserts EXE_DONE when finishes its task and
    has prepared the results to be read by the
    application flow on AE.
  • When the application flow needs these results, it
    checks the flag EXE_DONE, and waits if it is not
    yet set.
  • The application flow gets the results and then
    asserts DATA_ACK to acknowledge to HB that it got
    data.

32
Final system architecture
RE
AE
33
Tasks running on AE and RE
34
Physical Layer Overview
  • Have already developed a physical layer in JBits
    in order to evaluate RTR on a Xilinx Virtex
    device
  • Physical layer has 3 main functions
  • modeling the FPGA resources,
  • running a placement algorithm for the different
    Hardware Blocks, and
  • managing the physical resources of the FPGA and
    any on-board peripherals.
  • RTR Execution Model
  • Bitstream(s) read by the JBits App
  • JBits App configures the Virtex RC HW located in
    the PCI slot using the XHWIF API.
  • XHWIF (Xilinx HardWare InterFace Standard)
  • ? Java interface for communicating with FPGA-
  • based boards.
  • This Enables run-time reconfiguration of Virtex
    Device.

JBits is a set of Java APIs and classes that
provide a High-Level language approach to develop
reconfigurable Systems, include RT
reconfiguration.
35
Hardware Block (HB) Architecture
  • An HB is a functional hardware module that
    contains its own configuration (i.e. the
    bitstream), and state information (e.g. status
    and control registers) that define its current
    state.
  • It is divided into two major components
  • The HB Dependent Unit (HBDU)
  • Encompasses several components that vary in
    functionality and magnitude depending on the
    functions supported by a particular HB.
  • The HB Independent Unit (HBIU)
  • Designed as a core and hence follows a
    standardized implementation scheme for all HBs.

36
Hardware Block Reconfiguration
  • The HBs are partially reconfigured by the
    aforementioned Reconfigurable Processing Unit
    (RPU).
  • The reconfiguration process is enabled by means
    of a Self-Reconfiguration Platform (SRP).
  • It enables the FPGA to be dynamically
    reconfigured under the control of an embedded
    microprocessor.
  • It is divided into a H/W component and S/W
    components.
  • The H/W component consists of four primary
    components the Internal Configuration Access
    Port (ICAP), some control logic, a small
    configuration cache - Block RAM (BRAM), and an
    embedded processor.
  • The S/W component implements an API that defines
    methods for accessing configuration logic through
    the ICAP port.

37
PR Methodology Xilinx Virtex II Architecture
  • Virtex II FPGAs fabric composed of an array of
    Configurable Logic Blocks (CLBs).
  • Block RAMs (BRAM).
  • Input/Output Blocks (IOBs).
  • Special functions blocks such as Multipliers,
    PLLs etc.
  • Each CLB contains four slices.
  • Each slice contains two 4-input look-up tables, 2
    D-type flip-flops to implement combinational and
    sequential circuits.

38
PR Methodology
  • Bus Macros (BMs) are required between active and
    static modules of the design.
  • The size and location of the reconfigurable
    module (active) is always fixed.
  • The reconfigurable module is always the full
    height of the device
  • All logic resources located within the width of
    the module are considered part of the
    reconfigurable modules bitstream frame. This
    includes slices, tri-state buffers (TBUFs), block
    RAMs (BRAMs), multipliers, input/output blocks
    (IOBs), and all routing resources.

39
PR Methodology
Bus Macro block Diagram
  • Bus Macros (BMs) are predefined physical routing
    bridges that connect the active to the static
    one.
  • Any connection from active to static logic should
    always go through a bus macro
  • We chose the slices bus macros (over the TBUF) as
    they give higher concentration of communication
    bits per CLB
  • Bus macros allows data to move in only one
    direction either left-to-right or right-to-left.

40
PR Methodology
Final Design Layout
Design contains only one active module. All other
logic components are on the static module.
41
Xilinx Internal Configuration Access Port (ICAP)
PR Methodology
  • Provides configuration interface to FPGA fabric.
  • Cache BRAM to hold at least one frame.
  • Control logic for the OPB bus interface.
  • API calls to allow SW to read/Write configuration
    memory.

42
PR Methodology
  • A partial bitstream is generated for the active
    (dynamic) part of the FPGA
  • The device remains in full operation while the
    new partial bitstream is downloaded
  • The full bitstream configuration must already be
    programmed into the device before downloading the
    partial bitstream.
  • Multiple bitstreams can be generated for every
    partially reconfigurable module variation
  • Failing to utilize this command will assert the
    global set reset (GSR) during configuration,
    resetting the entire design
  • g ActiveReconfig Yes option

43
PR Methodology
  • Virtex-II configuration memory is arranged in
    vertical frames that are one bit wide and stretch
    from the top edge of the device to the bottom.
  • These frames are the smallest addressable
    segments of the Virtex-II configuration memory
    space therefore, all operations must act on
    whole configuration frames.
  • The length of a Virtex-II frame is not fixed and
    depends on the size of the device.
  • the number of frames per column type is constant
    for all devices.

44
Reconfigurable Processing Unit
The RPU high-level block diagram
45
Preliminary Results
  • Xilinx Virtex-II Platform FPGAs were used to
    implement this system.
  • Preliminary results were generated using ModelSim
    SE 5.7f.

Simulation results for the HB I/F interface. They
illustrate how the I/F is used in order to enable
proper synchronization among the reconfiguration
flow and the application flow.
46
6. Conclusion and Future Work
  • A novel architecture of a RTR SoC is introduced
  • RPU and HBs are designed
  • This design targets adaptive embedded systems,
    DSP-related and low-power applications
  • These functions are implemented as HBs and can be
    exploited in a multi-purpose environment. For
    example, the RTR SoC may execute various tasks to
    perform DSP-related functions, and subsequently
    reconfigured into a high-performance measurement
    processing system
  • Future designs would allow the user more
    flexibility by auto-reconfiguring the RPU
    depending on the computational and functional
    needs of its respective applications
  • Real-time applications is our future target, as
    idle HBs are swapped out of the RPU, to save
    power or to allow for updates to the HBs

47
References
  • Marco Platzner. Reconfigurable Computer
    Architectures, ei Elektrotechnik und
    Informationstechnik, 115(3)143-148, 1998.
    Springer.
  • Y. Li, T. Callahan, E. Darnel, R. Harr, U.
    Kurkure and J. Stockwood, HardwareSoftware
    Co-Design of Embedded Reconfigurable
    Architectures, 37th Design Automation
    Conference, 2000. Proceedings DAC pp.507 - 512,
    June 5-9, 2000.
  • J. P. Heron, R. Woods, S. Sezer, and R. H.
    Turner. Development of a run-time
    reconfiguration system with low reconfiguration
    overhead, Journal of VLSI Signal Processing,
    28(1/2)97-113, May 2001.
  • Xilinx Microblaze Soft Processor Core,
    http//www.xilinx.com/ise/embedded/edk6_2docs/mb
    ref_guide.pdf, last accessed on October 19, 2004
  • G. Aggarwal, N. Thaper, K. Aggarwal, M.
    Balakrishnan, and S. Kumar. A Novel
    Reconfigurable Co-Processor Architecture, In
    Proceedings of Tenth International Conference on
    VLSI Design, pages 370-375, January 1997.
  • G. Haug and W. Rosenstiel. Reconfigurable
    Hardware as Shared Resource in Multipurpose
    Computers, In Reiner W. Hartenstein and Andres
    Keevallik, editors, Field-Programmable Logic
    From FPGAs to Computing Paradigm,
    Springer-Verlag, pages 149-158, Berlin,
    August/September 1998.
  • Xilinx Virtex-II Platform FPGAs Complete Data
    Sheet, DS031 (14 Oct. 2003).
  • D. Wo and K. Forward, Compiling to the Gate
    Level for a Reconfigurable Co-Processor In
    Proceeding of FPGAs for Custom Computing Machines
    (1994), pages 147-154.
  • V. Groza, R. Abielmona, M. El-Kadri, N. Sakr, and
    M. Elbadri, A Reconfigurable Co-Processor for
    Adaptive Embedded Systems, Workshop on
    Intelligent Solutions in Embedded Systems, Graz,
    Austria, June 2004.
  • IBM On-Chip Peripheral Bus, http//www-306.ibm.c
    om/chips/techlib/techlib.nsf/techdocs/
    9A7AFA74DAD200D087256AB30005F0C8/file/OpbBus.pdf
    last accessed on October 19, 2004
  • R. Abielmona, V. Groza, N. Sakr, and J. Ho,
    Low-Level Run-Time Reconfiguration of FPGAs for
    Dynamic Environments, IEEE Canadian Conference
    on Electrical and Computer Engineering, CCECE
    2003, Niagara Falls, May 2004.
  • B. Blodget, P. James-Roxby, E. Keller, S.
    McMillian, and P. Sundararajan. A Self
    reconfiguring Platform, Proceedings of the
    International Conference on Field Programmable
    Logic, Lisbon, Portugal, Sept. 2003.
Write a Comment
User Comments (0)
About PowerShow.com