Memory Support Design for LU Decomposition on the Starbridge Hypercomputer - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Memory Support Design for LU Decomposition on the Starbridge Hypercomputer

Description:

Data transfer hardware which supports an implementation of a block-based LU ... Hardware Platform Limitations. PC to FPGA bandwidth. 64-bit 66MHz PCI bus ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 23
Provided by: csU66
Category:

less

Transcript and Presenter's Notes

Title: Memory Support Design for LU Decomposition on the Starbridge Hypercomputer


1
Memory Support Design for LU Decomposition on the
Starbridge Hyper-computer
  • Seth Young, Arvind Sudarsanam, Aravind Dasu, and
    Thomas Hauser
  • Utah State University
  • Presented by Yi-Gang Tai

2
  • Data transfer hardware which supports an
    implementation of a block-based LU algorithm on a
    multi-FPGA system

3
Introduction
  • LU decomposition splits a matrix into the product
    of an upper triangular matrix and a lower
    triangular matrix
  • Block-based LU
  • An FPGA does not have enough local memory
  • The matrix is broken down into smaller matrices
  • Inter-node communication eliminated

4
Block-based LU Decomposition
  • Block partitioning shown above
  • Four steps of the block-based LU
  • Normal LU factorization of A11
  • Create L21 using L11, U11, A21
  • Create U12 using L11, U11, A12
  • Create A using L21, U12, A22
  • Repeat iteratively with A as new A

5
Platform Overview
  • A hyper-computer by Starbridge systems
  • A PC as the main controller
  • Xilinx vertex II 6000 FPGA
  • 2GB DRAM for each PE (FPGADRAM)
  • FPGA hardware design using viva toolset by
    Starbridge

6
Platform Overview (cont.)
7
Hardware Platform Limitations
  • PC to FPGA bandwidth
  • 64-bit 66MHz PCI bus
  • 128-bit complex double precision floating-point
  • DRAM size
  • 2GB DRAM of a PE holds 512K 16x16 blocks
  • FPGA BRAM size
  • Determines the size of blocks
  • 324Kb of BRAM fits 79 blocks

8
Memory Transfer H/W Design
  • Top level PE block diagram

9
Data Flow StepsBetween PC and FPGA
  • Four steps of data flow process
  • Raw data from PC to PE and stored in DRAM
  • Data moved from DRAM to FPGA in blocks
  • Processed data transferred back to DRAMSteps 2
    and 3 alternate until all data processed
  • Processed data moved back to PC
  • Different LU decomposition steps have same data
    flow steps but different data organization

10
Data Ordering for LU Steps 2 3
Addr N
Addr 0
11
Data Ordering for LU Step 4
12
PC to DRAM Data Transfer H/W
  • Governed by a 3-state FSM

13
Block Diagram of State 1
14
Structure of Sequence Detector
15
H/W Interaction of State 3
16
Block Diagram ofData to PC Controller
17
DRAM to FPGA H/W Implementation for LU Steps 2 3
  • Implemented in Viva
  • Memory control module
  • Interface with Viva DRAM controller
  • 2-state state machine wait and load
  • Block control module
  • Controls memory control module
  • 2-state state machine read and write

18
DRAM to FPGA H/W Implementation for LU Steps 4
19
State Diagram of State MC Module
20
Resource Utilization Results
21
Data Transfer Times
  • Ideal
  • Actual

22
Block Size vs. Times
  • 64 is the most efficient block size (?)
Write a Comment
User Comments (0)
About PowerShow.com