Title: Reconfigurable Computing RC Group
1Reconfigurable Computing (RC) Group
- Reconfigurable Architectures, Networks, and
Services for COTS-based Cluster Computing Systems - Dr. Alan D. George, Director
- HCS Research Laboratory
- University of Florida
2Outline
- RC Group Overview
- RC Design Space
- Motivation
- Background
- Programming Model Framework
- CARMA Framework
- Preliminary Results
- Conclusions
3RC Group Overview
- Group Members (Summer 2003)
- Matt Ashoff, B.S. student in ECE
- Gall Gotfried, B.S. student in CISE
- Alex Hoyos, M.S. student in ECE
- Aju Jacob, M.S. student in ECE
- Matt Radlinski, Ph.D. candidate in ECE
- Alex Shye, M.S. student in ECE
- Steven Theriault, M.S. student in ECE
- Tavaris Thomas, Ph.D. candidate in CS
- Ian Troxel, Ph.D. student in ECE, group leader
- Sponsors
- Department of Defense
- Honeywell (pending)
- Xilinx (hardware and tools)
- Numerous other lab sponsors for cluster resources
4RC Design Space
Sponsor Areas
DoD
Honeywell
MLD
Pending
5Motivation
- Key missing pieces in RC clusters for HPC
- Dynamic RC fabric discovery and management
- Coherent multitasking, multi-user environment
- Robust job scheduling and management
- Fault tolerance and scalability
- Performance monitoring down into the RC fabric
- Automated application mapping into management
tool - The HCS labs proposed Cluster-based Approach to
Reconfigurable Management Architecture (CARMA)
attempts to unify existing technologies as well
as fill in missing pieces
6Background
- Evaluation of cluster mgt. tools is ongoing
- External extensions to traditional tools
- Work at GWU/GMU with LSF
- Good introduction to features
- needed in a management
- tool for RC clusters
7Background
- University College Cork (DRMC)
- Condensed graph algorithm mapping and scheduling
- Instruction-level threads
- UC Berkeley (SCORE)
- Dynamic flow graph for PC-RC scheduling
8Background
- University of Glasgow (RAGE)
- FPGA as virtual hardware
- Transform and configuration managers
- Limited, but useful ideas
- Imperial College (IGOL)
- Hierarchical programming framework
- Focus on application mapping
9Background
- Virginia Tech
- ACS API
- Network and device abstraction
- Static resource management
- MPI interprocess communication
- Janus RTR
- Object-oriented design
- Platform independent
- Java-based
Network-channel
10Programming Model Framework
- Users view
- Graphical user interface
- Submit jobs and receive results
- Assert execution parameters
- Query job and system status
- Applications
- C and VHDL programming model
- Test cases under development
- Block cyphers
- DES cryptanalysis
- Blowfish encrypt/decrypt
- Stream cyphers
- RSA encrypt/decrypt
- Sonar Beamforming
- Hyperspectral Imaging (c/o LANL)
- General set of benchmarks
11Programming Model Framework
- Application code mapped to specific hardware
- RC acceleration decisions
- Parallelization
- Computation graph produced
- Numerous design choices
- Automatic mapping tools
- e.g. Handel-C, Viva, StreamsC (LANL), CHAMPION
(UT), PipeRENCH (CMU) - Middleware for HPC
- MPI and UPC
- High-speed network communication
- Multi-user access of RC fabric
- Partial reconfiguration
- Unified control structures
12Programming Model Framework
- Job Scheduler (JS)
- Distributed, fault-tolerant yet scalable
- Parallel and serial jobs
- Graph-based scheduling e.g.
- Condensed Graphs (U. College Cork)
- DAGMan (Condor, U. of Wisconsin)
- Execution Manager (ExMan)
- Task-based execution (TaskMan)
- Checkpointing and rollback
- Configuration Manager (ConfigMan)
- Registration of all RC resources
- Discovery, prefetching and transport
- Caching and defragmentation
- Multi-task environment on each FPGA
13Programming Model Framework
Early prototype of the skeleton working with
simple jobs by 8/9/03
- Components and interactions for cluster
management tool - TCP sockets for reliable message delivery
- Independent service daemons for a more robust
environment - Spectrum of options, from master-worker to
peer-to-peer - Scalability achieved through hierarchical design
14Management Schemes
Jobs submitted centrally
Global view of the system at all times
APP
Global view of the system at all times
APP MAP
GJS
GRMAN
Network
Results, Statistics
Tasks, States
LRMON
LRMON
Local Sys
Local Sys
Server houses configurations
Master-Worker
Client-Server
Global view of the system at all times
Server brokers configurations
Client-Broker
Peer-to-Peer
15Programming Model Framework
- Processor monitoring
- CPU utilization
- Memory utilization
- Network monitoring
- Bandwidth
- Latency
- Congestion, loss rate
- RC fabric monitoring
- Area utilization
- On-chip memory utilization
- On-board memory utilization
- GEMS extended / integrated
GEMS is the gossip-enabled monitoring service
developed by the HCS Lab for robust, scalable,
multilevel monitoring of resource health and
performance for more info. see
http//www.hcs.ufl.edu/prj/ftgroup/teamHome.php.
16Programming Model Framework
- Middleware API
- Provide abstraction of network and RC fabric
- Support high-speed networks
- SCI, Myrinet, GigE, InfiniBand
- Support parallel pgm. models
- MPI, UPC, others?
- Support multiple boards
- e.g. Alpha Data, Celoxica, Nallatech, Starbridge,
Annapolis Microsystems, ACT, SRC, SLAAC,
Honeywell, others? - Propose to leverage key concepts in ACS API
developed at VPI - RC Fabric API
- Board-specific drivers and API
- Vendor-specific APIs used
17CARMA Model Framework
- CARMA seeks to integrate
- Graphical user interface
- COTS application mapper
- Candidates include Handel-C, Viva, StreamsC,
CHAMPION, others? - Graph-based job description
- Condensed Graphs, DAGMan, others?
- Robust management tool
- Distributed, scalable job scheduling
- Checkpointing, rollback and recovery for both
host and RC units - Multilevel monitoring service (GEMS)
- Clusters, networks, hosts, RC fabric
- Tradeoff issues down to RC level
- Flexible middleware API (adapt ACS?)
- Multiple types of RC boards
- Multiple high-speed networks
- SCI, Myrinet, GigE, InfiniBand
18Preliminary Results
- Prototyping of initial mechanisms for CARMA
framework - ExMan, ConfigMan, TaskMan, and simple JS over
ADM-XRC API - Evaluation of cluster management tools
- Many commercial tools identified and evaluated
(ref GWU/GMU) - Further evaluations are continuing with other
noteworthy options - Ideal goal to easily support almost any existing
general-purpose RMS - Simulations of multilevel RC performance
monitoring - Gossip-Enhanced Monitoring Service (GEMS)
- Several algorithms under development as test
cases - DES, Blowfish, RSA, Sonar Beamforming,
Hyperspectral Imaging - These and others for set of benchmarks in RC
cluster computing - Other work of note by RC group members at Florida
- Modeling and simulation of RC-enhanced network
processors - Part of design team for Honeywell Reconfig. Space
Computer (HRSC)
19Conclusions
- Broad coverage of RC design space
- With focus on COTS-based RC clusters for HPC
- Builds on lab strength in cluster computing and
communications - Key missing pieces in RC cluster design
identified - Initial framework for CARMA developed
- Design options being refined
- Prototyping of preliminary mechanisms almost
complete - Several test applications under development as RC
cluster benchmarks - Pursuing options for academia collaboration
- VPI, GWU, USC, UCB, FAMU-FSU, UC Cork, others?
- Commercial technologies under evaluation
- Hopeful of significant industry collaboration