CARMA: A Comprehensive Management Framework for HighPerformance Reconfigurable Computing - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

CARMA: A Comprehensive Management Framework for HighPerformance Reconfigurable Computing

Description:

Provides access to in-house boards: ADM-XRC (x1), Tarari (x4), RC1000 (x4) ... Performance monitoring down into the RC fabric ... – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 16
Provided by: dral60
Category:

less

Transcript and Presenter's Notes

Title: CARMA: A Comprehensive Management Framework for HighPerformance Reconfigurable Computing


1
CARMA A Comprehensive Management Framework for
High-Performance Reconfigurable Computing
  • Ian A. Troxel, Aju M. Jacob, Alan D. George,
  • Raj Subramaniyan, and Matthew A. Radlinski
  • High-performance Computing and Simulation (HCS)
    Research Laboratory
  • Department of Electrical and Computer Engineering
  • University of Florida
  • Gainesville, FL

2
CARMA Motivation
  • Key missing pieces in RC for HPC
  • Dynamic RC fabric discovery and management
  • Coherent multitasking, multi-user environment
  • Robust job scheduling and management
  • Design for fault tolerance and scalability
  • Heterogeneous system support
  • Device independent programming model
  • Debug and system health monitoring
  • System performance monitoring into the RC fabric
  • Increased RC device and system usability
  • Our proposed Comprehensive Approach to
    Reconfigurable Management Architecture (CARMA)
    attempts to unify existing technologies as well
    as fill in missing pieces

CARMA
(Holy Fire by Alex Grey)
3
CARMA Framework Overview
  • CARMA seeks to integrate
  • Graphical user interface
  • Flexible programming model
  • COTS application mapper(s)
  • Handel-C, Impulse-C, Viva, System Generator, etc.
  • Graph-based job description
  • DAGMan, Condensed Graphs, etc.
  • Robust management tool
  • Distributed, scalable job scheduling
  • Checkpointing, rollback and recovery
  • Distributed configuration management
  • Multilevel monitoring service (GEMS)
  • Networks, hosts, and boards
  • Monitoring down into RC Fabric
  • Device independent middleware API
  • Multiple types of RC boards
  • PCI (many), network-attached, Pilchard
  • Multiple high-speed networks
  • SCI, Myrinet, GigE, InfiniBand, etc.

4
Application Mapper Evaluation
  • Evaluating on basis of ease of use, performance,
    hardware device independence, programming model,
    parallelization support, resource targeting,
    network support, stand-alone mapping, etc.
  • C-Based tools
  • Celoxica - SDK (Handel-C)
  • Provides access to in-house boards ADM-XRC (x1),
    Tarari (x4), RC1000 (x4)
  • Good deal of success after lessons learned
  • Hardware design focused
  • Impulse Accelerated Technologies Impulse-C
  • Provides an option for hardware independence
  • Built upon open source Streams-C from LANL
  • Supports ANSI standard C
  • Graphical tools
  • StarBridge Systems - Viva
  • Nallatech Fuse / DIMEtalk
  • Annapolis Micro Systems - CoreFire
  • Xilinx - ISE compulsory
  • Evaluating the role of Jbits, System Generator,
    and XHWIF

Streams-C c/o LANL
5
CARMA Interface
  • Simple graphical user interface
  • Preliminary basis for graphical user interface
    via the Simple Web Interface Link Library (SWILL)
    from the University of Chicago
  • User view for authentication and job
    submission/status
  • Administration view for system status and
    maintenance
  • Applications supported
  • Single or multiple tasks per job (via CARMA
    DAGs)
  • CARMA registered (via CARMA API and DAGs) or not
  • Provides security, fault tolerance
  • Sequential and parallel (hand-coded or via MPI)
  • C-based application mappers supported
  • CARMA middleware API provides architecture
    independence
  • Any code that can link to the CARMA API library
    can be executed (Handel-C and ADM-XRC API tested
    to date)
  • Bit files must be registered with the CARMA
    Configuration Manager (CM)
  • All other mappers can use not CARMA registered
    mode
  • Plans for linking Streams/Impulse-C, System
    Generator, et al.

http//systems.cs.uchicago.edu/swill/
Similar to Condor DAGs
6
CARMA User Interface
7
CARMA Job Manager (JM)
CARMA DAG Example
  • Prototyping effort (CARMA interoperability)
  • Completed first version of CARMA JM
  • Task-based execution via Condor-like DAGs
  • Separate processes and message queues for
    fault-tolerance
  • Checkpointing enabled with rollback in progress
  • Links to all other CARMA components
  • Fully distributed multi-node operation with
    job/task migration
  • Links to CARMA monitor and GEMS to make
    scheduling decisions
  • Tradeoff studies and analyses underway
  • External extensions to COTS tools (COTS plug and
    play)
  • Expand upon preliminary work _at_ GWU/GMU
  • Striving for plug and play approach to JM
  • CARMA Monitor provides board info. (via ELIM)
  • Working to link to CARMA CM
  • Tradeoff studies and analysis underway
  • Integration of other CARMA components in progress

c/o GWU/GMU
Kris Gaj, Tarek El-Ghazawi, et al., Effective
Utilization and Reconfiguration of Distributed
Hardware Resources Using Job Management Systems,
Reconfigurable Architecture Workshop 2003, Nice,
France, April 2003.
8
CARMA CM Design
  • Builds upon previous design concepts
  • Execution Manager (EM)
  • Forks tasks from JM and returns results to JM
  • Requests and releases configurations
  • Configuration Manager (CM)
  • Manages configuration transport and caching
  • Loads, unloads configurations via BIM
  • Board Interface Module (BIM)
  • Provides board independence
  • Allows for configuration temporal locality
    benefits
  • Communication Module
  • Handles all inter-node communication
  • Board Interface Module (BIM)
  • Configures and interfaces with diverse set of RC
    boards
  • Numerous PCI-based boards
  • Various interfaces for network attached RC
  • Instantiated at startup
  • Provides hardware independence to higher layers
  • Separate BIM for each supported board
  • Simple standard interface to boards for remote
    nodes
  • Enhances security by authenticating data and
    configurations

U. of Glasgow (Rage), Imperial College in UK,
U. Washington, among others
9
Distributed CM Management Schemes
Jobs submitted centrally
APP
Global view of the system at all times
APP MAP
GJM
GRMAN
Results, Statistics
Network
Tasks, States

LRMON
LRMON
Local Sys
Local Sys
Client-Server (CS)
Master-Worker (MW)
Client-Broker (CB)
Simple Peer-to-Peer (SPP)
Note More in-depth results for distributed CM
appeared at ERSA04
10
CM System Recommendations
  • Scalability projected up to 4096 nodes
  • Performed analytic scalability analysis based on
    16-node experimental results
  • Dual 2.4GHz Xeons and a Tarari CPX2100 HPC board
    in a 64/66 PCI slot
  • Gigabit Ethernet and 5.3 Gbps Scalable Coherent
    Interface (SCI) control and data networks
    respectively
  • Flat system of 4096 has very high completion
    times (5 minutes for SPP and 83 hrs for CS)
  • Layered hierarchy needed for reasonable
    completion times (2.5 sec for SPP over SPP at
    4096 nodes)
  • CS reduces network traffic by sacrificing
    response time and SPP improves response time by
    increasing network utilization
  • Conclusions
  • CARMA CM design imposes very little overhead on
    the system
  • Hierarchical scheme needed to scale to systems of
    thousands of nodes (traditional MW will not work)
  • Multiple servers for CS scheme dont reduce the
    server bottleneck for system sizes greater than
    32
  • SPP over CS (group size 8) best overall
    performance for systems larger than 512 nodes

Schemes with completion latency values greater
than 5 seconds excluded
11
CARMA Monitoring Services
  • Monitoring service
  • Statistics Collector
  • Gathers local and remote information
  • Updates GEMS and local values
  • Query Processor
  • Processes task scheduling requests from JM
  • Maintains local information
  • Round-Robin Database
  • Compact way to store performance logs
  • Supports simple query interface
  • CARMA Diagnostic
  • System watchdog alerts based on defined
    heuristics of failure conditions
  • Provides system monitoring and debug
  • Initial monitor version is complete
  • Studying FPGA monitoring options
  • Increasing the scheduling options
  • Tradeoff studies and analyses underway

Gossip-Enabled Monitoring Service (GEMS)
developed by HCS Lab for robust, scalable,
multilevel monitoring of resource health and
performance. For more info. see
http//www.hcs.ufl.edu/gems
12
CARMA End-to-End Service Description
CARMA Execution Stages
  • Functionality demonstrated to date
  • Graphical user interface
  • Job/task scheduling based on board requirements
    and configuration temporal locality
  • Parallel and serial jobs
  • CARMA registered and non-registered tasks
  • Remote execution and result retrieval
  • Configuration caching and management
  • Mixed RC and CPU-only tasks
  • Heterogeneous board execution (3 types thus far)
  • System and RC device monitoring
  • Inter-node communication via SCI or TCP/IP/GigE
  • Fault-tolerant design
  • Processes can be restarted while running
  • Virtually no system impact from CARMA overhead
    despite use of unoptimized code
  • Less than 5MB RAM per node
  • Less than 0.1 processor utilization on a 2.4 GHz
    Xeon server
  • Less than 200 Kbps network utilization

13
CARMA Framework Verification
  • Several test jobs executed concurrently
  • Parallel Add Test composed of
  • ADD.exe, a CPU-only task to add two numbers
  • AddOne.bit, an RC task to increment input value
  • Parallel N-Queens Test composed of
  • ADD.exe, a CPU-only task to add two numbers
  • NQueens.bit, an RC1000 task to calculate a subset
    of the total number of solutions for an NN board
  • 4 RC1000s and 4 Tararis communicating via MPI
  • Parallel Sieve of Erasthones (on Tarari)
  • Parallel Monte Carlo Pi Generator (on Tarari)
  • Blowfish encrypt/decrypt (on ADM-XRC)

N-Queens Test
Par. Add Test
Example System Setup
These simple applications used to test CARMAs
functionality, while CARMAs services have wider
applicability to problems of greater size and
complexity.
14
Conclusions
  • First working version of CARMA complete tested
  • Numerous features supported
  • Simple GUI front-end interface
  • Coherent multitasking, multi-user environment
  • Dynamic RC fabric discovery and management
  • Robust job scheduling and management
  • Fault-tolerant and scalable services by design
  • Performance monitoring down into the RC fabric
  • Heterogeneous board support with hardware
    independence
  • Linking to COTS job management service
  • Initial testing shows the framework to be sound
    with very little overhead imposed upon the system

15
Future Work and Acknowledgements
  • Continue to fill in additional CARMA features
  • Include support for other boards, application
    mappers, and languages
  • Complete JM rollback feature and finish linkage
    to LSF
  • Include broker and caching mechanisms for the
    peer-to-peer distributed CM scheme
  • Include more intelligent scheduling algorithms
    (e.g. Last Release Time)
  • Expand RC device monitoring and include debug and
    opt. mechanisms
  • Enhance security including secure data transfer
    and authentication
  • Deploy on a large-scale test facility
  • Develop CARMA instantiations for other RC domains
  • Distributed shared-memory machines with RC (e.g.
    SGI Altix)
  • Embedded RC systems (e.g. satellite/aircraft
    systems, munitions)
  • We wish to thank the following for supporting
    this research
  • Department of Defense
  • Xilinx
  • Celoxica
  • Alpha Data
  • Tarari
Write a Comment
User Comments (0)
About PowerShow.com