Hackystat and the DARPA High Productivity Computing Systems Program Philip Johnson University of Hawaii - PowerPoint PPT Presentation

About This Presentation
Title:

Hackystat and the DARPA High Productivity Computing Systems Program Philip Johnson University of Hawaii

Description:

(Lincoln, UMD & Mission Partners) University Experiments (MIT, UCSB, UCSD, ... (Mitre, ISI, LBL, Lincoln, HPCMO, LANL & Mission Partners) Performance Analysis ... – PowerPoint PPT presentation

Number of Views:118
Avg rating:3.0/5.0
Slides: 40
Provided by: lorinho
Learn more at: http://www.hawaii.edu
Category:

less

Transcript and Presenter's Notes

Title: Hackystat and the DARPA High Productivity Computing Systems Program Philip Johnson University of Hawaii


1
Hackystat and the DARPA High Productivity
Computing Systems ProgramPhilip
JohnsonUniversity of Hawaii
2
Overview of HPCS
3
High ProductivityComputing Systems
  • Goal
  • Provide a new generation of economically viable
    high productivity computing systems for the
    national security and industrial user community
    (2007 2010)
  • Impact
  • Performance (time-to-solution) speedup critical
    national security applications by a factor of 10X
    to 40X
  • Programmability (time-for-idea-to-first-solution)
    reduce cost and time of developing application
    solutions
  • Portability (transparency) insulate research and
    operational application software from system
  • Robustness (reliability) apply all known
    techniques to protect against outside attacks,
    hardware faults, programming errors

HPCS Program Focus Areas
  • Applications
  • Intelligence/surveillance, reconnaissance,
    cryptanalysis, weapons analysis, airborne
    contaminant modeling and biotechnology

Fill the Critical Technology and Capability
Gap Today (late 80s HPC technology)..to..Future
(Quantum/Bio Computing)
4
Vision Focus on the Lost Dimension of HPC
User System Efficiency and Productivity
1980s Technology
Parallel Vector Systems
Vector
Moores Law Double Raw Performance every 18
Months
Commodity HPCs
Tightly Coupled Parallel Systems
New Goal Double Value Every 18 Months
2010High-End Computing Solutions
Fill the high-end computing technology and
capability gap for critical national security
missions
5
HPCS Technical Considerations
Architecture Types
Communication Programming Models
Microprocessor
Custom Vector
Symmetric Multiprocessors Distributed
Shared Memory
HPCS Focus Tailorable Balanced Solutions
Parallel Vector
Shared-Memory Multi-Processing
Massively Parallel Processors Commodity Clusters,
Grids
Distributed-Memory Multi-Computing MPI
Scalable Vector
Commodity HPC
Vector Supercomputer

Single Point Design Solutions are no longer
Acceptable
6
HPCS Program Phases I - III
Academia
Early
Early
Metrics and Benchmarks
Metrics,
Products
Research
Software
Pilot
Benchmarks
Platforms
Tools
Platforms
HPCS Capability or Products
Requirements and Metrics
Application Analysis Performance Assessment
Research Prototypes Pilot Systems
Technology Assessments
System Design Review
Concept Reviews
PDR
DDR
Industry
Phase II Readiness Reviews
Phase III Readiness Review
Fiscal Year
02
05
06
07
08
09
10
03
04
Reviews Industry
Procurements
Critical Program Milestones
Phase III Full Scale Development
Phase II RD
Phase I Industry Concept Study
7
Application Analysis/Performance Assessment
Activity Flow
Inputs
Application Analysis
Benchmarks Metrics
Impacts
DDRE IHEC Mission Analysis
Participants HPCS Technology Drivers Define
System Requirements and Characteristics
Common Critical Kernels
HPCS Applications 1. Cryptanalysis2. Signal and
Image Processing3. Operational Weather4.
Nuclear Stockpile Stewardship5. Etc.
Compact Applications Applications
Mission Partners DOD DOE NNSA NSA NRO
  • Productivity
  • Ratio of Utility/Cost
  • Metrics
  • Development time (cost)
  • Execution time (cost)
  • Implicit Factors

Mission Partners Improved Mission Capability
Participants Cray IBM Sun
DARPA HPCS ProgramMotivation
8
Workflow Priorities Goals
  • Implicit Productivity Factors
  • Workflow Perf. Prog. Port. Robust.
  • Researcher High
  • Enterprise High High High High
  • Production High High

Mission Needs
System Requirements
  • Workflows define scope of customer priorities
  • Activity and Purpose benchmarks will be used to
    measure Productivity
  • HPCS Goal is to add value to each workflow
  • Increase productivity while increasing problem
    size

9
Productivity Framework Overview
Phase II Implement Framework Perform Design
Assessments
Phase I Define Framework Scope Petascale
Requirements
Phase III Transition To HPC Procurement
Quality Framework
Acceptance Level Tests
  • Value Metrics
  • Execution
  • Development

Run Evaluation Experiments
Final Multilevel System Models SN001
Preliminary Multilevel System Models
Prototypes
HPCS Vendors HPCS FFRDC Gov RD
Partners Mission Agencies
Workflows -Production -Enterprise -Researcher
  • Benchmarks
  • -Activity
  • Purpose

Commercial or Nonprofit Productivity Sponsor
HPCS needs to develop a procurement quality
assessment methodology that will be the basis of
2010 HPC procurements
10
HPCS Phase II Teams
Industry
PI Elnozahy
PI Rulifson
PI Smith
  • Goal
  • Provide a new generation of economically viable
    high productivity computing systems for the
    national security and industrial user community
    (2007 2010)

Productivity Team (Lincoln Lead)
MIT Lincoln Laboratory
PI Lucas
PI Benson Snavely
PI Kepner
PI Basili
PI Koester
PIs Vetter, Lusk, Post, Bailey
PIs Gilbert, Edelman, Ahalt, Mitchell
LCS
OhioState
  • Goal
  • Develop a procurement quality assessment
    methodology that will be the basis of 2010 HPC
    procurements

11
Motivation Metrics Drive Designs
You get what you measure
  • Execution Time (Example)
  • Current metrics favor caches and pipelines
  • Systems ill-suited to applications with
  • Low spatial locality
  • Low temporal locality
  • Development Time (Example)
  • No metrics widely used
  • Least common denominator standards
  • Difficult to use
  • Difficult to optimize

Low
Table Toy (GUPS) (Intelligence)
High Performance High Level Languages
High
Large FFTs (Reconnaissance)
Matlab/ Python
Spatial Locality
Adaptive Multi-Physics Weapons Design Vehicle
Design Weather
UPC/CAF
LanguageExpressiveness
HPCS
HPCS
Tradeoffs
C/Fortran MPI/OpenMP
SIMD/DMA
StreamsAdd
Assembly/ VHDL
Top500 Linpack Rmax
Low
High
Language Performance
High
Low
Temporal Locality
Low
High
12
Phase 1 Productivity Framework
Activity Purpose Benchmarks
System Parameters (Examples)
BW bytes/flop (Balance)Memory latencyMemory
size..
Execution Time (cost)
Processor flop/cycle Processor integer
op/cycleBisection BW
Actual System or Model
Productivity Metrics
Work Flows
Productivity
Common Modeling Interface
(Ratio of Utility/Cost)
Size (ft3)Power/rackFacility operation .
Development Time (cost)
Code size Restart time (Reliability) Code
Optimization time
13
Phase 2 Implementation
(Mitre, ISI, LBL, Lincoln, HPCMO, LANL Mission
Partners)
Activity Purpose Benchmarks
System Parameters (Examples)
(Lincoln, OSU, CodeSourcery)
Performance Analysis (ISI, LLNL UCSD)
BW bytes/flop (Balance)Memory latencyMemory
size..
Execution Time (cost)
Exe Interface
Processor flop/cycle Processor integer
op/cycleBisection BW
Actual System or Model
Productivity Metrics
Work Flows
Productivity
Common Modeling Interface
(Ratio of Utility/Cost)
Size (ft3)Power/rackFacility operation .
Development Time (cost)
Code sizeRestart time (Reliability) Code
Optimization time
Dev Interface
Metrics Analysis of Current and New
Codes (Lincoln, UMD Mission Partners)
University Experiments (MIT, UCSB, UCSD, UMD, USC)
(ISI, LLNL UCSD)
(ANL Pmodels Group)
Contains Proprietary Information - For Government
Use Only
14
HPCS Mission Work Flows
Overall Cycle
Development Cycle
Researcher
Days to hours
Hours to minutes
Researcher
Development
Execution
Port Legacy Software
Enterprise
Port Legacy Software
Months to days
Months to days
Design
Production
Initial Product Development
Code
Years to months
Initial Development
Hours to Minutes (Response Time)
Test
Port, Scale, Optimize
HPCS Productivity Factors Performance,
Programmability, Portability, and Robustness are
very closely coupled with each work flow
15
HPC Workflow SW Technologies
  • Production Workflow
  • Many technologies targeting specific pieces of
    workflow
  • Need to quantify workflows (stages and time
    spent)
  • Need to measure technology impact on stages

Supercomputer
Workstation
Design, Code, Test
Algorithm Development
Spec
Run
Port, Scale, Optimize
Operating Systems Compilers Libraries Tools Pr
oblem Solving Environments
Linux
RT Linux
C
F90
Matlab
UPC
Coarray
Java
OpenMP
ATLAS, BLAS, FFTW, PETE, PAPI
VSIPL VSIPL
MPI
CORBA
DRI
UML
Globus
TotalView
POOMA
CCA
PVL
ESMF
HPC Software
Mainstream Software
16
Prototype Productivity Models
Efficiency and Power (Kennedy, Koelbel, Schreiber)
Special Model with Work Estimator (Sterling)
Utility (Snir)
Productivity Factor Based (Kepner)
Least Action (Numrich)
CoCoMo II (software engineering community)
Time-To-Solution (Kogge)
HPCS has triggered ground breaking activity in
understanding HPC productivity -Community focused
on quantifiable productivity (potential for broad
impact)
17
Example Existing Code Analysis
Analysis of existing codes used to test metrics
and identify important trends in productivity and
performance
18
Example Experiment Results (N1)
Matlab
C
C
  • Same application (image filtering)
  • Same programmer
  • Different langs/libs
  • Matlab
  • BLAS
  • BLAS/OpenMP
  • BLAS/MPI
  • PVL/BLAS/MPI
  • MatlabMPI
  • pMatlab

Current Practice
Research
3
Distributed Memory
2
1
PVL BLAS /MPI
BLAS /MPI
pMatlab
Estimate
4
MatlabMPI
Performance (Speedup x Efficiency)
Shared Memory
BLAS/ OpenMP
6
7
5
Single Processor
BLAS
Matlab
Development Time (Lines of Code)
Controlled experiments can potentially measure
the impact of different technologies and quantify
development time and execution time tradeoffs
19
Summary
  • Goal is to develop an acquisition quality
    framework for HPC systems that includes
  • Development time
  • Execution time
  • Have assembled a team that will develop models,
    analyze existing HPC codes, develop tools and
    conduct HPC development time and execution time
    experiments
  • Measures of success
  • Acceptance by users, vendors and acquisition
    community
  • Quantitatively explain HPC rules of thumb
  • "OpenMP is easier than MPI, but doesnt scale a
    high
  • "UPC/CAF is easier than OpenMP
  • "Matlab is easier the Fortran, but isnt as fast
  • Predict impact of new technologies

20
Example Development Time Experiment
  • Goal Quantify development time vs. execution
    time tradeoffs of different parallel programming
    models
  • Message passing (MPI)
  • Threaded (OpenMP)
  • Array (UPC, Co-Array Fortran)
  • Setting Senior/1st Year Grad Class in Parallel
    Computing (MIT/BU, Berkeley/NERSC, CMU/PSC,
    UMD/?, )
  • Timeline
  • Month 1 Intro to parallel programming
  • Month 2 Implement serial version of compact app
  • Month 3 Implement parallel version
  • Metrics
  • Development time (from logs), SLOCS, function
    points,
  • Execution time, scalability, comp/comm, speedup,
  • Analysis
  • Development time vs. Execution time of different
    models
  • Performance relative to expert implementation
  • Size relative to expert implementation

21
Hackystat in HPCS
22
About Hackystat
  • Five years old
  • I wrote the first LOC during first week of May,
    2001.
  • Current size 320,562 LOC (not all mine)
  • 5 active developers
  • Open source, GPL
  • General application areas
  • Education teaching measurement in SE
  • Research Test Driven Design, Software Project
    Telemetry, HPCS
  • Industry project management
  • Has inspired startup 6th Sense Analytics

23
Goals for Hackystat-HPCS
  • Support automated collection of useful low-level
    data for a wide variety of platforms,
    organizations, and application areas.
  • Make Hackystat low-level data accessable in a
    standard XML format for analysis by other tools.
  • Provide workflow and other analyses over
    low-level data collected by Hackystat and other
    tools to support
  • discovery of developmental bottlenecks
  • insight into impact of tool/language/library
    choice for specific applications/organizations.

24
Pilot Study, Spring 2006
  • Goal Explore issues involved in workflow
    analysis using Hackystat and students.
  • Experimental conditions (were challenging)
  • Undergraduate HPC seminar
  • 6 students total, 3 did assignment, 1 collected
    data.
  • 1 week duration
  • Gauss-Seidel iteration problem, written in C,
    using PThreads library, on cluster
  • As a pilot study, it was successful.

25
Data Collection Sensors
  • Sensors for Emacs and Vim captured editing
    activities.
  • Sensor for CUTest captured testing activities.
  • Sensor for Shell captured command line
    activities.
  • Custom makefile with compilation, testing, and
    execution targets, each instrumented with
    sensors.

26
Example data Editor activities
27
Example data Testing
28
Example data File Metrics
29
Example data Shell Logger
30
Data Analysis Workflow States
  • Our goal was to see if we could automatically
    infer the following developer workflow states
  • Serial coding
  • Parallel coding
  • Validation/Verification
  • Debugging
  • Optimization

31
Workflow State Detection Serial coding
  • We defined the "serial coding" state as the
    editing of a file not containing any parallel
    constructs, such as MPI, OpenMP, or PThread
    calls.
  • We determine this through the MakeFile, which
    runs SCLC over the program at compile time and
    collects Hackystat FileMetric data that provides
    counts of parallel constructs.
  • We were able to identify the Serial Coding state
    if the MakeFile was used consistently.

32
Workflow State Detection Parallel Coding
  • We defined the "parallel coding" state as the
    editing of a file containing a parallel construct
    (MPI, OpenMP, PThread call).
  • Similarly to serial coding, we get the data
    required to infer this phase using a MakeFile
    that runs SCLC and collects FileMetric data.
  • We were able to identify the parallel coding
    state if the MakeFile was used consistently.

33
Workflow State Detection Testing
  • We defined the "testing" state as the invocation
    of unit tests to determine the functional
    correctness of the program.
  • Students were provided with test cases and the
    CUTest to test their program.
  • We were able to infer the Testing state if CUTest
    was used consistently.

34
Workflow State Detection Debugging
  • We have not yet been able to generate
    satisfactory heuristics to infer the "debugging"
    state from our data.
  • Students did not use a debugging tool that would
    have allowed instrumentation with a sensor.
  • UMD heuristics, such as the presence of "printf"
    statements, were not collected by SCLC.
  • Debugging is entwined with Testing.

35
Workflow State DetectionOptimization
  • We have not yet been able to generate
    satisfactory heuristics to infer the
    "optimization" state from our data.
  • Students did not use a performance analysis tool
    that would have allowed instrumentation with a
    sensor.
  • Repeated command line invocation of the program
    could potentially identify the activity as
    "optimization".

36
Insights from the pilot study, 1
  • Automatic inference of these workflow states in a
    student setting requires
  • Consistent use of MakeFile (or some other
    mechanism to invoke SCLC consistently) to infer
    serial coding and parallel coding workflow
    states.
  • Consistent use of an instrumented debugging tool
    to infer the debugging workflow state.
  • Consistent use of an "execute" MakeFile target
    (and/or an instrumented performance analysis
    tool) to infer the optimization workflow state.

37
Insights from the pilot study, 2
  • Ironically, it may be easier to infer workflow
    states from industrial settings than from
    classroom settings!
  • Industrial settings are more likely to use a
    wider variety of tools which could be
    instrumented and provide better insight into
    development activities.
  • Large scale programming leads inexorably to
    consistent use of MakeFiles (or similar scripts)
    that should simplify state inference.

38
Insights from the pilot study, 3
  • Are we defining the right set of workflow states?
  • For example, the "debugging" phase seems
    difficult to distinguish as a distinct state.
  • Do we really need to infer "debugging" as a
    distinct activity?
  • Workflow inference heuristics appear to be highly
    contextual, depending upon the language, toolset,
    organization, and application. (This is not a
    bug, this is just reality. We will probably need
    to enable each MP to develop heuristics that work
    for them.)

39
Next steps
  • Graduate HPC classes at UH.
  • The instructor (Henri Casanova) has agreed to
    participate with UMD and UH/Hackystat in data
    collection and analysis.
  • Bigger assignments, more sophisticated students,
    hopefully larger class!
  • Workflow Inference System for Hackystat (WISH)
  • Support export of raw data to other tools.
  • Support import of raw data from other tools.
  • Provide high-level rule-based inference mechanism
    to support organization-specific heuristics for
    workflow state identification.
Write a Comment
User Comments (0)
About PowerShow.com