Title: Hackystat and the DARPA High Productivity Computing Systems Program Philip Johnson University of Hawaii
1Hackystat and the DARPA High Productivity
Computing Systems ProgramPhilip
JohnsonUniversity of Hawaii
2Overview of HPCS
3High ProductivityComputing Systems
- Goal
- Provide a new generation of economically viable
high productivity computing systems for the
national security and industrial user community
(2007 2010)
- Impact
- Performance (time-to-solution) speedup critical
national security applications by a factor of 10X
to 40X - Programmability (time-for-idea-to-first-solution)
reduce cost and time of developing application
solutions - Portability (transparency) insulate research and
operational application software from system - Robustness (reliability) apply all known
techniques to protect against outside attacks,
hardware faults, programming errors
HPCS Program Focus Areas
- Applications
- Intelligence/surveillance, reconnaissance,
cryptanalysis, weapons analysis, airborne
contaminant modeling and biotechnology
Fill the Critical Technology and Capability
Gap Today (late 80s HPC technology)..to..Future
(Quantum/Bio Computing)
4Vision Focus on the Lost Dimension of HPC
User System Efficiency and Productivity
1980s Technology
Parallel Vector Systems
Vector
Moores Law Double Raw Performance every 18
Months
Commodity HPCs
Tightly Coupled Parallel Systems
New Goal Double Value Every 18 Months
2010High-End Computing Solutions
Fill the high-end computing technology and
capability gap for critical national security
missions
5HPCS Technical Considerations
Architecture Types
Communication Programming Models
Microprocessor
Custom Vector
Symmetric Multiprocessors Distributed
Shared Memory
HPCS Focus Tailorable Balanced Solutions
Parallel Vector
Shared-Memory Multi-Processing
Massively Parallel Processors Commodity Clusters,
Grids
Distributed-Memory Multi-Computing MPI
Scalable Vector
Commodity HPC
Vector Supercomputer
Single Point Design Solutions are no longer
Acceptable
6HPCS Program Phases I - III
Academia
Early
Early
Metrics and Benchmarks
Metrics,
Products
Research
Software
Pilot
Benchmarks
Platforms
Tools
Platforms
HPCS Capability or Products
Requirements and Metrics
Application Analysis Performance Assessment
Research Prototypes Pilot Systems
Technology Assessments
System Design Review
Concept Reviews
PDR
DDR
Industry
Phase II Readiness Reviews
Phase III Readiness Review
Fiscal Year
02
05
06
07
08
09
10
03
04
Reviews Industry
Procurements
Critical Program Milestones
Phase III Full Scale Development
Phase II RD
Phase I Industry Concept Study
7Application Analysis/Performance Assessment
Activity Flow
Inputs
Application Analysis
Benchmarks Metrics
Impacts
DDRE IHEC Mission Analysis
Participants HPCS Technology Drivers Define
System Requirements and Characteristics
Common Critical Kernels
HPCS Applications 1. Cryptanalysis2. Signal and
Image Processing3. Operational Weather4.
Nuclear Stockpile Stewardship5. Etc.
Compact Applications Applications
Mission Partners DOD DOE NNSA NSA NRO
- Productivity
- Ratio of Utility/Cost
- Metrics
- Development time (cost)
- Execution time (cost)
- Implicit Factors
Mission Partners Improved Mission Capability
Participants Cray IBM Sun
DARPA HPCS ProgramMotivation
8Workflow Priorities Goals
- Implicit Productivity Factors
- Workflow Perf. Prog. Port. Robust.
- Researcher High
- Enterprise High High High High
- Production High High
Mission Needs
System Requirements
- Workflows define scope of customer priorities
- Activity and Purpose benchmarks will be used to
measure Productivity - HPCS Goal is to add value to each workflow
- Increase productivity while increasing problem
size
9Productivity Framework Overview
Phase II Implement Framework Perform Design
Assessments
Phase I Define Framework Scope Petascale
Requirements
Phase III Transition To HPC Procurement
Quality Framework
Acceptance Level Tests
- Value Metrics
- Execution
- Development
Run Evaluation Experiments
Final Multilevel System Models SN001
Preliminary Multilevel System Models
Prototypes
HPCS Vendors HPCS FFRDC Gov RD
Partners Mission Agencies
Workflows -Production -Enterprise -Researcher
- Benchmarks
- -Activity
- Purpose
Commercial or Nonprofit Productivity Sponsor
HPCS needs to develop a procurement quality
assessment methodology that will be the basis of
2010 HPC procurements
10HPCS Phase II Teams
Industry
PI Elnozahy
PI Rulifson
PI Smith
- Goal
- Provide a new generation of economically viable
high productivity computing systems for the
national security and industrial user community
(2007 2010)
Productivity Team (Lincoln Lead)
MIT Lincoln Laboratory
PI Lucas
PI Benson Snavely
PI Kepner
PI Basili
PI Koester
PIs Vetter, Lusk, Post, Bailey
PIs Gilbert, Edelman, Ahalt, Mitchell
LCS
OhioState
- Goal
- Develop a procurement quality assessment
methodology that will be the basis of 2010 HPC
procurements
11Motivation Metrics Drive Designs
You get what you measure
- Execution Time (Example)
- Current metrics favor caches and pipelines
- Systems ill-suited to applications with
- Low spatial locality
- Low temporal locality
- Development Time (Example)
- No metrics widely used
- Least common denominator standards
- Difficult to use
- Difficult to optimize
Low
Table Toy (GUPS) (Intelligence)
High Performance High Level Languages
High
Large FFTs (Reconnaissance)
Matlab/ Python
Spatial Locality
Adaptive Multi-Physics Weapons Design Vehicle
Design Weather
UPC/CAF
LanguageExpressiveness
HPCS
HPCS
Tradeoffs
C/Fortran MPI/OpenMP
SIMD/DMA
StreamsAdd
Assembly/ VHDL
Top500 Linpack Rmax
Low
High
Language Performance
High
Low
Temporal Locality
Low
High
12Phase 1 Productivity Framework
Activity Purpose Benchmarks
System Parameters (Examples)
BW bytes/flop (Balance)Memory latencyMemory
size..
Execution Time (cost)
Processor flop/cycle Processor integer
op/cycleBisection BW
Actual System or Model
Productivity Metrics
Work Flows
Productivity
Common Modeling Interface
(Ratio of Utility/Cost)
Size (ft3)Power/rackFacility operation .
Development Time (cost)
Code size Restart time (Reliability) Code
Optimization time
13Phase 2 Implementation
(Mitre, ISI, LBL, Lincoln, HPCMO, LANL Mission
Partners)
Activity Purpose Benchmarks
System Parameters (Examples)
(Lincoln, OSU, CodeSourcery)
Performance Analysis (ISI, LLNL UCSD)
BW bytes/flop (Balance)Memory latencyMemory
size..
Execution Time (cost)
Exe Interface
Processor flop/cycle Processor integer
op/cycleBisection BW
Actual System or Model
Productivity Metrics
Work Flows
Productivity
Common Modeling Interface
(Ratio of Utility/Cost)
Size (ft3)Power/rackFacility operation .
Development Time (cost)
Code sizeRestart time (Reliability) Code
Optimization time
Dev Interface
Metrics Analysis of Current and New
Codes (Lincoln, UMD Mission Partners)
University Experiments (MIT, UCSB, UCSD, UMD, USC)
(ISI, LLNL UCSD)
(ANL Pmodels Group)
Contains Proprietary Information - For Government
Use Only
14HPCS Mission Work Flows
Overall Cycle
Development Cycle
Researcher
Days to hours
Hours to minutes
Researcher
Development
Execution
Port Legacy Software
Enterprise
Port Legacy Software
Months to days
Months to days
Design
Production
Initial Product Development
Code
Years to months
Initial Development
Hours to Minutes (Response Time)
Test
Port, Scale, Optimize
HPCS Productivity Factors Performance,
Programmability, Portability, and Robustness are
very closely coupled with each work flow
15HPC Workflow SW Technologies
- Production Workflow
- Many technologies targeting specific pieces of
workflow - Need to quantify workflows (stages and time
spent) - Need to measure technology impact on stages
Supercomputer
Workstation
Design, Code, Test
Algorithm Development
Spec
Run
Port, Scale, Optimize
Operating Systems Compilers Libraries Tools Pr
oblem Solving Environments
Linux
RT Linux
C
F90
Matlab
UPC
Coarray
Java
OpenMP
ATLAS, BLAS, FFTW, PETE, PAPI
VSIPL VSIPL
MPI
CORBA
DRI
UML
Globus
TotalView
POOMA
CCA
PVL
ESMF
HPC Software
Mainstream Software
16Prototype Productivity Models
Efficiency and Power (Kennedy, Koelbel, Schreiber)
Special Model with Work Estimator (Sterling)
Utility (Snir)
Productivity Factor Based (Kepner)
Least Action (Numrich)
CoCoMo II (software engineering community)
Time-To-Solution (Kogge)
HPCS has triggered ground breaking activity in
understanding HPC productivity -Community focused
on quantifiable productivity (potential for broad
impact)
17Example Existing Code Analysis
Analysis of existing codes used to test metrics
and identify important trends in productivity and
performance
18Example Experiment Results (N1)
Matlab
C
C
- Same application (image filtering)
- Same programmer
- Different langs/libs
- Matlab
- BLAS
- BLAS/OpenMP
- BLAS/MPI
- PVL/BLAS/MPI
- MatlabMPI
- pMatlab
Current Practice
Research
3
Distributed Memory
2
1
PVL BLAS /MPI
BLAS /MPI
pMatlab
Estimate
4
MatlabMPI
Performance (Speedup x Efficiency)
Shared Memory
BLAS/ OpenMP
6
7
5
Single Processor
BLAS
Matlab
Development Time (Lines of Code)
Controlled experiments can potentially measure
the impact of different technologies and quantify
development time and execution time tradeoffs
19Summary
- Goal is to develop an acquisition quality
framework for HPC systems that includes - Development time
- Execution time
- Have assembled a team that will develop models,
analyze existing HPC codes, develop tools and
conduct HPC development time and execution time
experiments - Measures of success
- Acceptance by users, vendors and acquisition
community - Quantitatively explain HPC rules of thumb
- "OpenMP is easier than MPI, but doesnt scale a
high - "UPC/CAF is easier than OpenMP
- "Matlab is easier the Fortran, but isnt as fast
- Predict impact of new technologies
20Example Development Time Experiment
- Goal Quantify development time vs. execution
time tradeoffs of different parallel programming
models - Message passing (MPI)
- Threaded (OpenMP)
- Array (UPC, Co-Array Fortran)
- Setting Senior/1st Year Grad Class in Parallel
Computing (MIT/BU, Berkeley/NERSC, CMU/PSC,
UMD/?, ) - Timeline
- Month 1 Intro to parallel programming
- Month 2 Implement serial version of compact app
- Month 3 Implement parallel version
- Metrics
- Development time (from logs), SLOCS, function
points, - Execution time, scalability, comp/comm, speedup,
- Analysis
- Development time vs. Execution time of different
models - Performance relative to expert implementation
- Size relative to expert implementation
21Hackystat in HPCS
22About Hackystat
- Five years old
- I wrote the first LOC during first week of May,
2001. - Current size 320,562 LOC (not all mine)
- 5 active developers
- Open source, GPL
- General application areas
- Education teaching measurement in SE
- Research Test Driven Design, Software Project
Telemetry, HPCS - Industry project management
- Has inspired startup 6th Sense Analytics
23Goals for Hackystat-HPCS
- Support automated collection of useful low-level
data for a wide variety of platforms,
organizations, and application areas. - Make Hackystat low-level data accessable in a
standard XML format for analysis by other tools. - Provide workflow and other analyses over
low-level data collected by Hackystat and other
tools to support - discovery of developmental bottlenecks
- insight into impact of tool/language/library
choice for specific applications/organizations.
24Pilot Study, Spring 2006
- Goal Explore issues involved in workflow
analysis using Hackystat and students. - Experimental conditions (were challenging)
- Undergraduate HPC seminar
- 6 students total, 3 did assignment, 1 collected
data. - 1 week duration
- Gauss-Seidel iteration problem, written in C,
using PThreads library, on cluster - As a pilot study, it was successful.
25Data Collection Sensors
- Sensors for Emacs and Vim captured editing
activities. - Sensor for CUTest captured testing activities.
- Sensor for Shell captured command line
activities. - Custom makefile with compilation, testing, and
execution targets, each instrumented with
sensors.
26Example data Editor activities
27Example data Testing
28Example data File Metrics
29Example data Shell Logger
30Data Analysis Workflow States
- Our goal was to see if we could automatically
infer the following developer workflow states - Serial coding
- Parallel coding
- Validation/Verification
- Debugging
- Optimization
31Workflow State Detection Serial coding
- We defined the "serial coding" state as the
editing of a file not containing any parallel
constructs, such as MPI, OpenMP, or PThread
calls. - We determine this through the MakeFile, which
runs SCLC over the program at compile time and
collects Hackystat FileMetric data that provides
counts of parallel constructs. - We were able to identify the Serial Coding state
if the MakeFile was used consistently.
32Workflow State Detection Parallel Coding
- We defined the "parallel coding" state as the
editing of a file containing a parallel construct
(MPI, OpenMP, PThread call). - Similarly to serial coding, we get the data
required to infer this phase using a MakeFile
that runs SCLC and collects FileMetric data. - We were able to identify the parallel coding
state if the MakeFile was used consistently.
33Workflow State Detection Testing
- We defined the "testing" state as the invocation
of unit tests to determine the functional
correctness of the program. - Students were provided with test cases and the
CUTest to test their program. - We were able to infer the Testing state if CUTest
was used consistently.
34Workflow State Detection Debugging
- We have not yet been able to generate
satisfactory heuristics to infer the "debugging"
state from our data. - Students did not use a debugging tool that would
have allowed instrumentation with a sensor. - UMD heuristics, such as the presence of "printf"
statements, were not collected by SCLC. - Debugging is entwined with Testing.
35Workflow State DetectionOptimization
- We have not yet been able to generate
satisfactory heuristics to infer the
"optimization" state from our data. - Students did not use a performance analysis tool
that would have allowed instrumentation with a
sensor. - Repeated command line invocation of the program
could potentially identify the activity as
"optimization".
36Insights from the pilot study, 1
- Automatic inference of these workflow states in a
student setting requires - Consistent use of MakeFile (or some other
mechanism to invoke SCLC consistently) to infer
serial coding and parallel coding workflow
states. - Consistent use of an instrumented debugging tool
to infer the debugging workflow state. - Consistent use of an "execute" MakeFile target
(and/or an instrumented performance analysis
tool) to infer the optimization workflow state.
37Insights from the pilot study, 2
- Ironically, it may be easier to infer workflow
states from industrial settings than from
classroom settings! - Industrial settings are more likely to use a
wider variety of tools which could be
instrumented and provide better insight into
development activities. - Large scale programming leads inexorably to
consistent use of MakeFiles (or similar scripts)
that should simplify state inference.
38Insights from the pilot study, 3
- Are we defining the right set of workflow states?
- For example, the "debugging" phase seems
difficult to distinguish as a distinct state. - Do we really need to infer "debugging" as a
distinct activity? - Workflow inference heuristics appear to be highly
contextual, depending upon the language, toolset,
organization, and application. (This is not a
bug, this is just reality. We will probably need
to enable each MP to develop heuristics that work
for them.)
39Next steps
- Graduate HPC classes at UH.
- The instructor (Henri Casanova) has agreed to
participate with UMD and UH/Hackystat in data
collection and analysis. - Bigger assignments, more sophisticated students,
hopefully larger class! - Workflow Inference System for Hackystat (WISH)
- Support export of raw data to other tools.
- Support import of raw data from other tools.
- Provide high-level rule-based inference mechanism
to support organization-specific heuristics for
workflow state identification.