Title: Add title here
1Student opportunities at RENCI
Rob Fowler (Joint work with Allan Porterfield and
Todd Gamblin) Chief Domain Scientist,
HPC Renaissance Computing Institute Aug 18, 2008
2What we are
- Renaissance Computing Institute
- Founded 2004
- Stakeholders
- Triangle.edu UNC-CH, Duke, NCSU
- Statewide.edu ECU, UNC-Charlotte, UNC-Ashville,
ASU, - NC.gov, counties,
- Federal agencies NSF, DOE, DoD, NOAA, FEMA,
- Mission
- Enhance the capabilities of our stakeholders.
- Solve important problems.
- Strategies
- direct effort, technology transfer, collaborative
engagement.
3Where we are.
Engagement sites-- UNC-CH -- UNC Med. Lib --
Duke -- NCSU -- ECU -- UNC Charlotte-- UNC
Asheville
RENCI Anchor
RENCI-UNC (ITS Manning)
4Current Research Opportunities
- Application Areas
- Biomedical Visual analytics.
- Health delivery.
- Climate and Environmental Modeling.
- Emergency Response
- Core Computer Science
- Performance monitoring and analysis on million
thread systems. Application of data mining
methods. - Resource-centric monitoring and analysis for
multi- and many-core systems.
5RENCIs Disaster Studies Group
- Use technology to solve problems in North
Carolina - Environmental modeling
- Collaborative workspaces emergency managers
- Environmental sensing
6Opportunities
- Model coupling
- Linking weather, hydrology, and storm surge
models together. Data assimilation (from
sensors) - Work flow
- Management of processes, recovery from failure of
one element - Mashups
- Support situational awareness during disasters
- Asset and people tracking
- Information flow control
- Weather and communication tools for emergency
management community
7Environmental Model Coupling
Floodplain Maps
Storm Surge Forecasts
8Environmental Modeling Workflows
WRF Preprocessing System (WPS)
Var3D
Graphics Post-processing
NC EcoNet
RADAR
Brunswick Sensors
MRR
Consumers
9Mashups for NC Emergency Managers
10Performance Monitoring and Analysis
- Emerging technologies ? Challenges
- On-chip parallelism
- Prodigeous concurrent computation (cores)
- Limited shared resources (L3, memory, I/O)
- High node counts (100K to Millions)
- Very, very high degree of parallelism.
- Limited to spend on I/O, interconnect
- New system balance issues at all levels
- Dealing with Amdahls Law writ large.
- Conserving scarce shared resources.
11RENCI activities.
- Resource-centric, on-node measurement
- Interaction of threads at shared resources
- Limited budget for monitoring analysis
- On-chip filtering/introspection/feedback
- Hardware bottlenecks first, software later
- Adaptive application runtime
- Bottlenecks? Power and Perf. Adaptation
- Tools at full scale.
- Limited communication/IO budget
- In situ measurement/analysis/diagnosis
- Focus on scalability issues balance,
serialization - Very large, long-running, adaptive apps.
12Why is performance not obvious?
- Hardware complexity
- Keeping up with Moores law with one thread.
- Instruction-level parallelism.
- Deeply pipelined, out-of-order, superscalar,
threaded. - Memory-system parallelism
- Parallel processor-cache interface, limited
resources. - Need at least k concurrent memory accesses in
flight. - Software complexity
- Program size, languages, styles
- Competition/cooperation with other threads
- Dependence on (dynamic) libraries.
- Compilers
13? Each core needs 2 to 6 ops in flight to hide
latencies and get decent bandwidth.
Implications for DDRn memory architecture?
(John McCalpin, AMD, July 2007)
14System BalanceMulticore Economics 101
8 cores/chip 8 threads per core8 FBDIMM chains
per system. X 4 sticks per chain
Announcement Niagra 2 chipwill be available for
lt100032 DIMMS _at_ 100 3200
Niagra 2 chip is nominally a 95
watt Part. Micron dual rank FBDIMM 15W
single rank 10.5Wvs 5.5 W/rank for DDR2
15Resource-Centric Tools
- Utilization and serialization at shared resources
will dominate performance. - Hardware Memory hierarchy, channels.
- Software Synchronization, scheduling.
- Tools need to focus on these issues.
- It will be (too) easy to over-provision a chip
with cores relative to all else. - Memory effects are obvious 1st target
- Contention for shared cache big footprints
- Bus/memory utilization.
- DRAM page contention too many streams
- Reflection make data available for introspective
adaptation.
16Cores vs Nest Issues for HPM Software
- Performance sensors in every block.
- Nest counters extend the processor model.
- Current models
- Process/thread centric monitoring
- Virtual counters follow threads. Expensive,
tricky. - Node wide, but now (corePID centric)
- Inadequate monitoring of core-nest-core
interaction. - No monitoring of fine grain thread-thread
interactions (on-core resource contention). - No monitoring of concurrency resources.
Nest
Cores
17HPM on a Multicore Chip.Who can measure what.
Counters within a core can measure events in that
core, or in the nest.
Core 0
Core 1
Core 2
Core 3
CTRS
CTRS
CTRS
CTRS
FPU
FPU
FPU
FPU
L1
L1
L1
L1
L2
L2
L2
L2
Nest
L3
Mem-CTL
NIC
DDR-A
DDR-B
DDR-C
HT-1
HT-1
HT-1
Sensor
Counter
18RCRTool Strategy
One core (0) measures nest events. The others
monitor core events. Core 0 processes the event
logs of all cores. Runs on-node
analysis. MAESTRO All other jitter producing
OS activity confined to core 0.
Core 0
Core 1
Core 2
Core 3
CTRS
CTRS
CTRS
CTRS
FPU
FPU
FPU
FPU
L1
L1
L1
L1
L2
L2
L2
L2
Nest
L3
Mem-CTL
NIC
DDR-A
DDR-B
DDR-C
HT-1
HT-1
HT-1
Sensor
Counter
19RCRTool Architecture.
Similar to, but extends DCPI, oprofile, pfmon,
Histograms, conflict graphs, on-line summaries
and off-line reports.
Analysis Demon
HW Events
SW Events
Context switches
HPM driver Core 0 EBS events IBS events
HPM driver EBS events IBS events
HPM driver EBS events IBS events
HPM driver EBS events IBS events
Locks, queues, etc.
Power monitors
Kernel space log
20The need for scalable tools.(Todd Gamblins
dissertation work.)
- Fastest machines are increasingly concurrent
- Exponential growth in concurrency
- BlueGene/L now has 212,992 processors
- Million core systems soon.
- Need tools that can collect and analyze data from
this many cores
Concurrency levels in the Top 100
http//www.top500.org
21Challenges for Performance Tools
- Problem 2 Analysis
- Even if data could be stored to disks offline,
this would only delay the problem - Performance characterization must be manageable
by humans - Implies a concise description
- Eliminate redundancy
- e.g. in SPMD codes, most processes are similar
- Traditional scripts and data mining techniques
wont cut it - Need processing power in proportion to the data
- Need to perform the analysis online, in situ
22Motherboard Power Monitor
Prototype slightly wider than a disk bay,.
Estimated materials costs for a build of 100
45, smaller redesign 35
23Contact Information
Rob Fowler rjf_at_renci.org, rjf_at_unc.edu 919 445
9670 RENCI http//www.renci.org/
24(No Transcript)