Add title here - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Add title here

Description:

Rob Fowler (Joint work with Allan Porterfield and Todd Gamblin) ... SWAN. ADCIRC. Floodplain Maps. Storm Surge Forecasts. Environmental Modeling Workflows ... – PowerPoint PPT presentation

Number of Views:78

Avg rating:3.0/5.0

Slides: 25

Provided by: csU8

Category:

more less

Transcript and Presenter's Notes

Title: Add title here

1
Student opportunities at RENCI
Rob Fowler (Joint work with Allan Porterfield and
Todd Gamblin) Chief Domain Scientist,
HPC Renaissance Computing Institute Aug 18, 2008
2
What we are

Renaissance Computing Institute
Founded 2004
Stakeholders
Triangle.edu UNC-CH, Duke, NCSU
Statewide.edu ECU, UNC-Charlotte, UNC-Ashville,
ASU,
NC.gov, counties,
Federal agencies NSF, DOE, DoD, NOAA, FEMA,
Mission
Enhance the capabilities of our stakeholders.
Solve important problems.
Strategies
direct effort, technology transfer, collaborative
engagement.

3
Where we are.
Engagement sites-- UNC-CH -- UNC Med. Lib --
Duke -- NCSU -- ECU -- UNC Charlotte-- UNC
Asheville
RENCI Anchor
RENCI-UNC (ITS Manning)
4
Current Research Opportunities

Application Areas
Biomedical Visual analytics.
Health delivery.
Climate and Environmental Modeling.
Emergency Response
Core Computer Science
Performance monitoring and analysis on million
thread systems. Application of data mining
methods.
Resource-centric monitoring and analysis for
multi- and many-core systems.

5
RENCIs Disaster Studies Group

Use technology to solve problems in North
Carolina
Environmental modeling
Collaborative workspaces emergency managers
Environmental sensing

6
Opportunities

Model coupling
Linking weather, hydrology, and storm surge
models together. Data assimilation (from
sensors)
Work flow
Management of processes, recovery from failure of
one element
Mashups
Support situational awareness during disasters
Asset and people tracking
Information flow control
Weather and communication tools for emergency
management community

7
Environmental Model Coupling
Floodplain Maps
Storm Surge Forecasts
8
Environmental Modeling Workflows
WRF Preprocessing System (WPS)
Var3D
Graphics Post-processing
NC EcoNet
RADAR
Brunswick Sensors
MRR
Consumers
9
Mashups for NC Emergency Managers
10
Performance Monitoring and Analysis

Emerging technologies ? Challenges
On-chip parallelism
Prodigeous concurrent computation (cores)
Limited shared resources (L3, memory, I/O)
High node counts (100K to Millions)
Very, very high degree of parallelism.
Limited to spend on I/O, interconnect
New system balance issues at all levels
Dealing with Amdahls Law writ large.
Conserving scarce shared resources.

11
RENCI activities.

Resource-centric, on-node measurement
Interaction of threads at shared resources
Limited budget for monitoring analysis
On-chip filtering/introspection/feedback
Hardware bottlenecks first, software later
Adaptive application runtime
Bottlenecks? Power and Perf. Adaptation
Tools at full scale.
Limited communication/IO budget
In situ measurement/analysis/diagnosis
Focus on scalability issues balance,
serialization
Very large, long-running, adaptive apps.

12
Why is performance not obvious?

Hardware complexity
Keeping up with Moores law with one thread.
Instruction-level parallelism.
Deeply pipelined, out-of-order, superscalar,
threaded.
Memory-system parallelism
Parallel processor-cache interface, limited
resources.
Need at least k concurrent memory accesses in
flight.
Software complexity
Program size, languages, styles
Competition/cooperation with other threads
Dependence on (dynamic) libraries.
Compilers

13
? Each core needs 2 to 6 ops in flight to hide
latencies and get decent bandwidth.
Implications for DDRn memory architecture?
(John McCalpin, AMD, July 2007)
14
System BalanceMulticore Economics 101
8 cores/chip 8 threads per core8 FBDIMM chains
per system. X 4 sticks per chain
Announcement Niagra 2 chipwill be available for
lt100032 DIMMS _at_ 100 3200
Niagra 2 chip is nominally a 95
watt Part. Micron dual rank FBDIMM 15W
single rank 10.5Wvs 5.5 W/rank for DDR2
15
Resource-Centric Tools

Utilization and serialization at shared resources
will dominate performance.
Hardware Memory hierarchy, channels.
Software Synchronization, scheduling.
Tools need to focus on these issues.
It will be (too) easy to over-provision a chip
with cores relative to all else.
Memory effects are obvious 1st target
Contention for shared cache big footprints
Bus/memory utilization.
DRAM page contention too many streams
Reflection make data available for introspective
adaptation.

16
Cores vs Nest Issues for HPM Software

Performance sensors in every block.
Nest counters extend the processor model.
Current models
Process/thread centric monitoring
Virtual counters follow threads. Expensive,
tricky.
Node wide, but now (corePID centric)
Inadequate monitoring of core-nest-core
interaction.
No monitoring of fine grain thread-thread
interactions (on-core resource contention).
No monitoring of concurrency resources.

Nest
Cores
17
HPM on a Multicore Chip.Who can measure what.
Counters within a core can measure events in that
core, or in the nest.
Core 0
Core 1
Core 2
Core 3
CTRS
CTRS
CTRS
CTRS
FPU
FPU
FPU
FPU
L1
L1
L1
L1
L2
L2
L2
L2
Nest
L3
Mem-CTL
NIC
DDR-A
DDR-B
DDR-C
HT-1
HT-1
HT-1
Sensor
Counter
18
RCRTool Strategy
One core (0) measures nest events. The others
monitor core events. Core 0 processes the event
logs of all cores. Runs on-node
analysis. MAESTRO All other jitter producing
OS activity confined to core 0.
Core 0
Core 1
Core 2
Core 3
CTRS
CTRS
CTRS
CTRS
FPU
FPU
FPU
FPU
L1
L1
L1
L1
L2
L2
L2
L2
Nest
L3
Mem-CTL
NIC
DDR-A
DDR-B
DDR-C
HT-1
HT-1
HT-1
Sensor
Counter
19
RCRTool Architecture.
Similar to, but extends DCPI, oprofile, pfmon,
Histograms, conflict graphs, on-line summaries
and off-line reports.
Analysis Demon
HW Events
SW Events
Context switches
HPM driver Core 0 EBS events IBS events
HPM driver EBS events IBS events
HPM driver EBS events IBS events
HPM driver EBS events IBS events
Locks, queues, etc.
Power monitors
Kernel space log
20
The need for scalable tools.(Todd Gamblins
dissertation work.)

Fastest machines are increasingly concurrent
Exponential growth in concurrency
BlueGene/L now has 212,992 processors
Million core systems soon.
Need tools that can collect and analyze data from
this many cores

Concurrency levels in the Top 100
http//www.top500.org
21
Challenges for Performance Tools

Problem 2 Analysis
Even if data could be stored to disks offline,
this would only delay the problem
Performance characterization must be manageable
by humans
Implies a concise description
Eliminate redundancy
e.g. in SPMD codes, most processes are similar
Traditional scripts and data mining techniques
wont cut it
Need processing power in proportion to the data
Need to perform the analysis online, in situ

22
Motherboard Power Monitor
Prototype slightly wider than a disk bay,.
Estimated materials costs for a build of 100
45, smaller redesign 35
23
Contact Information
Rob Fowler rjf_at_renci.org, rjf_at_unc.edu 919 445
9670 RENCI http//www.renci.org/
24
(No Transcript)

Write a Comment

User Comments (0)