SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements - PowerPoint PPT Presentation

About This Presentation
Title:

SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements

Description:

University of Southern California (lead) California Institute of Technology. Columbia University. Harvard University. Massachusetts Institute of Technology ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: SCEC/CME Project - How Earthquake Simulations Drive Middleware Requirements


1
SCEC/CME Project - How Earthquake Simulations
Drive Middleware Requirements
  • Philip Maechling
  • SCEC IT Architect
  • 24 June 2005

2
Southern California Earthquake Center
  • Consortium of 15 core institutions and 39 other
    participating organizations, founded as an NSF
    STC in 1991
  • Co-funded by NSF and USGS under the National
    Earthquake Hazards Reduction Program (NEHRP)
  • Mission
  • Gather data on earthquakes in Southern California
  • Integrate information into a comprehensive,
    physics-based understanding of earthquake
    phenomena
  • Communicate understanding to end-users and the
    general public to increase earthquake awareness
    and reduce earthquake risk

Core Institutions University of Southern
California (lead) California Institute of
Technology Columbia University Harvard
University Massachusetts Institute of
Technology San Diego State University Stanford
University U.S. Geological Survey (3
offices) University of California, Los
Angeles University of California, San
Diego University of California, Santa
Barbara University of Nevada, Reno Participating
Institutions 39 national and international
universities and research organizations
http//www.scec.org
3
Recent Earthquakes In California
4
Observed Areas of Strong Ground Motion
5
Simulations Supplement Observed Data
6
SCEC/CME Project
Goal To develop a cyberinfrastructure that can
support system-level earthquake science the
SCEC Community Modeling Environment (CME)
Support 5-yr project funded by the NSF/ITR
program under the CISE and Geoscience
Directorates Start date Oct 1, 2001
NSF CISE GEO
SCEC/ITR Project
USGS
ISI
Information Science
Earth Science
SDSC
IRIS
SCEC Institutions
www.scec.org/cme
7
SCEC/CME Scientific Workflow Construction
A major SCEC/CME objective is the ability to
construct and run complex scientific workflow for
SHA
Lat/Long/Amp (xyz file) with 3000 datapoints
(100Kb)
Define Scenario Earthquake
ERF Definition
Calculate Hazard Curves
Extract IMR Value
Plot Hazard Map
9000 Hazard Curve files (9000 x 0.5 Mb 4.5Gb)
IMR Definition
GMT Map Configuration Parameters
Gridded Region Definition
Probability of Exceedence and IMR Definition
Pathway 1 example
8
SCEC/CME Scientific Workflow System
9
SCEC/CME SRB-based Digital Library
  • SRB-based Digital Library
  • More than 100 Terabytes of tape archive
  • 4 Terabytes of on-line disk
  • 5 Terabytes of disk cache for derivations

10
INTEGRATED WORKFLOW ARCHITECTURE
J. Zechar _at_ USC (Teamwork Geo CS)
Workflow Template Editor (CAT)
Query for components
D. Okaya _at_ USC
Tools
Domain Ontology
Workflow Template (WT)
Workflow Library
Component Library
Query for WT
Data Selection
Query for data given metadata
L. Hearn _at_ UBC
COMPONENTS
I/O data descriptions
Conceptual Data Query Engine (DataFinder)
Metadata Catalog
Workflow Instance (WI)
Execution requirements
Engineer
Workflow Mapping (Pegasus)
Grid information services
Tools
Grid
K. Olsen _at_ SDSU
Executable Workflow
11
SCEC/CME HPC Allocations
  • SCEC/CME researchers have need and have access to
    significant High Performance Computing
    capabilities
  • TeraGrid Allocations (April 2005 March 2006)
  • TG-MCA03S012 (Olsen) 1,020,000 SUs
  • TG-BCS050002S (Okaya) 145,000 Sus
  • USC HPCC Allocations
  • CME Group Allocations (Maechling) 100,000 SUs
  • Investigator Allocations (Li, Jordan) 300,000 SUs
  • SCEC Cluster
  • Dedicated Pentium 4 16 Processor Cluster (102
    GFlops)

12
SCEC/CME TeraGrid Support
  • TeraGrid Strategic Application Collaboration
    (SAC) greatly improved our AWM run-time on
    TeraGrid
  • Advanced TeraGrid Support (ATS) for TeraShake 2
    and CyberShake simulations
  • SDSC Visualization Services support for SCEC
    simulations.

13
Three Types of Simulations
  • SCEC/CME supports widely varying types of
    earthquake simulations
  • Each Simulation type creates its own set of
    middleware requirements
  • Will Describe three examples and comment on their
    middleware implications and on computational
    system requirements
  • Probabilistic Seismic Hazard Maps
  • 3D Waveform Propagation Simulations
  • 3D Waveform-based Intensity Measure Relationship

14
Probabilistic Seismic Hazard Maps
15
Example Hazard Curve
Site USC ERF Frankel-02 IMR Field IMT Peak
Velocity Time Period 50 Years
16
Probabilistic Hazard Map Calculations
17
Characteristic of PSHA Simulations
  • 10k Independent hazard curve calculations for
    each map calculations.
  • High throughput, not high performance, computing
    problem.
  • 10k resulting files per map
  • Metadata saved for each file
  • Short run times on each calculation
  • Overhead of starting up job is expensive.
  • Would like to offer map calculations as service
    to SCEC users (who may not have an allocation)

18
Middleware Implications
  • High throughput scheduling
  • Well Suited to Condor Pool
  • Bundling of short run-time jobs will reduce job
    startup overhead.
  • Bundling of jobs useful for clusters execution.
  • Metadata tracking with a RDBMS-based catalog
    system (e.g. Metadata Catalog System (MCS) and
    Replication Location Service (RLS)
  • Databases present installation and operational
    problems at ever site we request them
  • Software support for interpreted language on
    Computational Clusters
  • Implemented in an interpreted programming
    language
  • On-demand execution by non-allocated user

19
3D Wave Propagation Simulations
20
Characteristics of 3D Wave Propagation Simulations
  • More physically realistic than existing PSHA but
    more computationally expensive.
  • High Performance Computing, cluster-based codes
  • 4D data calculations (time varying volumetric
    data)
  • Output large volumetric data sets
  • Physics limited by resolution of grid. Higher
    ground motion frequencies require denser grid.
    Double of density increases storage by factor of
    8.

21
Example TeraShake Simulation
  • Magnitude 7.7 earthquake on southern San Andreas
  • Mesh of 2 billion cubes, dx200 m
  • 0.011 sec time step, 20,000 time steps 3 minute
    simulation
  • Kinematic source (from Denali) from Cajon Creek
    to Bombay Beach
  • 60 sec source duration
  • 18,886 point sources, each 6,800 time steps in
    duration
  • 240 processors at San Diego SuperComputer Center
    DataStar
  • 20,000 CPU hours, approximately 5 days wall
    clock
  • 50 Tbytes of output
  • During execution on-the-fly graphics (attempt
    aborted!)
  • Metadata capture and storage in the SCEC digital
    library

22
Domain Decomposition For TeraShake Simulations
23
Simulations Supplement Observed Data
24
Peak Velocity
NW-SE Rupture
SE-NW rupture
25
Montebello 337 cm/s Downtown 52 cm/s Long
Beach 48 cm/s San Diego 8 cm/s Palm Springs
36 cm/s
SE-NW
Montebello 8 cm/s Downtown 4 cm/s Long Beach
9 cm/s San Diego 6 cm/s Palm Springs 23 cm/s
NW-SE
26
Break-down of output
Full volume velocities every 10th time step 43.2Tb
Full surface velocities every time step 1.1Tb
Checkpoints (restarts) every 1,000 steps 3.0Tb
Input files, etc 0.1Tb
27
Middleware Implications for 3D Wave Propagation
Simulations
  • Multi-day high performance runs
  • Check point restart support needed
  • Schedule reservations on clusters
  • Reservations and special queues are often
    arranged.
  • Large file and data movement
  • TeraByte transfers require high reliably, long
    term, data transfers
  • Ability to stop and restart
  • Can we move restart from one system to another
  • Draining of temporary storage during runs
  • Storage required for full often exceeds
    capability of scratch, so output files must be
    moved during simulation

28
Middleware Implications for 3D Wave Propagation
Simulations
  • On the fly visualization for rapid validation of
    results
  • Verify before full simulation is completed
  • Standard protocols for data transfers, and
    metadata registration into SRB-based storage

29
Waveform-based Intensity Measure Relationship
(CyberShake)
30
Various IMR types (subclasses)

Attenuation Relationships
Gaussian dist. is assumed mean and std. from
various parameters
IMT, IML(s)
Multi-Site IMRs compute joint prob. of exceeding
IML(s) at multiple sites (e.g., Wesson
Perkins, 2002)
Site(s)
Rupture
Intensity-Measure Relationship List of Supported
IMTs List of Site-Related Ind. Params
Vector IMRs compute joint prob. of exceeding
multiple IMTs (Bazzurro Cornell, 2002)
Simulation IMRs exceed. prob. computed using a
suite of synthetic seismograms
31
CyberShake Simulations Push Macro and Micro
Computing
  • CyberShake requires large forward wave
    propagation simulations, volumetric data storage
  • CyberShake requires 100k seismogram synthesis
    computations using multi-Terabyte volumetric data
    sets. During synthesis processing, this data
    needs to be disk-based.
  • 100k of data files, and metadata, files to be
    managed
  • High throughput requirements are driving
    implementation toward TeraGrid wide computing
    approach.
  • High throughput requirements are driving
    integration of non-TeraGrid grids with TeraGrid

32
Example CyberShake Region (200km x 200km)
USC 34.05,-118.24 minLat31.889, minLon-120.60,
maxLat36.1858, maxLon-115.70
33
CyberShake Strain Green Tensor AWM
  • Large (TeraShake Scale) forward calculations for
    each site.
  • SHA typically ignore rupture gt 200km from site,
    so this is used as cutoff distance.
  • 20km buffer distance is used around edges of
    volume to reduce edge effects
  • 65km depth to support frequencies of interest
  • Volume is 440km x 440km x 65km at 200m spacing
  • 1.573 Billion mesh pts
  • Simulation time 240 seconds
  • Volumetric Data Saved for 2 horizontal
    simulations
  • Estimated Storage per site is 7 TB (4.5 data 2.5
    checkpoint files)

34
Ruptures in ERF within 200KM of USC
43227 Ruptures in Frankel02 ERF with M 5.0 or
larger within 200km of USC
35
CyberShake Computational Elements
36
CyberShake Seismogram Synthesis
  • Requires calculation of 100,000 seismogram for
    each site.
  • Estimate Rupture Variations scale by magnitude
  • Mw 5.0 x 1 20,450
  • Mw 6.0 x 10 216,990
  • Mw 7.0 x 100 106,900
  • Mw 8.0 x 1000 9,000
  • ------------------
  • 353,340 Ruptures
  • x 2 components
  • Current estimated number of seismogram files per
    site is 43,000 (due to combining components and
    variations into single file per rupture).

37
CyberShake Seismogram Synthesis
  • Seismogram synthesis stage requires disk-based
    data storage of large volumetric data sets so
    tape based archive of volumetric data sets does
    not work.
  • To distribute seismogram synthesis across
    TeraGrid, we need to either duplicate TB of data,
    or have global visibility on disks systems

38
Example Hazard Curve
Site USC ERF Frankel-02 IMR Field IMT Peak
Velocity Time Period 50 Years
39
Workflows run Using Grid VDS Workflow Tools
40
Examples Hazard Map Region (50km x 50km at 2km
grid spacing 625 sites)
OpenSHA SA 1.0 Frankel 2002 ERF and Sadigh with
10 POE in 50 years.
41
Summary of SCEC Experiences
  • As soon as we develop a computational capability,
    the geophysicists develop application that push
    the technology.
  • Compute technology, data management technology,
    resource sharing technology all are applied.
  • In many ways, IT capabilities required for
    geophysical problems exceed what is currently
    possible and limit the state of knowledge in
    geophysics and public safety.
  • For example, higher frequency simulations, are of
    significant interest, but exceed computational
    and storage capabilities currently available.

42
Major Middleware related issues for
SCEC/CMESecurity and Allocation Management
  • No widely accepted CA makes adding organizations
    to SCEC grid problematic.
  • Ability to run under group allocations for on
    demand requests. (Community Allocation ?)

43
Major Middleware related issues for
SCEC/CMESoftware Installation and Maintenance
  • Middleware software stack, even at supercomputer
    systems, support should include micro jobs
    support such as Java.
  • Database management support for database-oriented
    tools such as Metadata Catalogs are important
    (backup, recovery, cleanup, performance,
    modifications)
  • Guidelines for tools in middleware software
    stack, should describe when local installations
    are required and when remote installations are
    acceptable for tools such as RLS and MCS

44
Major Middleware related issues for
SCEC/CMESupercomputing and Storage
  • Globally (TeraGrid wide) visible disk storage
  • Well supported, reliable file transfers with
    monitoring and restart of jobs with problems are
    essential.
  • Interoperability between grid tools and data
    management tools such as SRB must include data
    and metadata and metadata search.

45
Major Middleware related issues for
SCEC/CMEScheduling Issues
  • Support for Reservation-based scheduling
  • Partial run and restart capability
  • Failure detection and alerting

46
Major Middleware related issues for
SCEC/CMEUsability Related and Monitoring
  • Monitoring tools that include status of available
    storage resources.
  • On-the-fly visualizations for run-time validation
    of results
  • Interfaces to workflow systems are complex,
    developer oriented interfaces. Easier to user
    interfaces needed
Write a Comment
User Comments (0)
About PowerShow.com