Project Athena: Technical Issues - PowerPoint PPT Presentation

About This Presentation
Title:

Project Athena: Technical Issues

Description:

Outline Project Athena Resources Models and Machine Usage Experiments Running Models Initial and Boundary Data Preparation Post Processing, Data Selection and ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 14
Provided by: LarryM156
Learn more at: http://www.wxmaps.org
Category:

less

Transcript and Presenter's Notes

Title: Project Athena: Technical Issues


1
Project Athena Technical Issues
  • Larry Marx and the Project Athena Team

2
Outline
  • Project Athena Resources
  • Models and Machine Usage
  • Experiments
  • Running Models
  • Initial and Boundary Data Preparation
  • Post Processing, Data Selection and Compression
  • Data Management

3
Dedicated, Oct09 Mar10 79 million core-hours
Dedicated, Oct09 Mar10 post-processing
Shared, Oct09 Mar10 5 million core-hours
Athena 4512 nodes _at_ 4 cores, 2 GB mem
Kraken 8256 nodes _at_ 12 cores, 16 GB mem
Verne 5 nodes _at_ 32 cores, 128 GB mem
Read-only
scratch 78 TB (Lustre)
homes 8 TB (NFS)
nakji 360 TB (Lustre)
800 TB HPSS tape archive
4
Models and Machine Usage
  • NICAM initially was the primary focus of
    implementation
  • Limited flexibility in scaling, due to
    icosahedral grid
  • Limited testing on multicore/cache processor
    architectures production primarily on the
    vector-parallel (NEC SX) Earth Simulator
  • Step 1 Port low resolution version with simple
    physics to Athena
  • Step 2 Determine highest resolution possible on
    Athena and minimum and maximum number of cores to
    be used
  • Unique solution G-level 10 or 10,485,762 cells
    (7-km spacing) using exactly 2,560 cores
  • Step 3 Initially NICAM jobs failed frequently
    due to improper namelist settings. During visit
    by U. Tokyo and JAMSTEC scientists to COLA, new
    settings determined that generally ran with
    little trouble. However 2003 could never be
    stabilized and was abandoned.

5
Models and Machine Usage (contd)
  • IFS flexible scalability sustains good
    performance for higher resolution configurations
    (T1279 and T2047) using 2,560 processor cores
  • We defined one slot as 2,560 cores and managed
    a mix of NICAM and IFS jobs _at_ 1 job per slot ?
    maximally efficient use of resource.
  • Having equal size slots for both models permits
    either model to be queued and run in the event of
    a job failure.
  • Selected jobs given higher priority so that they
    continue to run ahead of others.
  • Machine partition 7 slots of 2,560 cores
    17,920 cores out of 18,048
  • 99 machine utilization
  • 128 processors for pre- and post-processing and
    as spares (postpone reboot)
  • Lower resolution IFS experiments (T159 and T511)
    were run on Kraken
  • IFS runs were initially made by COLA. When the
    ECMWF SMS model management system was installed,
    runs could be made by COLA or ECMWF.

6
Project Athena Experiments
7
Initial and Boundary Data Preparation
  • IFS
  • Most input data prepared by ECMWF. Large files
    shipped by removable disk.
  • Time Slice experiment input data prepared by
    COLA.
  • NICAM
  • Initial data from GDAS 1 files. Available for
    all dates.
  • Boundary files other than SST included with
    NICAM.
  • SST from ΒΌ NCDC OI daily (version 2). Data
    starting 1 June 2002 include in situ, AVHRR (IR),
    and AMSR-E (microwave) . Earlier data does not
    include AMSR-E.
  • All data interpolated to icosahedral grid.

8
Post Processing, Data Selection and Compression
  • All IFS (Grib-1) data interpolated (coarsened) to
    the N80 reduce grid for common comparison among
    the resolutions and with the ERA-40 data. All
    IFS spectral data truncated to T159 coefficients
    and transformed to N80 full grid.
  • Key fields at full model resolution were
    processed, including transforming spectral
    coefficients to grids and compression to NetCDF-4
    via GrADS.
  • Processing accomplished using Kraken, because
    Athena lacks sufficient memory and computing
    power on each node.
  • All the common comparison and selected
    high-resolution data electronically transferred
    to COLA via bbcp (up to 40MB/s sustained).

9
Post Processing, Data Selection and Compression
(contd)
  • Nearly all (91) NICAM diagnostic variables saved.
    Each variable saved with (2560) separate files
    for model domains, resulting in over 230,000
    files. The number of files quickly saturated
    LFS.
  • Original program to interpolate data to regular
    lat-lon grid had to be revised to use less I/O
    and to multithread, thereby eliminating a
    processing backlog.
  • Selected 3-d fields were interpolated from
    z-coordinate to p-coordinate levels.
  • Selected 2- and 3-d fields were compressed
    (NetCDF-4) and electronically transferred to
    COLA.
  • All selected fields coarsened to N80 full grid.

10
Data Management NICS
  • All data archived to HPSS approaching 1 PB
  • Workflow required complex data movement
  • All model runs at high resolution done on Athena
  • Model output stored on scratch or nakji and all
    copied to tape on HPSS
  • IFS data interpolation/truncation done directly
    from retrieved HPSS files
  • NICAM data processed using Verne and nakji (more
    capable CPUs and larger memory)

11
Data Management COLA
  • Athena allocated 50TB (26) on COLA disk servers.
  • Required considerable discussion and judgment to
    down-select variables from IFS and NICAM, based
    on factors including scientific use and data
    compressibility.
  • Large directory structure needed to organize the
    data, particularly IFS with many resolutions,
    sub-resolutions, data forms and ensemble members.

12
Data Management Future
  • New machines at COLA and NICS will permit further
    analysis not currently possible due to lack of
    memory and compute power.
  • Some or all of the data will be made publically
    available eventually when long term disposition
    is determined.
  • TeraGrid Science Portal??
  • Earth System Grid??

13
Summary
  • Large, international team of climate and computer
    scientists, using dedicated and shared resources,
    introduces many challenges for production
    computing, data analysis and data management
  • The shear volume and the complexity of the data,
    breaks everything
  • Disk capacity
  • File name space
  • Bandwidth connecting systems within NICS
  • HPSS tape capacity
  • Bandwidth to remote sites for collaborating
    groups
  • Software for analysis and display of results
    (GrADS modifications)
  • COLA overcame these difficulties as they were
    encountered in 247 production mode and prevent
    having an idle dedicated computer.
Write a Comment
User Comments (0)
About PowerShow.com