Toward Parallel Space Radiation Analysis - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Toward Parallel Space Radiation Analysis

Description:

Title: Slide 1 Author: KhanAh Last modified by: SCE Created Date: 11/17/2005 12:55:22 AM Document presentation format: On-screen Show Company: UHCL Other titles – PowerPoint PPT presentation

Number of Views:117
Avg rating:3.0/5.0
Slides: 36
Provided by: Khan89
Category:

less

Transcript and Presenter's Notes

Title: Toward Parallel Space Radiation Analysis


1
Toward Parallel Space Radiation Analysis
Dr. Liwen Shih, Thomas K. Gederberg, Karthik
Katikaneni, Ahmed Khan, Sergio J. Larrondo,
Susan Strausser, Travis Gilbert, Victor Shum,
Romeo Chua University of Houston Clear Lake
2
This project continues Space Radiation Research
work preformed last year by Dr. Liwen Shihs
students to investigate HZETRN code optimization
options. This semester we will analyze HZETRN
code using standard static analysis tools and
runtime analysis tools. In addition we will
examine code parallelization options for the most
called numerical method in the source code the
PHI function.
3
What is Space Radiation?
  • Two major sources
  • galactic cosmic rays (GCR)
  • solar energetic particles (SEP).
  • GCR are ever-present and more energetic, thus
    they are able to penetrate much thicker materials
    than SEP.
  • In order to evaluate the space radiation risk and
    design the spacecraft and habitat for better
    radiation protection, space radiation transport
    codes, which depends on the input physics of
    nuclear interactions, have been developed

4
Space Radiation and the Earth
This image shows how the Earth's magnetic field
causes electrons to drift one way about the
Earth. Protons drift the opposite
direction. original clips provided courtesy of
Professor Patricia Reiff, Rice University,
Connections Program
Earth protected from Space Radiation Animation
Sources Rice University, Connections Program.
5
What about Galactic Cosmic Radiation (GCR)?
A typical high energy particle of radiation found
in the space environment is ionized itself and as
it passes through material such as human tissue
it disrupts the electronic clouds of the
constituent molecules and leaves a path of
ionization in its wake. These particles are
either singly charged protons or more highly
charged nuclei called "HZE" particles.
6
HZETRN - Space Radiation Nuclear Transport Code
The three included source code files are
1-NUCFRAG.FOR for generating nuclear absorption
and reaction cross sections 2-GEOMAG.FOR
for defining the GCR transmission coefficient
cutoff effects within the magnetosphere.
3-HZETRN.FOR for propagating the user defined GCR
environments through two layers of user
supplied materials. The current version is setup
to propagate through aluminum, tissue (H2O), CH2
and LH2.
HZETRN High Charge and Energy Nuclear Transport
Code FORTRAN-77 Written 1992 Environment VAX
mainframe Code Metrics Files 3 Lines
9665 Code Lines 6803 Comment Lines
2859 Declarative Statements 780 Executable
Statements 6563 Ratio Comment/Code 0.42
7
HZETRN Numerical Method
8
HZETRN Calculates
  • Radiation Fluence of HZE particles
  • time-integrated flux of HZE particles per unit
    area.
  • Energy absorbed per gram
  • first measuring energy amount left behind by
    radiation in question and, then, amount and type
    of material.
  • Dose Equivalent
  • A unit of dose equivalent ? amount of any type of
    radiation absorbed in a biological tissue as a
    standardized value

9
HZETRN Algorithm
10
HZETRN used for Mars Mission
NASA has a new vision for space exploration in
the 21st Century encompassing a broad range of
human and robotic missions including missions to
Moon, Mars and beyond. As a result, there is a
focus on long duration space missions. NASA, as
much as ever, is committed to the safety of the
missions and the crew. Exposure from the hazards
of severe space radiation in deep space long
duration missions is the show stopper.
Thus, protection from the hazards of severe space
radiation is of paramount importance for the new
vision. There is an overwhelming emphasis on the
reliability issues for the mission and the
habitat. Accurate risk assessments critically
depend on the accuracy of the input information
about the interaction of ions with materials,
electronics and tissues.
11
Martian Radiation Climate Modeling Using HZETRN
Code
  • Calculations of the skin dose equivalent for
    astronauts on the surface of Mars near solar
    minimum.
  • The variation in the dose with respect to
    altitude is shown.
  • Higher altitudes (such as Olympus Mons) offer
    less shielding.

Mars Radiation Environment (Source Wilson et al
http//marie.jsc.nasa.gov)
12
HZETRN Model vs. Actual Mars Radiation Climate
HZETRN underestimates!
Dose rate measured by MARIE spacecraft in the
transit period from April 2001 to August 2001
compared with HZETRN Calculated Doses Code
calculations Spike in May due to SPE Differences
between the observed (red) and predicted (black)
doses vary from factor 1 to 3
Partly Because of Code Inefficiency Dosage Data
is underestimated
Graph Source Aliena Spazio European Space Agency
Report 2004
13
Project Goal Speedup of Runtime via Analysis and
modification of HZETRN Code numerical algorithm
PHI Interpolation Function
The major Space Radiation Code Bottleneck lies
inside the function call to the PHI interpolation
function
14
Code Optimization Options
  • 4028 C
  • 4029 C
  • 4030 FUNCTION PHI(R0,N,R,P,X)
  • 4031 C
  • 4032 C FUNCTION PHI INTERPOLATES IN P(N) ARRAY
    DEFINED OVER R(N) ARRAY
  • 4033 C ASSUMES P IS LIKE A POWER OF R OVER
    SUBINTERVALS
  • 4034 C
  • 4035 DIMENSION R(N),P(N)
  • 4036 C
  • 4037 SAVE
  • 4038 C
  • 4039 XTX
  • 4040 PHIP(1)
  • 4041 INC((R(2)-R(1))/ABS(R(2)-R(1)))1.01
  • 4042 IF(X.LE.R(1).AND.R(1).LT.R(2))RETURN
  • 4043 C
  • 4044 DO 1 I3,N-1
  • 4045 ILI
  • 4046 IF(XTINC.LT.R(I)INC)GO TO 2
  1. Fix Inefficient code
  2. Fix/Remove unnecessary function calls (TEXP)
    SAVE, and dummy arguments
  3. Use optimized ALOG function
  4. Use Lookup Table instead
  5. Investigate Parallelization Of Interpolation
    Statements

Link to HZETRN
15
Code Optimization
Improve Code Structure
USE FASTER ALOG function (LOG)
Remove extraneous Function Calls
16
Steps toward a faster HZETRN
Step Purpose Result
1. Review Algorithm Understand underlying numerical algorithm HZETRN algorithm is complex Needs further review overall functions of code are understood
2. Analyze Source Code and Data files Understand code structure and function Review of Code and data files reveals that much of the code is inefficient, with redundant elements and archaic structure Data files contain sparse matrices amenable to performance improvement
3. Portability Study Attempt to port HZETRN code To various HPC platforms and compilers Portability study revealed problems with code and additional requirements for optimization
4. Static Analysis Develop understanding of program structure Document code for optimization and report We generated a detailed HTML report documenting HZETRN source code functions and structure of subroutine calls
5. Runtime Analysis Target runtime bottlenecks and determine most called functions/subroutines Revealed that the PHI interpolation function is the major bottleneck function \The natural logarithm intrinsic function Is also a performance issue
6. Serial Optimization of Code Starting with the PHI function We removed extraneous function calls, cleaned up messy code Resulted in Runtime Performance improvement (initially a 10 overall increase)
17
Parallel Space Radiation Analysis
  • The goal of project was to speed up the execution
    of the HZETRN code using parallel processing.
  • The Message Passing Interface (MPI) standard
    library was to be used to perform the parallel
    processing across a cluster with distributed
    memory.

18
Computing Resources Used
  • Itanium 2 cluster (Atlantis) - Texas Learning
    Computation Center (TLC2) at the University of
    Houston.
  • Atlantis is a cluster of 152 dual Itanium2 (1.3
    GHz) compute nodes networked via a Myrinet 2000
    interconnect. Atlantis is running RedHat Linux
    version 5.1.
  • The Intel Fortran compiler (version 10.0) and
    OpenMPI (an Open Source MPI-2 implementation) of
    MPI is being used.
  • In addition, a home PC running Linux (Ubuntu
    7.10) with the Sun Studio 12 Fortran 90 compiler
    and MPICH2 was used.
  • TeraGrid has just started been used

19
PHI Routine (Lagrangian Interploation)?
  • Figure showing HZETRN runtime profile
  • Most time is spent by function PHI - 3rd order
    Lagrangian Interpolation.
  • PHI function is heavily called by the propagation
    and integration routines -called 229,380 times
    at each depth typically.
  • Early focus - optimizing PHI routine.
  • The PHI routine takes the natural log of the
    input ordinate and abscissas prior to peforming
    the Lagrangian interpolation and returns the
    exponential of the interpolated ordinate.

(Source Shih, Larrondo, et al, HIgh-Performance
Martian Space Radiation Mapping,
NASA/UHCL/UH-ISSO, pp. 121-122)?
  • Removing the calls to the natural log and
    exponential functions resulted in a 21
    (Atlantis) to 45 (home) speedup, but had
    negative impact on numerical results (see next
    page) since the the functions being interpolated
    are logarithmic.

20
PHI Routine - Needs LOG/TEXP
Significant different comparing results with and
without calls to LOG/TEXP
21
PHI Routine Optimization
  • Bottleneck PHI routine being called so heavily,
    message passing overhead to parallelize would be
    prohibitive.
  • Simple code optimizations of PHI routine resulted
    in
  • 11.4 speedup on home PC running Linux compiled
    using the Sun Studio 12 Fortran compiler.
  • 3.85 speedup on an Atlantis node using the Intel
    Fortran compiler.
  • Reduced speedup on Atlantis may be that the Intel
    compiler was already generating more optimized
    code.

22
PHI Routine FPGA Prototype
  • Implementing bottleneck routines PHI routine,
    and/or logarithm/exponential routines in an FPGA
    could result in a significant speedup.
  • A reduced precision floating-point FPGA prototype
    was developed for an estimated 325 times faster
    PHI computation in hardware.

23
HZETRN Main Program Flow
  • Basic flow of HZETRN
  • Step 1 Call MATTER to obtain the material
    property (density, atomic weight and atomic
    number of each element) of the shield.
  • Step 2 Generate the energy grid.
  • Step 3 Dosemetry and propagation in the shield
    material
  • Call DMETRIC to compute dosemetic quantities at
    current depth.
  • Call PRPGT to propagate the GCR's to the next
    depth
  • Repeat step 3 until target material is reached
  • Step 4 Dosemetry and propagation in the target
    material
  • Call DMETRIC to compute dosemetric quantities at
    current depth.
  • Call PRPGT to propagate the GCR's to the next
    depth
  • Repeat step 4 until required depth is reached.

24
DMETRIC Routine
  • The suboutine DMETRIC is called by the main
    program at each user specified depth in the
    shield and target to compute dosimetric
    quantities.
  • Their are 6 main do-loops in the routine.
    Approximately 60 of DMETRICs processing time is
    spent in loop 2 and 39 of DMETRICs processing
    time is spent in loop 5.
  • To check whether the above loop could be done in
    parallel, the order of the loop was reversed to
    test for data dependency.
  • The results were identical ? there was no data
    dependency between the dosemetric calculations
    for each isotope.

25
DMETRIC Routine - Dependent?
  • To determine if loop 5 is parallelizable, the
    outer loop was first changed to decrement from II
    to 1 rather than from 1 to II. The results were
    identical ? outer loop of loop 5 should be
    parallelizable.
  • Next the inner loop was changed to decrement from
    IJ to 2 rather than from 2 to IJ. Differences
    appear in the last significant digit (see next
    page). ? These differences are due to floating
    point rounding differences during four
    summations.

26
DMETRIC Routine - Not Dependent
  • Minor results difference changing order of inner
    loop of loop 5

27
Parallel DMETRIC Routine
  • Since there is no data dependecy in the
    dosemetric calculations for each of the 59
    isotopes, these computations could be done in
    parallel.
  • Statements (using MPI's wall-time function
    MPI_WTIME) were inserted to measure the amount of
    time spent in each subroutine.
  • Approximately 17 of the processing time is spent
    in subroutine DMETRIC while about 82 of the
    processing time is spent in subroutine PRPGT and
    less than 1 of the processing time is spent in
    the remainder of the program.
  • Assuming infinite parallelization of DMETRIC, the
    maximum speedup obtained would be up to 17.

28
PRPGT Routine
  • PRPGT - propagate GCR's through the shielding and
    the target.
  • 82 of HZETRN processing is spent in PRPGT or
    routines it calls.
  • At each propagation step from one depth to the
    next in the shield or target, the propagation for
    each of the 59 isotopes is performed in two
    stages
  • The first stage computes the energy shift due to
    propagation
  • The second stage computes the attenuation and the
    secondary particle production due to collisions

To test whether the propagation for each of the
59 ions could be done in parallel, the loop was
broken up into four pieces (a J loop from 20 to
30, from 1 to 19, from 41 to 59, and from 31 to
40). If the loop can be performed in parallel,
then the results from these four loops should be
the same as the single loop from 1 to 59.
29
PRPGT Routine - Check Dependency
  • The following compares the results of breaking up
    main loop into four loops (on the left) with the
    original results.
  • Significant different results demonstrate that
    the propagation can not be parallelized for each
    of the 59 ions.

30
PRPGT Routine - Data Dependent
  • Identical to original results reversing inner 1st
    and 2nd stage I loops ? possible to parallelize
    the 1st or 2nd stages.
  • However, to test data dependence from the 1st
    stage to the 2nd stage, the main J loop was
    divided into two loops (one for the 1st stage and
    one for the 2nd stage)
  • Results changed ? the 2nd stage is dependent on
    the 1st stage
  • A barrier to prevent execution of the 2nd stage
    until the 1st stage completes
  • 24 of the HZETRN processing is spent on the 1st
    stage while less than 2 of the time is spent on
    the 2nd stage. Therefore, parallel processing of
    both stages does not appear worthwhile.

31
Parallel PRPLI Routine
  • PRPLI is called by PRPGT after the 1st and 2nd
    stage propagation has been completed for each of
    the 59 isotopes.
  • PRPLI performs the propagation of the six light
    ions (ions Z lt 5).
  • 53 of total HZETRN time is spent on light ions
    propagation.
  • PRPLI propagates 45 x 6 fluence ( particles
    intersect a unit area) matrix (45 energy points
    for each of the 6 light ions) named PSI.
  • Analysis of the has shown that there is no data
    dependency among the energy grid points.
  • It should, therefore, be possible to parallelize
    the PRPLI code across the 45 energy grid points.

32
General HZETRN Recommendations
  • Arrays in Fortran are stored in column-order. ?
    more effecient to access in column order, rather
    that row-order.
  • HZETRN is using an old Fortran technique of
    alternate entry points. ? The use of alternate
    entry points is discouraged.
  • HZETRN uses COMMON blocks for global memory. ?
    Fortran-90 MODULES should be used instead.

33
Conclusions Future Work
  • HZETRN performance, written in Fortran 77 in the
    early 1990's, can be improved via simple code
    optimizations and parallel processing using MPI
  • Maximum 50 speedup with current HZETRN expected
  • Additional performance improvements could be
    obtained by implementing the 3rd Order Lagrangian
    Interpolation routine (PHI), or the natural log
    (LOG) and exponential (TEXP) functions on a FPGA.

34
References
  • J.W. Wilson, F.F. Badavi, F. A. Cucinotta, J.L.
    Shinn, G.D. Badhwar, R. Silberberg, C.H. Tsao,
    L.W. Townsend, R.K. Tripathi, HZETRN Description
    of a Free-Space Ion and Nucleon Transport
    Shielding Computer Program, NASA Technical Paper
    3495, May 1995.
  • J. W. Wilson, J.L. Shinn, R. C. Singleterry, H.
    Tai, S. A. Thibeault, L.C. Simmons, Improved
    Spacecraft Materials for Radiation Shielding,
    NASA Langley Research Center. spacesciene.spacere
    f.com/colloquia/mmsm/wilson_pos.pdf
  • NASA Facts Understanding Space Radiation,
    FS-2002-10-080-JSC, October 2002.
  • P. S. Pacheco, Parallel Programming with MPI,
    Morgan Kaufmann Publishers Inc. San Francisso,
    1997.
  • S. J. Chapman, Fortran 90/95 for Scientists and
    Engineers, 2nd edition. McGraw Hill New York,
    2004.
  • L. Shih, S. Larrondo, K. Katikaneni, A. Khan, T.
    Gilbert, S. Kodali, A. Kadari, HIgh Performance
    Martian Space Radiation Mapping,
    NASA/UHCL/UH_ISSO, pp. 121-122.
  • L. Shih, Efficient Space Radiation Computation
    with Parallel FPGA, Y2006 ISSO Annual Report,
    pp. 56-61.
  • Gilbert, T. and L. Shih. "High-Performance
    Martian Space Radiation Mapping," IEEE/ACM/UHCL
    Computer Application Conference, University of
    Houston-Clear Lake, Houston, TX, April 29, 2005.
  • Kadari, A.. S. Kodali, T. Gilbert, and L. Shih.
    "Space Radiation Analysis with FPGA,"
    IEEE/ACM/UHCL Computer Application Conference,
    University of Houston-Clear Lake, Houston, TX,
    April 29, 2005.
  • F. A. Cucinotta, "Space Radiation Biology,"
    NASA-M. D. Anderson Cancer Center Mini-Retreat,
    Jan. 25, 2002 lthttp//advtech.jsc.nasa.gov/present
    ation_portal.shtmgt.
  • Space Radiation Health Project, May 3, 2005,
    NASA-JSC, March 7, 2005 lthttp//srhp.jsc.nasa.gov/
    gt

35
Acknowledgements
  • NASA LaRC - Robert C. Singleterry Jr, PhD
  • NASA JSC/CARR PVAM - Premkumar B. Saganti, PhD
  • TeraGrid, TACC
  • TLC2 - Mark Huang Erik Engquist
  • Texas Space Grant Consortium ISSO

Thank You ! Shih_at_UHCL.edu
Write a Comment
User Comments (0)
About PowerShow.com