Title: Toward Parallel Space Radiation Analysis
1Toward Parallel Space Radiation Analysis
Dr. Liwen Shih, Thomas K. Gederberg, Karthik
Katikaneni, Ahmed Khan, Sergio J. Larrondo,
Susan Strausser, Travis Gilbert, Victor Shum,
Romeo Chua University of Houston Clear Lake
2This project continues Space Radiation Research
work preformed last year by Dr. Liwen Shihs
students to investigate HZETRN code optimization
options. This semester we will analyze HZETRN
code using standard static analysis tools and
runtime analysis tools. In addition we will
examine code parallelization options for the most
called numerical method in the source code the
PHI function.
3What is Space Radiation?
- Two major sources
-
- galactic cosmic rays (GCR)
- solar energetic particles (SEP).
- GCR are ever-present and more energetic, thus
they are able to penetrate much thicker materials
than SEP. - In order to evaluate the space radiation risk and
design the spacecraft and habitat for better
radiation protection, space radiation transport
codes, which depends on the input physics of
nuclear interactions, have been developed
4Space Radiation and the Earth
This image shows how the Earth's magnetic field
causes electrons to drift one way about the
Earth. Protons drift the opposite
direction. original clips provided courtesy of
Professor Patricia Reiff, Rice University,
Connections Program
Earth protected from Space Radiation Animation
Sources Rice University, Connections Program.
5What about Galactic Cosmic Radiation (GCR)?
A typical high energy particle of radiation found
in the space environment is ionized itself and as
it passes through material such as human tissue
it disrupts the electronic clouds of the
constituent molecules and leaves a path of
ionization in its wake. These particles are
either singly charged protons or more highly
charged nuclei called "HZE" particles.
6HZETRN - Space Radiation Nuclear Transport Code
The three included source code files are
1-NUCFRAG.FOR for generating nuclear absorption
and reaction cross sections 2-GEOMAG.FOR
for defining the GCR transmission coefficient
cutoff effects within the magnetosphere.
3-HZETRN.FOR for propagating the user defined GCR
environments through two layers of user
supplied materials. The current version is setup
to propagate through aluminum, tissue (H2O), CH2
and LH2.
HZETRN High Charge and Energy Nuclear Transport
Code FORTRAN-77 Written 1992 Environment VAX
mainframe Code Metrics Files 3 Lines
9665 Code Lines 6803 Comment Lines
2859 Declarative Statements 780 Executable
Statements 6563 Ratio Comment/Code 0.42
7HZETRN Numerical Method
8HZETRN Calculates
- Radiation Fluence of HZE particles
- time-integrated flux of HZE particles per unit
area. - Energy absorbed per gram
- first measuring energy amount left behind by
radiation in question and, then, amount and type
of material. - Dose Equivalent
- A unit of dose equivalent ? amount of any type of
radiation absorbed in a biological tissue as a
standardized value
9HZETRN Algorithm
10HZETRN used for Mars Mission
NASA has a new vision for space exploration in
the 21st Century encompassing a broad range of
human and robotic missions including missions to
Moon, Mars and beyond. As a result, there is a
focus on long duration space missions. NASA, as
much as ever, is committed to the safety of the
missions and the crew. Exposure from the hazards
of severe space radiation in deep space long
duration missions is the show stopper.
Thus, protection from the hazards of severe space
radiation is of paramount importance for the new
vision. There is an overwhelming emphasis on the
reliability issues for the mission and the
habitat. Accurate risk assessments critically
depend on the accuracy of the input information
about the interaction of ions with materials,
electronics and tissues.
11Martian Radiation Climate Modeling Using HZETRN
Code
- Calculations of the skin dose equivalent for
astronauts on the surface of Mars near solar
minimum. - The variation in the dose with respect to
altitude is shown. - Higher altitudes (such as Olympus Mons) offer
less shielding.
Mars Radiation Environment (Source Wilson et al
http//marie.jsc.nasa.gov)
12HZETRN Model vs. Actual Mars Radiation Climate
HZETRN underestimates!
Dose rate measured by MARIE spacecraft in the
transit period from April 2001 to August 2001
compared with HZETRN Calculated Doses Code
calculations Spike in May due to SPE Differences
between the observed (red) and predicted (black)
doses vary from factor 1 to 3
Partly Because of Code Inefficiency Dosage Data
is underestimated
Graph Source Aliena Spazio European Space Agency
Report 2004
13Project Goal Speedup of Runtime via Analysis and
modification of HZETRN Code numerical algorithm
PHI Interpolation Function
The major Space Radiation Code Bottleneck lies
inside the function call to the PHI interpolation
function
14Code Optimization Options
- 4028 C
- 4029 C
- 4030 FUNCTION PHI(R0,N,R,P,X)
- 4031 C
- 4032 C FUNCTION PHI INTERPOLATES IN P(N) ARRAY
DEFINED OVER R(N) ARRAY - 4033 C ASSUMES P IS LIKE A POWER OF R OVER
SUBINTERVALS - 4034 C
- 4035 DIMENSION R(N),P(N)
- 4036 C
- 4037 SAVE
- 4038 C
- 4039 XTX
- 4040 PHIP(1)
- 4041 INC((R(2)-R(1))/ABS(R(2)-R(1)))1.01
- 4042 IF(X.LE.R(1).AND.R(1).LT.R(2))RETURN
- 4043 C
- 4044 DO 1 I3,N-1
- 4045 ILI
- 4046 IF(XTINC.LT.R(I)INC)GO TO 2
- Fix Inefficient code
- Fix/Remove unnecessary function calls (TEXP)
SAVE, and dummy arguments - Use optimized ALOG function
- Use Lookup Table instead
- Investigate Parallelization Of Interpolation
Statements
Link to HZETRN
15Code Optimization
Improve Code Structure
USE FASTER ALOG function (LOG)
Remove extraneous Function Calls
16Steps toward a faster HZETRN
Step Purpose Result
1. Review Algorithm Understand underlying numerical algorithm HZETRN algorithm is complex Needs further review overall functions of code are understood
2. Analyze Source Code and Data files Understand code structure and function Review of Code and data files reveals that much of the code is inefficient, with redundant elements and archaic structure Data files contain sparse matrices amenable to performance improvement
3. Portability Study Attempt to port HZETRN code To various HPC platforms and compilers Portability study revealed problems with code and additional requirements for optimization
4. Static Analysis Develop understanding of program structure Document code for optimization and report We generated a detailed HTML report documenting HZETRN source code functions and structure of subroutine calls
5. Runtime Analysis Target runtime bottlenecks and determine most called functions/subroutines Revealed that the PHI interpolation function is the major bottleneck function \The natural logarithm intrinsic function Is also a performance issue
6. Serial Optimization of Code Starting with the PHI function We removed extraneous function calls, cleaned up messy code Resulted in Runtime Performance improvement (initially a 10 overall increase)
17Parallel Space Radiation Analysis
- The goal of project was to speed up the execution
of the HZETRN code using parallel processing. - The Message Passing Interface (MPI) standard
library was to be used to perform the parallel
processing across a cluster with distributed
memory.
18Computing Resources Used
- Itanium 2 cluster (Atlantis) - Texas Learning
Computation Center (TLC2) at the University of
Houston. - Atlantis is a cluster of 152 dual Itanium2 (1.3
GHz) compute nodes networked via a Myrinet 2000
interconnect. Atlantis is running RedHat Linux
version 5.1. - The Intel Fortran compiler (version 10.0) and
OpenMPI (an Open Source MPI-2 implementation) of
MPI is being used. - In addition, a home PC running Linux (Ubuntu
7.10) with the Sun Studio 12 Fortran 90 compiler
and MPICH2 was used. - TeraGrid has just started been used
19PHI Routine (Lagrangian Interploation)?
- Figure showing HZETRN runtime profile
- Most time is spent by function PHI - 3rd order
Lagrangian Interpolation. - PHI function is heavily called by the propagation
and integration routines -called 229,380 times
at each depth typically. - Early focus - optimizing PHI routine.
- The PHI routine takes the natural log of the
input ordinate and abscissas prior to peforming
the Lagrangian interpolation and returns the
exponential of the interpolated ordinate.
(Source Shih, Larrondo, et al, HIgh-Performance
Martian Space Radiation Mapping,
NASA/UHCL/UH-ISSO, pp. 121-122)?
- Removing the calls to the natural log and
exponential functions resulted in a 21
(Atlantis) to 45 (home) speedup, but had
negative impact on numerical results (see next
page) since the the functions being interpolated
are logarithmic.
20PHI Routine - Needs LOG/TEXP
Significant different comparing results with and
without calls to LOG/TEXP
21PHI Routine Optimization
- Bottleneck PHI routine being called so heavily,
message passing overhead to parallelize would be
prohibitive. - Simple code optimizations of PHI routine resulted
in - 11.4 speedup on home PC running Linux compiled
using the Sun Studio 12 Fortran compiler. - 3.85 speedup on an Atlantis node using the Intel
Fortran compiler. - Reduced speedup on Atlantis may be that the Intel
compiler was already generating more optimized
code.
22PHI Routine FPGA Prototype
- Implementing bottleneck routines PHI routine,
and/or logarithm/exponential routines in an FPGA
could result in a significant speedup. - A reduced precision floating-point FPGA prototype
was developed for an estimated 325 times faster
PHI computation in hardware.
23HZETRN Main Program Flow
- Basic flow of HZETRN
- Step 1 Call MATTER to obtain the material
property (density, atomic weight and atomic
number of each element) of the shield. - Step 2 Generate the energy grid.
- Step 3 Dosemetry and propagation in the shield
material - Call DMETRIC to compute dosemetic quantities at
current depth. - Call PRPGT to propagate the GCR's to the next
depth - Repeat step 3 until target material is reached
- Step 4 Dosemetry and propagation in the target
material - Call DMETRIC to compute dosemetric quantities at
current depth. - Call PRPGT to propagate the GCR's to the next
depth - Repeat step 4 until required depth is reached.
24DMETRIC Routine
- The suboutine DMETRIC is called by the main
program at each user specified depth in the
shield and target to compute dosimetric
quantities. - Their are 6 main do-loops in the routine.
Approximately 60 of DMETRICs processing time is
spent in loop 2 and 39 of DMETRICs processing
time is spent in loop 5. - To check whether the above loop could be done in
parallel, the order of the loop was reversed to
test for data dependency. - The results were identical ? there was no data
dependency between the dosemetric calculations
for each isotope.
25DMETRIC Routine - Dependent?
- To determine if loop 5 is parallelizable, the
outer loop was first changed to decrement from II
to 1 rather than from 1 to II. The results were
identical ? outer loop of loop 5 should be
parallelizable. - Next the inner loop was changed to decrement from
IJ to 2 rather than from 2 to IJ. Differences
appear in the last significant digit (see next
page). ? These differences are due to floating
point rounding differences during four
summations.
26DMETRIC Routine - Not Dependent
- Minor results difference changing order of inner
loop of loop 5
27Parallel DMETRIC Routine
- Since there is no data dependecy in the
dosemetric calculations for each of the 59
isotopes, these computations could be done in
parallel. - Statements (using MPI's wall-time function
MPI_WTIME) were inserted to measure the amount of
time spent in each subroutine. - Approximately 17 of the processing time is spent
in subroutine DMETRIC while about 82 of the
processing time is spent in subroutine PRPGT and
less than 1 of the processing time is spent in
the remainder of the program. - Assuming infinite parallelization of DMETRIC, the
maximum speedup obtained would be up to 17.
28PRPGT Routine
- PRPGT - propagate GCR's through the shielding and
the target. - 82 of HZETRN processing is spent in PRPGT or
routines it calls. - At each propagation step from one depth to the
next in the shield or target, the propagation for
each of the 59 isotopes is performed in two
stages - The first stage computes the energy shift due to
propagation - The second stage computes the attenuation and the
secondary particle production due to collisions
To test whether the propagation for each of the
59 ions could be done in parallel, the loop was
broken up into four pieces (a J loop from 20 to
30, from 1 to 19, from 41 to 59, and from 31 to
40). If the loop can be performed in parallel,
then the results from these four loops should be
the same as the single loop from 1 to 59.
29PRPGT Routine - Check Dependency
- The following compares the results of breaking up
main loop into four loops (on the left) with the
original results.
- Significant different results demonstrate that
the propagation can not be parallelized for each
of the 59 ions.
30PRPGT Routine - Data Dependent
- Identical to original results reversing inner 1st
and 2nd stage I loops ? possible to parallelize
the 1st or 2nd stages. - However, to test data dependence from the 1st
stage to the 2nd stage, the main J loop was
divided into two loops (one for the 1st stage and
one for the 2nd stage) - Results changed ? the 2nd stage is dependent on
the 1st stage - A barrier to prevent execution of the 2nd stage
until the 1st stage completes - 24 of the HZETRN processing is spent on the 1st
stage while less than 2 of the time is spent on
the 2nd stage. Therefore, parallel processing of
both stages does not appear worthwhile.
31Parallel PRPLI Routine
- PRPLI is called by PRPGT after the 1st and 2nd
stage propagation has been completed for each of
the 59 isotopes. - PRPLI performs the propagation of the six light
ions (ions Z lt 5). - 53 of total HZETRN time is spent on light ions
propagation. - PRPLI propagates 45 x 6 fluence ( particles
intersect a unit area) matrix (45 energy points
for each of the 6 light ions) named PSI. - Analysis of the has shown that there is no data
dependency among the energy grid points. - It should, therefore, be possible to parallelize
the PRPLI code across the 45 energy grid points.
32General HZETRN Recommendations
- Arrays in Fortran are stored in column-order. ?
more effecient to access in column order, rather
that row-order. - HZETRN is using an old Fortran technique of
alternate entry points. ? The use of alternate
entry points is discouraged. - HZETRN uses COMMON blocks for global memory. ?
Fortran-90 MODULES should be used instead.
33Conclusions Future Work
- HZETRN performance, written in Fortran 77 in the
early 1990's, can be improved via simple code
optimizations and parallel processing using MPI
- Maximum 50 speedup with current HZETRN expected
- Additional performance improvements could be
obtained by implementing the 3rd Order Lagrangian
Interpolation routine (PHI), or the natural log
(LOG) and exponential (TEXP) functions on a FPGA.
34References
- J.W. Wilson, F.F. Badavi, F. A. Cucinotta, J.L.
Shinn, G.D. Badhwar, R. Silberberg, C.H. Tsao,
L.W. Townsend, R.K. Tripathi, HZETRN Description
of a Free-Space Ion and Nucleon Transport
Shielding Computer Program, NASA Technical Paper
3495, May 1995. - J. W. Wilson, J.L. Shinn, R. C. Singleterry, H.
Tai, S. A. Thibeault, L.C. Simmons, Improved
Spacecraft Materials for Radiation Shielding,
NASA Langley Research Center. spacesciene.spacere
f.com/colloquia/mmsm/wilson_pos.pdf - NASA Facts Understanding Space Radiation,
FS-2002-10-080-JSC, October 2002. - P. S. Pacheco, Parallel Programming with MPI,
Morgan Kaufmann Publishers Inc. San Francisso,
1997. - S. J. Chapman, Fortran 90/95 for Scientists and
Engineers, 2nd edition. McGraw Hill New York,
2004. - L. Shih, S. Larrondo, K. Katikaneni, A. Khan, T.
Gilbert, S. Kodali, A. Kadari, HIgh Performance
Martian Space Radiation Mapping,
NASA/UHCL/UH_ISSO, pp. 121-122. - L. Shih, Efficient Space Radiation Computation
with Parallel FPGA, Y2006 ISSO Annual Report,
pp. 56-61. - Gilbert, T. and L. Shih. "High-Performance
Martian Space Radiation Mapping," IEEE/ACM/UHCL
Computer Application Conference, University of
Houston-Clear Lake, Houston, TX, April 29, 2005. - Kadari, A.. S. Kodali, T. Gilbert, and L. Shih.
"Space Radiation Analysis with FPGA,"
IEEE/ACM/UHCL Computer Application Conference,
University of Houston-Clear Lake, Houston, TX,
April 29, 2005. - F. A. Cucinotta, "Space Radiation Biology,"
NASA-M. D. Anderson Cancer Center Mini-Retreat,
Jan. 25, 2002 lthttp//advtech.jsc.nasa.gov/present
ation_portal.shtmgt. - Space Radiation Health Project, May 3, 2005,
NASA-JSC, March 7, 2005 lthttp//srhp.jsc.nasa.gov/
gt
35Acknowledgements
- NASA LaRC - Robert C. Singleterry Jr, PhD
- NASA JSC/CARR PVAM - Premkumar B. Saganti, PhD
- TeraGrid, TACC
- TLC2 - Mark Huang Erik Engquist
- Texas Space Grant Consortium ISSO
Thank You ! Shih_at_UHCL.edu