Title: Nuclear Physics Greenbook Presentation (Astro,Theory, Expt)
1Nuclear Physics Greenbook Presentation(Astro,The
ory, Expt)
- Doug Olson, LBNL
- NUG Business Meeting
- 25 June 2004
- Berkeley
2Reminding you what Nuclear Physics is.
- The mission of the Nuclear Physics (NP) program
is to advance our knowledge of the properties and
interactions of atomic nuclei and nuclear matter
and the fundamental forces and particles of
nature. - The program seeks to understand how quarks bind
together to form nucleons and nuclei, to create
and study the quark-gluon plasma that is thought
to have been the primordial state of the early
universe, and to understand energy production and
element synthesis in stars and stellar
explosions.
3(No Transcript)
4Contents
- Questions
- Answers
- Some Observations
5Questions about current needsand 3-4 years out
- 1. What are your most important processing needs?
 (e.g. single CPU speed, of parallel CPU's,
memory, ...) 2. What are your most important
storage needs? Â (e.g. I/O bandwidth to disk,
disk space, HPSS bandwidth, space, ...) 3. What
are your most important network needs? Â (e.g.
wide-area bandwidth, bandwidth between NERSC
resources, ...) 4. What are your most important
remote access needs? Â (e.g. remote login,
remote visualization, data transfer, Â Â single
sign-on to automate work across multiple sites,
...) 5. What are your most important user
services needs? Â (e.g. general helpdesk
questions, tutorials, debugging help, ...) 6. Do
you have special software requirements, if so
what? 7. Do you have special visualization
requirements, if so what? 8. Is automating your
work across multiple sites important? Â Called
distributed workflow. Sites could be other large
 centers, a cluster, your desktop, etc. 9.
Anything else important to your project? - Asked to all PIs with NP awards
- 9 responders good cross section
6Responses
- Astronomy
- Swesty, SUNY SB, TSI Collaboration
- Nugent, LBNL
- Nuclear Theory
- Ji, NCSU, QCD
- Pieper, ANL, Light nuclei
- Dean, ORNL, Nuclear many body problem
- Vary, Iowa State, Nuclear reactions
- Lee, Kentucky, Lattice QCD
- Experiment
- Klein, LBNL, IceCube
- Olson, LBNL, STAR
71. What are your most important processing needs?
 (e.g. single CPU speed, of parallel CPU's,
memory, ...)
- Theory
- Small cluster of nodes (1-16 nodes) and long run
duration (12-24h or more) - implementation of execution of single-processor
tasks would be welcomed. - Faster processors are always a good thing!
- Up to 2048 parallel (for SMMC)
- Need faster interprocessor b/w (for two other
codes) - Eq. 256 CPU Altrix is 7X faster than Seaborg
- total number of cycles obtained over a range of
processors (50 to 500). generally .5 - 2
Gbytes/processor are needed on processors with
1-4x seaborg speed. - Memory per CPU - by far the most important to our
project (saves I/O to disk and/or cuts down on
inter-node communication)
81. What are your most important processing needs?
 (e.g. single CPU speed, of parallel CPU's,
memory, ...)
- Astro
- Getting a little more CPU speed is ok, but BY FAR
we need faster bandwidth between processors and
between nodes. - 1024-2048 processors now
- Lower latency communications
- Expt
- Not parallel algorithms, compute at PDSF other
linux clusters
92. What are your most important storage needs? Â
(e.g. I/O bandwidth to disk, disk space, HPSS
bandwidth, space, ...)
- Theory
- Bandwidth to disk and gpfs disk space (increase
by 100X for disk space). - Inode limit is a persistant pain, but can be
lived with. - HPSS bandwidth, space
- Single file system across machines (seaborg,
newton) - Astro
- Fine now, may change as we move more to 3-D.
- improved parallel I/O throughput to disk for gt
1024 processor jobs - increased scratch disk capacity
- Expt
- Database, MySQL in use now
- Disk - scalable size I/O performance, gt100TB, gt
1 GB/sec - HPSS size I/O
- Automated caching, replication I/O load
balancing
103. What are your most important network needs? Â
(e.g. wide-area bandwidth, bandwidth between
NERSC resources, ...)
- Theory
- Moving 20 GB datasets today ORNL-NERSC-MSU
- Moving 0.5 TB datasets in 2 years
ORNL-NERSC-MSU-LLNL-PNNL - bandwidth between NERSC resources
- Astro
- Improved throughput between NERSC, ORNL, and
Stony Brook - Expt
- WAN bandwidth end-to-end (means endpoints or
other LAN effects are often the problem), labs
universities
114. What are your most important remote access
needs? Â (e.g. remote login, remote
visualization, data transfer, Â Â single sign-on
to automate work across multiple sites, ...)
- Theory
- X-windowed system
- Data transfer is becoming an increasingly
important need. - ssh/scp with authorized keys is fine. one-time
passwords would severely handicap my use of a
local emacs and tramp to edit, view, and transfer
files. - Astro
- Single sign-on to allow process automation is
very important right now. Of CRITICAL importance
is avoidance of one-time authentication methods
which would kill any hopes of scientific workflow
automation. - Some remote viz.
- Expt
- Data transfer
- Single sign-on across sites for automated workflow
125. What are your most important user services
needs? Â (e.g. general helpdesk questions,
tutorials, debugging help, ...)
- Theory
- support/online-help for Windows-based X-servers
- Programming languages online-references or links
to online-references would be great. - General helpdesk and sometimes tutorials.
- Online tutorials (stored and indexed)
- Astro
- Biggest problems are dealing with new compiler
bugs. - Performance optimization, requires help people
who have access to the IBM compiler group to code
kernels tuned. - Expt
- General user support and collaboration software
installation is very good. - Need troubleshooting across sites and WAN
136. Do you have special software requirements, if
so what?
- Theory
- Part of my plans involve solving a large sparse
eigenvalue problem. Software like Aztec is going
to be useful for this. - Astro
- We continue to rely on the availability of HDF5
v1.4.5 for our I/O needs on seaborg. HDF5 1.6.x
will not suffice as we have uncovered
show-stopping bugs in this release. - Expt
- Community collaboration software (CERN, ROOT,
) - Current install/maintenance procedures work well
147. Do you have special visualization
requirements,if so what?
- Theory
- We would welcome introduction of visual debugging
tools for fortran/C, especially for MPI or HPF
programs, if possible of course. - Astro
- Some, but most have been covered by the viz
group. - We continue to rely heavily on the NERSC viz
group to help us address our viz needs.
158. Is automating your work across multiple sites
important?  Called distributed workflow. Sites
could be other large  centers, a cluster, your
desktop, etc.
- Theory
- Yes. We are considering how to develop common
component software for nuclear physics problems.
The low-energy nuclear theory community will
increasingly move towards integrated code
environments. This includes data movement, and
workflow across several sites. (We do this now
with NERSC/ORNL/MSU). - I do a fair amount of post processing with
Speakeasy on my workstation. This involves mixing
results from NERSC, Argonne's parallel machines,
and Los Alamos' qmc at present.
168. Is automating your work across multiple sites
important?  Called distributed workflow. Sites
could be other large  centers, a cluster, your
desktop, etc.
- Astro
- Yes! We are currently working with the SPA
(Scientific Process Automation) team from the
SciDAC Scientific Data Managment ISIC on
automatic our workflow between NERSC and our home
computing site at Stony Brook. - Expt
- Yes. Experiment collaboration computing is spread
across large small sites and desktops. Need
more integration with security tools for a more
seamless environment.
179. Anything else important to your project?
- Theory
- Nersc is a great help, keep up a great work!
- My biggest concern with NERSC at the present time
is that it has fallen behind the curve on the
machine front. While I still consider NERSC a
valuable resource to my research, I have
diversified significantly during this FY. - Any performance tools, such as POE, that help
diagnose the bottlenecks in a code and helpÂ
suggest routes to improvements. - Astro
- Memory bandwidth and latency.
- Expt
- User management across site national
boundaries. Separate user registration
accounts across many sites will become too
burdensome. Think single sign-on seamless!
18Observations
- A strong need for greater inter-processor
bandwidth - Faster processors, more memory
- Single file system view across NERSC
- Greater parallel FS performance (gt1024)
- More space
- More/better data management tools
- Single sign-on across sites
- Help with Inter-site (WAN) issues
- Much scientific computing now has workflow across
several sites