Title: Chief Architect Redux Paul Messina
1Chief Architect Redux Paul Messina
2Redux, an archaic term
- Redux brought back resurgent
- Example the Victorian era redux
- says something about my age?
3On the role of the Chief Architect
- As chief architect of NPACI, I plan to work with
the partnership to review and evolve our visions,
goals, and objectives and integrate them into
achievable plans - in the form of recommendations to NPACI
management - just as an architect spends many hours with
clients asking them about their vision for the
new or remodeled building
4An Ambitious Plan
- One of my chief concerns is whether I will be
able to devote enough time and energy to this - but I intend to try
5The Proposal for the Distributed Terascale
Facility
- In reassuming the role of Chief Architect, my
first major task is to participate in preparing
our response to the DTF solicitation (jointly
with the Alliance) - hopefully the lessons I learned in the DOE ASCI
program will be relevant for the DTF process and
its design - The DTF solicitation is largely motivated by the
new technologies and scientific research models
that Fran mentioned in her talk
6Highlights of the NSF RFP for aDistributed
Terascale Facility
- NSF seeks to fund an advanced "distributed
facility" that will demonstrate - both single site, and "Grid enabled" capabilities
for - both simulation and data exploration
- beyond what is available at current PACI sites
7Examples of DTF Resources given in the NSF RFP
- One computing system capable of five or more
teraflops (peak) performance located at a single
site - Another large, but not necessarily comparably
configured system at another site coupled with
the first to test large scale distributed
computing across the DTF and other resources
8Forward Looking
- The proposed distributed facility will add to the
already existing capabilities provided by NSF and
form the foundation of a distributed
computational infrastructure that will - meet the growing demands for modeling and
simulation as well as - anticipate the current and future needs of the
scientific and engineering communities dealing
with exceptionally large data intensive
information management applications
9Emphasis on Big Data
- The distributed facility will include substantial
support for accessing, analyzing, processing,
transmitting, and visualizing multi-terabyte data
collections of current and future interest to the
U.S. research community - This will require the DTF to have
- terabytes to petabytes of online and archival
storage available for user access and - multi-gigabit per second network connectivity
10Our System Architecture Approach
- Based on usage models, derived by determining
needs and approaches of relevant user communities
and projects - large-scale simulations
- Brain data
- PDB
- LIGO
- NVO
- LHC
- GriPhyN
- etc.
11The DTF proposal as a springboard for our renewal
- As you can see, the DTF that is envisioned
involves more than getting bigger computing
capability -- it is meant to enable new modes of
research - in addition to enhancing the infrastructure for
existing applications - Our development activities will need to focus
much more on creating usable grids with
distributed teraflops and terabyte resources - and grid software will have to be supported for
production use
12What I learned the last two years
- So how was my sabbatical in Washington DC?
- A lot of dedicated hard-working people and some
not so - bimodal distribution
- Did it indeed give me insights on how government
works and on how to advance science and
technology? - Ostensibly that is a reason for institutions to
send people to DC - in addition to public service, which was my
motivation
13Lessons of scale
- How much it costs to do development of
infrastructure - system software and tools
- How much it costs to develop multi-disciplinary
codes - large teams with diverse talents
- code control
14For many tasks, dedicated people are better than
fractional people
- I certainly observed this in some previous
projects that involved development, notably the
Scalable I/O initiative - one postdoc who had no other responsibilities did
much more than 1 FTE spread among many people - But the same is true in management -- it requires
dedicated time and great focus - a big challenge is finding people with the right
skills and the time to devote to this
15Development projects need to be managed
differently from research
- Project planning and monitoring really pays off
- milestones at various levels
- Gantt charts
- technology roadmaps
- program plans
- implementation plans
- But do we have enough people with the time and
inclination?
16Communication among the participants is key to
successful distributed projects
- OK, so this is not a deep new insight, but shapes
my thinking - It is possible to work with distributed teams
whose normal instinct and tradition is to
compete, not collaborate - requires regular and frequent communication
17(Some) Users will adopt new technologies
- Even conservative code developers will use new
tools -- if they actually work - visualization facilities
- powerwalls
- I-desks
- debuggers
18Pound wise and penny foolish (sic)
- Sometimes its best to introduce new capabilities
with expensive but mature tools - use of SGI Origin 2000s with 16 or even 20 IR
engines - shockingly expensive, but they worked, at scale
- my initial misgivings were erased when we saw
that the new viz capabilities were instrumental
in achieving some serious application milestones - the cost of developing the apps is high
- the cost of failure to meet milestones is very
high
19Pushing the envelope entails risks
- Having more than one approach is valuable risk
mitigation, when one is working on ambitious
objectives on a tight schedule - only one team achieved each of the first two
nuclear weapons application milestones on
schedule - a different team in each case
20So how does this relate to NPACI?
- Well, based on those recent experiences, you can
expect that I will be striving to ensure that - we allocate adequate resources to our development
tasks, even if it means undertaking fewer
projects - we (not just me) spend more time planning the
interplay between the projects, and tracking
progress
21Its good to be back
- And I look forward to working with many of you
over the coming months