Title: Current Fault Management (FM) Trends in NASA
1Current Fault Management (FM) Trends in NASAs
Planetary Spacecraft
FSW-08
- Lorraine Fesq
- Jet Propulsion Laboratory,
- California Institute of Technology
- 11/13/08
- JHU/APL, MD
This material is the culmination of many
peoples work. See Acknowledgements slide
2The Vision
Jim Adams, Deputy Director, Planetary Science
Division, NASA SMD Ken Ledbetter, Science Mission
Directorate Chief Engineer, NASA OCE
- Problem There exist schedule, cost and
predictability challenges on deep space and
planetary robotic missions when testing and
operating Fault Management (FM) systems. - Objective Identify issues plaguing FM systems
in unmanned, autonomous spacecraft today and make
recommendations for future missions - Approach Assemble key players in the spacecraft
fault management field across NASA, industry and
other organizations, to - Capture current state of FM
- Identify challenges associated with
engineering/operating FM systems - Characterize issues underlying the challenges and
propose steps to mitigate them - Discuss and document best practices and lessons
learned in FM - Explore promising state-of-the-art technology and
methodology solutions to identify potential
investment targets. - All attendees are participants
- Begin to form a community
- Deliverable White Paper for future programs and
technology development - Lessons Learned / Best Practices
- Opportunities for Investment
3Overview of FM Workshop
- Steering Committee John McDougal - MSFC, Chris
Jones - JPL, Steve Scott/Ray Whitley - GSFC,
George Cancro Dave Watson - JHU/APL - Technical Coordinator Lorraine Fesq - JPL
- Held April 14-16, 2008 in New Orleans, LA
- gt100 attendees from 31 organizations --
government, industry and academia
4FM Workshop Goals
- Identify issues plaguing Fault Management in
unmanned, autonomous spacecraft today - Provide guidance for future programs and
technology development - Lessons Learned
- Best Practices
- Opportunities for Investment
- Target Audience for White Paper Current and
future practitioners (FM, SE, SWE, VV, etc.),
proposal evaluators to assess viability of
proposals, reviewers and program managers to
evaluate credibility of program plans - Not looking to produce a recipe or a single
approach. Instead, identify expectations
associated with different approaches - Rise above institutional preferences!
This initial workshop was intended to start the
discussions. The next workshop should focus on
solutions to the issues.
5FAULT MANAGEMENT WORKSHOPAgenda
Identify and characterize cardinal issues via
Breakout Sessions
View Future Directions via Poster Sessions and
Invited Speakers
Expose current state of FM through Case Study
presentations
6General Observations
- Three key concepts emerged during the workshop
- FM Architectures are surprisingly similar across
organizations Monitors/Alarms and Responses - FM in current missions not limited by technology,
but by a lack of engineering and programmatic
discipline. - In-flight performance of the FM systems on
participating projects has been successful.
However, getting there wasnt pretty.
7FM WORKSHOP RESULTS
The Dirty Dozen Key Findings and Recommendations
8FM WORKSHOP RESULTS
The Dirty Dozen Key Findings and
Recommendations- cont.
9FM WORKSHOP RESULTS
The Dirty Dozen Key Findings and
Recommendations- cont.
10Finding 1 Avoid the downstream testing crunch
- Finding 1 Unexpected cost and schedule growth
during final system integration and test are a
result of underestimated Verification and
Validation (VV) complexity combined with late
resource availability and staffing.
Managements solution Add more people to get
caught up!
Lesson learned Minimize downstream testing
complexity
Recommendation 1a Allocate FM resources and
staffing early, with appropriate schedule,
resource scoping, allocation, and prioritizing.
Schedule VV time to capitalize on learning
opportunity. Recommendation 1b Establish
Hardware / software / sequences /operations
function allocations within an architecture
early. Recommendation 1c Engrain FM into the
system architecture. FM should be dyed into
design rather than painted on.
11Finding 2 Find a home for FM within Project
organization
- Finding 2 Responsibility for FM currently is
diffused throughout multiple organizations
unclear ownership leads to gaps, overlap and
inconsistencies in FM design, implementation and
validation. - Inefficiencies in defining the task
- Fault Mission Set and failure modes defined by
SE, FM or SMA? - Systems Engineers responsibility -- or not?
- Weak FM definition and requirements delay testing
- Recommendation 2 Establish clear roles and
responsibilities for FM engineering. Rotate
engineers through different project roles. Find
a home for FM.
12Finding 3 Standardize FM Terminology
- Finding 3 There is a lack of standard
terminology of FM systems that causes problems in
reviews and discussions. - Different Interpretation of same term e.g.,
Single Fault Tolerance may or may not include
SEUs and Operator errors Fault, Autonomy - Two terms to mean the same thing e.g., Fault
Protection Redundancy Management FDIR - Many papers/presentations begin with definitions
of terms -gt immature discipline
- Recommendation 3 and O for I Standardize FM
terminology to avoid confusion and to provide a
common vocabulary that can be used to design,
implement and review FM systems. - Low-hanging fruit
- Performed across larger community to achieve
consensus and acceptance - Tailor for individual missions
Opportunity for Investment (O for I) Develop a
Practitioners Handbook that will provide
practitioners with standard terminology to use
throughout and across projects.
13Finding 4 Identify FM representation techniques
and FM design guidelines
- Finding 4 There is insufficient formality in
the documentation of FM designs and
architectures, as well as a lack of principles to
guide the processes. - Difficult to visualize FM systems and to assess
changes - Causes lack of reviewability
- Little effort to date devoted to applying
rigorous architectural specs and representations
- Recommendation 4a and O for I Identify
representation techniques to improve the design,
implementation and review of FM systems. - Representation suggestions SysML? Statecharts?
- Recommendation 4b and O for I Establish a set
of design guidelines to aid in FM design. - Establish and document FM design principles
e.g., provide a safety net - software FMEA/FTA is a relatively new area for
aerospace, but represents a potentially important
approach to overall FM design
O for I Develop a Practitioners Handbook that
will provide practitioners with FM representation
techniques and FM design guidelines/principles.
14Finding 5 Establish FM Metrics
- Finding 5 Metrics have not been established to
evaluate the appropriateness or measure the
progress of FM systems. - Functional properties (e.g., diagnostic coverage
of the fault space, timing and responsiveness of
the fault responses, and determinism) and
Non-functional properties (e.g., testability,
useability, and maintainability) - Performance Measures and progress metrics
- Recommendation 5a Identify FM as a standard
element of the system development process (e.g.,
separate WBS) to promote innovative solutions and
realistic estimates of complexity, cost,
schedule. - Recommendation 5b and O for I Establish
evaluation criteria, metrics and process
specifications with milestones that will allow
proposal evaluators and project teams to assess
the FM relevance, merits and progress.
Note We do not recommend a standard architecture
for all missions on the contrary, risks,
constraints, and requirements across different
mission classes classes imply fundamentally
different approaches to FM
O for I Develop a Practitioners Handbook that
will provide practitioners with FM metrics and
process specifications.
15Finding 6 Apply CPI to FM
- Finding 6a Practices, processes, and tools for
FM have not kept pace with the increasing
complexity of mission requirements and spacecraft
systems - More capable complex missions gt growth in
number of potential fault scenarios - Finding 6b - Indications of potential spacecraft
anomalies exist in test data, but are not always
observed or not adjudicated. - test programs are becoming overwhelmed with data
- Recommendation 6a Design for testability
Architectures should enable post-launch and
post-test diagnosis. - Establish sensor placement for diagnosability --
recommendations from FM? - Recommendation 6b Examine all observed,
unexpected behavior. - all anomalies in VV should be adjudicated
- Recommendation 6c Implement CPI for FM
lifecycle. - Emphasis should be on learning and applying from
the VV process - Recommendation 6d and O for I Catalog and
integrate existing FM analysis and development
tools to identify capability gaps in the current
generation of tools, and to facilitate technology
development to address these gaps.
O for I Develop tools to fill the capability
gaps e.g., complexity analysis tools for concept
development and requirements definition,
evolvable system models.
16Finding 7 Assess mission-level requirements on
FM complexity
- Finding 7 The impact of mission-level
requirements on FM architecture complexity and
VV is not fully recognized. - Fail Operational
- Number of interactions number of
monitors/responses2
- Recommendation 7 Review and understand the
impacts of mission-level requirements on FM
complexity. FM designers should not suffer in
silence, but should assess and elevate impacts to
the appropriate levels of management.
17Finding 8 Assess if FM architecture is
appropriate for Mission
- Finding 8a FM architectures often contain
complexity beyond what is defined by project
specific definitions of faults and required fault
tolerance. - Programming model useful for other purposes.
Flexibility friend or foe? - Junk drawer concept e.g. perform resource
management - Finding 8b Increased FM architecture
complexity leads to increased challenges during
IT and mission operations. - Emergent behavior
- Variations in initial conditions cause differing
results Fly as you test operational
constraints
- Recommendation 8 Assess the appropriateness of
the FM architecture w.r.t. the scale and
complexity of the mission, and the scope of the
autonomy functions to be implemented within the
architecture.
O for I Create a trade space of existing (and
future) FM architectures showing how each handles
complexity, flexibility, growth, risk,
testability, etc.
18Finding 9 Establish and maintain risk tolerance
- Finding 9 FM architecture development is
subject to changing priorities toward cost and
risk over the course of system development. - Early phases -gt cost
- Late phases -gt risk
- FM Flexibility used as a response to accommodate
buying down risk
- Recommendation 9 Define and establish risk
tolerance as a mission-level requirement.
O for I Define Risk Posture for each NASA
Mission Class, and identify the resulting impacts
to FM testing.
19Finding 10 Be skeptical of inheritance claims
- Finding 10a The bulk of existing FM systems
(e.g., mission-specific monitors and responses)
is not inheritable. Heritage, similarity and
inheritance assumptions tend to underestimate
budgeting for necessary VV activities and review
milestones. - Timing issues, Environmental differences
- Finding 10b Current FM systems do not support
significant re-use. - Cultural differences
- Recommendation 10 Examine claims of FM
inheritance during proposal evaluation phase to
assess the impacts of mission differences.
20Finding 11 Provide adequate testbed resources
- Finding 11 Inadequate testbed resources is a
significant schedule driver during VV. - FM VV often delayed until other subsystems
tested - Lack of simulation fidelity cause incorrect test
results
- Recommendation 11 Develop high-fidelity
simulations and hardware testbeds to
comprehensively exercise the FM system prior to
spacecraft-level testing.
21Finding 12 Capture and understand FM cultural
differences among aerospace organizations
- Finding 12 Organizations have different and
sometimes conflicting institutional goals and
risk postures that drive designs, architectures
and VV plans in different directions, causing
friction between customers and contractors. - The why often is missing --gt lack of
documentation - Differences in culture -- Institutional fears
probable faults vs possible faults
- Recommendation 12 Collect and coordinate FM
assumptions, drivers, and implementation
decisions into a single location that is
available across NASA, APL and industry.
Utilize this information to establish / foster
dedicated education programs in FM.
O for I Develop course material and identify
courses that can be augmented with FM training.
O for I Develop a Practitioners Handbook that
will provide practitioners with necessary
background to make informed decisions.
O for I Develop a Textbook containing the
background and historical perspective that will
be used to train the next generation of FM
practitioners.
22FM Roadmap Opportunities for Investment
2009
2012
2015
2018
- Standardize FM Terminology -------gt
- Establish FM Performance Metrics -------gt
- Create trade space of FM architectures -------gt
- Define establish FM evaluation criteria -------gt
- Establish FM as WBS element -------gt
- Identify FM deliverables by Mission Phase
-------gt
Methodology Standards
- Standardize Process -------gt
- Establish FM Process specification -------gt
- Define Risk posture for each Mission Class
-------gt
- Estimate Metrics for IT scheduling -------gt
- Establish NASA FM Working Group -------gt
- Perform Tool Surveys (e.g., VV, design) -------gt
- Develop situation awareness tools for operation
-------gt
- Design Tools/Representation Methods to Specify
Review FM designs -------gt
- Design Tools for Formal VV methods -------gt
Technology
- Establish cost/risk estimation techniques
-------gt
- Identify/develop tools for complexity analysis
-------gt
- Identify/develop tools for complexity management
-------gt
- Share findings with larger community through
presentations, publications, workshops -------gt
- Develop training material for current NASA
courses -------gt
Education Training
- Develop text for university programs -------gt
- Develop university programs -------gt
- Develop FM Practitioners Handbook -------gt
2009
2012
2018
2015
Sample Target Missions
IXO (2014)
Mars Astrobiology Field Lab (2016)
SAFIR (2018)
Solar Probe Plus
Lunar Sample Return
JWST (2013)
GRAIL (2011)
MSL (2011)
JUNO (2011)
23Current/Future Plans
- White Paper to be published this month, capturing
Lessons Learned, Best Practices and Opportunities
for Investment - Make use of the findings
- Keep the dialog going
- Explore standardization opportunities -- are
there areas that NASA should standardize? (e.g.,
terminology and taxonomy) - Training/Education -- within our community and
students for pipeline - Collect Schedule and Cost metrics to form bases
of estimates - Opportunities for in-flight Technology
Demonstrations (e.g., via extended missions) - Show the business case (return on investment
possibilities) in terms of reducing technical and
programmatic risk by utilizing some of these
techniques - Propose a 2nd Planetary Spacecraft FM Workshop to
identify solutions/approaches to dealing with the
issues
24Acknowledgements
- Program Chair and Sponsor Jim Adams, Deputy
Director, Planetary Science Division, Science
Mission Directorate, NASA HQ - Program Host Paul Gilbert, Manager, Discovery
and New Frontiers Office, NASA MSFC - Workshop Director John M. McDougal, MSFC
- Steering Committee
- George Cancro, JHU/APL
- Chris Jones, JPL
- John McDougal, MSFC
- Steven Scott /Ray Whitley GSFC
- Venue Organizer Pauline Burgess, NRESS
- Workshop Technical Coordinator and Point of
Contact Lorraine Fesq, JPL, 818-393-7224,
Lorraine.M.Fesq_at_jpl.nasa.gov - Workshop Organizers and White Paper Authors
- Mitch Ingham, JPL
- Jesse Leitner, GSFC
- Marilyn Newhouse, CSC
- Eric Rice, JPL
- David Watson, JHU/APL
- Julie Wertz, JPL