Title: Mars%20and%20Beyond:%20NASA
1Mars and Beyond NASAs Software Challenges in
the 21st Century
- Dr. Michael R. Lowry
- NASA Ames Research Center
2Outline
- NASAs Mission
- Role of Software within NASAs Mission
- The Challenge Enable Dependable SW-based Systems
- Technical Challenges
- Scaling !
- System-software barrier
- Software is opaque and brittle in the large
- Reasons for Optimism
3NASAs Vision
- To improve life here
- To extend life to there
- To find life beyond
NASAs Mission
- To understand and protect our home planet
- To explore the universe and search for life
- To inspire the next generation of explorers
- as only NASA can
Biological Physical Research
5 Strategic EnterprisesOne NASA
Space Science
Earth Science
Aerospace Technology
HEDS
4Software Growth in Aerospace MissionsSoftware
Enables NASAs Missions
F-22 (PROJECTED)
10,000
SHUTTLE/ OPERATIONAL
Piloted Systems
B-2
F-15E
SHUTTLE/OFT
1,000
C-17 PROJECTED
B-1B
B-1A
F-16 C/D
Instructions (Equivalent Memory Locations in K)
Unpiloted Systems
AWACS
GEMINI 8
100
GALILEO
P-3A
MISSILE
PERSHING 11
APOLLO 7
F-111
VIKING
F-111
MERCURY 3
PERSHING 11 (AO)
GEMINI 3
C-5A
10
TRIDENT C4
A7D/E
GEMINI 3
VOYAGER
PERSHING 1
TITAN 111C
TITAN
POSEIDON C3
MARINER
PERSHING 1A
VENUS
1
SURVEYOR
MERCURY
1960
1965
1970
1975
1980
1985
1990
1995
Year
(Doubling every 3 or 4 years)
Source AF Software Technology Support Center
5The Challenge Software Risk Factors
6Mars Climate Orbiter
- Launched
- 11 Dec 1998
- Mission
- interplanetary weather satellite
- communications relay for Mars Polar Lander
- Fate
- Arrived 23 Sept 1999
- No signal received after initial orbit insertion
- Cause
- Faulty navigation data caused by failure to
convert imperial to metric units
7MCO Events
- Locus of error
- Ground software file called Small Forces gives
thruster performance data - This data is used to process telemetry from the
spacecraft - Spacecraft signals each Angular Momentum
Desaturation (AMD) maneuver - Small Forces data used to compute effect on
trajectory - Software underestimated effect by factor of 4.45
- Cause of error
- Small Forces Data given in Pounds-seconds (lbf-s)
- The specification called for Newton-seconds (N-s)
- Result of error
- As spacecraft approaches orbit insertion,
trajectory is corrected - Aimed for periapse of 226km on first orbit
- Estimates were adjusted as the spacecraft
approached orbit insertion - 1 week prior first periapse estimated at
150-170km - 1 hour prior this was down to 110km
- Minimum periapse considered survivable is 80km
- MCO entered Mars occultation 49 seconds earlier
than predicted - Signal was never regained after the predicted 21
minute occultation - Subsequent analysis estimates first periapse of
57km
8Contributing Factors
- First 4 months, AMD data unusable due to file
format errors - Navigators calculated the data by hand
- File format fixed by April 1999
- Anomalies in the computed trajectory became
apparent almost immediately - Limited ability to investigate the anomalies
- Thrust effects measured along Earth-spacecraft
line of sight using doppler shift - AMD thrusts are mainly perpendicular to line of
sight - Failure to communicate between teams
- E.g. Issue tracking system not properly used by
navigation team - Anomalies were not properly investigated
- Inadequate staffing
- Operations team were monitoring three missions
simultaneously (MGS, MCO and MPL)
- Operations Navigation team unfamiliar with
spacecraft - Different team from the development and test team
- This team did not fully understand the
significance of the anomalies - Assumed familiarity with previous mission (Global
Surveyor) was sufficient - did not understand why AMD was performed 10-14
times more often - (MCO has asymmetric solar panels, whereas MGS had
symmetric panels) - Inadequate Testing
- Software Interface Specification was not used
during unit testing of small forces software - End-to-end test of ground software was never
completed - Ground software was not considered mission
critical so didnt have independent VV - Inadequate Reviews
- Key personnel missing from critical design reviews
9Analysis
- Software size, S, increasing exponentially(doubli
ng every three or four years) - Errors, cost over-runs, schedule slip due
primarily to non-local dependencies during
integration (SN , with Nlt2, best calibration
N1.2 )
Source Professor Barry Boehm, Author of Software
Cost Modeling
10Predicted Errors as LOC GrowsCurrent SW
Practices/Technology
Cassini
MPL
Errors e ? SN where S is the number of modules
(LOC/M), and error rate e 1/10,000
11Future Mars Exploration MSL and MSR
12Beyond Mars JIMO and TPF
13Technical Challenges and Opportunities
- System-software barrier
- (Verification is easy, validation is hard)
- Software is transparent and malleable in the
small - But opaque and brittle in the large
- General-purpose software dependability tools work
well in the small - But fail to scale to systems in the large.
- But there is Reason for Optimism
- Align software architectures with system analysis
- Success of formal methods in related field of
digital hardware - Scaling through specialization
- Divide and Conquer compositional reasoning
- Beyond correctness exploiting the lattice
between true and false for software understanding - Providing the research community with realistic
experimental testbeds at scale
14Scaling through SpecializationPractical Static
Analysis
SPECIALIZED ANALYZERS
GENERAL-PURPOSE ANALYZERS
15Explaining the Cause of an Error
A model checker can automatically find a trace
that show the error appearing
Hard to Show Error Testing cannot reliably show
the error appearing, since it may require
specific environment actions (inputs) or
scheduling (for concurrency errors)
Hard to Find Cause of the Error Once we know a
way to show the error it is difficult to
localize the root cause of the error
Now we can automatically find an explanation for
the error from the error trace produced by the
model checker and the original program
The algorithm uses model checking to first find
similar traces that also cause the error
(negatives) and traces that do not cause the
error (positives)
Set of PositivesTraces that dont show the error
Set of Negatives Traces that show different
versions of the error
Analysis
16Generalized Symbolic Execution for Model Checking
and Testing
- Future mission software
- concurrent
- complex, dynamically allocated data
- structures (e.g., lists or trees)
- highly interactive
- with complex inputs
- large environment
- - should be extremely reliable
Input plan
execute action
environment/ rover status
complex input structure
large environment data
concurrency, dynamic data (lists, trees)
Current practice in checking complex software
- testing - requires manual input -
typically done for a few nominal input cases -
not good at finding concurrency bugs - not good
at dealing with complex data structures
- model checking - automatic, good at finding
concurrency bugs - not good at dealing with
complex data structures - feasible only with a
small environment - and a small set of input
values
Our novel symbolic execution framework - extends
model checking to programs that have complex
inputs with unbounded (very large) data -
automates test input generation
e0
true
false
Analysis of deleteFirst with our framework
e0 lt 10
e0 10
Framework
true
e0 10 /\ e0lt0
e0
e1
FALSE
Precondition acyclic list
- simulate the code using symbolic values instead
of program data enumerate the input structures
lazily
Numeric Constraints
Decision Procedures
( unknown yet)
- modular architecture can use different model
checkers/decision procedures
17System-Level Verification
- check (system-level) integration properties based
on module specifications - module hierarchy and interfaces used for
incremental abstraction - architectural patterns potentially reusable
- generate module/environment assumptions
- check implementation modules against their design
specifications - monitor properties that cannot be verified
- monitor environment assumptions
18Module Verification
- Modules may require context information to
satisfy a property - Assumption Module Property (assume
guarantee reasoning)
Module
Property
a
c
b
Assumption
Environment
19Mission Manager Viewpoint
Asking the Right Questions When can we stop
testing? What process should we use? What is the
value of formal methods?
Qualitative Correlative Model Peer Review
superior to testing for incorrect spec Model
Checking for uncertain environments
Quantitative Predictive Model Mission trade
studies how muchcost for acceptable
risk Development optimize use of Assurance
technologies Mission increase use of CPUcycles
for software monitoring
20HDCP Goals
- The overall mission of the HDCP project is to
increase the ability of NASA to engineer highly
dependable software systems - Method
- Science of Dependability
- Develop better ways to measure and predict
software dependability - What are the potential measurables for the
various attributes? - How can we move past the present surrogates and
approach the artifact more directly? - Empirical evaluation
- of NASA and NASA-contractor dependability
problems - of technologies and engineering principles to
address the problems - Testbeds
- Development of realistic testbeds for empirical
evaluation of technologies and attributes. - Intervention technologies
21Active MDS Testbed Projects
- Golden Gate Project
- Demonstrate that RT-Java is suitable for mission
systems - Drive MDS/RTSJ rover at JavaOne
- Collaborators Champlin, Giovannoni
- SCRover Project
- Develop rover testbed
- Collection defect and process data for experience
base - Collaborators Boehm, Madachy, Medvidovic, Port
- Dependability cases
- Develop dependability cases for time management
and software architectures - Collaborators Goodenough, Weinstock, Maxion,
Hudak - Analysis of MDS architectural style
- Analysis based on MDS use architectural-components
types - Collaborators Garlan
- Process improvement
- Data collection from mainline MDS and SCRover
development efforts - Collaborators Johnson, Port
22MDS in 1 Minute
- Problem Domain
- Mission information, control, and operations of
physical systems - Developed for unmanned space science missions
- Scope includes flight, ground and simulation/test
- Applicable to robots that operate autonomously to
achieve goals specified by humans - Architecturally suited for complex systems where
everything affects everything
- Approach
- Product line practice to exploit commonalities
across missions - An information and control architecture to which
missions/products conform - A systems engineering process that is
analytical, disciplined, and methodical - Reusable and adaptable framework software
23Managing Interactions
A unified approach to managing interactions is
essential
- Complex interactions make software difficult
- Elements that work separately often fail to work
together - Combinatorics of interaction is staggering, so
its not easy to get right - This is a major source of unreliability
- There are two approaches to this in MDS
24MDS isState-Based Architecture
State variables hold state values, including
degree of uncertainty
A goal is a constraint on the value of a state
variable over a time interval
Estimators interpret measurement and command
evidence to estimate state
Models express mission-specific relations among
states, commands, and measurements
Controllers issue commands, striving to achieve
goals
Hardware proxies provide access to hardware
busses, devices, instruments
- Key Features
- Systems analysis/design organized around states
and models - State control architecturally separated from
state determination - System operated via specifications of intent
goals on state
25From theory to flight...JPL Transition Path
- Mars Smart Lander (MSL) Technology Infusion
- Scheduled Launch 2009
- MSL has baselined MDS technology
- System engineering
- Software frameworks
- MSL Technology Gates
- PMSR August, 2004
- Integrated demo June, 2005
- PDR February, 2006
- MSL sample technology categories
- Software architecture with infused technologies
- Verification and Validation tools and
methodologies - Processes and supporting tools
- Cost modeling for system engineering, software
adaptation and autonomy validation
MDS compatible technologies are directly relevant
to MSL
26Conclusions
- System-software barrier
- (Verification is easy, validation is hard)
- Software is transparent and malleable in the
small - But opaque and brittle in the large
- General-purpose software dependability tools work
well in the small - But fail to scale to systems in the large.
- But there is Reason for Optimism
- Align software architectures with system analysis
- Success of formal methods in related field of
digital hardware - Scaling through specialization
- Divide and Conquer compositional reasoning
- Beyond correctness exploiting the lattice
between true and false for software understanding - Providing the research community with realistic
experimental testbeds at scale