Title: STEREO Flight Software Diagnostic and Health Monitoring Capabilities
1STEREOFlight Software Diagnostic and Health
Monitoring Capabilities
- Kevin Balon
- JHU/APL
- FSW-07 Workshop
- Laurel, MD
- 5/6-Nov-2007
2STEREO Architecture Overview
A
B
- 2 Spacecraft - each Single-String
- Redundant IMUs
- 2 Flight Processors
- CDH-cpu
- Runs 1 of 2 Flight Applications
- Safing EA-application (Earth Acquisition), or
- Normal CDH-application
- Manages 1-GB Solid-state Recorder (SSR)
- GC-cpu
- Single-Flight Application
- GC-application
- Not needed for EA
DSN
Mission Ops Control
3STEREO Block Diagram
Ground Segment
Spacecraft
IEM
1553
GC cpu RAD6000
IDPUs (3)
RT
RT
GC app
Star Tracker
RT
RT
IMU
PCI
PDU
Sun Sensors
CDH/EA cpu RAD6000
HGARA
BC
Wheels (4)
CDH or EA app
RT
Propulsion
PSE
Interface Board / CCD
Serial Relay Commands
I2C
TRIOs (7)
RT
Trans-ponder
Serial Uplink, Downlink Data
1 Gbyte SSR
Cmd/Tlm ConsoleApplication
Event/Anomaly Application
4STEREO FSW Diagnostic System
- Event and Anomaly System
- Integrated system between Flight Software and
Ground - ICCR
- Inter-CSCI Communication Region
- CSCI Computer Software Configuration Items
- ?Diagnostic RAM preserved across resets
- Telemetry and Solid-State Recorder (SSR)
- Automated daily reports to FSW-team
- Summary reports of any anomaly activity
5Evolution of onboard Diagnostics
- STEREOs Diagnostics evolved from earlier APL
Missions - NEAR
- TIMED
- CONTOUR
- MESSENGER
- Concepts
- Routine dumping of onboard data structures
- Command History Logs
- RAM preserved across resets
- ? Desire for more detailed data to aid in
diagnostics
6Key Definitions
- Event
- indicates something has occurred on the
spacecraft - usually the result of a commanded action
- Anomaly
- indicates something erroneous has occurred.
- Note An anomaly is an event but an event
is not necessarily an anomaly. - Examples
- Events
- EVNT_CMD_EXECUTE_SUCCESS
- EVNT_PPT_BATT_PRESS_LIMIT
- Anomalies
- ANOM_CMD_CHECK_FAILURE
- ANOM_CMD_EXECUTE_FAILURE
- ANOM_PPT_ALGOR_PDU_ADC_ERR
- ANOM_TLM_INGEST_DAC_ERR
7Event and Anomaly System
- Spacecraft
- Used on each Flight Processor (2 per S/C)
- Common code used in all 3 Flight Applications
(CDH, EA and GC) - Provides monitoring and packing into telemetry
of - Events
- Anomalies
- Ancillary Data for each
- Events and Anomalies are unique
- Deliberate uniqueness and individuality of Events
and Anomalies by developers - 50 Unique Events
- 900 Unique Anomalies
- Enables ease of localizing errors (via code grep)
- Event and Anomalies are relayed via
- Real-time packets
- Playback Telemetry from SSR
- Ground-System
- Two applications
- The standard Cmd/Tlm system used by operators
(EPOCH) - A special application for Event/Anomaly
monitoring
8Event and Anomaly Application Sample Output
Command Source (i.e., realtime, onboard macro,
etc)
SpacecraftTime
Flight Software Task Context
Event/Anom Mnemonic
Event/Anom Hexcode Ancillary Data
Which CPU (CDH or GC)
Anomalies (interspersed within Commanding Event
context)
9ICCR Inter-CSCI Communication Region
- Shared RAM
- 26kbytes on each CPU
- Preserved across resets
- Maintains
- Event and Anomaly arrays
- Latest 12 Events Latest 12 Anomalies
- Other diagnostic info
- (cpu registers, exceptions, SSR pointers, GCs
IMU and wheel info) - (up to) 4 copies
- Primary Most recent reset
- Backup_1 diag info recorded BEFORE the most
recent reset - Backup_2 diag info recorded AFTER the first
reset (typically the 1st reset after the
power-on) - Backup_3 diag info recorded BEFORE the first
reset (typically the power-on reset)
10Anomaly System Other Attributes
- Anomaly Suppression
- Allows mission operators to mask ongoing nuisance
anomalies, enabling increased visibility into
other possible anomalies. - Anomaly Buffers
- Anomaly History Buffer
- One-bit per anomaly code
- Latched summary of ALL (900) anomalies
- Provides a quick snap-shot of S/C health to
Mission Operators upon each initial contact. - Buffer cleared just prior to loss of contact.
- Anomaly Persistence Buffer
- Reflects Anomaly Start and Anomaly Stop
i.e. edge triggering - Anomaly Activity Buffer
- Similar to Anomaly History Buffer, but is
- periodically cleared by software at a
programmable interval. - Event and Anomaly Dump
- Buffered events and anomalies may be flushed upon
command - Event and Anomaly Counters
- Latest Anomaly Count
- Suppressed Anomaly Count
11Solid State Recorder
- Solid State Recorder Stores
- Primary Science data from the instruments
- Samples of S/C engineering data
- Flight-software Event and Anomaly Packets
- SSR Black-box Telemetry Recording
- Records Tlm Packets in reconfigurable partitions
- Block-on-fill
- Overwrite-on-fill
- Managed by Mission Operations Team
- Records telemetry packets while S/C is
out-of-view - Playback of telemetry packets when S/C is in
contact
12Other Diagnostic Facilities
- H/K Telemetry
- (Aforementioned) Event and Anomalies
- Visible Counters (Registers)
- Success and Failure Counters
- Telltales
- Data from Hardware
- Memory-Objects Parameters and Structures
- Flight-Software Task Health Monitoring
- Memory Diagnostics
- 1553 Bus Diagnostics
- Solid-State Recorder (SSR) Diagnostics
- Extensive set of Flight Software Documentation
- User Manual
- Command/Telemetry Document
- Event/Anomaly Document
- Memory Object Document
- STEREO FSW Design Document
13Spacecraft Monitoring
- Primarily performed by the MOPs Team
- Monitoring of Spacecraft H/K Telemetry
- makes use of the Event/Anomaly System
- Monitoring of the Event/Anomaly Application
- Playback of SSR Black-box telemetry
- Daily emails to FSW-Team
- Anomalies are automatically sent to the FSW-team
(via Cron-job) - Helps keep FSW-team informed on S/C ops and S/C
Health - ?Increased situational awareness
14Summary STEREO Diagnostics
- STEREO provides extensive on-board diagnostics
- Event and Anomaly System
- Both spacecraft and a ground-based software
components - Provides visibility of both nominal and anomalous
conditions - Provides an operational context
- Provides unique identifiers
- Common to all flight applications and all flight
CPUs - Shared RAM preserved across resets
- Black box housekeeping on the SSR
- Daily automated reporting system
- Email to FSW-developers and other interested
parties. - Results
- Increased operational situational awareness to
both Mission Operations and Developer teams. - Useful during development and IT
15Backup Material
16Event and Anomaly Data
- 12 Events (or Anomalies) fit into 1 - 272-byte
STEREO Telemetry Packet
17Anomaly Code Example
Anomaly Recorded
18ICCR Contents (CDH-cpu)
19Abstract
- The Solar TErrestrial RElations Observatories
(STEREO) consists of two (nearly) identical
spacecraft, each with two single-string flight
CPUs running specific flight applications. The
STEREO flight software system was developed with
an extensive set of diagnostic capabilities.
Much of these diagnostics capabilities are common
to the three onboard flight applications Command
and Data Handling (CDH), Earth Acquisition (EA)
and Guidance and Control (GC) resulting in a
uniform set of diagnostic facilities. These
diagnostic facilities have proven to be of great
utility during development and flight operations.
Specific health monitoring capabilities were a
result of STEREOs single-string CPU
architecture. This presentation will provide an
overview of STEREOs diagnostic and health
monitoring capabilities.