Title: Design of Memory Systems for Spaceborne Computers
1Design of Memory Systems for Spaceborne Computers
- Richard B. Katz
- NASA Office of Logic Design
- Flight Software Workshop 2007 (FSW-07)
- November 5-6, 2007
- Laurel, MD
2Memory Classification
- While normally associated with computers, some of
the concepts in this paper also apply to the
configuration memory of FPGAs. - Fixed
- The contents of the memory are physically fixed
by the structure of the memory element. - Examples core rope memories (wire wound through
or around a core), fusible link PROMs, and
antifuse-based PROMs. - Erasable
- The contents of the memory are non-volatile, like
the fixed memories, but the contents can be
changed. In many cases this involves an erase
operation and then a write. - Examples core, plated wire, electrically
erasable programmable read only memories
(EEPROM), erasable read only memories (EPROM),
ferroelectric memories, and flash. The ROM in
EPROM and EEPROM is a poor part of the name as it
implies permanence, which is incorrect. Devices
such as EEPROM may need refreshing over long
missions as many are rated with a 10 year storage
lifetime, giving them dynamic characteristics. - Volatile
- The contents of the memory are volatile they do
not retain contents either after the cycling of
power or during brown out conditions. This
class is subdivided into two subclasses, static,
which will retain state indefinitely and dynamic,
where the memory must be read and subsequently
refreshed. - Examples include SRAM, DRAM, and SDRAM.
3Requirement Design Against Any Credible
Off-Nominal Event
- These Events Are Considered Both Credible and
Likely Power Transitions and Disruptions - Power Up Transient
- Power Down Transient
- Glitches or brownouts on power lines
- Software Faults
- Cell and Device Failure
- Asynchronous Reset
- Some observations
- Difficult to design against and many current
designs do not properly protect against brownouts
and the power down transient. - FPGA-based control signals
4Software Faults
- Consider the likelihood of a software fault is
100. - Device Protection
- Many erasable devices implement software write
protection to prevent against inadvertent writes
to the memory. - JEDEC has published a standard on this type of
protection. - Do not keep the keys to unlock the memory
on-board unless absolutely necessary. - Subsystem Protection
- System level write protection limits, implemented
in hardware, to protect against software faults. - Some systems implement this in software which is
risky see bullet 1 above. - Use external hardware discrete command as an
additional barrier to prevent inadvertent writes.
5Cell and Device FailureGeneral Guidelines to be
Tailored for Each Mission and Application
- High-reliability, radiation-hardened CMOS RAM and
PROM is available. - Designing against cell and device failure should
be consistent with mission rules on single point
failures. - Examine radiation-hardened label carefully as
some devices marked as such are in fact SEU soft. - Commercial off the shelf (COTS) and Single Event
Upset (SEU) soft devices should have parity for
error detection or error detection and correction
(EDAC) circuits, as required for the application. - Analyze and test devices for lockup states.
These can occur in many memory types from illegal
loads into command registers, poor signal
integrity, poor power quality, or an SEU. Some
device lockup states require power cycling to
clear. - Consider the likelihood of an EEPROM or flash
device fault to be 100. There are enough
failures in the industry to justify such an
approach.
6Some Component ConsiderationsNon-volatile Memory
Lockup
SEFI data for the R1701L PROM This stuck at
mode, not necessarily 0, requires power cycling
of this serial device to clear. 5 See also
6 and other reports for similar results. t
SEE Test Results for AT28C010 (EEPROM) 4 Types
I and II are Single Effect Functional Interrupts
(SEFI) and required power cycling to restore
functionality. Errors can be multi-bit,
defeating SEC/DEC EDAC schemes.
Some but not all non-volatile memory components
can enter lockup states and become stuck,
requiring the cycling of power to restore
functionality. Careful system considerations for
the use of such devices is needed, with regards
to error detection and clearing, protection of
device I/O pins, and loss of system functionality
and propagation of errors until recovery is
achieved.
7Some Component ConsiderationsSynchronous DRAM
(SDRAM) Lockup
BURST LENGTH A2 A1 A0 M30
M31 0 0 0 1 1 0 0
1 2 2 0 1 0 4
4 0 1 1 8 8 1 0 0
RESERVED RESERVED 1 0 1 RESERVED
RESERVED 1 1 0 RESERVED RESERVED 1 1
1 FULL PAGE RESERVED
Loss of functionality for the Hyundai 256M SDRAM
(Auto Refresh Operation Mode) 7
Examination a command field, Burst Length, for a
Load Mode Register command for one SDRAM type.
SDRAMs contain finite state machines and some
models may lock up, requiring the cycling of
power, if RESERVED commands are loaded. For some
models, this can result in potential damage to a
device. Other methods of entering illegal and
potentially damaging states is via an SEU, as
shown in the chart on the right, and error in the
controlling device, poor signal integrity or poor
power quality. Careful system considerations for
the use of such devices is needed, with regards
to error detection and clearing, spare
replacement devices in the event of damage, and
loss of system functionality and propagation of
errors until recovery is achieved.
8Asynchronous Reset
- Consider the system effects on the memory
subsystem from an asynchronous reset. - Power disruption as discussed above, which are
included here. - Reset either from another on-board computer or a
ground command, perhaps in an attempt to clear a
fault. - Will write cycles be aborted while being setup or
in-process, leaving a non-volatile memory in an
undefined state or altering RAM contents from a
warm boot no longer valid? - Hardware memory controllers
- Flight software, which is often involved by some
systems in generating sequences and timing for
non-volatile memories. - Will hardware operations be given time and energy
to complete on-going operations? Many
non-volatile memory devices take on order of 10
ms to complete.
9Some Recommendations
- Boot and Safe-Hold Code
- High-reliability, radiation-hardened, fixed
memories should normally be employed for boot and
safe-hold functions. - For applications such as instruments, DMA
functions, properly implemented, can load
memories with boot code. In this case, the
instrument should be safed by hardware logic. - DMA functions should not require any operational
software. A hardware discrete command to clamp a
processor into reset is also recommended. - Hardware discrete commands should be used for
switching critical memory banks, not software. - Systems should require the minimum of resources
to function to enhance the probability of
survival in the presence of either faults or
off-nominal events.
10Saturn V Launch Vehicle Duplex Memory
Each of the two core memory units was accessed in
parallel and each contained parity. If an error
was detected in the memory unit currently
designated as prime, then data from the secondary
unit was used with the secondary unit now given
the prime designation. Hardware automatically
wrote corrected data upon the detection of an
error.
11Apollo Guidance Computer
The advantages of the ropes are numerous. The
program, once wired in, cannot be electrically
altered, a substantial asset for mission
reliability. 2 The permanent memory requires
very few active components and very little power
to operate, It also has properties that make it
indestructible short of mechanical damage, that
is, there is no inflight failure of any kind that
can destroy this part of the memory.
In case of inflight
failure that destroys the information in this
erasable memory the computation can be
restarted by reading in only a very few words.
3.
Memories in the AGC were single string each
memory used a parity bit for error detection.
Fixed storage was core rope, a permanent memory
technology, with coincident current core
implementing erasable memory. Involuntary
instructions, which operated as an interrupt and
not under program control, could shift data into
specific words of memory. Data could also be
entered via the astronauts keyboard and the the
"PACE" digital command system before launch. 3
12Galileo Attitude Control Computer
RTG Power For Keep-A-Live
RTG Power For Keep-A-Live
CMOS Memory Array
CMOS Memory Array
ROM
ROM
GSE/DMA
GSE/DMA
Arbiter/ Controller
Arbiter/ Controller
CDH/DMA
CDH/DMA
Memory units were accessed one at a time. There
was no parity and RAM contents were protected by
write protect registers and monitored by
checksums in the background. Primary and
secondary memory designs were switched via a
discrete command. ROM contents implemented
safe-hold mode. DMA was functional either with
the processor clamped in reset or executing
flight software. A heartbeat was sent to the
CDH via DMA.
13Single String Computer A
Single Board Computer
Conceptual diagram.
Code redundantly stored in three EEPROM modules.
Switching between copies is implemented in
software and all software must be running to be
able to accept and process the command to switch
images. The critical boot code and interrupt
vectors can not be made fault tolerant in this
software-centric architecture.
Command to the flight software.
µP
Logic Device
Simplified software-centric architecture.
Switching between critical boot sections is done
by software, leaving single point failures in
this architecture. There is no parity or EDAC.
Boot Code
Boot Code
Boot Code
EEPROM Module 1
EEPROM Module 2
EEPROM Module 3
14Single String Computer B
These two computers are based on the same base
SBC but reflect different engineering approaches.
Single Board Computer
Conceptual diagram.
Code redundantly stored in three EEPROM modules.
Switching between copies is implemented in
hardware by an external discrete command.
µP
Hardware command selects between one of two spare
modules.
Hardware command for either on- or off-board boot
code selection.
Simplified hardware-centric architecture.
Switching between critical boot sections is done
by hardware discretes, eliminating the EEPROM as
a single point failure. Common mode EEPROM
failure modes do remain.
Boot Code
Boot Code
Boot Code
EEPROM Module 1
EEPROM Module 2
EEPROM Module 3
15LOLA Memory
- LOLA is the Lunar Orbiter Laser Altimeter which
will fly on the Lunar Reconnaissance Orbiter.
16LOLA Memory Breakdown
128 kbytes
128 kbytes
Redundant
Redundant
Redundant
Redundant
- Notes
- Each block 16 kbytes
- Red blocks are redundant, access controlled by a
discrete command bit - BAE Rad-hard SRAM
- Continuously read SRAM by data drip in
telemetry - Aeroflex PROM
- Hitachi commercial EEPROM
- All memories readable by DMA.
- EEPROM and SRAM writable by DMA.
Redundant
Redundant
Redundant
Redundant
Spare
Redundant
Redundant
Spare
32 kbytes
Data
Margin
Code
Boot
EEPROM
SRAM
PROM
17Boot Control and Memory Architectures
- Multiple Sources for Initial Memory Load
- Each page is 32 kbytes
- The upper 32 kbytes of the address space is
loaded with an illegal instruction to force a
trap if the program loses control and executes
undefined memory locations. - One page of PROM (32k x 8 device)
- Four pages of EEPROM (128k x 8 device)
- Can load into one of two pages of SRAM (128k x 8
device) - Source of boot information comes from a
programmable register.
18LOLA Memory Philosophy
- Ground computer can peek and poke memory
locations. - DMA model for the instrument
- Other than setting/reading discrete bits.
- Single address field for entire instrument
- Give the ground computers maximum control of
instrument for push button control. - Science telemetry double buffered.
- Transparent to the spacecraft, they retrieve
packets from the same address each 1 s major
frame period - DMA can access telemetry in raw mode, directly
reading each byte of telemetry.
19LOLA Memory Philosophy
--------------------------------------------------
------------------------- -- -- LOLA MEMORY
ORGANIZATION -- -- The memory in the LOLA
instrument will be organized as an array of --
bytes with a 24-bit address field. Memory is to
be accessed through -- MIL-STD-1553B interface
and employ direct memory access via the --
"RodChip." The uppermost nibble of this 24-bit
address will denote -- the memory to be accessed
as in the table below. -- -- 0 PROM --
1 EEPROM -- 2 SRAM -- 3 -- 4 CT
(exclusive of 0-3 above) and RMU -- 5. --
6 TELEMETRY -- -- Address 0-5 are for physical
memory and are provided for engineering
use. -- ------------------------------------------
---------------------------------
20LOLA EEPROM Protection
- Discrete bit in MIL-STD-1553B output register
must be set - Logically ANDed with write signal to inhibit
writes - Power-on-reset signal
- Releases late (power up)
- Applied early (power down)
- Controls devices reset pin
- Disables device and prevents inadvertent writes
during power transitions.
21LOLA EEPROM Protection
- All write cycles use the sequence to unlock/lock
the non-volatile memory device - Software Data Protection continually enabled
and never disabled - This sequence is a preamble to a write cycle
- µP/software cannot access EEPROM
- All EEPROM access is done by hardware
- Boot controller (reads)
- DMA controller (reads and writes)
22KEY EEPROM CHARACTERISTICS
Keys
tBLC Byte Load Cycle min 1.0 µs, max 30
µs. Key Fact MIL-STD-1553B word rate 20 µs per
word.
23LOLA EEPROM Protection
- EEPROM Keys for enabling writing
- Are not stored on board in any form
- Part of the MIL-STD-1553B RECEIVE command
- Key (data part) is uploaded from ground in 3
byte preamble and presented via DMA controller. - Are discarded immediately after use
- Writing may only be done by DMA
- DMA supplies Address and Data
- Except for Address during preamble.
24LOLA Flight Software
25LOLA Flight Software Overview
- Produce 3 sets of outputs every 35.7 ms.
- Very small and focused.
- Interrupt driven software
- but only one task runs at a time, making the
software deterministic. - Role of software an interrupt calls a subroutine
which executes and then stops (executes HLT
instruction) - Provision made for software telemetry for each
minor frame. - Software is loaded by DMA, no need to configure
itself. - DMA is completely independent of processor, which
is held in reset during DMA operations.
26Key Flight Software Characteristics
- Size of Flight Code - 12,450 bytes
- Size of Tables - 3,472 bytes
- Size of Memory for Data - 3,592 bytes
- Execution time
- nominal 22 ms
- last shot in major frame 32 ms
- Note minor frame rate of 28 Hz ? 35.7 ms major
frame rate is 1 Hz.
27Algorithm Engine Timer
The signal that is input to the CT from the RMU
that triggers the algorithm engine is called
RUPT, which is a 200 ns pulse asserted every 28
Hz (35.7 ms). TELEMETRY One byte per
minor frame (35.7 ms) is to be put into the
telemetry This is the Algorithm Engine
Timer 00 Software did not run properly and
terminated early. FF Software did not complete
on time (cycle slip) Other values represent time
of execution for flight software for one minor
cycle. The Algorithm Engine Timer is implemented
by a counter with the following properties
Effectively an 8-bit saturating counter LSB
200 microseconds Set to "0000000" by RUPT
Stopped by an 80K85 I/O write cycle to a
particular address when the software is done
processing a minor frame.
28Thank you.
29References
- Space Vehicle Design Criteria, (Guidance and
Control) Spaceborne Digital Computer Systems,
NASA SP-8070, March 1971, National Aeronautics
and Space Administration - The Apollo Guidance Computer, Ramon L. Alonso
and Albert L. Hopkins, R-416, August, 1963. - General Design Characteristics of the Apollo
Guidance Computer, Eldon C. Hall, R-410, May
1963. - Single Event Functional Interrupt (SEFI)
Sensitivity in EEPROMs, R. Koga, 1998 MAPLD
International Conference, September, 1998,
Greenbelt, MD. - Single-Event Upset Test Results for the Xilinx
R1701L PROM, S. M. Guertin, JPL Report, August
24, 2000 - SEE and TID Extension Testing of the Xilinx
XQR18V04 4Mbit Radiation Hardened Configuration
PROM, Carl Carmichael, Joe Fabula, Candice Yui,
and Gary Swift, 2002 MAPLD International
Conference, September 10-12, 2002, Laurel, MD. - "Permanent Single Event Functional Interrupts
(SEFIs) in 128- and 256-megabit Synchronous
Dynamic Random Access Memories (SDRAMs)," R.
Koga, P. Yu, K.B. Crawford, S.H. Crain, and V.T.
Tran, 2001 IEEE Radiation Effects Data Workshop.