Title: Solid State Storage (SSS) System Error Recovery
1Solid State Storage (SSS) System Error Recovery
For NASA Langley Research Center
2Background
- NASA Langley Research Center is building a system
to record streaming video and other data when the
Space Shuttle docks with the Space Station. - This data will be used to develop algorithms that
will enable the next generation of the space
station to perform autonomous docking. - Due to the harsh environment in space the data
will be stored in a RAID array of solid state
SATA drives with the capability of recovering
data even if two drives fail. - This Solid State Storage (SSS) system is being
developed at VCU. - We will look at the that portion of the system
that deals with drive error recovery.
3Proposed SSS system Overview
To data recorder
4SSS Data Recovery
- The Solid State Storage (SSS) system will consist
of six solid state data drives. The discussion
will be directed to this specific configuration. - The data will be sector striped across these six
drives. - A modified RAID 6 system capable of recovering
data from two corrupted sectors in a stripe is
proposed. - Optimized for long single-thread transfers that
are multiples of the entire stripe.
5RAID 5
- To illustrate concepts and implications consider
a RAID 5 implementation. - RAID 5 uses striped array with rotating parity.
- Optimized for short, multithreaded transfers.
- Capable of recovering from a single drive
failure.
6RAID 5 system consisting of three data drives and
rotating parity. Four stripes for sectors A, B,
C, and D are shown.
7Rotating Parity
- Why rotating parity?
- The following steps are necessary to update a
single data sector in a stripe. - The old data sector and the parity sector for the
stripe must be read. - Compute the new parity using the new data sector,
old data sector, and old parity. - Write new data sector and new parity sector.
- Thus, to write to a data sector both the data
sector and parity sector must be read and
written. - Since there are many data drives a fixed parity
drive would accessed much more frequently than a
data drive. - This excessive access of a single parity drive is
avoid by rotating parity across all drives.
8Rotating parity not needed in SSS
- The SSS is required to store long data streams.
Not random sectors. - Make the size of these streams a multiple of the
stripe size. - An entire stripe with parity will be buffered.
- The entire stripe with party will be
simultaneously written to all drives. - It is not necessary to first read the drives.
- The SSS will always read and write entire
stripes. - Easier to implement.
- Faster access.
9Parity
- Parity encoding is given by
- Where Di represent a data byte in a sector on
drive i. - If both sides of the above equation are exclusive
ored with P, then - D5 for example can be recovered by
10Parity problem
- Using parity it is easy to recover data on a
single drive if we know that drive is bad. - We may have data corruption on a drive without
without the entire drive failing. - Undetectable based on parity alone.
- Propose to include a 32-bit CRC in sector.
- Simple to implement.
- Less than 1 overhead.
- In RAID 6 will ensure as long as a stripe has no
more than two bad sectors the data in that stripe
can be recovered.
11Key Conclusions
- Write data as entire stripes.
- Used fixed parity drive.
- Include sector CRC.
12Raid 6 (modified)
- Use two fixed parity drives (P and Q).
- Data can be recovered if two sectors in a stripe
are corrupted. - P parity is the same as RAID 5 (simple XOR).
- Easy to encode and easy to recover data.
- Q parity is more complicated.
13Q parity encoding
The Q parity is a Reed-Solomon code given by
Where ? is Galois Field (GF) multiplication and
gi is a constant. For i lt 8 it turns out that gi
2i. For larger i, it not as simple. For example
g8 29. But for the SSS application Q simplifies
to
The problem is how to compute the GF
multiplication.
14GF multiplication
- In ordinary arithmetic multiplication can be
accomplished summing the logs and taking the
inverse log. - GF multiplication is typically accomplished using
lookup tables to find the GF log and inverse log.
The addition in modulo 255. - See Xilinx application note XAPP731 Hardware
Accelerator for RADD 6 Parity Generation / Data
Recovery Controller.
15(No Transcript)
16(No Transcript)
17Examples
18Examples
Note A?B 0 if A 0 or B 0. This is a
special case and cannot be computed using
logs. It is also worth noting that A?1 A. This
does follow from using logs since logGF(0x01) 0.
19Elaboration on Galois Field Mathematics
- Évariste Galois (1832)
- Established many of the ideas of group theory.
- Left only sixty pages of mathematical writings.
- Mortally wounded in a duel at age 20.
- Most of his major centrifugations stem from a
letter written the night before the duel. - His work has had great impact.
- Provides powerful tool for investigating
fundamental mathematical problems. - Roots of algebraic equations.
- GF theory provides simple proof that an angle
cannot be trisected using only compass and
unmarked straightedge. - This had baffled mathematicians since the time of
Euclid. - Recently applied to computer design and
data-communication systems.
20Galois Field Mathematics
- A Galois Field is a algebraic structure ltG,?,?gt
where G is a set consisting of 2n elements, ? is
addition mod 2 (bit wise XOR) and ? is GF
multiplication. Math similar to ordinary
arithmetic. - ? and ? is commutative and associative.
- Distributive such that
- We are only concerned with GF(28) where the set G
has 256 elements. We will use a hex byte to
specify the elements. - Then A ? A 0x00, A ? 0x00 0x00, A ? 0x01 A
21GF(28)
- The GF log look up tables are generates based on
what in GF theory is called a primitive
polynomial. Primitive polynomials have certain
properties that lead to the error correction
techniques. - GF(28) is generated using the primitive
polynomial - This is the same primitive polynomials use to
determine the feed back path for an 8-bit maximum
count linear feedback shift registers (LFBSRs). - The LFBSR can be use to perform GF
multiplication.
22The 8 bit LFBSR
Q0 Q1 Q2
Q3 Q4 Q5 Q6 Q7
Or reversing order so that the most significant
bit is at the left
A shift has the same effect as ? 2. In VHDL Q lt
Q(6) Q(5) Q(4) (Q(3) XOR Q(7))
(Q(2) XOR Q(7)) (Q(1) XOR Q(7)) Q(0) Q(7)
23(No Transcript)
24(No Transcript)
25Galois Field Division