Failure Prediction in Hardware Systems - PowerPoint PPT Presentation

About This Presentation

Title:

Failure Prediction in Hardware Systems

Description:

Potential Failure Window An example is labeled as positive or ... Feature Vectors are created from the data in a sensor window. ... – PowerPoint PPT presentation

Number of Views:21

Avg rating:3.0/5.0

Slides: 17

Provided by: neilal

Learn more at: https://cseweb.ucsd.edu

Category:

more less

Transcript and Presenter's Notes

Title: Failure Prediction in Hardware Systems

1
Failure Prediction in Hardware Systems

Douglas Turnbull
Neil Alldrin
CSE 221 Operating System Final Project
Fall 2003

1
2
Background

Using sensors from a high-end server, can we
predict system board failures.

If we can predict failure, we can take
preventative action to avoid costly failures.
System Specifications
18 Hot Swappable System Boards
4 Processors per Board
18 Sensors per Board
Measures various temperatures and voltages

2
3
Sensor Logs

Each board has an associated Sensor Log
About every minute, the sensors are sampled and
the
measurements are stored in the sensor logs.
System board failures are also record in the
sensor log.

We need to extract a data set from these logs to
represent failure events (positive examples) and
normal operating conditions (negative examples).
We accomplish this using a Windowing
Abstraction.
3
4
Windowing Abstraction
Sensor Window Adjacent entries in the sensor
log that are used to predict failures Potential
Failure Window An example is labeled as
positive or negative if a failure occurs in the
potential failure window.
4
5
Feature Vectors
Feature Vectors are created from the data in a
sensor window. There are two types of feature
vectors Raw Feature Vectors a vector all the
sensor measurement in a sensor window. Summary
Feature Vectors the mean, standard deviation,
range and slope for each of the sensors in a
sensor window.
5
6
Classification
A classifier assigns labels (positive or
negative) to novel feature vectors after it has
been trained using a set of feature vectors with
known labels. Many classifiers can be used, such
as SVMs, Bayesian mixture models, and neural
networks. We use a Radial Basis Function (RBF)
network, a special form or a neural network,
because it is computationally efficient.
6
7
Evaluation Predictions
We must consider two rates when evaluating our
prediction system. True Positive Rate (tpr) A
measure of our ability to correctly predict true
failures. tpr Correctly Predicted Failures /
Total Number of True Failures False Positive
Rate (fpr) A measure of the number of
mispredictions. fpr incorrectly
Predicted Failures / Total Number of Non-Failures

Ground Truth
Non-failure
Failure
True Positives False Positives
False Negatives True Negatives
Failure
Prediction
Non-failure
7
8
Preliminary Results

Observations
Summary feature vectors have lower false positive
rates than Raw Feature Vectors.
2. Window size does not seem to matter.
How can we improve these results?

8
9
Feature Subset Selection
We can further improve prediction accuracy (and
reduce computation) by reducing the number of
features used by our classifier. Feature are
selected automatically using Forward Stepwise
Selection.
9
10
Results
10
11
Best Results
We find the best prediction results with Summary
Feature Vectors using 2/3 of the summary
features 0.87 True Positive Rate (tpr) 0.10
False Positive Rate (fpr) Our data set assumes
that we are equally likely to find a failure as a
non-failure. When one considers that there are
very few failures in most hardware system, even a
low false positive rate will produce many false
positives.
11
12
Future Work

Implement other classifiers SVMS, Bayesian
Mixture Models
Develop a larger data set with more examples of
failures
Apply framework to other hardware system such as
personal computers
Modify operating system to take advantage of
failure prediction
Migrate processes to other system boards
Run diagnostic tests
Turn off suspect system boards
Backup data

12
13
The End
Questions?
13
14
RBF Network
14
15
Value of a prediction system
The value of a prediction system can be
summarized as, Value (benefit of predicted
failure) tpr (cost of mispredicted failure)
fpr
15
16
Template
16

Write a Comment

User Comments (0)