Failure Prediction in Hardware Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Failure Prediction in Hardware Systems

Description:

Potential Failure Window An example is labeled as positive or ... Feature Vectors are created from the data in a sensor window. ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 17
Provided by: neilal
Learn more at: https://cseweb.ucsd.edu
Category:

less

Transcript and Presenter's Notes

Title: Failure Prediction in Hardware Systems


1
Failure Prediction in Hardware Systems
  • Douglas Turnbull
  • Neil Alldrin
  • CSE 221 Operating System Final Project
  • Fall 2003

1
2
Background
  • Using sensors from a high-end server, can we
    predict system board failures.
  • If we can predict failure, we can take
    preventative action to avoid costly failures.
  • System Specifications
  • 18 Hot Swappable System Boards
  • 4 Processors per Board
  • 18 Sensors per Board
  • Measures various temperatures and voltages

2
3
Sensor Logs
  • Each board has an associated Sensor Log
  • About every minute, the sensors are sampled and
    the
  • measurements are stored in the sensor logs.
  • System board failures are also record in the
    sensor log.

We need to extract a data set from these logs to
represent failure events (positive examples) and
normal operating conditions (negative examples).
We accomplish this using a Windowing
Abstraction.
3
4
Windowing Abstraction
Sensor Window Adjacent entries in the sensor
log that are used to predict failures Potential
Failure Window An example is labeled as
positive or negative if a failure occurs in the
potential failure window.
4
5
Feature Vectors
Feature Vectors are created from the data in a
sensor window. There are two types of feature
vectors Raw Feature Vectors a vector all the
sensor measurement in a sensor window. Summary
Feature Vectors the mean, standard deviation,
range and slope for each of the sensors in a
sensor window.
5
6
Classification
A classifier assigns labels (positive or
negative) to novel feature vectors after it has
been trained using a set of feature vectors with
known labels. Many classifiers can be used, such
as SVMs, Bayesian mixture models, and neural
networks. We use a Radial Basis Function (RBF)
network, a special form or a neural network,
because it is computationally efficient.
6
7
Evaluation Predictions
We must consider two rates when evaluating our
prediction system. True Positive Rate (tpr) A
measure of our ability to correctly predict true
failures. tpr Correctly Predicted Failures /
Total Number of True Failures False Positive
Rate (fpr) A measure of the number of
mispredictions. fpr incorrectly
Predicted Failures / Total Number of Non-Failures

Ground Truth
Non-failure
Failure
True Positives False Positives
False Negatives True Negatives
Failure
Prediction
Non-failure
7
8
Preliminary Results
  • Observations
  • Summary feature vectors have lower false positive
    rates than Raw Feature Vectors.
  • 2. Window size does not seem to matter.
  • How can we improve these results?

8
9
Feature Subset Selection
We can further improve prediction accuracy (and
reduce computation) by reducing the number of
features used by our classifier. Feature are
selected automatically using Forward Stepwise
Selection.
9
10
Results
10
11
Best Results
We find the best prediction results with Summary
Feature Vectors using 2/3 of the summary
features 0.87 True Positive Rate (tpr) 0.10
False Positive Rate (fpr) Our data set assumes
that we are equally likely to find a failure as a
non-failure. When one considers that there are
very few failures in most hardware system, even a
low false positive rate will produce many false
positives.
11
12
Future Work
  • Implement other classifiers SVMS, Bayesian
    Mixture Models
  • Develop a larger data set with more examples of
    failures
  • Apply framework to other hardware system such as
    personal computers
  • Modify operating system to take advantage of
    failure prediction
  • Migrate processes to other system boards
  • Run diagnostic tests
  • Turn off suspect system boards
  • Backup data

12
13
The End
Questions?
13
14
RBF Network
14
15
Value of a prediction system
The value of a prediction system can be
summarized as, Value (benefit of predicted
failure) tpr (cost of mispredicted failure)
fpr
15
16
Template
16
Write a Comment
User Comments (0)
About PowerShow.com