System Models for Problem Determination - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

System Models for Problem Determination

Description:

State of the art in systems monitoring: manual; tools' help ... Monitor a minimal vector of metrics. Response time & errors in user-accessible servlets ... – PowerPoint PPT presentation

Number of Views:15

Avg rating:3.0/5.0

Slides: 26

Provided by: dominoRes

Category:

more less

Transcript and Presenter's Notes

Title: System Models for Problem Determination

1
System Models for Problem Determination

Michael Jiang, Mohammad A. Munawar, Kevin Quan,
Paul A.S. Ward

Department of Electrical and Computer
Engineering University of Waterloo
April 26, 2006
2
Outline

Introduction
Context
Background
Related Work
The Problem
Challenges
Contributions
Prototype Adaptive Monitoring Tool
What Comes Next?
Summary

3
Introduction

Enterprise information systems have high
reliability requirements
These systems are getting larger and more complex
Defects cannot be completely eliminated failures
can be very costly
Managing these systems is hard
Componentization and availability of information
More human resources needed, but the ones with
required abilities are in short supply
Many duties of systems administrators depend on
systems monitoring
State of the art in systems monitoring manual
tools help
Slow response impact on availability
Error-prone
Monitor everything all the time?
Impractical and unnecessary
Overhead involved measurement, storage,
communication, computation

4
Introduction

Intelligent Monitoring
Automatically adapt monitoring to match
prevailing condition
Motivation
Reduce human involvement in monitoring
Collect only that which is needed
Reduce impact on performance and other overheads
Importance
Software systems will inevitably get larger and
more complex self-managed systems requires
intelligent monitoring
Benefits
Human resources are free for more important,
more-complex tasks
Less pertinent information lost
System can perform as close as possible to its
unmonitored version

5
Background

Component-based software systems
Made of re-usable/pluggable parts well-defined
boundaries for our purposes, components
internal implementation not known
Examples COM/DCOM, CORBA, J2EE, .Net

6
Background

Software systems based on Java 2 Platform,
Enterprise Edition (J2EE)

7
Background

Monitoring a J2EE-based system

8
Related Work

Monitoring large-scale systems
Academic research NetLogger, Astrolabe, Ganglia
Goals Efficiency, scalability, robustness,
flexibility
Techniques used binary formats, gossip,
multi-cast, mobile-code
Solutions from the Industry Tivoli, OpenView,
EBay SuperCall
Intelligent summaries and visual support,
end-to-end monitoring, storage and analysis of
data, threshold-based triggers
Modeling
Black-box know nothing about internals
e.g., time-series modeling, statistical learning,
machine learning
Non-black-box know something about internals
Application emulators, queuing models, Petri nets
black-box

9
Related Work

Applying modeling for problem determination in
enterprise systems
Using access logs
Based on page hit counts and errors
Learn and use normal access patterns with
chi-square tests and naïve Bayes models
Using traces (request-paths)
Model application execution flow using PCFGs
Model component interactions and test using the
chi-square test
Diagnosis using clustering and decision trees
Using aggregate metrics
Using application structure and use averages
under normal conditions as anomaly-thresholds
Correlating low-level metrics with SLO violations
using Bayesian networks

10
Related Work

Adaptive monitoring
Extensible OS
Database table statistics
JFuild dynamic instrumentation of call graphs
in Java programs
Moss adaptive performance monitoring
Measurement overhead ignored
No attempt at characterizing normal behaviour
need to set thresholds
Active probing
Incrementally find the smallest set of probes
(tests) that can determine the systems state

11
Related Work
What is monitored?
How is it monitored?
High-level
Bayesian models and information theory
Diagnosis
Decision trees
Data clustering
Visualization tools
Manual diagnosis
Monitoring
Bayesian models (naïve Bayes)
overhead
Statistical learning (e.g., chi-square)
Anomaly Detection
Mean and variance
Manually-set thresholds
Low-level
12
The Problem

Need to adapt monitoring
Need a framework to do so
How do we model system for a generic framework?
What is a good model for problem determination?

13
Challenges

Characterizing normal behaviour
What?
Monitoring information
System itself (changes)
When?
Initially
Incremental updating
How?
What techniques to use
Leverage work already done
Seek new applications of available techniques
Emergent behaviour

14
Challenges

Minimum Monitoring
Most useful smallest cost information
Adapting Monitoring
What warrants collection of more data?
Detecting anomalies in the collected data
Fusing anomaly-related information
When should we stop collecting more data?

15
Challenges

Basis for selecting information sources
Information gain
Reference characterization
Overhead involved
Prior knowledge

16
Challenges

Algorithms for adaptation what drives adaptation
of monitoring
Dependency structure top-down approach
Explicit
Implicit (Inferred)
No dependency structure

17
Contributions

Design and implementation of an adaptive
monitoring framework
Study of how to perform adaptation of monitoring
Basis, algorithms, system modeling and anomaly
detection
Research new ways of applying existing modeling
techniques
Validation by applying it to a J2EE-based
software system
Demonstration of the generality of the framework
Whats new?
Impact-aware monitoring
Integrated monitoring
More comprehensive performance is only one,
albeit important aspect
Most previous work has considered monitoring
using a fixed set of information sources

18
Prototype

Applied the framework to a J2EE-based testbed
Testbed IBM WebSphere App. Server, DB2 UDB,
custom-workload generators
Use benchmarking J2EE applications such as TPC-W,
SPECJApp Server 2004, Trade, RUBiS
Use synthetic workload
Simulate anomalies
Software defects
Delay in servlets and EJBs
Exceptions in same
External faults
Drop connections/data between WAS and DB2
CPU hogs

19
Prototype

Correlation-based approach
Modeling single variables is difficult
Drift
No black box connection to problems
Why not consider pairs?
Takes out some non-linearity
Allows black-box problem determination
Initially
Collect all metrics
Find correlated pairs from known subsystems-level
relationships

20
Prototype

Example of correlated a metric pair

21
Prototype

Operation
Monitor a minimal vector of metrics
Response time errors in user-accessible
servlets
When an anomaly is detected, increase monitoring
level so as to have pairs to analyze
Stop when source of problem found or monitoring
level reaches maximum. Increase number of pairs,
otherwise.
Problems
Error propagation
Determination of thresholds
Knowledge decay
Anomaly corroboration

22
Prototype