Title: Causal Models, Learning Algorithms and their Application to Performance Modeling
1Causal Models, Learning Algorithms and their
Application to Performance Modeling
- Jan Lemeire
- Parallel Systems lab
- November 15th 2006
2Overview
- I. Causal Models
- II. Learning Algorithms
- III. Performance Modeling
- IV. Extensions
3I. Multivariate Analysis
Probabilistic model of joint distribution? Relatio
nal information? A priori unknown relations
4A. Representation of distributions
- Factorization
- Reduction of factorization complexity
- Bayesian Network
Ordering 1
Ordering 2
5B. Representation of Independencies
- Conditional independence
- Qualitative property P(rainquality of
speech)P(rain)? - Markov condition in graph
- Variable becomes independent from all its
non-descendants by conditioning on its direct
parents. - graphical d-separation criterion
6Faithfulness
- Independence-map
- All independencies in the Bayesian network
appear in the distribution
- Faithfulness
- Joint Distribution ? Directed Acyclic Graph
- Conditional independencies ? d-separation
- Theorem
- if a faithful graph exists, it is the minimal
factorization.
7C. Representation of Causal Mechanisms
Model of the underlying physical mechanisms
- Definition through interventions
- causal model Conditional Probability
Distributions - Causal Markov Condition Bayesian network
8Reductionism
- Causal modeling reductionism
- Canonical representation unique, minimal,
independent - Building block P(Xiparents(Xi))
- Whole theory is based on this modularity
- Intervention
- change of block
9Ultimate motivation for causality
- If causal mechanisms are unrelated
- model is faithful
- Model canonical representation able to explain
all qualitative properties (independencies) - close to reality
10II. Learning Algorithms
- Two types
- Constraint-based
- based on the independencies
- Scoring-based
- searches set of all models, give a score of how
good they represent distribution
11Step 1 Adjacency search
- Property
- adjacent nodes do not become independent
- Algorithm
- start with full-connected graph
- check for marginal independencies
- check for conditional independencies
12Step 2 Orientation
- Property
- V-structure can be recognized
- Algorithm
- look for v-structures
- derived rules
13Assumptions
- General statistical assumptions
- No selection bias
- Random sample
- Sufficient data for correctness of statistical
tests - Underlying network is faithful
- Causal sufficiency
- No unknown common causes
14Criticism
- Definition causality?
- About predicting the effect of changes to the
system - Faithfulness assumption
- Eg. accidental cancellation
- Causal Markov Condition
- All relations are causal
- Learning algorithms are not robust
- Statistical tests make mistakes
15Part III Performance Analysis
- High-Performance computing
parallel system
1 processor
- Reasons of bad performance?
16PhD??
- Causal modeling (cf. COMO lab, VUB)
- Representation form
- Close to reality
- Learning algorithms
- TETRAD tool (open-source, java)
17Performance Models
- Aim performance analysis
- Support software developer
- High-performance applications
- Expected properties
- offer insight into causes performance
degradation - prediction
- estimate effect of optimizations
- reusable submodels
- separate application and system-dependency
- reason under uncertainty
- causal models
18Integrated in statistical analysis
- Statistical characteristics
- Regression analysis
- Probability table compression
- Outlier detection
- Iterative process
- 1. Perform additional experiments
- 2. Extract additional characteristics
- 3. Indicate exceptions
- 4. Analyze the divergences of the data points
with the current hypotheses
19A. Model construction
- Model of computation
- time of LU decom-
- position algorithm
- elementsize (redundant variable) is sufficient
for influence datatype -gt cache misses - regression analysis on submodels Xf(parents)
- analysis of parameters
20B. Detection of unexpected dependencies
- Point-to-point communication performance
- background communication
21C. Finding explanations for outliers
Exceptional data in communication performance
measurements
Probability table compression
gt derived variable Interesting features
22IV. Complexity of Performance Data
- Mixture discrete and continuous variables
- Mutual Information Kernel Density Estimation
- Non-linear relations
- Mutual Information Kernel Density Estimation
- Deterministic relations
- Augmented models Complexity criterion
- Context variables
- Work in progress
- Context-specific independencies
- Work in progress
23A. Information-theoretic Dependency
- Entropy of random variable X
- Discretized entropy for continuous variable
24B. Kernel Density Estimation
- See applets
-
- Trade-off maximal entropy ltgt typicalness
- Conclusions
- Limited number data points needed
- Discretization of continuous data justified
- Form-free dependency measure
25C. Deterministic relations
- Y becomes independent from Z conditioned on X
- violation of the intersection condition (Pearl
88) - Not faithfully describable
Solution augmented causal model - add
regularity to model - adapt inference algorithms
26The Complexity Criterion
- X Y contain equivalent information about Z
27Augmented causal model
- Restrict conditional independencies
- Generalize d-separation
- Reestablish faithfulness
- Consistent models under
- Complexity Increase assumption
28Theory works!
Deterministic
B
Probabilistic
A
29Conclusions
- Benefit of the integration of statistical
techniques - Causal modeling is a challenge
- wants to know the inner from the outer
- More information
- http//parallel.vub.ac.be
- http//parallel.vub.ac.be/jan