Title: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ANALYSIS OF MODELS
1VISUALIZATION TECHNIQUES UTILIZING THE
SENSITIVITY ANALYSIS OF MODELS
Ivo Kondapaneni, Pavel Kordík, Pavel
Slavík Department of Computer Science and
Engineering, Faculty of Eletrical
Engineering, Czech Technical University in
Prague, Czech Republic Presenting author Pavel
Kordík (kordikp_at_fel.cvut.cz)
- Ivo Kondapaneni, Pavel Kordík, Pavel Slavík
- Department of Computer Science and Engineering,
Faculty of Eletrical Engineering, - Czech Technical University in Prague, Czech
Republic - Presenting author Pavel Kordík
(kordikp_at_fel.cvut.cz)
2Overview
- Motivation
- Data mining models
- Visualization based on sensitivity analysis
- Classification problems
- Regression problems
- Definition of interesting plots
- Genetic search for 2D and 3D plots
3Motivation
- Data mining extracting new, potentially useful
information from data - DM Models are automatically generated
- Are models always credible?
- Are models comprehensible?
- How to extract information from models?
Visualization
4Data mining models
- Often black-box models generated from data
- E.g. Neural networks
- What is inside?
Input variables
Data mining black box model
Output variable (s)
5Inductive model
- Estimates output from inputs
- Generated automatically
- Evolved by niching GA
- Grows from minimal form
- Contains hybrid units
- Several training methods
- Ensemble of models
6Example Housing data
Input variables
CRIM ZN INDUS NOX RM AGE DIS
RAD TAX PTRATIO B LSTA
Weighted distances to five Boston employment
centres
Per capita crime rate by town
Proportion of owner-occupied units built prior to
1940
Median value of owner-occupied homes in 1000's
MEDV
Output variable
7Housing data records
Input variables
CRIM ZN INDUS NOX RM AGE DIS
RAD TAX PTRATIO B LSTA
24 0.00632 18 2.31 53.8 6.575
65.2 4.09 1 296 15.3
396.9
21.6 0.02731 0 7.07 46.9 6.421
78.9 4.9671 2 242 17.8 396.9
4.98
9.14
MEDV
Output variable
8Housing data inductive model
Input variables
CRIM ZN INDUS NOX RM AGE DIS
RAD TAX PTRATIO B LSTA
Niching genetic algorithm evolves units in first
layer
sigmoid
sigmoid
Error 0.13
Error 0.21
MEDV1/(1-exp(-5.724CRIM 1.126))
MEDV1/(1-exp(-5.861AGE 2.111))
MEDV
Output variable
9Housing data inductive model
Input variables
CRIM ZN INDUS NOX RM AGE DIS
RAD TAX PTRATIO B LSTA
sigmoid
sigmoid
linear
sigmoid
Error 0.13
Error 0.21
Error 0.26
Error 0.24
polyno mial
MEDV0.747(1/(1-exp(-5.724CRIM 1.126)))
0.582(1/(1-exp(-5.861AGE 2.111)))20.016
Niching genetic algorithm evolves units in second
layer
Error 0.10
MEDV
Output variable
10Housing data inductive model
Input variables
CRIM ZN INDUS NOX RM AGE DIS
RAD TAX PTRATIO B LSTA
sigmoid
sigmoid
sigmoid
linear
polyno mial
polyno mial
linear
Constructed model has very low validation error!
expo nential
Error 0.08
MEDV
Output variable
11Housing data inductive model
Input variables
CRIM ZN INDUS NOX RM AGE DIS
RAD TAX PTRATIO B LSTA
MEDV(exp((0.038 3.451(1/(1-exp(-5.724CRIM
1.126)))(1/(1-exp(2.413DIS-2.581)))(1/(1-exp(2.
413DIS-2.581)))0.429(1/(1-exp(-5.861AGE
2.111)))0.024(1/(1-exp(2.413DIS-2.581)))0.036
0.0380.350(1/(1-exp(-3.613RAD-0.088)))
0.999( 0.747(1/(1-exp(-5.724CRIM
1.126)))0.582(1/(1-exp(-5.861AGE
2.111)))(1/(1-exp(-5.861AGE 2.111)))0.016)-0.0
46(1/(1-exp(-5.724CRIM 1.126)))-0.079
0.002INDUS-0.001LSTA 0.150)0.860)13.072)-14.8
74 Math equation is not comprehensible any more
we have to threat it as a black box model!
S
S
S
L
P
P
L
E
Error 0.08
MEDV
Output variable
12Visualization based on sensitivity analysis
GAME
GAME
13Sensitivity analysis of inductive model of MEDV
House no. 189
House no. 164
What will happen with the value of house when
criminality in the area decreases/increases?
14Ensemble of inductive models
- Random initialization
- Developing on the same
- training set
- Training affect just well
- defined areas of input space
- Each model - unique architecture,
- similar complexity
- similar transfer functions
- Similar behavior for well defined areas
- Different behavior under-defined areas
yk
yk-1
yk1
GAME
i x2
min
max
GAME
GAME
15Credibility of models Artificial data set
Advantages
- No need of the training data set,
- Modeling method success considered,
- Inputs importance considered.
Credibility the criterion is a dispersion of
models responses.
16Example Models of hot water consumption
17Cold water consumption, increasing humidity
18When a plot is interesting for us?
xi
xisize
xistart
19Definition of interesting plot
- Minimal volume of the envelope p min
- Maximal sensitivity of the output to the change
of xi input variable ysize max - Maximal size of the area xisize max
20Multiobjective optimization
- Interestingness
- Unknown variables
- x1,x2,..., xi-1,xi1,xn xistart, xisize
- We will use Niching genetic algorithm
Chromosome x1 x2 ... xi-1 xi1 xn
xistart xisize
21(No Transcript)
22Niching GA locates also local optima
- Three subpopulations (niches) of individuals
survived
23Automated retrieval of plots showing interesting
behavior
Genetic Algorithm
Genetic algorithm with special fitness function
is used to adjust all other inputs (dimensions)