Classification%20by%20search%20partition%20analysis:%20an%20alternative%20to%20tree%20methods%20in%20medical%20problems. - PowerPoint PPT Presentation

About This Presentation

Title:

Classification%20by%20search%20partition%20analysis:%20an%20alternative%20to%20tree%20methods%20in%20medical%20problems.

Description:

Roger Marshall School of Population Health University of Auckland New Zealand rj.marshall_at_auckland.ac.nz Classification by search partition analysis: an alternative ... – PowerPoint PPT presentation

Number of Views:147

Avg rating:3.0/5.0

Slides: 44

Provided by: School250

Category:

more less

Transcript and Presenter's Notes

Title: Classification%20by%20search%20partition%20analysis:%20an%20alternative%20to%20tree%20methods%20in%20medical%20problems.

1
Classification by search partition analysis an
alternative to tree methods in medical problems.

Roger Marshall
School of Population Health
University of Auckland
New Zealand
rj.marshall_at_auckland.ac.nz

2
Why classification? Uses to develop
diagnostic/prognostic decision and
classification rules discover homogeneous
subgroups e.g at risk in epidemiology, who
respond to treatment
3
Methods Regression methods (including neural
networks) model based Trees no model (unless
hierarchical tree considered as such) Empirical
density methodsSmoothers of parameter
space Support Vector machines find margins of
maximum separation Boolean classifiers
(including SPAN, rough sets) based on logical
structures
4
Attractions of trees (in medicine) regression
models perceived as unrealistic regression
models based on arithmetic scores trees
demarcate individuals with clusters
of characteristics closer affinity to clinical
reasoning
5
Feinstein (circa 1971) .clinicians dont
think in terms of weighted averages of clinical
variables they think about demarcated
subgroups of people who possess combinations
of clinical attributes.. Suggested trees for
prognostic stratification.
6
AO
AÇO
AOS
AOH
AOHG
7
High risk if (H C) ( H P U) ( H P BV) .
8
Trees History 1960-70s AID (Sonquist and
Morgan), CHAID, 1980s CART, C4.5, Machine
Learning 1990s-- New ideas Bagging, boosting,
Bayes, random forests Software CART, S,
KnowledgeSeeker, C4.5 (C5?), SPSS AnswerTree,
RECAMP, QUEST, CAL5, SAS macros, SAS Enterprise
Miner, R rpart
9
Measure of best split Goodness of split e.g.
statistical test based measures chi-square
(categorical y), t, F-statistics (y continous),
log-rank for survival trees eg. decrease in
impurity by splitting (CART) eg. Likelihood
(deviance) statistics (Ripley, S).
10
CART ideas on impurity Binary outcome classes
D1 and D2 . Define an impurity measure i(t) of
node t. pproportion of D1 s at node t e.g
Gini diversity impurity is i(t) p(1-p) e.g or
Entropy measure i(t) -p log p (1-p)
log(1-p) Change in impurity by splitting of into
left L and right R nodes node t is
i(t)-pLi(tL)-pRi(tR)
11
e.g. Entropy deems a node with p0.25 more
impure than Gini
12
Right size trees 1. Stopping rules (subgroup
sample size, P-values) (pre-pruning) 2. Grow
big tree and prune (post pruning) e.g MCCP
(minimum cost complexity pruning) Complexity c
terminal nodes Use cross-validation to
estimate prediction error. Many other methods
13
New methods/extensions Multivariate trees
(multivariate splits) Multivariate y Survival
trees (Segal, LeBlanc) Bagging/Boosting
(Brieman) Bayes trees (Buntine,
Chipman) Forests of Trees (Breiman)
14
Some Troubles with trees Net (main) effects
not evaluated Misleading decision
rules Rules hard to interpret Simple rules
probably missed Tree itself as a model
15
AO
AOS
AOH
AOHG
16
Misleading classification rules High risk of
diabetes (A O) Ç (A O S) Ç (AO H) Ç (AOH
G) which is same as (A O) Ç (O S) Ç (A H) Ç
(A G) i.e. A, O are redundant. e.g. Dont
need to be young (A) in conjunction with O, S
17
Simple rules may require complex
trees (replication problem) Eg. (A B) Ç (C
D) Aage under 18, Bblack, C cigarette smoker
and Ddrinks alcohol. Needs a tree with 7
nodes. Eg. ( A B C) Ç(D E F) Ç(G H K) needs a
tree with 80 terminal nodes!
18
Tree for (A B) Ç (C D)
AB
ACD
ABCD
19
Positive attributes Usually (in medicine at
least) an attribute can be considered in advance
to be either positively or negatively
associated with y e.g obese, sedentary,
hypertensive are positive for diabetes. e.g.
Smoking, old, high cholesterol positive for
ischaemia eg. Presence of an adverse gene
20
Regular classification rules Combinations of
positive attributes only to define high
risk Tree rules not usually regular (though
occasionally may reduce to a regular rule, as in
diabetes example).
21
High risk if (H C) ( H P U) ( H P BV) e.g
( H P U) not regularly hospitalized.
22
Tree model Is the hierarchical tree model
sensible? Probably not Even if it is does
process of subdivision estimate the best
tree? Maybe?
23
The considerations suggest Why not consider
non-hierarchical procedures? and Why not focus
on regular combinations directly? SPAN attempts
to
24
SPAN (Search Partition Analysis) Generates
regular decision rules of the form AK1
orand K2 orand Kq where Ki is the
conjunctiondisjunction of pi
attributes. Binary partitions of the predictor
space into A and A-. Non-hierarchical
25
Example SPAN rule for detecting malignant cells
(bcw data) from cell chracteristics
26
SPAN Carries out a search to find best possible
combinations of attributes Unless search is
somehow limited, becomes impossibly large!
e.g 22 -1 -
1 ways to form a binary partition for m
attributes e.g. 2147 million for m5
m
27

How SPAN limits extent of search
By restricting to a set of m attributes Tm
X1,Xm
Typically mlt15. These may be the m best
attributes
By not allowing mixed combinations of
attributes of those in Tm.
By restricting complexity of Boolean expressions
i.e pi and q parameters

28
Attribute set Tm If the set of m attributes
Tm X1,Xm consists of attributes
labelled positive, SPAN will generate only
regular partitions. Natural to consider the best
ranked attributes
29
Ranked plot of attributes GI-Cancer and tumour
markers
30
Extent of search for different parameters
Based on cominatoric formulae of lock and
keyalgorithm for generating partitions.
31
j1, TTm
search over T
Iterated search procedure Continue
until AjAj1
Best partition Aj attribute aj
Make TTm, aj jj1
Produces a sequence of new attributes with
increasing better discrimination (no proof of
this assertion!)
32
SPAN Rank plot a_1a_5 are partition attributes
on 5 iterations (hea data)
33
Complexity penalising To avoid overfitting
penalise complex Boolean expressions How to
measure complexity? - by number of subgroups
(minus 1).
3 subgroups in A, 2 in A- Complexity c 3 2 -1
4 Penalise measure (eg entropy) G by G-bc
34
Visualising subgroups
35
Extension to gt2 ordinal states. e.g Categories 0,
1, 2 Can find binary partition A2 of 0,1
v 2 also A1 of
0 v 1,2
Need to ensure A2 is subset of A1. Constrain
search.
36
E.g diabetes 0none, 1imparied glucose
tolerance, 2diabetes A2 (FÇU)È(FÇEÇT) A1
(FÇU)È(FÇT)È(FÇH) F, T and U denote positive
fructosamine, triglyceride and urinary albumin
tests. E is ethnic Polynesian Can be shown A2
subset of A1
37

Comparisons of SPAN and other methods
Lim, Loh and Shih (Machine Learning) compared 33
methods on 32 data sets
Methods 22 tree, 9 statistical, 2 neural
networks.
16 data sets (plus 16 with added noise)
Seems to provide benchmarks for other methods.
I tried SPAN on the 24 2-state and 3-state
classification
data sets.

38
Data classes SPAN error LLS Range of 33 methods
bcw 2 0.035 0.03-0.09
bcw 0.035 0.03-0.08
bld 2 0.365 0.28-0.43
bld 0.373 0.29-0.44
bos 3 0.236 0.221-0.314
bos 0.236 0.225-0.422
cmc 3 0.449 0.43-0.60
cmc 0.444 0.43-0.58
dna 3 0.075 0.05-0.38
dna 0.075 0.04-0.38
hea 2 0.170 0.14-0.34
hea 0.170 0.15-0.31
39
Data classes SPAN error LLS Range of 33 methods
pid 2 0.251 0.22-0.31
pid 0.252 0.22-0.32
smo 3 0.305 (0.44) 0.30-0.45
smo 0.305(0.44) 0.31-0.45
tae 3 0.510 0.325-0.693
tae 0.701 0.445-0.696
thy 3 0.0134 0.005-0.89
thy 0.0134 0.01-0.88
vot 2 0.044 0.04-0.06
vot 0.044 0.04-0.07
wav 3 0.266 0.151-0.477
wav 0.266 0.160-0.446
40
Data POL SPAN QI0 LOG LDA IC0 RBF ST0
bcw 19 7 2.5 5.5 12 23 5.5 30
bcw 11 8 1.5 8 9 27 4 20
bld 3 26 7 9 18.5 20 24 10
bld 1 25 26 17 15 7 18 6
bos 2.5 6 28 11.5 14.5 18.5 2.5 22.5
bos 3 2 22.5 13 20 11 28 17.5
cmc 1 5 14 19 22 7.5 10 9
cmc 1 4 17 18.5 22.5 5 15.5 9
dna 2 21 18 16 12 10 31 10
dna 3 20 19 17 13.5 6 31 6
pid 16.5 31 5 11 1.5 16.5 11 16.5
pid 1 30 2.5 7 4 13.5 15 30
hea 9 7.5 4.5 6 1 18 11.5 30
hea 16 6 6 9 2 16 8 23
41
Data POL SPAN QI0 LOG LDA IC0 RBF ST0
smo 9.5 9.5(32) 9.5 9.5 9.5 26 18.5 9.5
smo 7.5 9.5(32) 7.5 16.5 21 7.5 23.3 7.5
tae 20 19 10 11 6 3 13 30.5
tae 16 11 9 4 2 20 10 32
thy 14.5 14.5 17 26 28 8 23 5.5
thy 15 15 17.5 25 27 10 29 7.5
vot 25.5 9 1 21 15 17.5 25.5 21
vot 16 5 21 5 16 26 21 19
wav 5 21 8.5 2 10.5 29 1 26
wav 6 21 9 3.5 7.5 27.5 31 26
Mean Rank 9.3 11.1 (12.9) 11.8 12.1 12.9 15.6 17.1 17.7
Mean error 0.210 0.200 (0.230) 0.219 0.215 0.216 0.223 0.249 0.247
42
Limitations/Criticisms Multi-class problems
difficult Data dredging Loss of information
by cutpoints of continuous vars. Complexity
penalising somewhat ad hoc Computationally
intensive, unless search sensibly restricted Not
black-box requires user judgements Needs
(temperamental!) SPAN software no R algorithms

43
Conclusion Despite popularity trees have
weakness that stem from their hierachical
structure. SPAN offers an alternative that is
non-hierarchical SPAN generally performs as
well or better than trees. Offers decision
rules that are generally easy to understand.

Write a Comment

User Comments (0)