Title: Descriptive data mining: some current issues
1Descriptive data miningsome current issues
- Peter A. Flach
- Department of Computer Science
- University of Bristol
- www.cs.bris.ac.uk/flach/
2Outline of the talk
- Introduction
- First-order rule discovery
- Subgroup discovery
- Concluding remarks
3Contact lense dataset
4Contact lense decision list
IF Tear-production reduced THEN Lenses none
(12) ELSE / Tear-production normal / IF
Astigmatic no THEN Lenses soft (6/1)
ELSE / Astigmatic yes / IF
Spectacles myope THEN Lenses hard (3)
ELSE / Spectacles hypermetrope /
Lenses none (3/1) Confusion Matrix
a b c lt-- classified as 5 0 0 a
soft 0 3 1 b hard 1 0 14 c none
5Contact lense association rules
1. (1.00) Tear-productionreduced 12 gt
Lensesnone 12 2. (1.00) Astigmaticyes
Tear-productionreduced 6 gt Lensesnone 6 3.
(1.00) Astigmaticno Tear-productionreduced 6
gt Lensesnone 6 4. (1.00) Spectacleshypermetro
pe Tear-productionreduced 6 gt Lensesnone 6
5. (1.00) Spectaclesmyope Tear-productionreduced
6 gt Lensesnone 6 6. (1.00) Lensessoft 5 gt
Astigmaticno Tear-productionnormal 5 7. (1.00)
Astigmaticno Lensessoft 5 gt
Tear-productionnormal 5 8. (1.00)
Tear-productionnormal Lensessoft 5 gt
Astigmaticno 5 9. (1.00) Lensessoft 5 gt
Tear-productionnormal 5 10. (1.00) Lensessoft 5
gt Astigmaticno 5 11. (0.86) Astigmaticno
Lensesnone 7 gt Tear-productionreduced 6 12.
(0.86) Spectaclesmyope Lensesnone 7 gt
Tear-productionreduced 6 13. (0.83)
Astigmaticno Tear-productionnormal 6 gt
Lensessoft 5 14. (0.83) Spectacleshypermetrope
Astigmaticyes 6 gt Lensesnone 5 15. (0.80)
Lensesnone 15 gt Tear-productionreduced 12 16.
(0.75) Astigmaticyes Lensesnone 8 gt
Tear-productionreduced 6 17. (0.75)
Spectacleshypermetrope Lensesnone 8 gt
Tear-productionreduced 6 18. (0.75)
Agepresbyopic 8 gt Lensesnone 6
B ?B H 12 315 ?H 0 9 9
121224
B ?B H 6 915 ?H 2 7 9
81624
6Clauses sorted by confirmation
1. (.76 .00) Tear-prodreduced gt Lensesnone
2. (.76 .12) Lensesnone gt Tear-prodreduced
3. (.67 .04) Lensesnone gt Agepresb or
Tear-prodreduced 4. (.63 .04) Astigmno and
Tear-prodnormal gt Lensessoft 5. (.54 .00)
Astigmno and Tear-prodnormal gt Agepresb or
Lensessoft 6. (.50 .08) Astigmyes and
Tear-prodnormal gt Lenseshard 7. (.50 .08)
Lensesnone gt Agepre-presb or
Tear-prodreduced 8. (.47 .04) Lensesnone gt
Specshmetr or Tear-prodreduced 9. (.47 .04)
Lensesnone gt Astigmyes or Tear-prodreduced 10
. (.47 .00) Lensessoft gt Astigmno 11. (.47
.00) Lensessoft gt Tear-prodnormal 12. (.47
.00) Specsmyope and Astigmyes and
Tear-prodnormal gt Lenseshard 13. (.47 .00)
Lensesnone gt Agepresb or Specshmetr or
Tear-prodreduced 14. (.47 .00) Lensesnone gt
Agepresb or Astigmyes or Tear-prodreduced 15.
(.45 .00) Specshmetr and Astigmno and
Tear-prodnormal gt Lensessoft 16. (.44 .29)
Astigmno gt Lensessoft 17. (.44 .29)
Tear-prodnormal gt Lensessoft
B ?B H 12 315 ?H 0 9 9
121224
B ?B H 5 0 5 ?H 71219
121224
7A toy example
8East-West trains (flattened)
- Example eastbound(t1).
- Background knowledgehasCar(t1,c1).
hasCar(t1,c2). hasCar(t1,c3).
hasCar(t1,c4).cshape(c1,rect). cshape(c2,rect).
cshape(c3,rect). cshape(c4,rect).clength(c1,shor
t).clength(c2,long).clength(c3,short).clength(c4,l
ong).croof(c1,none). croof(c2,none).
croof(c3,peak). croof(c4,none).cwheels(c1,2).
cwheels(c2,3). cwheels(c3,2).
cwheels(c4,2).hasLoad(c1,l1). hasLoad(c2,l2).
hasLoad(c3,l3). hasLoad(c4,l4).lshape(l1,circ).
lshape(l2,hexa). lshape(l3,tria).
lshape(l4,rect).lnumber(l1,1). lnumber(l2,1).
lnumber(l3,1). lnumber(l4,3). - Hypothesis eastbound(T)-hasCar(T,C),clength(C,s
hort), not
croof(C,none).
9East-West trains (flattened)
- Example eastbound(t1).
- Background knowledgehasCar(t1,c1).
hasCar(t1,c2). hasCar(t1,c3).
hasCar(t1,c4).cshape(c1,rect). cshape(c2,rect).
cshape(c3,rect). cshape(c4,rect).clength(c1,shor
t).clength(c2,long).clength(c3,short).clength(c4,l
ong).croof(c1,none). croof(c2,none).
croof(c3,peak). croof(c4,none).cwheels(c1,2).
cwheels(c2,3). cwheels(c3,2).
cwheels(c4,2).hasLoad(c1,l1). hasLoad(c2,l2).
hasLoad(c3,l3). hasLoad(c4,l4).lshape(l1,circ).
lshape(l2,hexa). lshape(l3,tria).
lshape(l4,rect).lnumber(l1,1). lnumber(l2,1).
lnumber(l3,1). lnumber(l4,3). - Hypothesis eastbound(T)-hasCar(T,C),clength(C,s
hort), not
croof(C,none).
10East-West trains (terms)
- Example eastbound(car(rect,short,none,2,load(cir
c,1)), car(rect,long,
none,3,load(hexa,1)),
car(rect,short,peak,2,load(tria,1)),
car(rect,long, none,2,load(rect,3))). - Background knowledge member/2, arg/3
- Hypothesis eastbound(T)-member(C,T),arg(2,C,sho
rt), not arg(3,C,none).
11East-West trains (terms)
- Example eastbound(car(rect,short,none,2,load(cir
c,1)), car(rect,long,
none,3,load(hexa,1)),
car(rect,short,peak,2,load(tria,1)),
car(rect,long, none,2,load(rect,3))). - Background knowledge member/2, arg/3
- Hypothesis eastbound(T)-member(C,T),arg(2,C,sho
rt), not arg(3,C,none).
12ER diagram for East-West trains
13Train-as-set database
SELECT DISTINCT TRAIN_TABLE.TRAIN FROM
TRAIN_TABLE, CAR_TABLE WHERE TRAIN_TABLE.TRAIN
CAR_TABLE.TRAIN AND CAR_TABLE.SHAPE 'short'
AND CAR_TABLE.ROOF ! 'none'
14Individual-centred representations
- ER diagram is a tree (approximately)
- root denotes individual
- looking downwards from the root, only one-to-one
or one-to-many relations are allowed - one-to-one cycles are allowed
- Database can be partitioned according to
individual - Alternative all information about a single
individual packed together in a term - tuples, lists, trees, sets, multisets, graphs,
15Mutagenesis
16Complexity of learning problems
- Simplest case single table with primary key
- attribute-value or propositional learning
- example corresponds to tuple of constants
- Next single table without primary key
- multi-instance problem
- example corresponds to set of tuples of constants
- Complexity resides in many-to-one foreign keys
- non-determinate variables
- lists, trees, sets, multisets, graphs,
17Subgroup discovery
- An interesting subgroup has a class distribution
which differs significantly from overall
distribution - This can be modelled as classification with
profits (for true pos/neg) and costs (for false
pos/neg) - Requires different heuristics and/or trade-off
between accuracy and generality
18Evaluation metrics
19Outline of the talk
- Introduction
- First-order rule discovery
- Subgroup discovery
- Concluding remarks
20The Tertius approach
- Suppose the formula to be evaluated is an
implication H?B - determine the truthvalues of H and B for each
example, and organise the observed frequencies in
a contingency table
B ?B H nHB nH?B nH ?H n?HB n?H?B n?H nB
n?B N
21The Tertius approach
- Suppose the formula to be evaluated is an
implication H?B - obtain expected frequencies from the marginals
under some null-hypothesis of independence - e.g. ??HB n?H nB / N
B ?B H nHB (?HB) nH?B (?H?B) nH ?H n?HB (??HB
) n?H?B (??H?B) n?H nB n?B N
22The Tertius approach
- Suppose the formula to be evaluated is an
implication H?B - define confirmation in terms of the difference
between the expected frequency of
counter-instances ??HB and the observed frequency
n?HB
B ?B H nHB (?HB) nH?B (?H?B) nH ?H n?HB (??HB
) n?H?B (??H?B) n?H nB n?B N
23Motivation
- Many existing measures only take part of the
contingency table into account - Precision nHB / nB p(H B)
- Recall or Sensitivity nHB / nH p(B H)
- Specificity n?H?B / n?H p(?B ?H)
- In knowledge discovery we need to consider the
whole table without introducing false symmetries - compare rules with different heads and bodies
24Novelty and satisfaction
- Novelty is defined as the relative decrease in
counter-instances from expected to observed - ??HB (??HB n?HB ) / N .25??HB.25
- p(H B) p(H) p(B)
- Satisfaction is defined as the ratio of expected
but non-observed counter-instances - ??HB (??HB n?HB ) / ??HB 0??HB1 if ??HB0
- p(H B) p(H) / 1 p(H)
25Confirmation
- Confirmation trades off novelty and satisfaction
- ??HB??HB (??HB n?HB )2 / N ??HB is the
contribution of ?HB to ?2 c2 / N - Theorem
- conf(H?B) is lowest v?2 given n?HB, ??HB and N
26Trains again
(.81 .00) eastbound(A)-hasCar(A,B),clength(B,shor
t),not croof(B,none). (.62 .20)
eastbound(A)-hasCar(A,B),cshape(B,rect),clength(B
,short). (.61 .05) eastbound(A)-hasCar(A,B),hasLo
ad(B,C),not cshape(B,u_sh),lshape(C,tria). (.55
.00) eastbound(A)-hasCar(A,B),clength(B,short),cr
oof(B,flat). (.53 .25) eastbound(A)-hasCar(A,B),c
length(B,short),not cshape(B,u_sh). (.51 .05)
eastbound(A)-hasCar(A,B),hasLoad(B,C),cshape(B,re
ct),lshape(C,tria). (.51 .20) eastbound(A)-hasCar
(A,B),hasLoad(B,C),not croof(B,none),not
lshape(C,rect). (.51 .20) eastbound(A)-hasCar(A,B
),hasLoad(B,C),not croof(B,none),not
lnumber(C,3).
27Trains again
(.81 .00) eastbound(A)-hasCar(A,B),clength(B,shor
t),not croof(B,none). (.62 .20)
eastbound(A)-hasCar(A,B),cshape(B,rect),clength(B
,short). (.61 .05) eastbound(A)-hasCar(A,B),hasLo
ad(B,C),not cshape(B,u_sh),lshape(C,tria). (.55
.00) eastbound(A)-hasCar(A,B),clength(B,short),cr
oof(B,flat). (.53 .25) eastbound(A)-hasCar(A,B),c
length(B,short),not cshape(B,u_sh). (.51 .05)
eastbound(A)-hasCar(A,B),hasLoad(B,C),cshape(B,re
ct),lshape(C,tria). (.51 .20) eastbound(A)-hasCar
(A,B),hasLoad(B,C),not croof(B,none),not
lshape(C,rect). (.51 .20) eastbound(A)-hasCar(A,B
),hasLoad(B,C),not croof(B,none),not
lnumber(C,3).
car(A,B),short(B),not none(B) ?()
eastbound(A) 8 (4) 2 (6) 10 ?eastbound(A) 0
(4) 10 (6) 10 8 12 20
28Trains again
(.81 .00) eastbound(A)-hasCar(A,B),clength(B,shor
t),not croof(B,none). (.62 .20)
eastbound(A)-hasCar(A,B),cshape(B,rect),clength(B
,short). (.61 .05) eastbound(A)-hasCar(A,B),hasLo
ad(B,C),not cshape(B,u_sh),lshape(C,tria). (.55
.00) eastbound(A)-hasCar(A,B),clength(B,short),cr
oof(B,flat). (.53 .25) eastbound(A)-hasCar(A,B),c
length(B,short),not cshape(B,u_sh). (.51 .05)
eastbound(A)-hasCar(A,B),hasLoad(B,C),cshape(B,re
ct),lshape(C,tria). (.51 .20) eastbound(A)-hasCar
(A,B),hasLoad(B,C),not croof(B,none),not
lshape(C,rect). (.51 .20) eastbound(A)-hasCar(A,B
),hasLoad(B,C),not croof(B,none),not
lnumber(C,3).
car(A,B),rect(B),short(B) ?()
eastbound(A) 10 (7) 0 (3) 10 ?eastbound(A) 4
(7) 6 (3) 10 14 6 20
29Confirmation as search heuristic
- Theorem if H?B is an admissible specialisation
of H?B, then conf(H?B) (NnH?B) / (NnH?B)
B ?B H 2 (3) 4 (3) 6 ?H 3 (2) 1 (2) 4 5 5
10
B ?B H 3 (2.1) 4 (4.9) 7 ?H 0 (.9) 3 (2.1
) 3 3 7 10
30Upgrading to first-order logic
- Use function-free Prolog as representation
language - normal-form logic, simple syntax
- specialisation well understood
- For rule evaluation, generate all grounding
substitutions - specialisation may increase sample size
- if problematic, use first-order features and
count only over global variables
31First-order features
- Features concern interactions of local variables
- The following rule has one boolean feature has
a short closed car - eastbound(T)-hasCar(T,C), clength(C,short),
not croof(C,none). - The following rule has two boolean features has
a short car and has a closed car - eastbound(T)- hasCar(T,C1),clength(C1,short
), hasCar(T,C2),not croof(C2,none).
32Propositionalising rules
- Equivalently
- eastbound(T)-hasShortCar(T),hasClosedCar(T).
- hasShortCar(T)-hasCar(T,C1),clength(C1,short).
- hasClosedCar(T)-hasCar(T,C2),not croof(C2,none).
- Given a way to construct and select first-order
features, rule construction is semi-propositional - head and body literals have the same global
variable(s) - corresponds to single table, one row per example
33First-order feature bias in Tertius
- Flattened representation, but derived from
strongly-typed term representation - one free global variable
- each (binary) structural predicate introduces a
new existential local variable and uses either
global variable or local variable introduced by
other structural predicate - utility predicates only use variables
- all variables are used
- NB. features can be non-boolean
- if all structural predicates are one-to-one
34The Tertius system
- A, anytime top-down search algorithm
- optimal refinement operator
- 7500 lines of GNU C
- propositional Weka plug-in available
- P.A. Flach N. Lachiche (2001),
Confirmation-guided discovery of first-order
rules with Tertius, Machine Learning 42(1/2)
6195 - www.cs.bris.ac.uk/Research/MachineLearning/Tertius
/
35Outline of the talk
- Introduction
- First-order rule discovery
- Subgroup discovery
- Concluding remarks
36Subgroups vs. classifiers
- Classification rules aim at pure subgroups
- Subgroups aim at significantly higher (or
different) proportion of positives - essentially the same as cost-sensitive
classification - instead of FNcost we have TPprofit
37ROC space
- True positive rate true pos. / pos.
- TP1 40/50 80
- TP2 30/50 60
- False positive rate false pos. / neg.
- FP1 10/50 20
- FP2 0/50 0
- ROC space has FP rate on X axis and TP rate on Y
axis
38The ROC convex hull
39The ROC convex hull
40Choosing a classifier
41Choosing a classifier
42Weighted Relative Accuracy
WRAcc(Class?Condition) p(Condition)p(Class
Condition) p(Class) TPrate FPrate
43Subgroup discovery with CN2-SD
- Weighted covering algorithm
- covered example is not removed, but weight
decreased - probability estimates take weights into account
- Weighted relative accuracy as search heuristic
- trades off rule precision and generality
- results in fewer, more general, overlapping rules
44Subgroup evaluation method 1
Each point corresponds to a subgroup description
45Subgroup evaluation method 2
- Use all discovered rules together as a
probabilistic classifier - Order all test instances by decreasing predicted
probability of being positive - Draw a ROC curve as follows
- start in (0,0)
- if the next instance is positive move up else
right - until (1,1) is reached
46Subgroup evaluation method 2
Each point on the curve corresponds to a
probability threshold
47Example Australian (UCI)
48Concluding remarks
- Confirmation-guided rule discovery
- new heuristic especially suited for knowledge
discovery - optimal A search implemented in Tertius
- ROC analysis very natural for subgroup discovery
- here used for evaluation
- also possible to use as search heuristic
- Joint work with Nicolas Lachiche, Nada LavraË,
John Lloyd, and others
49Questions?