Descriptive data mining: some current issues - PowerPoint PPT Presentation

1 / 49

About This Presentation

Title:

Descriptive data mining: some current issues

Description:

2. (1.00) Astigmatic=yes Tear-production=reduced 6 == Lenses=none 6 ... 7. (.50 .08) Lenses=none == Age=pre-presb or Tear-prod=reduced ... – PowerPoint PPT presentation

Number of Views:133

Avg rating:3.0/5.0

Slides: 50

Provided by: peter87

Category:

more less

Transcript and Presenter's Notes

Title: Descriptive data mining: some current issues

1
Descriptive data miningsome current issues

Peter A. Flach
Department of Computer Science
University of Bristol
www.cs.bris.ac.uk/flach/

2
Outline of the talk

Introduction
First-order rule discovery
Subgroup discovery
Concluding remarks

3
Contact lense dataset
4
Contact lense decision list
IF Tear-production reduced THEN Lenses none
(12) ELSE / Tear-production normal / IF
Astigmatic no THEN Lenses soft (6/1)
ELSE / Astigmatic yes / IF
Spectacles myope THEN Lenses hard (3)
ELSE / Spectacles hypermetrope /
Lenses none (3/1) Confusion Matrix
a b c lt-- classified as 5 0 0 a
soft 0 3 1 b hard 1 0 14 c none
5
Contact lense association rules
1. (1.00) Tear-productionreduced 12 gt
Lensesnone 12 2. (1.00) Astigmaticyes
Tear-productionreduced 6 gt Lensesnone 6 3.
(1.00) Astigmaticno Tear-productionreduced 6
gt Lensesnone 6 4. (1.00) Spectacleshypermetro
pe Tear-productionreduced 6 gt Lensesnone 6
5. (1.00) Spectaclesmyope Tear-productionreduced
6 gt Lensesnone 6 6. (1.00) Lensessoft 5 gt
Astigmaticno Tear-productionnormal 5 7. (1.00)
Astigmaticno Lensessoft 5 gt
Tear-productionnormal 5 8. (1.00)
Tear-productionnormal Lensessoft 5 gt
Astigmaticno 5 9. (1.00) Lensessoft 5 gt
Tear-productionnormal 5 10. (1.00) Lensessoft 5
gt Astigmaticno 5 11. (0.86) Astigmaticno
Lensesnone 7 gt Tear-productionreduced 6 12.
(0.86) Spectaclesmyope Lensesnone 7 gt
Tear-productionreduced 6 13. (0.83)
Astigmaticno Tear-productionnormal 6 gt
Lensessoft 5 14. (0.83) Spectacleshypermetrope
Astigmaticyes 6 gt Lensesnone 5 15. (0.80)
Lensesnone 15 gt Tear-productionreduced 12 16.
(0.75) Astigmaticyes Lensesnone 8 gt
Tear-productionreduced 6 17. (0.75)
Spectacleshypermetrope Lensesnone 8 gt
Tear-productionreduced 6 18. (0.75)
Agepresbyopic 8 gt Lensesnone 6
B ?B H 12 315 ?H 0 9 9
121224
B ?B H 6 915 ?H 2 7 9
81624
6
Clauses sorted by confirmation
1. (.76 .00) Tear-prodreduced gt Lensesnone
2. (.76 .12) Lensesnone gt Tear-prodreduced
3. (.67 .04) Lensesnone gt Agepresb or
Tear-prodreduced 4. (.63 .04) Astigmno and
Tear-prodnormal gt Lensessoft 5. (.54 .00)
Astigmno and Tear-prodnormal gt Agepresb or
Lensessoft 6. (.50 .08) Astigmyes and
Tear-prodnormal gt Lenseshard 7. (.50 .08)
Lensesnone gt Agepre-presb or
Tear-prodreduced 8. (.47 .04) Lensesnone gt
Specshmetr or Tear-prodreduced 9. (.47 .04)
Lensesnone gt Astigmyes or Tear-prodreduced 10
. (.47 .00) Lensessoft gt Astigmno 11. (.47
.00) Lensessoft gt Tear-prodnormal 12. (.47
.00) Specsmyope and Astigmyes and
Tear-prodnormal gt Lenseshard 13. (.47 .00)
Lensesnone gt Agepresb or Specshmetr or
Tear-prodreduced 14. (.47 .00) Lensesnone gt
Agepresb or Astigmyes or Tear-prodreduced 15.
(.45 .00) Specshmetr and Astigmno and
Tear-prodnormal gt Lensessoft 16. (.44 .29)
Astigmno gt Lensessoft 17. (.44 .29)
Tear-prodnormal gt Lensessoft
B ?B H 12 315 ?H 0 9 9
121224
B ?B H 5 0 5 ?H 71219
121224
7
A toy example
8
East-West trains (flattened)

Example eastbound(t1).
Background knowledgehasCar(t1,c1).
hasCar(t1,c2). hasCar(t1,c3).
hasCar(t1,c4).cshape(c1,rect). cshape(c2,rect).
cshape(c3,rect). cshape(c4,rect).clength(c1,shor
t).clength(c2,long).clength(c3,short).clength(c4,l
ong).croof(c1,none). croof(c2,none).
croof(c3,peak). croof(c4,none).cwheels(c1,2).
cwheels(c2,3). cwheels(c3,2).
cwheels(c4,2).hasLoad(c1,l1). hasLoad(c2,l2).
hasLoad(c3,l3). hasLoad(c4,l4).lshape(l1,circ).
lshape(l2,hexa). lshape(l3,tria).
lshape(l4,rect).lnumber(l1,1). lnumber(l2,1).
lnumber(l3,1). lnumber(l4,3).
Hypothesis eastbound(T)-hasCar(T,C),clength(C,s
hort), not
croof(C,none).

9
East-West trains (flattened)

Example eastbound(t1).
Background knowledgehasCar(t1,c1).
hasCar(t1,c2). hasCar(t1,c3).
hasCar(t1,c4).cshape(c1,rect). cshape(c2,rect).
cshape(c3,rect). cshape(c4,rect).clength(c1,shor
t).clength(c2,long).clength(c3,short).clength(c4,l
ong).croof(c1,none). croof(c2,none).
croof(c3,peak). croof(c4,none).cwheels(c1,2).
cwheels(c2,3). cwheels(c3,2).
cwheels(c4,2).hasLoad(c1,l1). hasLoad(c2,l2).
hasLoad(c3,l3). hasLoad(c4,l4).lshape(l1,circ).
lshape(l2,hexa). lshape(l3,tria).
lshape(l4,rect).lnumber(l1,1). lnumber(l2,1).
lnumber(l3,1). lnumber(l4,3).
Hypothesis eastbound(T)-hasCar(T,C),clength(C,s
hort), not
croof(C,none).

10
East-West trains (terms)

Example eastbound(car(rect,short,none,2,load(cir
c,1)), car(rect,long,
none,3,load(hexa,1)),
car(rect,short,peak,2,load(tria,1)),
car(rect,long, none,2,load(rect,3))).
Background knowledge member/2, arg/3
Hypothesis eastbound(T)-member(C,T),arg(2,C,sho
rt), not arg(3,C,none).

11
East-West trains (terms)

Example eastbound(car(rect,short,none,2,load(cir
c,1)), car(rect,long,
none,3,load(hexa,1)),
car(rect,short,peak,2,load(tria,1)),
car(rect,long, none,2,load(rect,3))).
Background knowledge member/2, arg/3
Hypothesis eastbound(T)-member(C,T),arg(2,C,sho
rt), not arg(3,C,none).

12
ER diagram for East-West trains
13
Train-as-set database
SELECT DISTINCT TRAIN_TABLE.TRAIN FROM
TRAIN_TABLE, CAR_TABLE WHERE TRAIN_TABLE.TRAIN
CAR_TABLE.TRAIN AND CAR_TABLE.SHAPE 'short'
AND CAR_TABLE.ROOF ! 'none'
14
Individual-centred representations

ER diagram is a tree (approximately)
root denotes individual
looking downwards from the root, only one-to-one
or one-to-many relations are allowed
one-to-one cycles are allowed
Database can be partitioned according to
individual
Alternative all information about a single
individual packed together in a term
tuples, lists, trees, sets, multisets, graphs,

15
Mutagenesis
16
Complexity of learning problems

Simplest case single table with primary key
attribute-value or propositional learning
example corresponds to tuple of constants
Next single table without primary key
multi-instance problem
example corresponds to set of tuples of constants
Complexity resides in many-to-one foreign keys
non-determinate variables
lists, trees, sets, multisets, graphs,

17
Subgroup discovery

An interesting subgroup has a class distribution
which differs significantly from overall
distribution
This can be modelled as classification with
profits (for true pos/neg) and costs (for false
pos/neg)
Requires different heuristics and/or trade-off
between accuracy and generality

18
Evaluation metrics
19
Outline of the talk

Introduction
First-order rule discovery
Subgroup discovery
Concluding remarks

20
The Tertius approach

Suppose the formula to be evaluated is an
implication H?B
determine the truthvalues of H and B for each
example, and organise the observed frequencies in
a contingency table

B ?B H nHB nH?B nH ?H n?HB n?H?B n?H nB
n?B N
21
The Tertius approach

Suppose the formula to be evaluated is an
implication H?B
obtain expected frequencies from the marginals
under some null-hypothesis of independence
e.g. ??HB n?H nB / N

B ?B H nHB (?HB) nH?B (?H?B) nH ?H n?HB (??HB
) n?H?B (??H?B) n?H nB n?B N
22
The Tertius approach

Suppose the formula to be evaluated is an
implication H?B
define confirmation in terms of the difference
between the expected frequency of
counter-instances ??HB and the observed frequency
n?HB

B ?B H nHB (?HB) nH?B (?H?B) nH ?H n?HB (??HB
) n?H?B (??H?B) n?H nB n?B N
23
Motivation

Many existing measures only take part of the
contingency table into account
Precision nHB / nB p(H B)
Recall or Sensitivity nHB / nH p(B H)
Specificity n?H?B / n?H p(?B ?H)
In knowledge discovery we need to consider the
whole table without introducing false symmetries
compare rules with different heads and bodies

24
Novelty and satisfaction

Novelty is defined as the relative decrease in
counter-instances from expected to observed
??HB (??HB n?HB ) / N .25??HB.25
p(H B) p(H) p(B)
Satisfaction is defined as the ratio of expected
but non-observed counter-instances
??HB (??HB n?HB ) / ??HB 0??HB1 if ??HB0
p(H B) p(H) / 1 p(H)

25
Confirmation

Confirmation trades off novelty and satisfaction
??HB??HB (??HB n?HB )2 / N ??HB is the
contribution of ?HB to ?2 c2 / N
Theorem
conf(H?B) is lowest v?2 given n?HB, ??HB and N

26
Trains again
(.81 .00) eastbound(A)-hasCar(A,B),clength(B,shor
t),not croof(B,none). (.62 .20)
eastbound(A)-hasCar(A,B),cshape(B,rect),clength(B
,short). (.61 .05) eastbound(A)-hasCar(A,B),hasLo
ad(B,C),not cshape(B,u_sh),lshape(C,tria). (.55
.00) eastbound(A)-hasCar(A,B),clength(B,short),cr
oof(B,flat). (.53 .25) eastbound(A)-hasCar(A,B),c
length(B,short),not cshape(B,u_sh). (.51 .05)
eastbound(A)-hasCar(A,B),hasLoad(B,C),cshape(B,re
ct),lshape(C,tria). (.51 .20) eastbound(A)-hasCar
(A,B),hasLoad(B,C),not croof(B,none),not
lshape(C,rect). (.51 .20) eastbound(A)-hasCar(A,B
),hasLoad(B,C),not croof(B,none),not
lnumber(C,3).
27
Trains again
(.81 .00) eastbound(A)-hasCar(A,B),clength(B,shor
t),not croof(B,none). (.62 .20)
eastbound(A)-hasCar(A,B),cshape(B,rect),clength(B
,short). (.61 .05) eastbound(A)-hasCar(A,B),hasLo
ad(B,C),not cshape(B,u_sh),lshape(C,tria). (.55
.00) eastbound(A)-hasCar(A,B),clength(B,short),cr
oof(B,flat). (.53 .25) eastbound(A)-hasCar(A,B),c
length(B,short),not cshape(B,u_sh). (.51 .05)
eastbound(A)-hasCar(A,B),hasLoad(B,C),cshape(B,re
ct),lshape(C,tria). (.51 .20) eastbound(A)-hasCar
(A,B),hasLoad(B,C),not croof(B,none),not
lshape(C,rect). (.51 .20) eastbound(A)-hasCar(A,B
),hasLoad(B,C),not croof(B,none),not
lnumber(C,3).
car(A,B),short(B),not none(B) ?()
eastbound(A) 8 (4) 2 (6) 10 ?eastbound(A) 0
(4) 10 (6) 10 8 12 20
28
Trains again
(.81 .00) eastbound(A)-hasCar(A,B),clength(B,shor
t),not croof(B,none). (.62 .20)
eastbound(A)-hasCar(A,B),cshape(B,rect),clength(B
,short). (.61 .05) eastbound(A)-hasCar(A,B),hasLo
ad(B,C),not cshape(B,u_sh),lshape(C,tria). (.55
.00) eastbound(A)-hasCar(A,B),clength(B,short),cr
oof(B,flat). (.53 .25) eastbound(A)-hasCar(A,B),c
length(B,short),not cshape(B,u_sh). (.51 .05)
eastbound(A)-hasCar(A,B),hasLoad(B,C),cshape(B,re
ct),lshape(C,tria). (.51 .20) eastbound(A)-hasCar
(A,B),hasLoad(B,C),not croof(B,none),not
lshape(C,rect). (.51 .20) eastbound(A)-hasCar(A,B
),hasLoad(B,C),not croof(B,none),not
lnumber(C,3).
car(A,B),rect(B),short(B) ?()
eastbound(A) 10 (7) 0 (3) 10 ?eastbound(A) 4
(7) 6 (3) 10 14 6 20
29
Confirmation as search heuristic

Theorem if H?B is an admissible specialisation
of H?B, then conf(H?B) (NnH?B) / (NnH?B)

B ?B H 2 (3) 4 (3) 6 ?H 3 (2) 1 (2) 4 5 5
10
B ?B H 3 (2.1) 4 (4.9) 7 ?H 0 (.9) 3 (2.1
) 3 3 7 10
30
Upgrading to first-order logic

Use function-free Prolog as representation
language
normal-form logic, simple syntax
specialisation well understood
For rule evaluation, generate all grounding
substitutions
specialisation may increase sample size
if problematic, use first-order features and
count only over global variables

31
First-order features

Features concern interactions of local variables
The following rule has one boolean feature has
a short closed car
eastbound(T)-hasCar(T,C), clength(C,short),
not croof(C,none).
The following rule has two boolean features has
a short car and has a closed car
eastbound(T)- hasCar(T,C1),clength(C1,short
), hasCar(T,C2),not croof(C2,none).

32
Propositionalising rules

Equivalently
eastbound(T)-hasShortCar(T),hasClosedCar(T).
hasShortCar(T)-hasCar(T,C1),clength(C1,short).
hasClosedCar(T)-hasCar(T,C2),not croof(C2,none).
Given a way to construct and select first-order
features, rule construction is semi-propositional
head and body literals have the same global
variable(s)
corresponds to single table, one row per example

33
First-order feature bias in Tertius

Flattened representation, but derived from
strongly-typed term representation
one free global variable
each (binary) structural predicate introduces a
new existential local variable and uses either
global variable or local variable introduced by
other structural predicate
utility predicates only use variables
all variables are used
NB. features can be non-boolean
if all structural predicates are one-to-one

34
The Tertius system

A, anytime top-down search algorithm
optimal refinement operator
7500 lines of GNU C
propositional Weka plug-in available
P.A. Flach N. Lachiche (2001),
Confirmation-guided discovery of first-order
rules with Tertius, Machine Learning 42(1/2)
6195
www.cs.bris.ac.uk/Research/MachineLearning/Tertius
/

35
Outline of the talk

Introduction
First-order rule discovery
Subgroup discovery
Concluding remarks

36
Subgroups vs. classifiers

Classification rules aim at pure subgroups
Subgroups aim at significantly higher (or
different) proportion of positives
essentially the same as cost-sensitive
classification
instead of FNcost we have TPprofit

37
ROC space

True positive rate true pos. / pos.
TP1 40/50 80
TP2 30/50 60
False positive rate false pos. / neg.
FP1 10/50 20
FP2 0/50 0
ROC space has FP rate on X axis and TP rate on Y
axis

38
The ROC convex hull
39
The ROC convex hull
40
Choosing a classifier
41
Choosing a classifier
42
Weighted Relative Accuracy
WRAcc(Class?Condition) p(Condition)p(Class
Condition) p(Class) TPrate FPrate
43
Subgroup discovery with CN2-SD