Descriptive data mining: some current issues - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Descriptive data mining: some current issues

Description:

2. (1.00) Astigmatic=yes Tear-production=reduced 6 == Lenses=none 6 ... 7. (.50 .08) Lenses=none == Age=pre-presb or Tear-prod=reduced ... – PowerPoint PPT presentation

Number of Views:133
Avg rating:3.0/5.0
Slides: 50
Provided by: peter87
Category:

less

Transcript and Presenter's Notes

Title: Descriptive data mining: some current issues


1
Descriptive data miningsome current issues
  • Peter A. Flach
  • Department of Computer Science
  • University of Bristol
  • www.cs.bris.ac.uk/flach/

2
Outline of the talk
  • Introduction
  • First-order rule discovery
  • Subgroup discovery
  • Concluding remarks

3
Contact lense dataset
4
Contact lense decision list
IF Tear-production reduced THEN Lenses none
(12) ELSE / Tear-production normal / IF
Astigmatic no THEN Lenses soft (6/1)
ELSE / Astigmatic yes / IF
Spectacles myope THEN Lenses hard (3)
ELSE / Spectacles hypermetrope /
Lenses none (3/1) Confusion Matrix
a b c lt-- classified as 5 0 0 a
soft 0 3 1 b hard 1 0 14 c none
5
Contact lense association rules
1. (1.00) Tear-productionreduced 12 gt
Lensesnone 12 2. (1.00) Astigmaticyes
Tear-productionreduced 6 gt Lensesnone 6 3.
(1.00) Astigmaticno Tear-productionreduced 6
gt Lensesnone 6 4. (1.00) Spectacleshypermetro
pe Tear-productionreduced 6 gt Lensesnone 6
5. (1.00) Spectaclesmyope Tear-productionreduced
6 gt Lensesnone 6 6. (1.00) Lensessoft 5 gt
Astigmaticno Tear-productionnormal 5 7. (1.00)
Astigmaticno Lensessoft 5 gt
Tear-productionnormal 5 8. (1.00)
Tear-productionnormal Lensessoft 5 gt
Astigmaticno 5 9. (1.00) Lensessoft 5 gt
Tear-productionnormal 5 10. (1.00) Lensessoft 5
gt Astigmaticno 5 11. (0.86) Astigmaticno
Lensesnone 7 gt Tear-productionreduced 6 12.
(0.86) Spectaclesmyope Lensesnone 7 gt
Tear-productionreduced 6 13. (0.83)
Astigmaticno Tear-productionnormal 6 gt
Lensessoft 5 14. (0.83) Spectacleshypermetrope
Astigmaticyes 6 gt Lensesnone 5 15. (0.80)
Lensesnone 15 gt Tear-productionreduced 12 16.
(0.75) Astigmaticyes Lensesnone 8 gt
Tear-productionreduced 6 17. (0.75)
Spectacleshypermetrope Lensesnone 8 gt
Tear-productionreduced 6 18. (0.75)
Agepresbyopic 8 gt Lensesnone 6
B ?B H 12 315 ?H 0 9 9
121224
B ?B H 6 915 ?H 2 7 9
81624
6
Clauses sorted by confirmation
1. (.76 .00) Tear-prodreduced gt Lensesnone
2. (.76 .12) Lensesnone gt Tear-prodreduced
3. (.67 .04) Lensesnone gt Agepresb or
Tear-prodreduced 4. (.63 .04) Astigmno and
Tear-prodnormal gt Lensessoft 5. (.54 .00)
Astigmno and Tear-prodnormal gt Agepresb or
Lensessoft 6. (.50 .08) Astigmyes and
Tear-prodnormal gt Lenseshard 7. (.50 .08)
Lensesnone gt Agepre-presb or
Tear-prodreduced 8. (.47 .04) Lensesnone gt
Specshmetr or Tear-prodreduced 9. (.47 .04)
Lensesnone gt Astigmyes or Tear-prodreduced 10
. (.47 .00) Lensessoft gt Astigmno 11. (.47
.00) Lensessoft gt Tear-prodnormal 12. (.47
.00) Specsmyope and Astigmyes and
Tear-prodnormal gt Lenseshard 13. (.47 .00)
Lensesnone gt Agepresb or Specshmetr or
Tear-prodreduced 14. (.47 .00) Lensesnone gt
Agepresb or Astigmyes or Tear-prodreduced 15.
(.45 .00) Specshmetr and Astigmno and
Tear-prodnormal gt Lensessoft 16. (.44 .29)
Astigmno gt Lensessoft 17. (.44 .29)
Tear-prodnormal gt Lensessoft
B ?B H 12 315 ?H 0 9 9
121224
B ?B H 5 0 5 ?H 71219
121224
7
A toy example
8
East-West trains (flattened)
  • Example eastbound(t1).
  • Background knowledgehasCar(t1,c1).
    hasCar(t1,c2). hasCar(t1,c3).
    hasCar(t1,c4).cshape(c1,rect). cshape(c2,rect).
    cshape(c3,rect). cshape(c4,rect).clength(c1,shor
    t).clength(c2,long).clength(c3,short).clength(c4,l
    ong).croof(c1,none). croof(c2,none).
    croof(c3,peak). croof(c4,none).cwheels(c1,2).
    cwheels(c2,3). cwheels(c3,2).
    cwheels(c4,2).hasLoad(c1,l1). hasLoad(c2,l2).
    hasLoad(c3,l3). hasLoad(c4,l4).lshape(l1,circ).
    lshape(l2,hexa). lshape(l3,tria).
    lshape(l4,rect).lnumber(l1,1). lnumber(l2,1).
    lnumber(l3,1). lnumber(l4,3).
  • Hypothesis eastbound(T)-hasCar(T,C),clength(C,s
    hort), not
    croof(C,none).

9
East-West trains (flattened)
  • Example eastbound(t1).
  • Background knowledgehasCar(t1,c1).
    hasCar(t1,c2). hasCar(t1,c3).
    hasCar(t1,c4).cshape(c1,rect). cshape(c2,rect).
    cshape(c3,rect). cshape(c4,rect).clength(c1,shor
    t).clength(c2,long).clength(c3,short).clength(c4,l
    ong).croof(c1,none). croof(c2,none).
    croof(c3,peak). croof(c4,none).cwheels(c1,2).
    cwheels(c2,3). cwheels(c3,2).
    cwheels(c4,2).hasLoad(c1,l1). hasLoad(c2,l2).
    hasLoad(c3,l3). hasLoad(c4,l4).lshape(l1,circ).
    lshape(l2,hexa). lshape(l3,tria).
    lshape(l4,rect).lnumber(l1,1). lnumber(l2,1).
    lnumber(l3,1). lnumber(l4,3).
  • Hypothesis eastbound(T)-hasCar(T,C),clength(C,s
    hort), not
    croof(C,none).

10
East-West trains (terms)
  • Example eastbound(car(rect,short,none,2,load(cir
    c,1)), car(rect,long,
    none,3,load(hexa,1)),
    car(rect,short,peak,2,load(tria,1)),
    car(rect,long, none,2,load(rect,3))).
  • Background knowledge member/2, arg/3
  • Hypothesis eastbound(T)-member(C,T),arg(2,C,sho
    rt), not arg(3,C,none).

11
East-West trains (terms)
  • Example eastbound(car(rect,short,none,2,load(cir
    c,1)), car(rect,long,
    none,3,load(hexa,1)),
    car(rect,short,peak,2,load(tria,1)),
    car(rect,long, none,2,load(rect,3))).
  • Background knowledge member/2, arg/3
  • Hypothesis eastbound(T)-member(C,T),arg(2,C,sho
    rt), not arg(3,C,none).

12
ER diagram for East-West trains
13
Train-as-set database
SELECT DISTINCT TRAIN_TABLE.TRAIN FROM
TRAIN_TABLE, CAR_TABLE WHERE TRAIN_TABLE.TRAIN
CAR_TABLE.TRAIN AND CAR_TABLE.SHAPE 'short'
AND CAR_TABLE.ROOF ! 'none'
14
Individual-centred representations
  • ER diagram is a tree (approximately)
  • root denotes individual
  • looking downwards from the root, only one-to-one
    or one-to-many relations are allowed
  • one-to-one cycles are allowed
  • Database can be partitioned according to
    individual
  • Alternative all information about a single
    individual packed together in a term
  • tuples, lists, trees, sets, multisets, graphs,

15
Mutagenesis
16
Complexity of learning problems
  • Simplest case single table with primary key
  • attribute-value or propositional learning
  • example corresponds to tuple of constants
  • Next single table without primary key
  • multi-instance problem
  • example corresponds to set of tuples of constants
  • Complexity resides in many-to-one foreign keys
  • non-determinate variables
  • lists, trees, sets, multisets, graphs,

17
Subgroup discovery
  • An interesting subgroup has a class distribution
    which differs significantly from overall
    distribution
  • This can be modelled as classification with
    profits (for true pos/neg) and costs (for false
    pos/neg)
  • Requires different heuristics and/or trade-off
    between accuracy and generality

18
Evaluation metrics
19
Outline of the talk
  • Introduction
  • First-order rule discovery
  • Subgroup discovery
  • Concluding remarks

20
The Tertius approach
  • Suppose the formula to be evaluated is an
    implication H?B
  • determine the truthvalues of H and B for each
    example, and organise the observed frequencies in
    a contingency table

B ?B H nHB nH?B nH ?H n?HB n?H?B n?H nB
n?B N
21
The Tertius approach
  • Suppose the formula to be evaluated is an
    implication H?B
  • obtain expected frequencies from the marginals
    under some null-hypothesis of independence
  • e.g. ??HB n?H nB / N

B ?B H nHB (?HB) nH?B (?H?B) nH ?H n?HB (??HB
) n?H?B (??H?B) n?H nB n?B N
22
The Tertius approach
  • Suppose the formula to be evaluated is an
    implication H?B
  • define confirmation in terms of the difference
    between the expected frequency of
    counter-instances ??HB and the observed frequency
    n?HB

B ?B H nHB (?HB) nH?B (?H?B) nH ?H n?HB (??HB
) n?H?B (??H?B) n?H nB n?B N
23
Motivation
  • Many existing measures only take part of the
    contingency table into account
  • Precision nHB / nB p(H B)
  • Recall or Sensitivity nHB / nH p(B H)
  • Specificity n?H?B / n?H p(?B ?H)
  • In knowledge discovery we need to consider the
    whole table without introducing false symmetries
  • compare rules with different heads and bodies

24
Novelty and satisfaction
  • Novelty is defined as the relative decrease in
    counter-instances from expected to observed
  • ??HB (??HB n?HB ) / N .25??HB.25
  • p(H B) p(H) p(B)
  • Satisfaction is defined as the ratio of expected
    but non-observed counter-instances
  • ??HB (??HB n?HB ) / ??HB 0??HB1 if ??HB0
  • p(H B) p(H) / 1 p(H)

25
Confirmation
  • Confirmation trades off novelty and satisfaction
  • ??HB??HB (??HB n?HB )2 / N ??HB is the
    contribution of ?HB to ?2 c2 / N
  • Theorem
  • conf(H?B) is lowest v?2 given n?HB, ??HB and N

26
Trains again
(.81 .00) eastbound(A)-hasCar(A,B),clength(B,shor
t),not croof(B,none). (.62 .20)
eastbound(A)-hasCar(A,B),cshape(B,rect),clength(B
,short). (.61 .05) eastbound(A)-hasCar(A,B),hasLo
ad(B,C),not cshape(B,u_sh),lshape(C,tria). (.55
.00) eastbound(A)-hasCar(A,B),clength(B,short),cr
oof(B,flat). (.53 .25) eastbound(A)-hasCar(A,B),c
length(B,short),not cshape(B,u_sh). (.51 .05)
eastbound(A)-hasCar(A,B),hasLoad(B,C),cshape(B,re
ct),lshape(C,tria). (.51 .20) eastbound(A)-hasCar
(A,B),hasLoad(B,C),not croof(B,none),not
lshape(C,rect). (.51 .20) eastbound(A)-hasCar(A,B
),hasLoad(B,C),not croof(B,none),not
lnumber(C,3).
27
Trains again
(.81 .00) eastbound(A)-hasCar(A,B),clength(B,shor
t),not croof(B,none). (.62 .20)
eastbound(A)-hasCar(A,B),cshape(B,rect),clength(B
,short). (.61 .05) eastbound(A)-hasCar(A,B),hasLo
ad(B,C),not cshape(B,u_sh),lshape(C,tria). (.55
.00) eastbound(A)-hasCar(A,B),clength(B,short),cr
oof(B,flat). (.53 .25) eastbound(A)-hasCar(A,B),c
length(B,short),not cshape(B,u_sh). (.51 .05)
eastbound(A)-hasCar(A,B),hasLoad(B,C),cshape(B,re
ct),lshape(C,tria). (.51 .20) eastbound(A)-hasCar
(A,B),hasLoad(B,C),not croof(B,none),not
lshape(C,rect). (.51 .20) eastbound(A)-hasCar(A,B
),hasLoad(B,C),not croof(B,none),not
lnumber(C,3).
car(A,B),short(B),not none(B) ?()
eastbound(A) 8 (4) 2 (6) 10 ?eastbound(A) 0
(4) 10 (6) 10 8 12 20
28
Trains again
(.81 .00) eastbound(A)-hasCar(A,B),clength(B,shor
t),not croof(B,none). (.62 .20)
eastbound(A)-hasCar(A,B),cshape(B,rect),clength(B
,short). (.61 .05) eastbound(A)-hasCar(A,B),hasLo
ad(B,C),not cshape(B,u_sh),lshape(C,tria). (.55
.00) eastbound(A)-hasCar(A,B),clength(B,short),cr
oof(B,flat). (.53 .25) eastbound(A)-hasCar(A,B),c
length(B,short),not cshape(B,u_sh). (.51 .05)
eastbound(A)-hasCar(A,B),hasLoad(B,C),cshape(B,re
ct),lshape(C,tria). (.51 .20) eastbound(A)-hasCar
(A,B),hasLoad(B,C),not croof(B,none),not
lshape(C,rect). (.51 .20) eastbound(A)-hasCar(A,B
),hasLoad(B,C),not croof(B,none),not
lnumber(C,3).
car(A,B),rect(B),short(B) ?()
eastbound(A) 10 (7) 0 (3) 10 ?eastbound(A) 4
(7) 6 (3) 10 14 6 20
29
Confirmation as search heuristic
  • Theorem if H?B is an admissible specialisation
    of H?B, then conf(H?B) (NnH?B) / (NnH?B)

B ?B H 2 (3) 4 (3) 6 ?H 3 (2) 1 (2) 4 5 5
10
B ?B H 3 (2.1) 4 (4.9) 7 ?H 0 (.9) 3 (2.1
) 3 3 7 10
30
Upgrading to first-order logic
  • Use function-free Prolog as representation
    language
  • normal-form logic, simple syntax
  • specialisation well understood
  • For rule evaluation, generate all grounding
    substitutions
  • specialisation may increase sample size
  • if problematic, use first-order features and
    count only over global variables

31
First-order features
  • Features concern interactions of local variables
  • The following rule has one boolean feature has
    a short closed car
  • eastbound(T)-hasCar(T,C), clength(C,short),
    not croof(C,none).
  • The following rule has two boolean features has
    a short car and has a closed car
  • eastbound(T)- hasCar(T,C1),clength(C1,short
    ), hasCar(T,C2),not croof(C2,none).

32
Propositionalising rules
  • Equivalently
  • eastbound(T)-hasShortCar(T),hasClosedCar(T).
  • hasShortCar(T)-hasCar(T,C1),clength(C1,short).
  • hasClosedCar(T)-hasCar(T,C2),not croof(C2,none).
  • Given a way to construct and select first-order
    features, rule construction is semi-propositional
  • head and body literals have the same global
    variable(s)
  • corresponds to single table, one row per example

33
First-order feature bias in Tertius
  • Flattened representation, but derived from
    strongly-typed term representation
  • one free global variable
  • each (binary) structural predicate introduces a
    new existential local variable and uses either
    global variable or local variable introduced by
    other structural predicate
  • utility predicates only use variables
  • all variables are used
  • NB. features can be non-boolean
  • if all structural predicates are one-to-one

34
The Tertius system
  • A, anytime top-down search algorithm
  • optimal refinement operator
  • 7500 lines of GNU C
  • propositional Weka plug-in available
  • P.A. Flach N. Lachiche (2001),
    Confirmation-guided discovery of first-order
    rules with Tertius, Machine Learning 42(1/2)
    6195
  • www.cs.bris.ac.uk/Research/MachineLearning/Tertius
    /

35
Outline of the talk
  • Introduction
  • First-order rule discovery
  • Subgroup discovery
  • Concluding remarks

36
Subgroups vs. classifiers
  • Classification rules aim at pure subgroups
  • Subgroups aim at significantly higher (or
    different) proportion of positives
  • essentially the same as cost-sensitive
    classification
  • instead of FNcost we have TPprofit

37
ROC space
  • True positive rate true pos. / pos.
  • TP1 40/50 80
  • TP2 30/50 60
  • False positive rate false pos. / neg.
  • FP1 10/50 20
  • FP2 0/50 0
  • ROC space has FP rate on X axis and TP rate on Y
    axis

38
The ROC convex hull
39
The ROC convex hull
40
Choosing a classifier
41
Choosing a classifier
42
Weighted Relative Accuracy
WRAcc(Class?Condition) p(Condition)p(Class
Condition) p(Class) TPrate FPrate
43
Subgroup discovery with CN2-SD
  • Weighted covering algorithm
  • covered example is not removed, but weight
    decreased
  • probability estimates take weights into account
  • Weighted relative accuracy as search heuristic
  • trades off rule precision and generality
  • results in fewer, more general, overlapping rules

44
Subgroup evaluation method 1
Each point corresponds to a subgroup description
45
Subgroup evaluation method 2
  • Use all discovered rules together as a
    probabilistic classifier
  • Order all test instances by decreasing predicted
    probability of being positive
  • Draw a ROC curve as follows
  • start in (0,0)
  • if the next instance is positive move up else
    right
  • until (1,1) is reached

46
Subgroup evaluation method 2
Each point on the curve corresponds to a
probability threshold
47
Example Australian (UCI)
48
Concluding remarks
  • Confirmation-guided rule discovery
  • new heuristic especially suited for knowledge
    discovery
  • optimal A search implemented in Tertius
  • ROC analysis very natural for subgroup discovery
  • here used for evaluation
  • also possible to use as search heuristic
  • Joint work with Nicolas Lachiche, Nada LavraË,
    John Lloyd, and others

49
Questions?
Write a Comment
User Comments (0)
About PowerShow.com