Data mining - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

Data mining

Description:

Every internal node corresponds to a predictor field ... Identify the set M of all (maximal) sets V sith support at least s as follows: ... – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 28

Provided by: JD146

Category:

more less

Transcript and Presenter's Notes

Title: Data mining

1
Data mining

Decision Trees Rule induction

2
Decision Trees

Are for classification

3
Classification

Input data record
Output Class to which this record belongs
Example
(young,bright,polite,hard working) -gt Student KE

4
Decision Tree

Every internal node corresponds to a predictor
field
Every arc corresponds to a value of that field
Every leaf corresponds to a class, a value of the
prediction field

5
Idea behind ID3

At each level, in each branch, choose the
prediction field which is not on a path to the
root, and is most informative about the
prediction.
Information is measured in entropy.

Idea behind C4.5

Be intelligen about missing values, continuous
values, pruning, rule induction...

6
What is entropy?

Say we choose a set of M(m1,m2,mn) messages to
exchange information.
Then there are n different messages, and we need
at least log n bits to distinguish between them,
and hence to exchange information.

7
What is entropy? (cont.)

The messages are being exchanged with certain
relative frequeuncies.
For i1n, these relative frequencies
(p1,p2,,pn) can be interpreted as the
probability pi, that an exchanged message is
message mi.

8
What is entropie? (cont.)

The entropy I(P) of a probability distribution
function P, measures the information being
exchanged as follows
I(P) - (p1 log p1
p2 log p2
..
pn log pn )
Where (of course) 0 log 0 0..

9
Example for each i, pi 1/n

P1/n,1/n,1/n.
I(P) -(1/n log 1/n 1/n log 1/n... )
- log 1/n
log n.

10
More examples

P1 --gt entropie(P)0.
P0.5,0.5 --gt entropie(P)1.
P0.67,0.33 --gt entropie(P).92.
P1,0 --gt entropie(P)0.
Voor a given value of n the entropy increases if
the differences between the probabilities
decrease.

11
ID3

Construct the decision tree such that the
decrease in entropy is maximized.
(it is a greedy approach)
The intent is that each node increases the
certainty about the class of the record as much
as possible.

12
Entropy in Classification

Denote by T the number of records, and assume
there are k classes, C1,,Ck with corresponding
prediction values c1,,ck.
Thus, k is in fact the number of different values
for prediction C. Now, define the probability
distribution function
PC1/T,C2/T,,Cn/T.

13
Entropy in Classification(cont.)

Now consider predictor D with values d1,..,dl,
and define
pj(D) C1 EN Dj/Dj,
C2 EN Dj/Dj,
... ,
Cn EN Dj/Dj
The relative frequencies, under the condition
that D has value dj
Subsequently compute the entropy I(pj(D)).

14
Entropy in Classification

Next define
Info(D,T) D1/T I(p1(D))
D2/T I(p2(D))
.
.
Dl/T I(pl(D)) .
and
Gain(D,T)I(T)-Info(D,T).

15
The ID3 algorithm

ID3(T,R,C)
Input
training set T,
Set of predictor fields R
prediction field C.

16
ID3 (cont).

While T is not empty
If R is empty, return 1 node with as prediction
value, the value which maximes Ci.
IfR is not empty, let D in R be the predictor
which mazimizes Gain (D,T).
Return a tree with node D, and a branch for
every j1..l, which leads to the tree/root ID3(
Dj, R\D,C)

17
Rule Induction

The sequence of branches from the root, can be
viewed as a classisification rule.
(If the root-field has value X, and the next
field has value Y, and.., then, the record is in
class Ci.
Induction fact based reasoning

18
Assocation rules

Example
(Saturday, beer, chips) --gt (dipers)
BIS applications of assocation rules
- Identifying prospects
- Identifying customers which will cause trouble

19
definitions

Antecedent Set A of records, with certain values
for a set of predictor fields
Consequence Set C of Records, with certain
values for a set of prediction fields
(Saturday, beer, chips) --gt (dipers)

20
definitions

Support percentage of the records which belong
to A and C.
Lift percentage of records of C which also
belongs to A
p(A and C)/p(C)
Confidence percentage of records of A which also
belongs to C
p(A and C)/p(A)

21
Rule Induction

Generate all rules with support at least s, and
confidence at least c.

22
Brute force algoritme

Generate all rules and check whether they satisfy
the support and confidence requirements.
Complexity?

23
Intelligent algorithm

Identify the set M of all (maximal) sets V sith
support at least s as follows
-First check all sets with cardinality 1.
-Then check alls sets V with cardinality 2.
1,2 can only qualifies if 1 and 2 qualify.
(this condition is necessary but not sufficient).
-et cetera. 1,2,,n can only qualify if
1,2,n-1,,1,3,,n,2,3,,n qualify.

24
Intelligent algorithm

For every set l in M, check whether there is a
subset Q in l such that
l --gt Q\l has sufficent confidence.
The confidence of this rule equals
support(Q)/support(l)
Notice that if l --gt Q\l doesnt have sufficient
confidence, the same holds for supersets of l.

25
Intelligent algorithm