Learning Decision Trees - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Learning Decision Trees

Description:

The decision tree should reduce entropy as test conditions are ... Choosing the decision nodes How to determine information gain Measuring Purity (Entropy) ... – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 12
Provided by: MichaelW173
Category:

less

Transcript and Presenter's Notes

Title: Learning Decision Trees


1
Learning Decision Trees
  • Brief tutorial by M Werner

2
Medical Diagnosis Example
  • Goal Diagnose a disease from a blood test
  • Clinical Use
  • Blood sample is obtained from a patient
  • Blood is tested to measure current expression of
    various proteins, say by using a DNA microarray
  • Data is analyzed to produce a Yes or No answer

3
Data Analysis
  • Use a decision tree such as

P1 gt K1
Y
N
P2 gt K2
P2 gt K2
N
Y
Y
P3 gt K3
P4 gt K4
P4 gt K4
No
Y
N
Y
N
Y
N
Yes
No
Yes
No
Yes
No
4
How to Build the Decision Tree
  • Start with samples of blood from patients known
    to either have the disease or not (training set).
  • Suppose there are 20 patients and 10 are known to
    have the disease and 10 not
  • From the training set get expression levels for
    all proteins of interest
  • i.e. if there are 20 patients and 50 proteins we
    get a 50 X 20 array of real numbers
  • Rows are proteins
  • Columns are patients

5
Choosing the decision nodes
10 have disease 10 dont
  • We would like the tree to be as short as possible
  • Start with all 20 patients in one group
  • Choose a protein and a level that gains the most
    information

10/10
Possible splitting condition
Mostly diseased
Px gt Kx
9/3
1/7
Mostly not diseased
10/10
Alternative splitting condition
Py gt Ky
7/7
3/3
6
How to determine information gain
  • Purity A measure to which the patients in a
    group share the same outcome.
  • A group that splits 1/7 is fairly pure Most
    patients dont have the disease
  • 0/8 is even purer
  • 4/4 is the opposite of pure. This group is said
    to have high entropy. Knowing that a patient is
    in this group does not make her more or less
    likely to have the disease.
  • The decision tree should reduce entropy as test
    conditions are evaluated

7
Measuring Purity (Entropy)
  • Let f(i,j)Prob(Outcomej in node i)
  • i.e. If node 2 has a 9/3 split
  • f(2,0) 9/12 .75
  • f(2,1) 3/12 .25
  • Gini impurity
  • Entropy

8
Computing Entropy
9
Goal is to use a test which best reduces total
entropy in the subgroups
10
Building the Tree
11
Links
  • http//www.ece.msstate.edu/research/isip/publicati
    ons/courses/ece_8463/lectures/current/lecture_27/l
    ecture_27.pdf
  • Decision Trees Data Mining
  • Andrew Moore Tutorial
Write a Comment
User Comments (0)
About PowerShow.com