Decision Trees - PowerPoint PPT Presentation

1 / 43

About This Presentation

Title:

Decision Trees

Description:

E.g., days decribed by attributes (or features): Sky, Temp, Humidity, Wind, ... statistical significance tests (e.g., chi-square) Non-determinism ... Machine ... – PowerPoint PPT presentation

Number of Views:17

Avg rating:3.0/5.0

Slides: 44

Provided by: csNorth

Category:

more less

Transcript and Presenter's Notes

Title: Decision Trees

1
Decision Trees
2
(No Transcript)
3
General Learning Task

DEFINE
Set X of Instances (of n-tuples x ltx1, ...,
xngt)
E.g., days decribed by attributes (or features)
Sky, Temp, Humidity, Wind, Water, Forecast
Target function y, e.g.
EnjoySport X ? Y 0,1 (example of concept
learning)
WhichSport X ? Y Tennis, Soccer, Volleyball
InchesOfRain X ? Y 0, 10
GIVEN
Training examples D
positive and negative examples of the target
function ltx , y(x)gt
FIND
A hypothesis h such that h(x) approximates y(x).

4
Hypothesis Spaces

Hypothesis space H is a subset of all y X ? Y
e.g.
MC2, conjunction of literals lt Sunny ? ?
Strong ? Same gt
Decision trees, any function
2-level decision trees (any function of two
attributes, some of three)
Candidate-Elimination Algorithm
Search H for a hypothesis that matches the
training data
Exploits general-to-specific ordering of
hypotheses
Decision Trees
Incrementally grow tree by splitting training
examples on attribute values
Can be thought of as looping for i 1,...,n
Search Hi i-level trees for hypothesis h that
matches data

5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
Decision Trees represent disjunctions of
conjunctions
(Sunny Normal) v Overcast v (Rain Weak)
18
Decision Trees vs. MC2
MC2 cant represent (Sunny v Cloudy) MC2
hypotheses must constrain to a single attribute
value if at all Vs. Decision Trees
Yes Yes No
19
(No Transcript)
20
Learning Parity with D-Trees

How to solve 2-bit parity
Two step look-ahead
Split on pairs of attributes at once
For k attributes, why not just do k-step look
ahead? Or split on k attribute values?
gtParity functions are the victims of the
decision trees inductive bias.

21
(No Transcript)
22
I(Y xi)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
Overfitting is due to noise

Sources of noise
Erroneous training data
concept variable incorrect (annotator error)
Attributes mis-measured
Much more significant
Irrelevant attributes
Target function not deterministic in attributes

27
Irrelevant attributes

If many attributes are noisy, information gains
can be spurious, e.g.
20 noisy attributes
10 training examples
gtExpected of depth-3 trees that split the
training data perfectly using only noisy
attributes 13.4
Potential solution statistical significance
tests (e.g., chi-square)

28
Non-determinism

In general
We cant measure all the variables we need to do
perfect prediction.
gt Target function is not uniquely determined by
attribute values

29
Non-determinism Example
Decent hypothesis Humidity gt 0.70 ? No
Otherwise ? Yes Overfit hypothesis Humidity gt
0.89 ? No Humidity gt 0.80 Humidity lt 0.89 ?
Yes Humidity gt 0.70 Humidity lt 0.80 ?
No Humidity lt 0.70 ? Yes
30
Rule 2 of Machine Learning

The best hypothesis almost never achieves 100
accuracy on the training data.
(Rule 1 was you cant learn anything without
inductive bias)

31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
Hypothesis Space comparisons
Task concept learning with k binary attributes
40
Decision Trees Strengths