Decision Trees - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Decision Trees

Description:

E.g., days decribed by attributes (or features): Sky, Temp, Humidity, Wind, ... statistical significance tests (e.g., chi-square) Non-determinism ... Machine ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 44
Provided by: csNorth
Category:
Tags: decision | trees

less

Transcript and Presenter's Notes

Title: Decision Trees


1
Decision Trees
2
(No Transcript)
3
General Learning Task
  • DEFINE
  • Set X of Instances (of n-tuples x ltx1, ...,
    xngt)
  • E.g., days decribed by attributes (or features)
  • Sky, Temp, Humidity, Wind, Water, Forecast
  • Target function y, e.g.
  • EnjoySport X ? Y 0,1 (example of concept
    learning)
  • WhichSport X ? Y Tennis, Soccer, Volleyball
  • InchesOfRain X ? Y 0, 10
  • GIVEN
  • Training examples D
  • positive and negative examples of the target
    function ltx , y(x)gt
  • FIND
  • A hypothesis h such that h(x) approximates y(x).

4
Hypothesis Spaces
  • Hypothesis space H is a subset of all y X ? Y
    e.g.
  • MC2, conjunction of literals lt Sunny ? ?
    Strong ? Same gt
  • Decision trees, any function
  • 2-level decision trees (any function of two
    attributes, some of three)
  • Candidate-Elimination Algorithm
  • Search H for a hypothesis that matches the
    training data
  • Exploits general-to-specific ordering of
    hypotheses
  • Decision Trees
  • Incrementally grow tree by splitting training
    examples on attribute values
  • Can be thought of as looping for i 1,...,n
  • Search Hi i-level trees for hypothesis h that
    matches data

5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
Decision Trees represent disjunctions of
conjunctions
(Sunny Normal) v Overcast v (Rain Weak)
18
Decision Trees vs. MC2
MC2 cant represent (Sunny v Cloudy) MC2
hypotheses must constrain to a single attribute
value if at all Vs. Decision Trees
Yes Yes No
19
(No Transcript)
20
Learning Parity with D-Trees
  • How to solve 2-bit parity
  • Two step look-ahead
  • Split on pairs of attributes at once
  • For k attributes, why not just do k-step look
    ahead? Or split on k attribute values?
  • gtParity functions are the victims of the
    decision trees inductive bias.

21
(No Transcript)
22
I(Y xi)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
Overfitting is due to noise
  • Sources of noise
  • Erroneous training data
  • concept variable incorrect (annotator error)
  • Attributes mis-measured
  • Much more significant
  • Irrelevant attributes
  • Target function not deterministic in attributes

27
Irrelevant attributes
  • If many attributes are noisy, information gains
    can be spurious, e.g.
  • 20 noisy attributes
  • 10 training examples
  • gtExpected of depth-3 trees that split the
    training data perfectly using only noisy
    attributes 13.4
  • Potential solution statistical significance
    tests (e.g., chi-square)

28
Non-determinism
  • In general
  • We cant measure all the variables we need to do
    perfect prediction.
  • gt Target function is not uniquely determined by
    attribute values

29
Non-determinism Example
Decent hypothesis Humidity gt 0.70 ? No
Otherwise ? Yes Overfit hypothesis Humidity gt
0.89 ? No Humidity gt 0.80 Humidity lt 0.89 ?
Yes Humidity gt 0.70 Humidity lt 0.80 ?
No Humidity lt 0.70 ? Yes
30
Rule 2 of Machine Learning
  • The best hypothesis almost never achieves 100
    accuracy on the training data.
  • (Rule 1 was you cant learn anything without
    inductive bias)

31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
Hypothesis Space comparisons
Task concept learning with k binary attributes
40
Decision Trees Strengths
  • Very Popular Technique
  • Fast
  • Useful when
  • Instances are attribute-value pairs
  • Target Function is discrete
  • Concepts are likely to be disjunctions
  • Attributes may be noisy

41
Decision Trees Weaknesses
  • Less useful for continuous outputs
  • Can have difficulty with continuous input
    features as well
  • E.g., what if your target concept is a circle in
    the x1, x2 plane?
  • Hard to represent with decision trees
  • Very simple with instance-based methods well
    discuss later

42
(No Transcript)
43
decision tree learning algorithm along the lines
of ID3
Write a Comment
User Comments (0)
About PowerShow.com