Machine Learning - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Machine Learning

Description:

Temperature: hot, mild, cool. Humidity: normal, high. Wind: weak, strong. Classification ... Cool. Mild. Hot. E=.985. E=.592. E=.811. E=.1.0 ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 27
Provided by: Kathleen268
Category:
Tags: cool | learning | machine

less

Transcript and Presenter's Notes

Title: Machine Learning


1
Machine Learning
  • Reading Chapter 18

2
Choosing the Best Attribute Information Gain
  • Information gain (from attribute test)
    difference between the original information
    requirement and new requirement
  • Gain(A)H(p/pn,n/pn)-Remainder(A)
  • Hentropy
  • Highest when the set is equally divided between
    positive (p) and negative (n) examples (.5,.5)
    (value of 1)
  • Lower as the set becomes more unbalanced (e.g.,
    (.9, .1) )

3
Information based on attributes
Pn10, so H(1/2,1/2) 1 bit
Remainder (A)
4
Text Classification
  • Is texti a finance new article?

Positive
Negative
5
Example
6
stock
rolling
?10
5-10
?10
5-10
9,10
1,5,6,8
2,3,4,7
1,8,9,10
2,3,4,5,6,7
Gain(stock)1-4/10H(1/4,3/4)6/10H(4/6,2/6)
1-.4((-.25(-2))-(.75-.42)).6
((-.67-.58)-(..33-1.6))
1-.33.55.12 Gain(rolling)1-4/10H(1/2,1/2)4/
10H(1/2,1/2)2/10H(1/2,1/2)0
7
Johns Question
  • Does the algorithm help us determine the values?
    How do we know to divide at 10?

8
Algorithm as specified so far is designed for
binary classification, attributes with discrete
values
  • Attributes
  • Outlook sunny, overcast, rain
  • Temperature hot, mild, cool
  • Humidity normal, high
  • Wind weak, strong
  • Classification
  • PlayTennis? Yes, No

9
(No Transcript)
10
Humidity E.940 (9/14 yes)
Wind E.94
Strong
Weak
High
Normal
E.985
E.811
E.592
E.1.0
Gain(humidity).940-(7/14).985-(7/14).592
Gain(wind).940-(8/14).811-(7/14)1.0
Temperature E.940
Outlook E.940
Overcast
Cool
Mild
Sunny
Hot
Rain
Gain(S,Outlook).246, Gain(S,Humidity).151,
Gain(S,Wind).048, Gain(S,Temperature).029 Outloo
k is selected because it has highest gain
11
(No Transcript)
12
(No Transcript)
13
Extending the algorithm for continuous valued
attributes
  • Dynamically define new discrete-valued attributes
    that partition the continuous attribute into a
    discrete set of intervals
  • For continuous A, create Ac that is true if Afalse otherwise
  • How to select the best value for threshold c?
  • Sort examples by continuous attribute
  • Identify adjacent examples that differ in target
    classification
  • Generate a set of candidate thresholds midway
    between corresponding values of A
  • Choose threshold c that maximizes information gain

14
Example temperature as continuous value
  • Two candidate thresholds
  • (4860)/2
  • (8090)/2
  • Information gain greater for Temperature54 than
    for Temperature85

15
Other cases
  • What if class is discrete valued, not binary?
  • What if an attribute has many values (e.g., 1 per
    instance)?

16
Training vs. Testing
  • A learning algorithm is good if it uses its
    learned hypothesis to make accurate predictions
    on unseen data
  • Collect a large set of examples (with
    classifications)
  • Divide into two disjoint sets the training set
    and the test set
  • Apply the learning algorithm to the training set,
    generating hypothesis h
  • Measure the percentage of examples in the test
    set that are correctly classified by h
  • Repeat for different sizes of training sets and
    different randomly selected training sets of each
    size.

17
(No Transcript)
18
Overfitting
  • Learning algorithms may use irrelevant attributes
    to make decisions
  • For news, day published and newspaper
  • When else can overfitting occur?
  • Solution 1 Decision tree pruning
  • Prune away attributes with low information gain
  • Use statistical significance to test whether gain
    is meaningful

19
K-fold Cross Validation
  • Solution 2 To reduce overfitting
  • Run k experiments
  • Use a different 1/k of data for testing each time
  • Average the results
  • 5-fold, 10-fold, leave-one-out

20
Cross-Validation
Labeled data (1566)
Split into 10 folds
9 folds (approx. 1409)
1 fold (approx. 157)
Train
Model
Evaluate
Lather, rinse, repeat (10 times)
Report average
21
Example
22
Division into 3 sets
  • Avoid inadvertent peeking
  • Parameters that must be learned (e.g., how to
    split values)
  • Generate different hypotheses for different
    parameter values on training data
  • Choose values that perform best on validation
    data
  • Test final algorithm on testing data
  • Why do we need to do this for selecting best
    attributes?

23
Ensemble Learning
  • Learn from a collection of hypotheses
  • Majority voting
  • Enlarges the hypothesis space

24
Boosting
  • Uses a weighted training set
  • Each example has an associated weight wj?0
  • Higher weighted examples have higher importance
  • Initially, wj1 for all examples
  • Next round increase weights of misclassified
    examples, decrease other weights
  • From the new weighted set, generate hypothesis h2
  • Continue until M hypotheses generated
  • Final ensemble hypothesis weighted-majority
    combination of all M hypotheses
  • Weight each hypothesis according to how well it
    did on training data

25
AdaBoost
  • If input learning algorithm is a weak learning
    algorithm
  • L always returns a hypothesis with weighted error
    on training slightly better than random
  • Returns hypothesis that classifies training data
    perfectly for large enough M
  • Boosts the accuracy of the original learning
    algorithm on training data

26
Tutorial on using C4.5
Write a Comment
User Comments (0)
About PowerShow.com