Title: CS236501 Introduction to AI
1 CS 236501Introduction to AI
2Learning
Unlabeled example
Learner
Classifier
Training Set
/-
Label
We aim to produce an accurate classifier
3Example Play Tennis
- We want to learn the concept
- A good day to play tennis
- Examples to be used for learning
Classification (label)
Attributes
Outlook Temperature Humidity Wind PlayTennis
Sunny Cold High Weak YES
Rain Hot High Strong NO
Sunny Hot High Strong NO
Examples
4Decision Trees
A Node represents an attribute
Possible attribute values
Outlook
Sunny
Overcast
Rain
Humidity
Wind
YES
High
Normal
Strong
Weak
YES
NO
YES
NO
Leaves contain classifications
(Outlook Sunny, Temperature High, Humidity
High, Wind Weak) ? PlayTennis NO
5Building a decision tree
- Building a decision tree, given a group of
labeled examples (training set) - Choose an attribute A
- Split the examples according to the values of A
- Build trees for sons recursively, stop splitting
when all examples at a node have the same labels
A
A a1
A a2
A a3
6ID3
- ID3 is an algorithm for building decision trees
- ID3 uses information gain to select the best
attribute for splitting
7Decision trees and ID3
ID3
Decision tree
Learner
Classifier
Training Set
8Information Gain
High Uncertainty
High Uncertainty
Low Uncertainty
Good Split
9Information Gain
- Where
- p, n number of positive/negative examples at a
node - I(p,n) uncertainty given p and n
- Va number of possible values for attribute A
- Ei number of examples at son i
- ID3 chooses an attribute with the highest gain
for splitting
10Attribute TypesAn Attribute with Discrete Values
- The domain of attribute A is discrete
- DomainA blue, green, yellow
- Splitting is simple create a son for each
possible value of A
11Attribute typesAn attribute with continuous
values
- The domain of attribute A is continuous
- DomainA 1 - 100
- How to split?
- Suggestion make the domain discrete
- DomainA 1 30, 30 40, 40 100
- Problems
- Which discretization is good?
- We will not be able to distinguish between
examples in the same range - Example if A represents grades, there will be no
difference between students with grades within
the range 40 -100
12An attribute with continuous values
- A solution dynamic split
- Sort examples according to the values of
attribute A - For each possible value xi ? Domain(A)
- Try to split into 2 sons xi and gt xi
- Measure the information gain of the split
- An example good temperature for playing tennis
is between 20 and 28 deg.
13Attribute TypesAn important note
- Let ATTRIB A1, A2, , An be the group of
attributes available for splitting at the current
node - Let Ai be the attribute chosen for split
- If the domain of Ai is discrete
- We will choose from ATTRIB \ Ai for splitting at
the sons - If the domain of Ai values is continuous
- We will choose from ATTRIB for splitting at the
sons
14The Accuracy of a Classifier
- We aim to produce an accurate classifier
- How can we measure an accuracy of a classifier
that was produced by our algorithm? - We can know the true accuracy of the classifier
by testing it on all possible examples - This is usually impossible
- We can get an estimation of the classifier
accuracy by testing it on a subset of all
possible examples
15Estimating an Accuracyof a Classifier
- Assume that we have a labeled set of examples T.
- We can split T
- Use k of T as a training set
- Use the rest (100 k) of T for testing
- The accuracy of the classifier on the test set
will provide us an estimate of the true accuracy - Note it is important that the training set and
the test set do not overlap, otherwise the
accuracy estimate can be too optimistic
16Cross Validation
- A common method for estimating the accuracy of a
classifier, by splitting the labeled data into
non-overlapping training and testing sets - N-fold cross validation
- Split the labeled data to N distinct groups
- Run N experiments, in each
- Use N 1 groups of examples for learning
(training set) - Use one group for testing
- Average the results of the N experiments this is
the accuracy estimate
17An Example5-Fold Cross Validation
Labeled Data
Group 1 Group 2 Group 3 Group 4 Group 5
Run Training Set Test Set Classifier accuracy
1 Groups 2, 3, 4, 5 Group 1 X1
2 Groups 1, 3, 4, 5 Group 2 X2
3 Groups 1, 2, 4, 5 Group 3 X3
4 Groups 1, 2, 3, 5 Group 4 X4
5 Groups 1, 2, 3, 4 Group 5 X5
Classifier accuracy estimate Classifier accuracy estimate Classifier accuracy estimate Average of X1 X5
18Learning Curves
- Show the accuracy of the produced classifier as a
function of the training set size - In simple words, show how classification
accuracies behave, when learning with more and
more examples
- Note that the accuracies should be measured on
the same test set, which does not overlap with
any of the training sets