Title: Figure 1'1 Rules for the contact lens data'
1Figure 1.1 Rules for the contact lens data.
2Figure 1.2 Decision tree for the contact lens
data.
3Figure 1.3 Decision trees for the labor
negotiations data.
(a)
(b)
4Figure 2.1 A family tree and two ways of
expressing the sister-of relation.
5Figure 2.2 ARFF file for the weather data.
6Figure 3.1 Decision tree for a simple
disjunction.
7Figure 3.2 The exclusive-or problem.
If x1 and y0 then class a If x0 and y1
then class a If x0 and y0 then class
b If x1 and y1 then class b
8Figure 3.3 Decision tree with a replicated
subtree.
If x1 and y1 then class a If z1 and w1
then class a Otherwise class b
9Figure 3.4 Rules for the Iris data.
Default Iris-setosa 1 except if
petal-length ? 2.45 and petal-length lt 5.355
2 and petal-width lt 1.75 3
then Iris-versicolor 4 except
if petal-length ? 4.95 and petal-width lt 1.55
5 then Iris-virginica 6
else if sepal-length lt 4.95 and
sepal-width ? 2.45 7
then Iris-virginica 8 else if
petal-length ? 3.35 9 then
Iris-virginica 10 except if
petal-length lt 4.85 and sepal-length lt 5.95 11
then Iris-versicolor 12
10Figure 3.5 The shapes problem.
Shaded standing Unshaded lying
11Figure 3.6(a) Models for the CPU performance
data linear regression.
PRP - 56.1 0.049 MYCT 0.015
MMIN 0.006 MMAX 0.630 CACH -
0.270 CHMIN 1.46 CHMAX
12Figure 3.6(b) Models for the CPU performance
data regression tree.
13Figure 3.6(c) Models for the CPU performance
data model tree.
14Figure 3.7 Different ways of partitioning the
instance space.
(a)
(b)
(c)
(d)
15Figure 3.8 Different ways of representing
clusters.
(a)
(b)
1 2 3 a 0.4 0.1
0.5 b 0.1 0.8 0.1 c
0.3 0.3 0.4 d 0.1 0.1
0.8 e 0.4 0.2 0.4 f 0.1 0.4
0.5 g 0.7 0.2 0.1 h
0.5 0.4 0.1
(c)
(d)
16Figure 4.1 Pseudo-code for 1R.
17Figure 4.2 Tree stumps for the weather data.
(a)
(b)
(c)
(d)
18Figure 4.3 Expanded tree stumps for the weather
data.
(a)
(b)
(c)
19Figure 4.4 Decision tree for the weather data.
20Figure 4.5 Tree stump for the ID code attribute.
21Figure 4.6 (a) Operation of a covering
algorithm (b) decision tree for the same problem.
(a)
(b)
22Figure 4.7 The instance space during operation
of a covering algorithm.
23Figure 4.8 Pseudo-code for a basic rule learner.
24Figure 5.1 A hypothetical lift chart.
25Figure 5.2 A sample ROC curve.
26Figure 5.3 ROC curves for two learning schemes.
27Figure 6.1 Example of subtree raising, where
node C is raised to subsume node B.
(a)
(b)
28Figure 6.2 Pruning the labor negotiations
decision tree.
29Figure 6.3 Generating rules using a probability
measure.
30Figure 6.4 Definitions for deriving the
probability measure.
p number of instances of that class that the
rule selects t total number of instances
that the rule selects p total number of
instances of that class in the dataset t
total number of instances in the dataset.
31Figure 6.5 Algorithm for forming rules by
incremental reduced error pruning.
32Figure 6.6 Algorithm for expanding examples into
a partial tree.
33Figure 6.7 Example of building a partial tree.
(a)
(c)
(b)
34Figure 6.7 (continued) Example of building a
partial tree.
(d)
(e)
35Figure 6.8 Rules with exceptions, for the Iris
data.
Exceptions are represented as Dotted paths,
alternatives as solid ones.
36Figure 6.9 A maximum margin hyperplane.
37Figure 6.10 A boundary between two rectangular
classes.
38Figure 6.11 Pseudo-code for model tree induction.
39Figure 6.12 Model tree for a dataset with
nominal attributes.
40Figure 6.13 Clustering the weather data.
(a)
(b)
(c)
41Figure 6.13 (continued) Clustering the weather
data.
(d)
(e)
42Figure 6.13 (continued) Clustering the weather
data.
(f)
43Figure 6.14 Hierarchical clusterings of the Iris
data.
(a)
44Figure 6.14 (continued) Hierarchical clusterings
of the Iris data.
(b)
45Figure 6.15 A two-calss mixture model.
data
A 51A 43B 62B 64A
45A 42A 46A 45A 45
B 62A 47A 52B 64A
51B 65A 48A 49 A 46
B 64A 51A 52B 62A
49A 48B 62A 43A 40
A 48B 64A 51B 63A
43B 65B 66 B 65A 46
A 39B 62B 64A 52B
63B 64A 48B 64A 48
A 51A 48B 64A 42A
48A 41
model
?A50, ?A 5, pA0.6 ?B65, ?B 2, pB0.4
46Figure 7.1 Attribute space for the weather
dataset.
47Figure 7.2 Discretizing temperature using the
entropy method.
48Figure 7.3 The result of discretizing
temperature.
64 65 68 69
70 71 72 75
80 81 83 85 no
yes yes no yes yes
yes no
no yes yes
no yes yes F E D
C B A 66.5 70.5
73.5 77.5 80.5 84
49Figure 7.4 Class distribution for a two-class,
two-attribute problem.
50Figure 7.5 Number of international phone calls
from Belgium, 19501973.
51Figure 7.6 Algorithm for bagging.
model generation Let n be the number of instances
in the training data. For each of t iterations
Sample n instances with replacement from training
data. Apply the learning algorithm to the
sample. Store the resulting model. classificati
on For each of the t models Predict class of
instance using model. Return class that has been
predicted most often.
52Figure 7.7 Algorithm for boosting.
model generation Assign equal weight to each
training instance. For each of t iterations
Apply learning algorithm to weighted dataset and
store resulting model. Compute error e of
model on weighted dataset and store error. If e
equal to zero, or e greater or equal to 0.5
Terminate model generation. For each instance
in dataset If instance classified correctly
by model Multiply weight of instance by e
/ (1 - e). Normalize weight of all
instances. classification Assign weight of zero
to all classes. For each of the t (or less)
models Add -log(e / (1 - e)) to weight of
class predicted by model. Return class with
highest weight.
53Figure 8.1 Weather data (a) in spreadsheet (b)
comma-separated.
(a)
(b)
54Figure 8.1 Weather data (c) in ARFF format.
(c)
55Figure 8.2 Output from the J4.8 decision tree
learner.
56Figure 8.3 Using Javadoc (a) the front page
(b) the weka.core package.
(a)
(b)
57Figure 8.4 A class of the weka.classifiers
package.
58Figure 8.5 Output from the M5 program for
numeric prediction.
59Figure 8.6 Output from J4.8 with cost-sensitive
classification.
60Figure 8.7 Effect of AttributeFilter on the
weather dataset.
61Figure 8.8 Output from the APRIORI association
rule learner.
62Figure 8.9 Output from the EM clustering scheme.
63Figure 8.10 Source code for the message
classifier.
64Figure 8.10 (continued)
65Figure 8.10 (continued)
66Figure 8.10 (continued)
67Figure 8.11 Source code for the ID3 decision tree
learner.
68Figure 8.11 (continued)
69Figure 8.11 (continued)
70Figure 8.11
71Figure 8.12 Source code for a filter that
replaces the missing values in a dataset.
72Figure 8.12
73Figure 9.1 Representation of Iris data (a) one
dimension.
74Figure 9.1 Representation of Iris data (b) two
dimensions.
75Figure 9.2 Visualization of classification tree
for grasses data.