132, v3'0

About This Presentation

Title:

132, v3'0

Description:

Learning to play Backgammon. The world's best computer backgammon players are based on machine ... Tessauro 1992 Backgammon application, 'world class' ... – PowerPoint PPT presentation

Number of Views:51

Avg rating:3.0/5.0

Slides: 33

Provided by: intranetE

Category:

more less

Transcript and Presenter's Notes

Title: 132, v3'0

1
Lecture 12Introduction to Intelligent Systems

Dr Martin Brown
Room E1k
Email martin.brown_at_manchester.ac.uk
Telephone 0161 306 4672
http//www.eee.manchester.ac.uk/intranet/pg/course
material/

2
Lecture 12 Outline

Introduction to machine learning and intelligent
systems
Intelligence and learning
Definition of machine learning
Application areas
Face detection (classification)
Energy prediction (prediction/system
identification)
The machine learning process
Steps necessary to solve a machine learning
application
Aspects of machine learning
Performance, generalisation and parameter
estimation
Simple regression, classification and clustering
examples

3
Lecture 12 Resources

This set of slides are largely self-contained,
however it is based on the following texts, which
provide useful background/supplementary
information
Chapters 1-3, Machine Learning (MIT Open
Courseware), T Jakkola, http//ocw.mit.edu/OcwWeb/
Electrical-Engineering-and-Computer-Science/6-867M
achine-LearningFall2002/CourseHome/index.htm
An introduction to Support Vector Machines and
other kernel-based learning methods, N
Cristianini, J Shawe-Taylor, Cambridge University
Press, 2000
Machine Learning, T Mitchell, McGraw Hill, 1997
Machine Learning, Neural and Statistical
Classification, D Michie, DJ Spiegelhalter and CC
Taylor, 1994 (out of print, but available from
http//www.amsta.leeds.ac.uk/charles/statlog/)

4
Examples of Machine Learning

The field of machine learning is concerned with
the question of how to construct computer
programs that automatically improve with
experience
Systems for credit card transactions approval.
The system is designed by showing it examples of
good and bad (fraudulent) transactions, and
letting it learn the difference
Learning to play Backgammon. The worlds best
computer backgammon players are based on machine
learning algorithms
Systems for spam filtering. Decisions about
whether or not a mail is regarded as junk can be
built and refined during actual usage
Note that recursive model estimation can also be
regarded as a type of machine learning
Broad definition, covering statistical regression
and system identification to genetic algorithms

5
Definition of Machine Learning

A computer program M is said to learn from
experience E with respect to some tasks T and
performance measure P, if Ms performance at
tasks in T, as measured by P, improves with
experience E.
The task T must be adequately described well in
terms of measurable signals (inputs and outputs)
and success criteria
The computer program M could be a physical model,
where the parameters require tuning, a
statistical distribution or a set of rules
The experience E may occur during design
(off-line) or during actual operation (on-line)
The actual, on-line performance P may be
difficult to estimate, if learning takes place
off-line
The experience set E needs to be rich enough so
that M is sufficiently exercised, should be
balanced to reflect the actual usage and should
contain enough examples to estimate M
sufficiently accurately.

6
Predict/Learn Cycle
Task T specify measurable inputs and outputs

Learning Dq f(y, y)
Experience E is a data set D X,y of
measurements and desired targets
Model M which is parameterized by the unknown
vector q
Performance P compares Ms predictions y against y

Prediction y m(x,q)

This is the most commonly applied view of machine
learning
Supervised learning
Its also worthwhile remembering the
statisticians view that
All models are wrong, just some are more useful
than others

7
Simple, Single Regression Example

Consider trying to build a simple, single
variable linear predictor of the relationship
between job (lot) size and work hours for Toluca
Data set is of the form
X ones(25,1) work(,1)
y work(,2)
Linear prediction model is
Quadratic Performance function is
Parameters can be learnt/estimated
This is the basis for todays laboratory

8
History of Machine Learning

Rosenblatt 1956 Perceptron
Minsky and Papert 1969 Critique of basic
Perceptron theory
Holland 1975 Genetic algorithms
Barto Sutton 1985 Proposed basic
reinforcement learning algorithms
Rumelhart 1986 Development of back-propagation
a gradient descent procedure for multi-layer
perceptrons
Mackay 1992 Bayesian MLPs (energy prediction
winner)
Tessauro 1992 Backgammon application, world
class
Heckerman 1994 Bayesian network (medical, win95
applications)
Vapnik 1995 Support vector machines
Neal 1996 Gaussian processes
Jordan 1997 Variational learning and Bayesian
nets

9
Rational for Machine Learning

Poor understanding of the optimal model that
maximises the performance P. The optimal model
may be the true physical model or a statistical
model with known parameters.
In practice, there is always some uncertainty
about effects not included in the model, or the
structure of the model itself hence the need to
learn/adapt during design.
Similarly, the model may change during operation
or training information about the optimal
behaviour may be weakly available hence the
need to learn during operation.

10
Relationship to System Identification

This is obviously closely related to system
identification, so its worthwhile considering
what the differences are
System identification is largely concerned with
linear, dynamical prediction, based on
real-valued, time-delayed variables
The models are described by their structures and
their unknown parameter vectors, which are tuned
to best fit the data
Intelligent systems is largely concerned with
non-linear classification and prediction
problems, where the variables may be real-valued
or categorical.
Again, the models are described by their
structure and their unknown parameter vectors,
which are tuned to best fit the data
In practice, there is a reasonable amount of
overlap between the two areas.

11
Application 1 Face Detection

Task Given an arbitrary image which could be a
digitized video signal or a scanned photograph,
classify image regions according to whether they
contain human faces.
Data Representative sample of images containing
examples of faces and non-faces. Input features
are derived from the raw pixel values, binary
class labels are labelled by experts
Rational for machine learning most pixel based
detection problems are hard, due to significant
pattern variations that are hard to parameterise
analytically
Reference Support Vector Machines Training and
Applications, Osuna, Freund and Girosi, MIT
AI-1602 Technical Report, 1997.

12
Faces Feature Extraction
Aim is given a 1919 pixel window (283)
features, determine whether the normalized image
window contains a face (1) or not (-1).

Rescale Image several times
(Scale invariance)
Cut 1919 windows pattern
(Location detection)
Preprocessing gt
light correction,
histogram equalization
(brightness invariance)
Classify using SVM

13
Faces Classification Problem

Determine whether the 1919 window contains a
face
Support vector machines only use the examples
closest to the decision boundary, the rest are
ignored
Training set contained 2,000 examples

x2
x1
14
Faces Results and Comments

Use a support vector machine classifier using
polynomial kernels of degree 2.
Number of parameters in the polynomial 40,000!
Test on two data sets
A 313 images, 313 faces (one per image mug
shots), 4.5M windows
B 23 images, 155 faces, 5.5M windows

15
Faces Example Classifications
16
Application 2 Building Energy Prediction

Task Create a (time series) prediction model(s)
which can predict the energy load from a series
of environmental factors
Data Four months of actual energy usage was
recorded, and used to estimate a predictive model
Rational Physical model of energy usage is too
complex to estimate, often the best guide is
prior experience, or matching to similar
buildings already in existence
Reference Bayesian non-linear modelling for the
prediction competition, Mackay, Technical Report,
Cambridge University, 1994

17
Energy Task Details and Data

The data set consisted of hourly measurements
from 1/9/89 to 31/12/89 of four environmental
(input) variables
Temperature
Humidity
Solar flux
Wind
as well as three dependent targets
A1 Electricity
A2 Cooling water
A3 Heating water
A total of 2926 training data, and the aim was to
predict hourly usage of the three targets over
the next 54 days (1282 test data points)

18
Energy Data Preparation
Electricity
Hours

Data preparation involved determining the time
delay associated with the environmental values,
as well as deciding how best to represent them
(raw values, rotated projections, exponential
averaging)
Outlier rejection some data actually included a
burst pipe, the standard model should not be
trained on these

19
Energy Modelling Approach

The inputs included the four environmental
factors at time
Current and 2.5, 24, 72 hour delays
Categorical inputs denoting
day, week, holiday, year

The modelling algorithm was a Bayesian
multi-layer perceptron Many models were trained
and averaged to get robust performance with some
indication of expected error
20
Energy Prediction and Uncertainty
Cooling water
Hours

As the designer is not certain about model
structure, type of features, there will be some
discrepancy between different predictions from
different models
These can be averaged to get a mean prediction
and the standard deviation can be used to give
the prediction error bars

21
Energy Results and Conclusions

The winning entry in this competition was
created using the following data modelling
philosophy use huge flexible models, including
all possibilities that you can imagine might be
appropriate control the flexibility of these
models using sophisticated priors and use Bayes
as a helmsman to guide the search through model
space.
A physical model would have been very difficult
to produce, whereas an empirical regression model
Independent test data
A1 s(y-y) 65
A2 s(y-y) 0.64
A3 s(y-y) 0.53

22
Data Modelling Methodology
1 - Define task specify goal ROI to guide
accuracy
2 Data model Define relevant data sources and
attributes
3 Transform Consolidate, clean and transform
data
4 Exploratory data analysis Analyse ranges,
distributions, simple relationships
(correlations)
5 Model building Association classification
clustering and prediction
6 Validation Assess model accuracy, find
important relationships, try alternative models
23
2. Data Models

Decide on which informative, measurable variables
will be used in this project and select their
source(s)
Selecting which attributes/variables to use
Deriving a table that contains those
attributes/variables
Define structure of x -gt y mapping

24
4. Exploratory Data Analysis

Understand the distributions of each variable,
and simple relationships (correlations) between
variables during off-line data exploration
Visualise the distributions associated with each
variable in the system
Continuous variable distribution
Categorical variable histogram
Very important for things like classification,
when one of the class labels is rare (e.g.
fraudulent transaction)

25
5. Machine Learning Techniques

Summarise a data set, using a model, in order to
extract useful relationships
Associations
Identify items that occur together within a
transaction
Supermarket goods, patent authors, viewed web
pages
Classification
Predict whether an image window contains a face
Awarding credit, user complete a web transaction?
Clustering
Identify a small number of groups that contain
similar records
Customer segmentation, web page grouping
Regression
Predict a real-value
Time series analysis, lifetime value prediction

26
6. Model Analysis Validation

When building a model to describe relationships
within the data, you need to perform model
analysis and validation in order to estimate its
performance when applied to unseen data
Models are not 100 accurate!
This is due to limited
Features (cannot measure everything)
Data examples (cannot sample every case)
The ROI of the data mining project will depend on
the accuracy of the classification/prediction
models, therefore, this needs to be carefully
estimated
Error analysis 95 accurate
Confusion matrices
Lift charts

When building a model to describe relationships
within the data, you need to perform model
analysis and validation in order to estimate its
performance when applied to unseen data
Models are not 100 accurate!
This is due to limited
Features (cannot measure everything)
Data examples (cannot sample every case)
The ROI of the data mining project will depend on
the accuracy of the classification/prediction
models, therefore, this needs to be carefully
estimated
Error analysis 95 accurate
Confusion matrices
Lift charts

27
Aspects of Machine Learning Theory

Aspects of machine learning theory
Optimal estimator properties
Parameter estimation and prediction uncertainty
Generalisation bias/variance (overfitting)
Model hypothesis space
In machine learning, were always aiming for the
best second best solution
The model needs to be guided by, but not overfit,
the exemplar training data
The model needs to represent its prediction
uncertainty
Due to the effects of having a finite training set

28
Modelling as Data Compression

D X,y
y xTq
25 degrees of freedom (data points)
2 degrees of freedom (parameters)

Typically, the model represents the underlying
signal, in the data that has been corrupted by
noise
Assume data is generated by
yi f(xi) ni
where E(n) 0, E(n2) s2. Aim produce an
optimal model
yi E(yxi) m(xi) where m() f()

29
Parameter Estimation

A/the key aspect of machine learning is parameter
estimation
Once the model structure and data set is fixed,
learning can be as simple as finding the minimum
point of the performance function
Machine learning approaches often use iterative
schemes

At a turning point
Dq
f
q
30
Bias/Variance Dilemma (OverModelling )

There is no such thing as a free lunch
In machine learning, whenever we select a certain
class of models we want one that closely
approximates f (low bias) while being insensitive
to particular choices of D (low variance)
Typically, this is only achieved with lots of
data, but sensible tricks can be used to achieve
a free pudding

(Squared) Bias of optimal predictor in class
Parameter estimation variance due to choice of D
31
Lecture 12 Summary

Many system design tasks require a machine
learning approach, ranging from parameter
estimation to model selection
In machine learning applications, key aspects
include data preparation, transformation and
model validation
This course is focussing on the actual
modelling/machine learning, key aspects of which
include
Modelling as data compression
Optimal parameter estimation (optimization)
Prediction/parameter uncertainty (statistics)
Model specification and selection

32
Lecture 12 Laboratory

Load a data set work.dat into Matlab. This has
two columns the Lot Size and the Work Hours
and the aim is to build a linear, least squares
regression model for the Toluca company to
estimate the time taken to produce lots of a
particular size (taken from Applied Linear
Statistical Models, Neter et al).
With regard to the definition machine learning
Clearly state the task, environment, model and
performance
Is this off-line or on-line learning
For a linear model, create two Matlab functions
that allow you to
Train/estimate the parameters from the training
data
Predict/estimate new data
Test the routines on the given data set and
comment on
What are the optimal parameters (bias and gain)
for the linear model
Plot the linear input output relationship as well
as the original training data
What is the standard deviation of the prediction
error and how does this relate to the range
(standard deviation) of the target data. How
does the errors standard deviation relate to the
performance function on slide 7?
How does this model compare to simply predicting
the average Work Hours, irrespective of
information about the Lot Size i.e. a model
that just contains a bias term

Write a Comment

User Comments (0)

About PowerShow.com

132, v3'0 - PowerPoint PPT Presentation

132, v3'0

Learning to play Backgammon. The world's best computer backgammon players are based on machine ... Tessauro 1992 Backgammon application, 'world class' ... – PowerPoint PPT presentation