Instance Based Learning - PowerPoint PPT Presentation

About This Presentation

Title:

Instance Based Learning

Description:

We need a measure of distance in order to know who are the neighbours ... We might want to weight nearer neighbors more heavily ... – PowerPoint PPT presentation

Number of Views:107

Avg rating:3.0/5.0

Slides: 30

Provided by: atak4

Category:

more less

Transcript and Presenter's Notes

Title: Instance Based Learning

1
Instance Based Learning

Ata Kaban
The University of Birmingham

Today we learn
K-Nearest Neighbours
Case-based reasoning
Lazy and eager learning

3
Instance-based learning

One way of solving tasks of approximating
discrete or real valued target functions
Have training examples (xn, f(xn)), n1..N.
Key idea
just store the training examples
when a test example is given then find the
closest matches

1-Nearest neighbour
Given a query instance xq,
first locate the nearest training example xn
then f(xq) f(xn)
K-Nearest neighbour
Given a query instance xq,
first locate the k nearest training examples
if discrete values target function then take
vote among its k nearest nbrs else if real
valued target fct then take the mean of the f
values of the k nearest nbrs

5
The distance between examples

We need a measure of distance in order to know
who are the neighbours
Assume that we have T attributes for the learning
problem. Then one example point x has elements xt
? ?, t1,T.
The distance between two points xi xj is often
defined as the Euclidean distance

6
Voronoi Diagram
7
Characteristics of Inst-b-Learning

An instance-based learner is a lazy-learner and
does all the work when the test example is
presented. This is opposed to so-called
eager-learners, which build a parameterised
compact model of the target.
It produces local approximation to the target
function (different with each test instance)

8
When to consider Nearest Neighbour algorithms?

Instances map to points in
Not more then say 20 attributes per instance
Lots of training data
Advantages
Training is very fast
Can learn complex target functions
Dont lose information
Disadvantages
? (will see them shortly)

9
(No Transcript)
10
Training data
Test instance
11
Keep data in normalised form
One way to normalise the data ar(x) to ar(x) is
12
Normalised training data
Test instance
13
Distances of test instance from training data
Classification 1-NN Yes 3-NN Yes 5-NN No 7-NN
No
14
What if the target function is real valued?

The k-nearest neighbour algorithm would just
calculate the mean of the k nearest neighbours

15
Variant of kNN Distance-Weighted kNN

We might want to weight nearer neighbors more
heavily
Then it makes sense to use all training examples
instead of just k (Stepards method)

16
Difficulties with k-nearest neighbour algorithms

Have to calculate the distance of the test case
from all training cases
There may be irrelevant attributes amongst the
attributes curse of dimensionality

17
Case-based reasoning (CBR)

CBR is an advanced instance based learning
applied to more complex instance objects
Objects may include complex structural
descriptions of cases adaptation rules

CBR cannot use Euclidean distance measures
Must define distance measures for those complex
objects instead (e.g. semantic nets)
CBR tries to model human problem-solving
uses past experience (cases) to solve new
problems
retains solutions to new problems
CBR is an ongoing area of machine learning
research with many applications

19
Applications of CBR