K-Nearest%20Neighbours%20and%20Instance%20based%20learning - PowerPoint PPT Presentation

About This Presentation

Title:

K-Nearest%20Neighbours%20and%20Instance%20based%20learning

Description:

K-Nearest Neighbours and Instance based learning Ata Kaban The University of Birmingham – PowerPoint PPT presentation

Number of Views:258

Avg rating:3.0/5.0

Slides: 33

Provided by: Ata128

Category:

more less

Transcript and Presenter's Notes

Title: K-Nearest%20Neighbours%20and%20Instance%20based%20learning

1
K-Nearest Neighbours and Instance based learning

Ata Kaban
The University of Birmingham

Today we learn
K-Nearest Neighbours
Case-based reasoning
Lazy and eager learning

3
Instance-based learning

One way of solving tasks of approximating
discrete or real valued target functions
Have training examples (xn, f(xn)), n1..N.
Key idea
just store the training examples
when a test example is given then find the
closest matches

1-Nearest neighbour
Given a query instance xq,
first locate the nearest training example xn
then f(xq) f(xn)
K-Nearest neighbour
Given a query instance xq,
first locate the k nearest training examples
if discrete values target function then take
vote among its k nearest nbrs else if real
valued target fct then take the mean of the f
values of the k nearest nbrs

5
The distance between examples

We need a measure of distance in order to know
who are the neighbours
Assume that we have T attributes for the learning
problem. Then one example point x has elements xt
? ?, t1,T.
The distance between two points xi xj is often
defined as the Euclidean distance

6
Voronoi Diagram
7
Voronoi Diagram
8
Characteristics of Inst-b-Learning

An instance-based learner is a lazy-learner and
does all the work when the test example is
presented. This is opposed to so-called
eager-learners, which build a parameterised
compact model of the target.
It produces local approximation to the target
function (different with each test instance)

9
When to consider Nearest Neighbour algorithms?

Instances map to points in
Not more then say 20 attributes per instance
Lots of training data
Advantages
Training is very fast
Can learn complex target functions
Dont lose information
Disadvantages
? (will see them shortly)

10
(No Transcript)
11
Training data
Test instance
12
Keep data in normalised form
One way to normalise the data ar(x) to ar(x) is
13
Normalised training data
Test instance
14
Distances of test instance from training data
Classification 1-NN Yes 3-NN Yes 5-NN No 7-NN
No
15
What if the target function is real valued?

The k-nearest neighbour algorithm would just
calculate the mean of the k nearest neighbours

16
Variant of kNN Distance-Weighted kNN

We might want to weight nearer neighbors more
heavily
Then it makes sense to use all training examples
instead of just k (Stepards method)

17
Difficulties with k-nearest neighbour algorithms

Have to calculate the distance of the test case
from all training cases
There may be irrelevant attributes amongst the
attributes curse of dimensionality

18
Case-based reasoning (CBR)

CBR is an advanced instance based learning
applied to more complex instance objects
Objects may include complex structural
descriptions of cases adaptation rules

CBR cannot use Euclidean distance measures
Must define distance measures for those complex
objects instead (e.g. semantic nets)
CBR tries to model human problem-solving
uses past experience (cases) to solve new
problems
retains solutions to new problems
CBR is an ongoing area of machine learning
research with many applications

We only touch upon the area of Case Based
Reasoning.
If you have interest to find out more about it a
good place to start is the second part of the
Chapter on Instance Based Learning in the
textbook of Tom Mitchell.