Machine Learning - PowerPoint PPT Presentation

About This Presentation

Title:

Machine Learning

Description:

Machine Learning Foundations of Artificial Intelligence – PowerPoint PPT presentation

Number of Views:183

Avg rating:3.0/5.0

Slides: 44

Provided by: Bamsh2

Learn more at: http://facweb.cs.depaul.edu

Category:

more less

Transcript and Presenter's Notes

Title: Machine Learning

1
Machine Learning

Foundations of Artificial Intelligence

2
Learning

What is Learning?
Learning in AI is also called machine learning or
pattern recognition.
The basic objective is to allow an intelligent
agent to discover autonomously knowledge from
experience.
Lets examine the definition more closely
an intelligent agent The ability to learn
requires a prior level of intelligence and
knowledge. Learning has to start from an
existing level of capability.
to discover autonomously Learning is
fundamentally about an agent recognizing new
facts for its own use and acquiring new abilities
that reinforce its own existing abilities.
Literal programming, i.e. rote learning from
instruction, is not useful.
knowledge Whatever is learned has to be
represented in some way that the agent can use.
If you can't represent it, you can't learn it
is a corollary of the slogan Knowledge is
power.
from experience Experience is typically a set
of so-called training examples examples may be
categorized or not. They may be random or
selected by a teacher. They may include
explanations or not.

3
Learning Agent
Critic
Percepts
Problem solver
Learning element
KB
Actions
4
Learning element

Design of a learning element is affected by
Which components of the performance element are
to be learned
What feedback is available to learn these
components
What representation is used for the components
Type of feedback
Supervised learning correct answers for each
training example
Unsupervised learning correct answers not given
Reinforcement learning occasional
rewards/feedback

5
Inductive Learning

Inductive Learning
inductive learning involves learning generalized
rules from specific examples (can think of this
as the inverse of deduction)
main task given a set of examples, each
classified as positive or negative produce a
concept description that matches exactly the
positive examples
Some Notes
The examples are coded in some representation
language, e.g. they are coded by a finite set of
real-valued features.
The concept description is in a certain language
that is presumably a superset of the language of
possible example encodings.
A correct concept description is one that
classifies correctly ALL possible examples, not
just those given in the training set.
Fundamental Difficulties with Induction
cant generalize with perfect certainty
examples and concepts are NOT available
directly they are only available through
representations which may be more or less
adequate to capture them
some examples may be classified as both positive
and negative
the features supplied may not be sufficient to
discriminate between positive and negative
examples

6
Inductive Learning Frameworks

Function-learning formulation
Logic-inference formulation

7
Inductive learning

Simplest form learn a function from examples
f is the target function
An example is a pair (x, f(x))
Problem find a hypothesis h
such that h f
given a training set of examples
This is a highly simplified model of real
learning
Ignores prior knowledge
Assumes examples are given

8
Inductive learning

Construct/adjust h to agree with f on training
set
h is consistent if it agrees with f on all
examples
E.g., curve fitting

9
Inductive learning

Construct/adjust h to agree with f on training
set
h is consistent if it agrees with f on all
examples
E.g., curve fitting

10
Inductive learning

Construct/adjust h to agree with f on training
set
h is consistent if it agrees with f on all
examples
E.g., curve fitting

11
Inductive learning

Construct/adjust h to agree with f on training
set
h is consistent if it agrees with f on all
examples
E.g., curve fitting

12
Inductive learning

Construct/adjust h to agree with f on training
set
h is consistent if it agrees with f on all
examples
E.g., curve fitting

13
Inductive learning

Construct/adjust h to agree with f on training
set
h is consistent if it agrees with f on all
examples
E.g., curve fitting
Ockhams razor prefer the simplest hypothesis
consistent with data

14
Logic-Inference Formulation

Background knowledge KB
Training set D (observed knowledge) that is not
logically implied by KB
Inductive inference Find h (inductive
hypothesis) such that KB and h imply D

h D is a trivial, but uninteresting solution
(data caching)
Usually, not a sound inference
15
Rewarded Card Example

Deck of cards, with each card designated by
r,s, its rank and suit, and some cards
rewarded
Background knowledge KB
((r1) v v (r10)) ? NUM(r)((rJ) v (rQ) v
(rK)) ? FACE(r)((sS) v (sC)) ?
BLACK(s)((sD) v (sH)) ? RED(s)
Training set DREWARD(4,C) ? REWARD(7,C) ?
REWARD(2,S) ?
?REWARD(5,H) ? ?REWARD(J,S)
Possible inductive hypothesish ? (NUM(r) ?
BLACK(s) ? REWARD(r,s))

Note There are several possible inductive
hypotheses
16
Learning a Predicate

Set E of objects (e.g., cards)
Goal predicate CONCEPT(x), where x is an object
in E,
takes the value True or False (e.g., REWARD)
Observable predicates A(x), B(X),
e.g., NUM, RED
Training set
values of CONCEPT for some combinations of values
of the observable predicates

17
A Possible Training Set
Ex. A B C D E CONCEPT
1 True True False True False False
2 True False False False False True
3 False False True True True False
4 True True True False True True
5 False True True False False False
6 True True False True True False
7 False False True False True False
8 True False True False True True
9 False False False True True False
10 True True True True False True
Note that the training set does not say whether
an observable predicate A, , E is pertinent or
not
18
Learning a Predicate

Set E of objects (e.g., cards)
Goal predicate CONCEPT(x), where x is an object
in E,
takes the value True or False (e.g., REWARD)
Observable predicates A(x), B(X),
e.g., NUM, RED
Training set
values of CONCEPT for some combinations of values
of the observable predicates
Find a representation of CONCEPT in the form
CONCEPT(x) ? S(A,B, )
where S(A,B,) is a sentence built with the
observable predicates, e.g.
CONCEPT(x) ? A(x) ? (?B(x) v C(x))

19
Example set

An example consists of the values of CONCEPT and
the observable predicates for some object x
A example is positive if CONCEPT is True, else it
is negative
The set X of all examples is the example set
The training set is a subset of X

20
Hypothesis Space

An hypothesis is any sentence h of the form
CONCEPT(x) ? S(A,B, )
where S(A,B,) is a sentence built with the
observable predicates
The set of all hypotheses is called the
hypothesis space H
An hypothesis h agrees with an example if it
gives the correct value of CONCEPT

21
Inductive Learning Scheme
22
Size of Hypothesis Space

n observable predicates
2n entries in truth table
In the absence of any restriction (bias), there
are 22n hypotheses to choose from
n 6 ? 2x1019 hypotheses!

23
Multiple Inductive Hypotheses
Rewarded Card Example (Continued)
h1 ? NUM(x) ? BLACK(x) ? REWARD(x) h2 ?
BLACK(r,s) ? ?(rJ) ? REWARD(r,s) h3 ?
(r,s4,C) ? (r,s7,C) ? r,s2,S) ?
REWARD(r,s) h4 ? ?(r,s5,H) ?
?(r,sJ,S) ? REWARD(r,s) agree with all the
examples in the training set
24
Inductive Bias

Need for a system of preferences called a bias
to compare possible hypotheses
Keep-It-Simple (KIS) Bias
If an hypothesis is too complex it may not be
worth learning it
There are much fewer simple hypotheses than
complex ones, hence the hypothesis space is
smaller
Examples
Use much fewer observable predicates than
suggested by the training set
Constrain the learnt predicate, e.g., to use only
high-level observable predicates such as NUM,
FACE, BLACK, and RED and/or to have simple syntax
(e.g., conjunction of literals)

If the bias allows only sentences S that are
conjunctions of k ltlt n predicates picked from the
n observable predicates, then the size of H is
O(nk)
25
Version Spaces

Idea assume you are looking for a CONJUNCTIVE
CONCEPT
e.g., spade A, club 7, club 9 yes
club 8, heart 5 no
concept odd and black
now notice that the set of conjunctive concepts
is partially ordered by specificity

any card

at any point, keep most specific and least
specific
conjuncts consistent with data
most specific
anything more specific misses some positive
instances
always exists -- conjoin all OK conjunctions
least specific
anything less specific admits some negative
instances
may not be unique -- imagine all you know is
club
4 not ok, odd black ok, spade ok, black not ok
Idea is to gradually merge least and most
specific as data comes in.

black
odd black
spade
odd spade
3 of spade
26
Version Spaces Example

Step 0 most specific concept (msc) is the empty
set least specific concept (lsc) is the set of
all cards.
Step 1 A-spade is found to be in target set
msc A-spade
lsc set of all cards
Step 2 7-club is found to be in target set
msc odd black cards
lsc set of all cards
Step 3 8-heart is not in target set
msc odd black cards
lsc all odd cards OR all black cards
. . .

The training examples (obtained) incrementally
27
Predicate as a Decision Tree
The predicate CONCEPT(x) ? A(x) ? (?B(x) v C(x))
can be represented by the following decision
tree

ExampleA mushroom is poisonous iffit is yellow
and small, or yellow,
big and spotted
x is a mushroom
CONCEPT POISONOUS
A YELLOW
B BIG
C SPOTTED

28
Decision Trees

What is a Decision Tree
it takes as input the description of a situation
as a set of attributes (features) and outputs a
yes/no decision (so it represents a Boolean
function)
each leaf is labeled "positive or "negative",
each node is labeled with an attribute (or
feature), and each edge is labeled with a value
for the feature of its parent node
Attribute-value language for examples
in many inductive tasks, especially learning
decision trees, we need a representation language
for examples
each example is a finite feature vector
a concept is a decision tree where nodes are
features

29
Decision Trees

Example is it a good day to play golf?
a set of attributes and their possible values
outlook sunny, overcast, rain
temperature cool, mild, hot
humidity high, normal
windy true, false

A particular instance in the training set might
be ltovercast, hot, normal, falsegt play
In this case, the target class is a binary
attribute, so each instance represents a
positive or a negative example.
30
Using Decision Trees for Classification

Examples can be classified as follows
1. look at the example's value for the feature
specified
2. move along the edge labeled with this value
3. if you reach a leaf, return the label of the
leaf
4. otherwise, repeat from step 1
Example (a decision tree to decide whether to go
play golf)

outlook
sunny
overcast
rain
humidity
windy
high
normal
true
false
31
Classification 3 Step Process

1. Model construction (Learning)
Each record (instance) is assumed to belong to a
predefined class, as determined by one of the
attributes, called the class label
The set of records used for construction of the
model is called training set
The model is usually represented in the form of
classification rules, (IF-THEN statements) or
decision trees
2. Model Evaluation (Accuracy)
Estimate accuracy rate of the model based on a
test set
The known label of test sample is compared to
classified result from model
Accuracy rate percentage of test set samples
correctly classified by the model
Test set is independent of training set otherwise
over-fitting will occur
3. Model Use (Classification)
The model is used to classify unseen instances
(assigning class labels)
Predict the value of an actual attribute

32
Memory-Based Reasoning

Basic Idea classify new instances based on their
similarity to instances we have seen before
also called instance-based learning
Simplest form of MBR Rote Learning
learning by memorization
save all previously encountered instance given a
new instance, find one from the memorized set
that most closely resembles the new one assign
new instance to the same class as the nearest
neighbor
more general methods try to find k nearest
neighbors rather than just one
but, how do we define resembles?
MBR is lazy
defers all of the real work until new instance is
obtained no attempts are made to learn a
generalized model from the training set
less data preprocessing and model evaluation, but
more work has to be done at classification time

33
MBR Collaborative Filtering

Collaborative Filtering or Social Learning
idea is to give recommendations to a user based
on the ratings of objects by other users
usually assumes that features in the data are
similar objects (e.g., Web pages, music, movies,
etc.)
usually requires explicit ratings of objects by
users based on a rating scale
there have been some attempts to obtain ratings
implicitly based on user behavior (mixed results
problem is that implicit ratings are often
binary)
Nearest Neighbors Strategy
Find similar users and predicted (weighted)
average of user ratings
We can use any distance or similarity measure to
compute similarity among users (user ratings on
items viewed as a vector)
In case of ratings, often the Pearson r algorithm
is used to compute correlations

34
MBR Collaborative Filtering

Collaborative Filtering Example
A movie rating system
Ratings scale 1 detest 7 love it
Historical DB of users includes ratings of movies
by Sally, Bob, Chris, and Lynn
Karen is a new user who has rated 3 movies, but
has not yet seen Independence Day should we
recommend it to her?

Will Karen like Independence Day?
35
Clustering
Clustering is a process of partitioning a set of
data (or objects) in a set of meaningful
sub-classes, called clusters
Helps users understand the natural grouping or
structure in a data set

Cluster
a collection of data objects that are similar
to one another and thus can be treated
collectively as one group
but as a collection, they are sufficiently
different from other groups
Clustering
unsupervised classification
no predefined classes

36
Distance or Similarity Measures

Measuring Distance
In order to group similar items, we need a way to
measure the distance between objects (e.g.,
records)
Note distance inverse of similarity
Often based on the representation of objects as
feature vectors

Term Frequencies for Documents
An Employee DB
37
Distance or Similarity Measures

Common Distance Measures
Manhattan distance
Euclidean distance
Cosine similarity

38
What Is Good Clustering?

A good clustering will produce high quality
clusters in which
the intra-class (that is, intra-cluster)
similarity is high
the inter-class similarity is low
The quality of a clustering result also depends
on both the similarity measure used by the method
and its implementation
The quality of a clustering method is also
measured by its ability to discover some or all
of the hidden patterns
The quality of a clustering result also depends
on the definition and representation of cluster
chosen

39
Applications of Clustering

Clustering has wide applications in Pattern
Recognition
Spatial Data Analysis
create thematic maps in GIS by clustering feature
spaces
detect spatial clusters and explain them in
spatial data mining
Image Processing
Market Research
Information Retrieval
Document or term categorization
Information visualization and IR interfaces
Web Mining
Cluster Web usage data to discover groups of
similar access patterns
Web Personalization

40
Learning by Discovery

One example AM by Doug Lenat at Stanford
a mathematical system
inputs set theory (union, intersection, etc)
how to do mathematics (based on a book by
Polya), e.g., if f is an interesting function of
two arguments, then f(x,x) is an interesting
function on one, etc.
speculated about what was interesting an made
conjectures, etc.
What AM discovered
integers (as equivalence relation on cardinality
of sets)
addition (using disjoint union of sets)
multiplication
primes 1 was interesting, the function returning
the cardinality of set of divisors was
interesting, etc.
Glodbachs conjecture all even numbers are the
sum of two prime numbers (note that AM did not
prove it, just discovered that it was
interesting)
Why was AM so successful?
Connection between LISP and mathematics
(mutations of small bits of LISP code are likely
to be interesting)
Doesnt extend to other domains
Lessons from EURISKO (fleet game)

41
Explanation-Based Learning

Explanation- based learning (EBL) systems try to
explain why each training instance belongs to the
target concept.
The resulting proof is then generalized and
saved.
If a new instance can be explained in the same
manner as a previous instance, then it is also
assumed to be a member of the target concept.
Like macro- operators, EBL systems never learn to
solve a problem that they couldnt solve before
(in principle).
However, they can become much more efficient at
problem-solving by reorganizing the search space.
One of the strengths of EBL is that the resulting
explanations are typically easy to understand.
One of the weaknesses of EBL is that they rely on
a domain theory to generate the explanations.

42
Case-Based Learning

Case-based reasoning (CBR) systems keep track of
previously seen instances and apply them directly
to new ones.
In general, a CBR system simply stores each
case that it experiences in a case base which
represents its memory of previous episodes.
To reason about a new instance, the system
consults its case base and finds the most similar
case that its seen before. The old case is then
adapted and applied to the new situation.
CBR is similar to reasoning by analogy. Many
people believe that much of human learning is
case- based in nature.

43
Connectionist Algorithms

Connectionist models (also called neural
networks) are inspired by the interconnectivity
of the brain.
Connectionist networks typically consist of many
nodes that are highly interconnected. When a node
is activated, it sends signals to other nodes so
that they are activated in turn.
Using layers of nodes allows connectionist models
to learn fairly complex functions.
Neural networks are loosely modeled after the
biological processes involved in cognition
1. Information processing involves many simple
elements called neurons.
2. Signals are transmitted between neurons using
connecting links.
3. Each link has a weight that controls the
strength of its signal.
4. Each neuron applies an activation function to
the input that it receives from other neurons.
This function determines its output.