Intelligent Systems presentation

About This Presentation

Transcript and Presenter's Notes

Title: Intelligent Systems

1
Intelligent Systems

Machine learning
Steven de Jong
With contributions by K. Tuyls, E. Postma, I.
Sprinkhuizen-Kuyperand Evonet Flying Circus

2
This lecture

Goal
Quickly revisiting material on machine learning
from courses you already had
Giving a preview of material that will be
discussed in the M.Sc. program
Focussing on design and applications
Interaction!

3
This lecture

Content
Why use machine learning? A tutorial!
Artificial neural networks
Evolutionary computation (genetic algorithms)
Reinforcement learning
Not the content
Lots of theory I will provide some references
Two hours of talking lots of slides though ?

4
1. Machine learning

Tutorial based on your assignment

5
Q What is the most powerful problem solver
in the Universe?

The (human) brain
that created the wheel, New York, wars
and so on (after Douglas Adams)
The evolution mechanism
that created the human brain (after Darwin et
al.)

6
Building problem solvers by looking at and
mimicking

brains ? neurocomputing
evolution ? evolutionary computing

7
Taxonomy
8
Why use machine learning?

Speed
Robustness
Flexibility
Adaptivity
Context sensitivity

9
ML and the assignment

Create a robot controller that uses planning
and other techniques to navigate a physical
robot in a maze
So, why opt for machine learning?

10
Example 1 recognize crossing

Sensors
300mm
1000mm
980mm
290mm
6000mm
760mm
780mm
5600mm

2
1
11
Example 1 recognize crossing

Sensors
300mm
1000mm
980mm
290mm
6000mm
760mm
780mm
5600mm

Rule-based
IF 250lts1lt350 AND
Lots of work
Crappy performance!
Sensors are noisy
What exactly defines a crossing?
Is it not a T-joint?

12
The machine learning perspective

What kind of task is this?
What input data do we have?
What output data do we want?
Supervised or unsupervised learning?
? Which method is suitable?

13
The machine learning perspective

What kind of task is this?
Classification
What input data do we have?
Eight sonar sensor values
What output data do we want?
Probability that values represent crossing,
T-joint, corridor
Supervised or unsupervised learning?
Probably supervised

14
The machine learning perspective

Supervised learning classification
Make a large maze, mark areas as being crossings,
T-joints, corridors
Place robots at random locations and in random
orientations
Train your ML method until all locations
correctly classified
Problem classification depends on orientation!

15
The machine learning perspective

Orientation maze is seen with robot!!!

16
Example 2 keep your lane

Robot follows a hallway
Possibly with angles!
Problem
Noise on actuators (motors, 3rd wheel, dust on
the floor)
The worse the robots position, the worse
performance on classification

17
Example 2 keep your lane

Idea
Monitor distance from the walls
If the robot is significantly off-center, perform
a correction
Problems
Distance monitoring
Some lanes are really short
What if robot already badly aligned?

18
The machine learning perspective

What kind of task is this?
Control
What input data do we have?
Eight sonar sensor values (and camera)
What output data do we want?
A robot that keeps itself aligned
Supervised or unsupervised learning?
Probably unsupervised

19
The machine learning perspective

Unsupervised learning control
Develop a large maze
Develop tasks move from crossing A to crossing E
(adjacent)
Couple sensory information to motors with some ML
method
Quality
Short time needed to reach destination
Low number of collisions with the wall

20
Sensory-motor coordination

Idea
Enhance information obtained by sensors by
actively using motors
For example, for
Aligning the robotor
being more sure about classification
you might stop forward movement and start
rotating the robot in a scanning fashion

21
2. Artificial Neural Networks

(parts by E. Postma)

22
Artificial neural networks

You have seen these often in the past
I will provide only a quick overview
Slides will be put online

23
Recommended literature

Russel Norvig H 19 pp. 563-587
and many more

24
A peek into the neural computer

Is it possible to develop a computer model after
the natural example (the human brain)?
Brain-inspired models
Models that possess a limited number of
structural and functional properties of the
neural computer

25
Neurons, the building blocks of the brain
26
Neural activity
out
in
27
Synapses, the basis of learning and memory
28
Hebbian learning (Donald Hebb)
?w(1,2) ? a(1) a(2)
29
(Artificial) Neural Networks

Neurons
Activity
Non-linear transfer function (!)
Connections
Adaptive weights
Learning
Supervised
Unsupervised

30
Artificial Neurons

input (vectors)
summation (excitation)
output (activation)

i1
a f(e)
i2
e
i3
31
Transfer function

Non-linear function(sigmoid)

a to 0
f(x)
a to infinity
x
32
Artificial connections (Synapses)

wAB
The weight of the connection from neuron A to
neuron B

33
The Perceptron
34
Learning in the Perceptron

Delta learning rule (supervised)
The difference between the target tand actual
output o, given input x

Global error E
Is a function of the differences between the
target and actual output of the patterns to be
learnt

35
Gradient descent
36
Decision boundaries linear!
37
The multilayer perceptron
input
hidden
output
38
Learning in the MLP
39
Sigmoïd function (logistic)

Alternative tanh (lt-1,1gt instead of lt0,1gt)
Derivative f(x) f(x) 1 f(x)

40
Updating the hidden-to-output weights
41
Updating the input-to-hidden weights
42
Forward Backward Propagation
43
Implementation

Use ADT for graphs
Or just use matrices and vectors
Vectors for input and output
Matrices for each transition / layer (wij)
Learning
Supervised e.g., Backpropagation
Unsupervised e.g., Evolutionary Algorithms

44
Break

See you in 10 minutes

45
3. (a) Introduction toEvolutionary Computation

(Ida Sprinkhuizen-Kuyper and EvoNet Flying Circus)

46
Recommended literature

Evonet site
http//www.dcs.napier.ac.uk/evonet
Slides, demos
T. M. Mitchell, Machine Learning, 1997
http//www.cs.cmu.edu/tom/ (slides ch. 9)
Other literature
Goldberg (1989)
Michalewicz (1996)
Bäck (1996)

47
History

L. Fogel 1962 (San Diego, CA) Evolutionary
Programming
J. Holland 1962 (Ann Arbor, MI)Genetic
Algorithms
I. Rechenberg H.-P. Schwefel 1965 (Berlin,
Germany) Evolution Strategies
J. Koza 1989 (Palo Alto, CA)Genetic Programming

48
The Metaphor

EVOLUTION
Individual
Fitness
Environment

PROBLEM SOLVING
Candidate Solution
Quality
Problem

49
The Ingredients
t 1
t
reproduction
selection
50
The Evolution Mechanism

Increasing diversity by genetic operators
Mutation local search
Recombination(crossover)global search

Decreasing diversity by selection
Of parents
Of survivors

51
The Evolutionary Cycle
Selection
Recombination
Mutation
Replacement
52
Main Streams

Genetic Algorithms
Evolution Strategies
Evolutionary Programming
Genetic Programming

53
Domains of Application

Numerical, Combinatorial Optimisation
System Modeling and Identification
Planning and Control
Engineering Design
Data Mining
Machine Learning
Artificial Life
Evolving neural networks

54
Performance

Acceptable performance at acceptable costs on a
wide range of problems
Intrinsic parallelism (robustness, fault
tolerance)
Superior to other techniques on complex problems
with
lots of data, many free parameters
complex relationships between parameters
many (local) optima

55
Advantages

No presumptions w.r.t. problem space
Widely applicable
Low development application costs
Easy to incorporate other methods
Solutions are interpretable (unlike NN)
Can be run interactively, accommodate user
proposed solutions
Provide many alternative solutions

56
Disadvantages

No guarantee for optimal solution within finite
time
Weak theoretical basis
May need parameter tuning
Often computationally expensive, i.e. slow

57
3. (b) How to Build an Evolutionary Algorithm

(Ida Sprinkhuizen-Kuyper and EvoNet Flying Circus)

58
Evolutionary algorithms

Evolutionary algorithms quick implementation
guide
Evolving artificial neural networks

GA(fitness, threshold, p, c, m)
fitness is a function calculating fitness of an
individual in the gene pool
threshold either fitness to reach or number of
generations
p is population size
c is crossover probability 0,1
m is mutation probability 0,1
Initialize P ? p random individuals
Evaluate for each i in P, compute fitness(i)
While maxi fitness(i) lt threshold or
generation lt treshold
Select probabilistically select (1-c)p
individuals out of P to add to Ps
Crossover Probabilistically select c/2p pairs
of individuals from P. For each pair, lti1, i2gt,
produce two offspring by applying the crossover
operator. Add the offspring to Ps too.
Mutate Apply mutation operator to mp random
members of Ps
Update P ? Ps
Evaluate for each h in P, compute Fitness(h)
Shift generation generation ? generation 1
Return the individual from P that has the highest
fitness

60
The Steps

In order to build an evolutionary algorithm
there are a number of steps that we have to
perform
Design a representation
Decide how to initialise a population
Design a way of mapping a genotype to a phenotype
Design a way of evaluating an individual

61
Further Steps

Design suitable mutation operator(s)
Design suitable recombination operator(s)
Decide how to manage our population
Decide how to select individuals to be parents
Decide how to select individuals to be replaced
Decide when to stop the algorithm

62
Designing a Representation

We have to come up with a method of representing
an individual as a genotype.
There are many ways to do this and the way we
choose must be relevant to the problem that we
are solving.
When choosing a representation, we have to bear
in mind how the genotypes will be evaluated and
what the genetic operators might be.

63
Example Discrete Representation

Representation of an individual can be using
discrete values (binary, integer, or any other
system with a discrete set of values).
Following is an example of binary representation.

CHROMOSOME
GENE
64
Example Discrete Representation

8 bits Genotype

Phenotype

Integer

Real Number

Schedule

65
Example Discrete Representation

Phenotype could be integer numbers

Genotype
Phenotype
163
127 026 125 024 023 022 121
120 128 32 2 1 163
66
Example Discrete Representation

Phenotype could be Real Numbers
e.g. a number between 2.5 and 20.5 using 8 binary
digits

Genotype
Phenotype
13.9609
67
Example Real-valued representation

A very natural encoding if the solution we are
looking for is a list of real-valued numbers,
then encode it as a list of real-valued numbers!
(i.e., not as a string of 1s and 0s)
Lots of applications, e.g. parameter optimisation
(ANNs!)

68
Example Real-valued representation

Individuals are represented as a tuple of n
real-valued numbers
The fitness function maps tuples of real numbers
to a single real number

69
Phenotype to Genotype

Sometimes producing the phenotype from the
genotype is a simple and obvious process.
Other times the genotype might be a set of
parameters to some algorithm, which works on the
problem data to produce the phenotype

Genotype
Problem Data
Growth Function
Phenotype
70
Evaluating an Individual

This is by far the most costly step for real
applications
do not re-evaluate unmodified individuals
It might be a subroutine, a black-box simulator,
or any external process
(e.g. robot experiment)
You could use approximate fitness - but not for
too long

71
More on Evaluation

Constraint handling - what if the phenotype
breaks some constraint of the problem
penalize the fitness
specific evolutionary methods
Multi-objective evolutionary optimization
gives a set of compromise solutions

72
Mutation Operators

We might have one or more mutation operators for
our representation.
Some important points are
At least one mutation operator should allow every
part of the search space to be reached
The size of mutation is important and should be
controllable
Mutation should produce valid chromosomes

73
Example
1 1 1 1 1 1 1
before
mutated gene
Mutation usually happens with probability pm for
each gene
74
Example Real-valued mutation

Perturb values by adding some random noise
Often, a Gaussian/normal distribution N(0,?) is
used, where
0 is the mean value
? is the standard deviation
and
xi xi N(0,?i)
for each parameter

75
Recombination (crossover)

We might have one or more recombination
operators for our representation.
Some important points are
The child should inherit something from each
parent. If this is not the case then the operator
is a mutation operator.
The recombination operator should be designed in
conjunction with the representation so that
recombination is not always catastrophic
Recombination should produce valid chromosomes.

76
Example Recombination for Discrete Representation
Whole Population
Each chromosome is cut into n pieces which are
recombined. (Example for n1)
offspring
77
Example Recombination for real valued
representation
Discrete recombination (uniform crossover) given
two parents one child is created as follows
78
Example Recombination for real valued
representation
Intermediate recombination (arithmetic
crossover) given two parents one child is
created as follows
?
79
Selection Strategy

We want to have some way to ensure that better
individuals have a better chance of being
parents than less good individuals.
This will give us selection pressure which will
drive the population forward.
We have to be careful to give less good
individuals at least some chance of being parents
- they may include some useful genetic material.

80
Example Fitness proportionate selection

Expected number of times fi is selected for
mating is

Better (fitter) individuals have
more space
more chances to be selected

Best
Worst
81
Example Fitness proportionate selection

Disadvantages
Danger of premature convergence because
outstanding individuals take over the entire
population very quickly
Low selection pressure when fitness values are
near each other
Behaves differently on transposed versions of the
same function

82
Example Fitness proportionate selection

Fitness scaling A cure for FPS
Start with the raw fitness function f.
Standardise to ensure
Lower fitness is better fitness.
Optimal fitness equals to 0.
Adjust to ensure
Fitness ranges from 0 to 1.
Normalise to ensure
The sum of the fitness values equals to 1.

83
Example Tournament selection

Select k random individuals, without replacement
Take the best
k is called the size of the tournament

84
Example Ranked based selection

Individuals are sorted on their fitness value
from best to worse. The place in this sorted list
is called rank.
Instead of using the fitness value of an
individual, the rank is used by a function to
select individuals from this sorted list. The
function is biased towards individuals with a
high rank ( good fitness).

85
Replacement Strategy

The selection pressure is also affected by the
way in which we decide which members of the
population to kill in order to make way for our
new individuals.
We can use the stochastic selection methods in
reverse, or there are some deterministic
replacement strategies.
We can decide never to replace the best in the
population elitism.

86
Recombination vs Mutation

Recombination
modifications depend on the whole population
decreasing effects with convergence
exploitation operator
Mutation
mandatory to escape local optima
strong causality principle
exploration operator

87
Stopping criterion

The optimum is reached!
Limit on CPU resources
Maximum number of fitness
evaluations
Limit on the users patience
After some generations without
improvement

88
Algorithm performance

Never draw any conclusion from a single run
use statistical measures (averages, medians)
from a sufficient number of independent runs
From the application point of view
design perspective
find a very good solution at least once
production perspective
find a good solution at almost every run

89
Algorithm Performance (2)

Remember the WYTIWYG principal
What you test is what you get - dont tune
algorithm performance on toy data and expect it
to work with real data.

90
Key issues

Genetic diversity
differences of genetic characteristics in the
population
loss of genetic diversity all individuals in
the population look alike
snowball effect
convergence to the nearest local optimum
in practice, it is irreversible

91
Key issues (2)

Exploration vs Exploitation
Exploration sample unknown regions
Too much exploration random search, no
convergence
Exploitation try to improve the best-so-far
individuals
Too much expoitation local search only
convergence to a local optimum

92
4. Reinforcement learning

(I. Sprinkhuizen-Kuyper, K. Tuyls)

93
Recommended literature

Sutton, R.S. and A.G. Barto (1998), Reinforcement
Learning An Introduction, MIT Press.
http//www.cs.ualberta.ca/sutton/book/the-book.ht
ml
Mitchell, T.(1997). Machine Learning. McGraw
Hill.
RL repository at MSU (http//web.cps.msu.edu/rlr)

94
Reinforcement Learning

Roots of reinforcement learning (RL)
Preliminaries (need to know!)
The setting
Properties
The Markov Property
Markov Decision Processes (MDP)

95
Roots of Reinforcement Learning

Origins from
Mathematical psychology (early 10s)
Control theory (early 50s)
Mathematical psychology
Edward Thorndike research on animals via puzzle
boxes
Bush Mosteller developed one of the first
models of learning behavior
Control theory
Richard Bellman Stability theory of Differential
Equations How to design an optimal controller?
Inventor Dynamic Programming solving optimal
control problems by solving the Bellman equations!

96
Preliminaries Setting of Reinforcement Learning

What is it?
Learning from interaction
Learning about, from, and while interacting with
an external environment
Learning what to dohow to map situations to
actionsso as to maximize a numerical reward
signal

97
Preliminaries Setting of Reinforcement Learning

Key features?
Learner is not told which actions to take
Trial-and-Error search
Possibility of delayed reward
Sacrifice short-term gains for greater long-term
gains
The need to explore and exploit
Considers the whole problem of a goal-directed
agent interacting with an uncertain environment

98
Preliminaries properties of RLSupervised versus
Unsupervised

Supervised learning Unsupervised learning

Training Info desired (target) outputs
Training Info evaluations (rewards /
penalties)
SupervisedLearning System
ReinforcementLearning System
Inputs
Outputs
Inputs
Outputs
actions
states
Error (target output actual output)
Objective get as much reward as possible
99
Preliminaries properties of RL The
Agent-Environment Interface
100
Preliminaries properties of RL Learning how to
behave

Reinforcement learning methods specify how the
agent changes its policy as a result of
experience.
Roughly, the agents goal is to get as much
reward as it can over the long run.

101
Preliminaries properties of RL Abstraction

Getting the Degree of Abstraction Right
Time steps need not refer to fixed intervals of
real time.
Actions can be low level (voltages to motors), or
high level (accept job offer), mental (shift
focus of attention), etc.
States can be low-level sensations, or
abstract, symbolic, based on memory, or
subjective (surprised or lost).
The environment is not necessarily unknown to the
agent, only incompletely controllable.

102
Preliminaries Properties of RL Goals and Rewards

Is a scalar reward signal an adequate notion of a
goal?maybe not, but it is surprisingly flexible.
A goal should specify what we want to achieve,
not how we want to achieve it.
A goal must be outside the agents direct
controlthus outside the agent.
The agent must be able to measure success
explicitly
frequently during its lifespan.

103
Preliminaries Properties of RL Whats the
objective?
Episodic tasks interaction breaks naturally into
episodes, e.g., plays of a game, trips through a
maze.
Immediate reward
Long term reward
104
Preliminaries Properties of RL Returns for
Continuing Tasks
Continuing tasks interaction does not have
natural episodes.
Discounted return
105
An Example
Avoid failure the pole falling beyond a critical
angle or the cart hitting end of track.
As an episodic task where episode ends upon
failure
As a continuing task with discounted return
In either case, return is maximized by avoiding
failure for as long as possible.
106
Another Example
Get to the top of the hill as quickly as
possible.
Return is maximized by minimizing number of
steps reach the top of the hill.
107
Preliminaries properties of RL A Unified
Notation

Think of each episode as ending in an absorbing
state that always produces reward of zero
We can cover all cases by writing

108
The Markov Property

The state at step t, means whatever information
is available to the agent at step t about its
environment.
The state can include immediate sensations,
highly processed sensations, and structures built
up over time from sequences of sensations.
A state should summarize past sensations so as to
retain all essential information, i.e., it
should have the Markov Property

109
Markov Decision Processes

If a reinforcement learning task has the Markov
Property, it is basically a Markov Decision
Process (MDP).
If state and action sets are finite, it is a
finite MDP.
To define a finite MDP, you need to give
state and action sets
one-step dynamics defined by transition
probabilities
reward probabilities

110
An Example Finite MDP

At each step, robot has to decide whether it
should (1) actively search for a can, (2) wait
for someone to bring it a can, or (3) go to home
base and recharge.
Searching is better but runs down the battery if
runs out of power while searching, has to be
rescued (which is bad).
Decisions made on basis of current energy level
high, low.
Reward number of cans collected

111
Recycling Robot MDP
112
Reinforcement procedure

Bellman equations
Policy evaluation and improvement
Policy iteration (value functions)

113
Reinforcement methods

Dynamic programming
Monte Carlo methods
Temporal Difference (TD) learning

114
Wrapping up

Any questions, remarks?

Write a Comment

User Comments (0)

Intelligent Systems PowerPoint PPT Presentation