Title: Intelligent Systems
 1Intelligent Systems
- Machine learning 
- Steven de Jong 
- With contributions by K. Tuyls, E. Postma, I. 
 Sprinkhuizen-Kuyperand Evonet Flying Circus
2This lecture
- Goal 
- Quickly revisiting material on machine learning 
 from courses you already had
- Giving a preview of material that will be 
 discussed in the M.Sc. program
- Focussing on design and applications 
- Interaction!
3This lecture
- Content 
- Why use machine learning? A tutorial! 
- Artificial neural networks 
- Evolutionary computation (genetic algorithms) 
- Reinforcement learning 
- Not the content 
- Lots of theory I will provide some references 
- Two hours of talking  lots of slides though ?
41. Machine learning
- Tutorial based on your assignment
5Q What is the most powerful problem solver 
in the Universe?
- The (human) brain 
 that created the wheel, New York, wars
 and so on (after Douglas Adams)
- The evolution mechanism 
 that created the human brain (after Darwin et
 al.)
6Building problem solvers by looking at and 
mimicking
- brains ? neurocomputing 
- evolution ? evolutionary computing
7Taxonomy 
 8Why use machine learning?
- Speed 
- Robustness 
- Flexibility 
- Adaptivity 
- Context sensitivity
9ML  and the assignment
- Create a robot controller that uses planning 
 and other techniques to navigate a physical
 robot in a maze
- So, why opt for machine learning?
10Example 1 recognize crossing
- Sensors 
- 300mm 
- 1000mm 
- 980mm 
- 290mm 
- 6000mm 
- 760mm 
- 780mm 
- 5600mm
2
1 
 11Example 1 recognize crossing
- Sensors 
- 300mm 
- 1000mm 
- 980mm 
- 290mm 
- 6000mm 
- 760mm 
- 780mm 
- 5600mm
- Rule-based 
- IF 250lts1lt350 AND  
- Lots of work 
- Crappy performance! 
- Sensors are noisy 
- What exactly defines a crossing? 
- Is it not a T-joint?
12The machine learning perspective
- What kind of task is this? 
- What input data do we have? 
- What output data do we want? 
- Supervised or unsupervised learning? 
- ? Which method is suitable?
13The machine learning perspective
- What kind of task is this? 
- Classification 
- What input data do we have? 
- Eight sonar sensor values 
- What output data do we want? 
- Probability that values represent crossing, 
 T-joint, corridor
- Supervised or unsupervised learning? 
- Probably supervised 
14The machine learning perspective
- Supervised learning classification 
- Make a large maze, mark areas as being crossings, 
 T-joints, corridors
- Place robots at random locations and in random 
 orientations
- Train your ML method until all locations 
 correctly classified
- Problem classification depends on orientation! 
15The machine learning perspective
- Orientation maze is seen with robot!!!
16Example 2 keep your lane
- Robot follows a hallway 
- Possibly with angles! 
- Problem 
- Noise on actuators (motors, 3rd wheel, dust on 
 the floor)
- The worse the robots position, the worse 
 performance on classification
17Example 2 keep your lane
- Idea 
- Monitor distance from the walls 
- If the robot is significantly off-center, perform 
 a correction
- Problems 
- Distance monitoring 
- Some lanes are really short 
- What if robot already badly aligned?
18The machine learning perspective
- What kind of task is this? 
- Control 
- What input data do we have? 
- Eight sonar sensor values (and camera) 
- What output data do we want? 
- A robot that keeps itself aligned 
- Supervised or unsupervised learning? 
- Probably unsupervised 
19The machine learning perspective
- Unsupervised learning control 
- Develop a large maze 
- Develop tasks move from crossing A to crossing E 
 (adjacent)
- Couple sensory information to motors with some ML 
 method
- Quality 
- Short time needed to reach destination 
- Low number of collisions with the wall
20Sensory-motor coordination
- Idea 
- Enhance information obtained by sensors by 
 actively using motors
- For example, for 
- Aligning the robotor 
- being more sure about classification 
-  you might stop forward movement and start 
 rotating the robot in a scanning fashion
212. Artificial Neural Networks
  22Artificial neural networks
- You have seen these often in the past 
- I will provide only a quick overview 
- Slides will be put online
23Recommended literature
- Russel  Norvig H 19 pp. 563-587 
-  and many more 
24A peek into the neural computer
- Is it possible to develop a computer model after 
 the natural example (the human brain)?
- Brain-inspired models 
- Models that possess a limited number of 
 structural and functional properties of the
 neural computer
25Neurons, the building blocks of the brain 
 26Neural activity
out
in 
 27Synapses, the basis of learning and memory 
 28Hebbian learning (Donald Hebb)
?w(1,2) ? a(1) a(2) 
 29(Artificial) Neural Networks
- Neurons 
- Activity 
- Non-linear transfer function (!) 
- Connections 
- Adaptive weights 
- Learning 
- Supervised 
- Unsupervised
30Artificial Neurons
- input (vectors) 
- summation (excitation) 
- output (activation)
i1
a  f(e)
i2
e
i3 
 31Transfer function
- Non-linear function(sigmoid)
a to 0
f(x)
a to infinity
x 
 32Artificial connections (Synapses)
- wAB 
- The weight of the connection from neuron A to 
 neuron B
33The Perceptron 
 34Learning in the Perceptron
- Delta learning rule (supervised) 
- The difference between the target tand actual 
 output o, given input x
- Global error E 
- Is a function of the differences between the 
 target and actual output of the patterns to be
 learnt
35Gradient descent 
 36Decision boundaries linear! 
 37The multilayer perceptron
input
hidden
output 
 38Learning in the MLP 
 39Sigmoïd function (logistic)
- Alternative tanh (lt-1,1gt instead of lt0,1gt) 
- Derivative f(x)  f(x) 1  f(x)
40Updating the hidden-to-output weights 
 41Updating the input-to-hidden weights 
 42Forward  Backward Propagation 
 43Implementation
- Use ADT for graphs 
- Or just use matrices and vectors 
- Vectors for input and output 
- Matrices for each transition / layer (wij) 
- Learning 
- Supervised e.g., Backpropagation 
- Unsupervised e.g., Evolutionary Algorithms
44Break
  453. (a) Introduction toEvolutionary Computation
- (Ida Sprinkhuizen-Kuyper and EvoNet Flying Circus)
46Recommended literature
- Evonet site 
- http//www.dcs.napier.ac.uk/evonet 
- Slides, demos 
- T. M. Mitchell, Machine Learning, 1997 
- http//www.cs.cmu.edu/tom/ (slides ch. 9) 
- Other literature 
- Goldberg (1989) 
- Michalewicz (1996) 
- Bäck (1996)
47History
- L. Fogel 1962 (San Diego, CA) Evolutionary 
 Programming
- J. Holland 1962 (Ann Arbor, MI)Genetic 
 Algorithms
- I. Rechenberg  H.-P. Schwefel 1965 (Berlin, 
 Germany) Evolution Strategies
- J. Koza 1989 (Palo Alto, CA)Genetic Programming
48The Metaphor
- EVOLUTION 
- Individual 
- Fitness 
- Environment
- PROBLEM SOLVING 
- Candidate Solution 
- Quality 
- Problem
49The Ingredients
t  1
t
reproduction
selection 
 50The Evolution Mechanism
- Increasing diversity by genetic operators 
- Mutation local search 
- Recombination(crossover)global search
- Decreasing diversity by selection 
- Of parents 
- Of survivors
51The Evolutionary Cycle
Selection
Recombination
Mutation
Replacement 
 52Main Streams
- Genetic Algorithms 
- Evolution Strategies 
- Evolutionary Programming 
- Genetic Programming
53Domains of Application
- Numerical, Combinatorial Optimisation 
- System Modeling and Identification 
- Planning and Control 
- Engineering Design 
- Data Mining 
- Machine Learning 
- Artificial Life 
- Evolving neural networks
54Performance
- Acceptable performance at acceptable costs on a 
 wide range of problems
- Intrinsic parallelism (robustness, fault 
 tolerance)
- Superior to other techniques on complex problems 
 with
-  lots of data, many free parameters 
-  complex relationships between parameters 
-  many (local) optima 
55Advantages
- No presumptions w.r.t. problem space 
- Widely applicable 
- Low development  application costs 
- Easy to incorporate other methods 
- Solutions are interpretable (unlike NN) 
- Can be run interactively, accommodate user 
 proposed solutions
- Provide many alternative solutions
56Disadvantages
- No guarantee for optimal solution within finite 
 time
- Weak theoretical basis 
- May need parameter tuning 
- Often computationally expensive, i.e. slow
573. (b) How to Build an Evolutionary Algorithm
- (Ida Sprinkhuizen-Kuyper and EvoNet Flying Circus)
58Evolutionary algorithms
- Evolutionary algorithms quick implementation 
 guide
- Evolving artificial neural networks
59- GA(fitness, threshold, p, c, m) 
- fitness is a function calculating fitness of an 
 individual in the gene pool
- threshold either fitness to reach or number of 
 generations
- p is population size 
- c is crossover probability 0,1 
- m is mutation probability 0,1 
- Initialize P ? p random individuals 
- Evaluate for each i in P, compute fitness(i) 
- While maxi fitness(i) lt threshold or 
 generation lt treshold
- Select probabilistically select (1-c)p 
 individuals out of P to add to Ps
- Crossover Probabilistically select c/2p pairs 
 of individuals from P. For each pair, lti1, i2gt,
 produce two offspring by applying the crossover
 operator. Add the offspring to Ps too.
- Mutate Apply mutation operator to mp random 
 members of Ps
- Update P ? Ps 
- Evaluate for each h in P, compute Fitness(h) 
- Shift generation generation ? generation  1 
- Return the individual from P that has the highest 
 fitness
60The Steps 
-  In order to build an evolutionary algorithm 
 there are a number of steps that we have to
 perform
- Design a representation 
- Decide how to initialise a population 
- Design a way of mapping a genotype to a phenotype 
- Design a way of evaluating an individual 
61Further Steps
- Design suitable mutation operator(s) 
- Design suitable recombination operator(s) 
- Decide how to manage our population 
- Decide how to select individuals to be parents 
- Decide how to select individuals to be replaced 
- Decide when to stop the algorithm 
62Designing a Representation
-  We have to come up with a method of representing 
 an individual as a genotype.
-  There are many ways to do this and the way we 
 choose must be relevant to the problem that we
 are solving.
-  When choosing a representation, we have to bear 
 in mind how the genotypes will be evaluated and
 what the genetic operators might be.
63Example Discrete Representation
-  Representation of an individual can be using 
 discrete values (binary, integer, or any other
 system with a discrete set of values).
-  Following is an example of binary representation.
CHROMOSOME
GENE 
 64Example Discrete Representation
Phenotype
  65Example Discrete Representation
- Phenotype could be integer numbers
Genotype
Phenotype
 163
127  026  125  024  023  022  121  
120  128  32  2  1  163 
 66Example Discrete Representation
- Phenotype could be Real Numbers 
- e.g. a number between 2.5 and 20.5 using 8 binary 
 digits
Genotype
Phenotype
 13.9609 
 67Example Real-valued representation
- A very natural encoding if the solution we are 
 looking for is a list of real-valued numbers,
 then encode it as a list of real-valued numbers!
 (i.e., not as a string of 1s and 0s)
- Lots of applications, e.g. parameter optimisation 
 (ANNs!)
68Example Real-valued representation
- Individuals are represented as a tuple of n 
 real-valued numbers
- The fitness function maps tuples of real numbers 
 to a single real number
69Phenotype to Genotype
- Sometimes producing the phenotype from the 
 genotype is a simple and obvious process.
- Other times the genotype might be a set of 
 parameters to some algorithm, which works on the
 problem data to produce the phenotype
Genotype
Problem Data
Growth Function
Phenotype 
 70Evaluating an Individual
- This is by far the most costly step for real 
 applications
- do not re-evaluate unmodified individuals 
- It might be a subroutine, a black-box simulator, 
 or any external process
- (e.g. robot experiment) 
- You could use approximate fitness - but not for 
 too long
71More on Evaluation
- Constraint handling - what if the phenotype 
 breaks some constraint of the problem
- penalize the fitness 
- specific evolutionary methods 
- Multi-objective evolutionary optimization 
 gives a set of compromise solutions
72Mutation Operators
-  We might have one or more mutation operators for 
 our representation.
-  Some important points are 
- At least one mutation operator should allow every 
 part of the search space to be reached
- The size of mutation is important and should be 
 controllable
- Mutation should produce valid chromosomes
73Example
1 1 1 1 1 1 1 
before
mutated gene
Mutation usually happens with probability pm for 
each gene 
 74Example Real-valued mutation
- Perturb values by adding some random noise 
- Often, a Gaussian/normal distribution N(0,?) is 
 used, where
-  0 is the mean value 
-  ? is the standard deviation 
- and 
- xi  xi  N(0,?i) 
- for each parameter
75Recombination (crossover)
-  We might have one or more recombination 
 operators for our representation.
-  Some important points are 
- The child should inherit something from each 
 parent. If this is not the case then the operator
 is a mutation operator.
- The recombination operator should be designed in 
 conjunction with the representation so that
 recombination is not always catastrophic
- Recombination should produce valid chromosomes. 
76Example Recombination for Discrete Representation
Whole Population
Each chromosome is cut into n pieces which are 
recombined. (Example for n1)
offspring 
 77Example Recombination for real valued 
representation
Discrete recombination (uniform crossover) given 
two parents one child is created as follows 
 78Example Recombination for real valued 
representation
Intermediate recombination (arithmetic 
crossover) given two parents one child is 
created as follows
? 
 79Selection Strategy
-  We want to have some way to ensure that better 
 individuals have a better chance of being
 parents than less good individuals.
-  This will give us selection pressure which will 
 drive the population forward.
-  We have to be careful to give less good 
 individuals at least some chance of being parents
 - they may include some useful genetic material.
80Example Fitness proportionate selection
- Expected number of times fi is selected for 
 mating is
- Better (fitter) individuals have 
- more space 
- more chances to be selected
Best
Worst 
 81Example Fitness proportionate selection
- Disadvantages 
- Danger of premature convergence because 
 outstanding individuals take over the entire
 population very quickly
- Low selection pressure when fitness values are 
 near each other
- Behaves differently on transposed versions of the 
 same function
82Example Fitness proportionate selection
- Fitness scaling A cure for FPS 
- Start with the raw fitness function f. 
- Standardise to ensure 
- Lower fitness is better fitness. 
- Optimal fitness equals to 0. 
- Adjust to ensure 
- Fitness ranges from 0 to 1. 
- Normalise to ensure 
- The sum of the fitness values equals to 1.
83Example Tournament selection
- Select k random individuals, without replacement 
- Take the best 
- k is called the size of the tournament
84Example Ranked based selection
- Individuals are sorted on their fitness value 
 from best to worse. The place in this sorted list
 is called rank.
- Instead of using the fitness value of an 
 individual, the rank is used by a function to
 select individuals from this sorted list. The
 function is biased towards individuals with a
 high rank ( good fitness).
85Replacement Strategy
-  The selection pressure is also affected by the 
 way in which we decide which members of the
 population to kill in order to make way for our
 new individuals.
-  We can use the stochastic selection methods in 
 reverse, or there are some deterministic
 replacement strategies.
-  We can decide never to replace the best in the 
 population elitism.
86Recombination vs Mutation
- Recombination 
- modifications depend on the whole population 
- decreasing effects with convergence 
- exploitation operator 
- Mutation 
- mandatory to escape local optima 
- strong causality principle 
- exploration operator 
87Stopping criterion
- The optimum is reached! 
- Limit on CPU resources 
 Maximum number of fitness
 evaluations
-  Limit on the users patience 
 After some generations without
 improvement
88Algorithm performance
-  Never draw any conclusion from a single run 
-  use statistical measures (averages, medians) 
-  from a sufficient number of independent runs 
-  From the application point of view 
-  design perspective 
- find a very good solution at least once 
-  production perspective 
- find a good solution at almost every run
89Algorithm Performance (2)
-  Remember the WYTIWYG principal 
- What you test is what you get - dont tune 
 algorithm performance on toy data and expect it
 to work with real data.
90Key issues
- Genetic diversity 
- differences of genetic characteristics in the 
 population
- loss of genetic diversity  all individuals in 
 the population look alike
- snowball effect 
- convergence to the nearest local optimum 
- in practice, it is irreversible 
91Key issues (2)
- Exploration vs Exploitation 
- Exploration sample unknown regions 
- Too much exploration  random search, no 
 convergence
- Exploitation  try to improve the best-so-far 
 individuals
- Too much expoitation  local search only  
 convergence to a local optimum
924. Reinforcement learning
- (I. Sprinkhuizen-Kuyper, K. Tuyls)
93Recommended literature
- Sutton, R.S. and A.G. Barto (1998), Reinforcement 
 Learning An Introduction, MIT Press.
 http//www.cs.ualberta.ca/sutton/book/the-book.ht
 ml
- Mitchell, T.(1997). Machine Learning. McGraw 
 Hill.
- RL repository at MSU (http//web.cps.msu.edu/rlr) 
94Reinforcement Learning
- Roots of reinforcement learning (RL) 
- Preliminaries (need to know!) 
- The setting 
- Properties 
- The Markov Property 
- Markov Decision Processes (MDP)
95Roots of Reinforcement Learning
- Origins from 
- Mathematical psychology (early 10s) 
- Control theory (early 50s) 
- Mathematical psychology 
- Edward Thorndike research on animals via puzzle 
 boxes
- Bush  Mosteller developed one of the first 
 models of learning behavior
- Control theory 
- Richard Bellman Stability theory of Differential 
 Equations How to design an optimal controller?
- Inventor Dynamic Programming solving optimal 
 control problems by solving the Bellman equations!
96Preliminaries Setting of Reinforcement Learning
- What is it? 
- Learning from interaction 
- Learning about, from, and while interacting with 
 an external environment
- Learning what to dohow to map situations to 
 actionsso as to maximize a numerical reward
 signal
97Preliminaries Setting of Reinforcement Learning
- Key features? 
- Learner is not told which actions to take 
- Trial-and-Error search 
- Possibility of delayed reward 
- Sacrifice short-term gains for greater long-term 
 gains
- The need to explore and exploit 
- Considers the whole problem of a goal-directed 
 agent interacting with an uncertain environment
98Preliminaries properties of RLSupervised versus 
Unsupervised
- Supervised learning Unsupervised learning
Training Info  desired (target) outputs
Training Info  evaluations (rewards / 
penalties)
SupervisedLearning System
ReinforcementLearning System
Inputs
Outputs
Inputs
Outputs
actions
states
 Error  (target output  actual output) 
Objective get as much reward as possible 
 99 Preliminaries properties of RL The 
Agent-Environment Interface 
 100Preliminaries properties of RL Learning how to 
behave
- Reinforcement learning methods specify how the 
 agent changes its policy as a result of
 experience.
- Roughly, the agents goal is to get as much 
 reward as it can over the long run.
101Preliminaries properties of RL Abstraction
- Getting the Degree of Abstraction Right 
- Time steps need not refer to fixed intervals of 
 real time.
- Actions can be low level (voltages to motors), or 
 high level (accept job offer), mental (shift
 focus of attention), etc.
- States can be low-level sensations, or 
 abstract, symbolic, based on memory, or
 subjective (surprised or lost).
- The environment is not necessarily unknown to the 
 agent, only incompletely controllable.
102Preliminaries Properties of RL Goals and Rewards
- Is a scalar reward signal an adequate notion of a 
 goal?maybe not, but it is surprisingly flexible.
- A goal should specify what we want to achieve, 
 not how we want to achieve it.
- A goal must be outside the agents direct 
 controlthus outside the agent.
- The agent must be able to measure success 
- explicitly 
- frequently during its lifespan.
103Preliminaries Properties of RL Whats the 
objective?
Episodic tasks interaction breaks naturally into 
episodes, e.g., plays of a game, trips through a 
maze. 
Immediate reward
Long term reward 
 104Preliminaries Properties of RL Returns for 
Continuing Tasks
Continuing tasks interaction does not have 
natural episodes. 
Discounted return 
 105An Example
Avoid failure the pole falling beyond a critical 
angle or the cart hitting end of track.
As an episodic task where episode ends upon 
failure
As a continuing task with discounted return
In either case, return is maximized by avoiding 
failure for as long as possible. 
 106Another Example
Get to the top of the hill as quickly as 
possible. 
Return is maximized by minimizing number of 
steps reach the top of the hill. 
 107Preliminaries properties of RL A Unified 
Notation
- Think of each episode as ending in an absorbing 
 state that always produces reward of zero
- We can cover all cases by writing
108The Markov Property
- The state at step t, means whatever information 
 is available to the agent at step t about its
 environment.
- The state can include immediate sensations, 
 highly processed sensations, and structures built
 up over time from sequences of sensations.
- A state should summarize past sensations so as to 
 retain all essential information, i.e., it
 should have the Markov Property
109Markov Decision Processes
- If a reinforcement learning task has the Markov 
 Property, it is basically a Markov Decision
 Process (MDP).
- If state and action sets are finite, it is a 
 finite MDP.
- To define a finite MDP, you need to give 
- state and action sets 
- one-step dynamics defined by transition 
 probabilities
- reward probabilities 
110An Example Finite MDP
- At each step, robot has to decide whether it 
 should (1) actively search for a can, (2) wait
 for someone to bring it a can, or (3) go to home
 base and recharge.
- Searching is better but runs down the battery if 
 runs out of power while searching, has to be
 rescued (which is bad).
- Decisions made on basis of current energy level 
 high, low.
- Reward  number of cans collected
111Recycling Robot MDP 
 112Reinforcement procedure
- Bellman equations 
- Policy evaluation and improvement 
- Policy iteration (value functions) 
113Reinforcement methods
- Dynamic programming 
- Monte Carlo methods 
- Temporal Difference (TD) learning
114Wrapping up