Intelligent Systems

1 / 114
About This Presentation
Title:

Intelligent Systems

Description:

Quickly revisiting material on machine learning from courses you already had ... Russel & Norvig: H 19: pp. 563-587 ... and many more. Intelligent systems. 24 ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 115
Provided by: csUni

less

Transcript and Presenter's Notes

Title: Intelligent Systems


1
Intelligent Systems
  • Machine learning
  • Steven de Jong
  • With contributions by K. Tuyls, E. Postma, I.
    Sprinkhuizen-Kuyperand Evonet Flying Circus

2
This lecture
  • Goal
  • Quickly revisiting material on machine learning
    from courses you already had
  • Giving a preview of material that will be
    discussed in the M.Sc. program
  • Focussing on design and applications
  • Interaction!

3
This lecture
  • Content
  • Why use machine learning? A tutorial!
  • Artificial neural networks
  • Evolutionary computation (genetic algorithms)
  • Reinforcement learning
  • Not the content
  • Lots of theory I will provide some references
  • Two hours of talking lots of slides though ?

4
1. Machine learning
  • Tutorial based on your assignment

5
Q What is the most powerful problem solver
in the Universe?
  • The (human) brain
    that created the wheel, New York, wars
    and so on (after Douglas Adams)
  • The evolution mechanism
    that created the human brain (after Darwin et
    al.)

6
Building problem solvers by looking at and
mimicking
  • brains ? neurocomputing
  • evolution ? evolutionary computing

7
Taxonomy
8
Why use machine learning?
  • Speed
  • Robustness
  • Flexibility
  • Adaptivity
  • Context sensitivity

9
ML and the assignment
  • Create a robot controller that uses planning
    and other techniques to navigate a physical
    robot in a maze
  • So, why opt for machine learning?

10
Example 1 recognize crossing
  • Sensors
  • 300mm
  • 1000mm
  • 980mm
  • 290mm
  • 6000mm
  • 760mm
  • 780mm
  • 5600mm

2
1
11
Example 1 recognize crossing
  • Sensors
  • 300mm
  • 1000mm
  • 980mm
  • 290mm
  • 6000mm
  • 760mm
  • 780mm
  • 5600mm
  • Rule-based
  • IF 250lts1lt350 AND
  • Lots of work
  • Crappy performance!
  • Sensors are noisy
  • What exactly defines a crossing?
  • Is it not a T-joint?

12
The machine learning perspective
  • What kind of task is this?
  • What input data do we have?
  • What output data do we want?
  • Supervised or unsupervised learning?
  • ? Which method is suitable?

13
The machine learning perspective
  • What kind of task is this?
  • Classification
  • What input data do we have?
  • Eight sonar sensor values
  • What output data do we want?
  • Probability that values represent crossing,
    T-joint, corridor
  • Supervised or unsupervised learning?
  • Probably supervised

14
The machine learning perspective
  • Supervised learning classification
  • Make a large maze, mark areas as being crossings,
    T-joints, corridors
  • Place robots at random locations and in random
    orientations
  • Train your ML method until all locations
    correctly classified
  • Problem classification depends on orientation!

15
The machine learning perspective
  • Orientation maze is seen with robot!!!

16
Example 2 keep your lane
  • Robot follows a hallway
  • Possibly with angles!
  • Problem
  • Noise on actuators (motors, 3rd wheel, dust on
    the floor)
  • The worse the robots position, the worse
    performance on classification

17
Example 2 keep your lane
  • Idea
  • Monitor distance from the walls
  • If the robot is significantly off-center, perform
    a correction
  • Problems
  • Distance monitoring
  • Some lanes are really short
  • What if robot already badly aligned?

18
The machine learning perspective
  • What kind of task is this?
  • Control
  • What input data do we have?
  • Eight sonar sensor values (and camera)
  • What output data do we want?
  • A robot that keeps itself aligned
  • Supervised or unsupervised learning?
  • Probably unsupervised

19
The machine learning perspective
  • Unsupervised learning control
  • Develop a large maze
  • Develop tasks move from crossing A to crossing E
    (adjacent)
  • Couple sensory information to motors with some ML
    method
  • Quality
  • Short time needed to reach destination
  • Low number of collisions with the wall

20
Sensory-motor coordination
  • Idea
  • Enhance information obtained by sensors by
    actively using motors
  • For example, for
  • Aligning the robotor
  • being more sure about classification
  • you might stop forward movement and start
    rotating the robot in a scanning fashion

21
2. Artificial Neural Networks
  • (parts by E. Postma)

22
Artificial neural networks
  • You have seen these often in the past
  • I will provide only a quick overview
  • Slides will be put online

23
Recommended literature
  • Russel Norvig H 19 pp. 563-587
  • and many more


24
A peek into the neural computer
  • Is it possible to develop a computer model after
    the natural example (the human brain)?
  • Brain-inspired models
  • Models that possess a limited number of
    structural and functional properties of the
    neural computer

25
Neurons, the building blocks of the brain
26
Neural activity
out
in
27
Synapses, the basis of learning and memory
28
Hebbian learning (Donald Hebb)
?w(1,2) ? a(1) a(2)
29
(Artificial) Neural Networks
  • Neurons
  • Activity
  • Non-linear transfer function (!)
  • Connections
  • Adaptive weights
  • Learning
  • Supervised
  • Unsupervised

30
Artificial Neurons
  • input (vectors)
  • summation (excitation)
  • output (activation)

i1
a f(e)
i2
e
i3
31
Transfer function
  • Non-linear function(sigmoid)

a to 0
f(x)
a to infinity
x
32
Artificial connections (Synapses)
  • wAB
  • The weight of the connection from neuron A to
    neuron B

33
The Perceptron
34
Learning in the Perceptron
  • Delta learning rule (supervised)
  • The difference between the target tand actual
    output o, given input x
  • Global error E
  • Is a function of the differences between the
    target and actual output of the patterns to be
    learnt

35
Gradient descent
36
Decision boundaries linear!
37
The multilayer perceptron
input
hidden
output
38
Learning in the MLP
39
Sigmoïd function (logistic)
  • Alternative tanh (lt-1,1gt instead of lt0,1gt)
  • Derivative f(x) f(x) 1 f(x)

40
Updating the hidden-to-output weights
41
Updating the input-to-hidden weights
42
Forward Backward Propagation
43
Implementation
  • Use ADT for graphs
  • Or just use matrices and vectors
  • Vectors for input and output
  • Matrices for each transition / layer (wij)
  • Learning
  • Supervised e.g., Backpropagation
  • Unsupervised e.g., Evolutionary Algorithms

44
Break
  • See you in 10 minutes

45
3. (a) Introduction toEvolutionary Computation
  • (Ida Sprinkhuizen-Kuyper and EvoNet Flying Circus)

46
Recommended literature
  • Evonet site
  • http//www.dcs.napier.ac.uk/evonet
  • Slides, demos
  • T. M. Mitchell, Machine Learning, 1997
  • http//www.cs.cmu.edu/tom/ (slides ch. 9)
  • Other literature
  • Goldberg (1989)
  • Michalewicz (1996)
  • Bäck (1996)

47
History
  • L. Fogel 1962 (San Diego, CA) Evolutionary
    Programming
  • J. Holland 1962 (Ann Arbor, MI)Genetic
    Algorithms
  • I. Rechenberg H.-P. Schwefel 1965 (Berlin,
    Germany) Evolution Strategies
  • J. Koza 1989 (Palo Alto, CA)Genetic Programming

48
The Metaphor
  • EVOLUTION
  • Individual
  • Fitness
  • Environment
  • PROBLEM SOLVING
  • Candidate Solution
  • Quality
  • Problem

49
The Ingredients
t 1
t
reproduction
selection
50
The Evolution Mechanism
  • Increasing diversity by genetic operators
  • Mutation local search
  • Recombination(crossover)global search
  • Decreasing diversity by selection
  • Of parents
  • Of survivors

51
The Evolutionary Cycle
Selection
Recombination
Mutation
Replacement
52
Main Streams
  • Genetic Algorithms
  • Evolution Strategies
  • Evolutionary Programming
  • Genetic Programming

53
Domains of Application
  • Numerical, Combinatorial Optimisation
  • System Modeling and Identification
  • Planning and Control
  • Engineering Design
  • Data Mining
  • Machine Learning
  • Artificial Life
  • Evolving neural networks

54
Performance
  • Acceptable performance at acceptable costs on a
    wide range of problems
  • Intrinsic parallelism (robustness, fault
    tolerance)
  • Superior to other techniques on complex problems
    with
  • lots of data, many free parameters
  • complex relationships between parameters
  • many (local) optima

55
Advantages
  • No presumptions w.r.t. problem space
  • Widely applicable
  • Low development application costs
  • Easy to incorporate other methods
  • Solutions are interpretable (unlike NN)
  • Can be run interactively, accommodate user
    proposed solutions
  • Provide many alternative solutions

56
Disadvantages
  • No guarantee for optimal solution within finite
    time
  • Weak theoretical basis
  • May need parameter tuning
  • Often computationally expensive, i.e. slow

57
3. (b) How to Build an Evolutionary Algorithm
  • (Ida Sprinkhuizen-Kuyper and EvoNet Flying Circus)

58
Evolutionary algorithms
  • Evolutionary algorithms quick implementation
    guide
  • Evolving artificial neural networks

59
  • GA(fitness, threshold, p, c, m)
  • fitness is a function calculating fitness of an
    individual in the gene pool
  • threshold either fitness to reach or number of
    generations
  • p is population size
  • c is crossover probability 0,1
  • m is mutation probability 0,1
  • Initialize P ? p random individuals
  • Evaluate for each i in P, compute fitness(i)
  • While maxi fitness(i) lt threshold or
    generation lt treshold
  • Select probabilistically select (1-c)p
    individuals out of P to add to Ps
  • Crossover Probabilistically select c/2p pairs
    of individuals from P. For each pair, lti1, i2gt,
    produce two offspring by applying the crossover
    operator. Add the offspring to Ps too.
  • Mutate Apply mutation operator to mp random
    members of Ps
  • Update P ? Ps
  • Evaluate for each h in P, compute Fitness(h)
  • Shift generation generation ? generation 1
  • Return the individual from P that has the highest
    fitness

60
The Steps
  • In order to build an evolutionary algorithm
    there are a number of steps that we have to
    perform
  • Design a representation
  • Decide how to initialise a population
  • Design a way of mapping a genotype to a phenotype
  • Design a way of evaluating an individual

61
Further Steps
  • Design suitable mutation operator(s)
  • Design suitable recombination operator(s)
  • Decide how to manage our population
  • Decide how to select individuals to be parents
  • Decide how to select individuals to be replaced
  • Decide when to stop the algorithm

62
Designing a Representation
  • We have to come up with a method of representing
    an individual as a genotype.
  • There are many ways to do this and the way we
    choose must be relevant to the problem that we
    are solving.
  • When choosing a representation, we have to bear
    in mind how the genotypes will be evaluated and
    what the genetic operators might be.

63
Example Discrete Representation
  • Representation of an individual can be using
    discrete values (binary, integer, or any other
    system with a discrete set of values).
  • Following is an example of binary representation.

CHROMOSOME
GENE
64
Example Discrete Representation
  • 8 bits Genotype

Phenotype
  • Integer
  • Real Number
  • Schedule

65
Example Discrete Representation
  • Phenotype could be integer numbers

Genotype
Phenotype
163
127 026 125 024 023 022 121
120 128 32 2 1 163
66
Example Discrete Representation
  • Phenotype could be Real Numbers
  • e.g. a number between 2.5 and 20.5 using 8 binary
    digits

Genotype
Phenotype
13.9609
67
Example Real-valued representation
  • A very natural encoding if the solution we are
    looking for is a list of real-valued numbers,
    then encode it as a list of real-valued numbers!
    (i.e., not as a string of 1s and 0s)
  • Lots of applications, e.g. parameter optimisation
    (ANNs!)

68
Example Real-valued representation
  • Individuals are represented as a tuple of n
    real-valued numbers
  • The fitness function maps tuples of real numbers
    to a single real number

69
Phenotype to Genotype
  • Sometimes producing the phenotype from the
    genotype is a simple and obvious process.
  • Other times the genotype might be a set of
    parameters to some algorithm, which works on the
    problem data to produce the phenotype

Genotype
Problem Data
Growth Function
Phenotype
70
Evaluating an Individual
  • This is by far the most costly step for real
    applications
  • do not re-evaluate unmodified individuals
  • It might be a subroutine, a black-box simulator,
    or any external process
  • (e.g. robot experiment)
  • You could use approximate fitness - but not for
    too long

71
More on Evaluation
  • Constraint handling - what if the phenotype
    breaks some constraint of the problem
  • penalize the fitness
  • specific evolutionary methods
  • Multi-objective evolutionary optimization
    gives a set of compromise solutions

72
Mutation Operators
  • We might have one or more mutation operators for
    our representation.
  • Some important points are
  • At least one mutation operator should allow every
    part of the search space to be reached
  • The size of mutation is important and should be
    controllable
  • Mutation should produce valid chromosomes

73
Example
1 1 1 1 1 1 1
before
mutated gene
Mutation usually happens with probability pm for
each gene
74
Example Real-valued mutation
  • Perturb values by adding some random noise
  • Often, a Gaussian/normal distribution N(0,?) is
    used, where
  • 0 is the mean value
  • ? is the standard deviation
  • and
  • xi xi N(0,?i)
  • for each parameter

75
Recombination (crossover)
  • We might have one or more recombination
    operators for our representation.
  • Some important points are
  • The child should inherit something from each
    parent. If this is not the case then the operator
    is a mutation operator.
  • The recombination operator should be designed in
    conjunction with the representation so that
    recombination is not always catastrophic
  • Recombination should produce valid chromosomes.

76
Example Recombination for Discrete Representation
Whole Population
Each chromosome is cut into n pieces which are
recombined. (Example for n1)
offspring
77
Example Recombination for real valued
representation
Discrete recombination (uniform crossover) given
two parents one child is created as follows
78
Example Recombination for real valued
representation
Intermediate recombination (arithmetic
crossover) given two parents one child is
created as follows
?
79
Selection Strategy
  • We want to have some way to ensure that better
    individuals have a better chance of being
    parents than less good individuals.
  • This will give us selection pressure which will
    drive the population forward.
  • We have to be careful to give less good
    individuals at least some chance of being parents
    - they may include some useful genetic material.

80
Example Fitness proportionate selection
  • Expected number of times fi is selected for
    mating is
  • Better (fitter) individuals have
  • more space
  • more chances to be selected

Best
Worst
81
Example Fitness proportionate selection
  • Disadvantages
  • Danger of premature convergence because
    outstanding individuals take over the entire
    population very quickly
  • Low selection pressure when fitness values are
    near each other
  • Behaves differently on transposed versions of the
    same function

82
Example Fitness proportionate selection
  • Fitness scaling A cure for FPS
  • Start with the raw fitness function f.
  • Standardise to ensure
  • Lower fitness is better fitness.
  • Optimal fitness equals to 0.
  • Adjust to ensure
  • Fitness ranges from 0 to 1.
  • Normalise to ensure
  • The sum of the fitness values equals to 1.

83
Example Tournament selection
  • Select k random individuals, without replacement
  • Take the best
  • k is called the size of the tournament

84
Example Ranked based selection
  • Individuals are sorted on their fitness value
    from best to worse. The place in this sorted list
    is called rank.
  • Instead of using the fitness value of an
    individual, the rank is used by a function to
    select individuals from this sorted list. The
    function is biased towards individuals with a
    high rank ( good fitness).

85
Replacement Strategy
  • The selection pressure is also affected by the
    way in which we decide which members of the
    population to kill in order to make way for our
    new individuals.
  • We can use the stochastic selection methods in
    reverse, or there are some deterministic
    replacement strategies.
  • We can decide never to replace the best in the
    population elitism.

86
Recombination vs Mutation
  • Recombination
  • modifications depend on the whole population
  • decreasing effects with convergence
  • exploitation operator
  • Mutation
  • mandatory to escape local optima
  • strong causality principle
  • exploration operator

87
Stopping criterion
  • The optimum is reached!
  • Limit on CPU resources
    Maximum number of fitness
    evaluations
  • Limit on the users patience
    After some generations without
    improvement

88
Algorithm performance
  • Never draw any conclusion from a single run
  • use statistical measures (averages, medians)
  • from a sufficient number of independent runs
  • From the application point of view
  • design perspective
  • find a very good solution at least once
  • production perspective
  • find a good solution at almost every run

89
Algorithm Performance (2)
  • Remember the WYTIWYG principal
  • What you test is what you get - dont tune
    algorithm performance on toy data and expect it
    to work with real data.

90
Key issues
  • Genetic diversity
  • differences of genetic characteristics in the
    population
  • loss of genetic diversity all individuals in
    the population look alike
  • snowball effect
  • convergence to the nearest local optimum
  • in practice, it is irreversible

91
Key issues (2)
  • Exploration vs Exploitation
  • Exploration sample unknown regions
  • Too much exploration random search, no
    convergence
  • Exploitation try to improve the best-so-far
    individuals
  • Too much expoitation local search only
    convergence to a local optimum

92
4. Reinforcement learning
  • (I. Sprinkhuizen-Kuyper, K. Tuyls)

93
Recommended literature
  • Sutton, R.S. and A.G. Barto (1998), Reinforcement
    Learning An Introduction, MIT Press.
    http//www.cs.ualberta.ca/sutton/book/the-book.ht
    ml
  • Mitchell, T.(1997). Machine Learning. McGraw
    Hill.
  • RL repository at MSU (http//web.cps.msu.edu/rlr)

94
Reinforcement Learning
  • Roots of reinforcement learning (RL)
  • Preliminaries (need to know!)
  • The setting
  • Properties
  • The Markov Property
  • Markov Decision Processes (MDP)

95
Roots of Reinforcement Learning
  • Origins from
  • Mathematical psychology (early 10s)
  • Control theory (early 50s)
  • Mathematical psychology
  • Edward Thorndike research on animals via puzzle
    boxes
  • Bush Mosteller developed one of the first
    models of learning behavior
  • Control theory
  • Richard Bellman Stability theory of Differential
    Equations How to design an optimal controller?
  • Inventor Dynamic Programming solving optimal
    control problems by solving the Bellman equations!

96
Preliminaries Setting of Reinforcement Learning
  • What is it?
  • Learning from interaction
  • Learning about, from, and while interacting with
    an external environment
  • Learning what to dohow to map situations to
    actionsso as to maximize a numerical reward
    signal

97
Preliminaries Setting of Reinforcement Learning
  • Key features?
  • Learner is not told which actions to take
  • Trial-and-Error search
  • Possibility of delayed reward
  • Sacrifice short-term gains for greater long-term
    gains
  • The need to explore and exploit
  • Considers the whole problem of a goal-directed
    agent interacting with an uncertain environment

98
Preliminaries properties of RLSupervised versus
Unsupervised
  • Supervised learning Unsupervised learning

Training Info desired (target) outputs
Training Info evaluations (rewards /
penalties)
SupervisedLearning System
ReinforcementLearning System
Inputs
Outputs
Inputs
Outputs
actions
states
Error (target output actual output)
Objective get as much reward as possible
99
Preliminaries properties of RL The
Agent-Environment Interface
100
Preliminaries properties of RL Learning how to
behave
  • Reinforcement learning methods specify how the
    agent changes its policy as a result of
    experience.
  • Roughly, the agents goal is to get as much
    reward as it can over the long run.

101
Preliminaries properties of RL Abstraction
  • Getting the Degree of Abstraction Right
  • Time steps need not refer to fixed intervals of
    real time.
  • Actions can be low level (voltages to motors), or
    high level (accept job offer), mental (shift
    focus of attention), etc.
  • States can be low-level sensations, or
    abstract, symbolic, based on memory, or
    subjective (surprised or lost).
  • The environment is not necessarily unknown to the
    agent, only incompletely controllable.

102
Preliminaries Properties of RL Goals and Rewards
  • Is a scalar reward signal an adequate notion of a
    goal?maybe not, but it is surprisingly flexible.
  • A goal should specify what we want to achieve,
    not how we want to achieve it.
  • A goal must be outside the agents direct
    controlthus outside the agent.
  • The agent must be able to measure success
  • explicitly
  • frequently during its lifespan.

103
Preliminaries Properties of RL Whats the
objective?
Episodic tasks interaction breaks naturally into
episodes, e.g., plays of a game, trips through a
maze.
Immediate reward
Long term reward
104
Preliminaries Properties of RL Returns for
Continuing Tasks
Continuing tasks interaction does not have
natural episodes.
Discounted return
105
An Example
Avoid failure the pole falling beyond a critical
angle or the cart hitting end of track.
As an episodic task where episode ends upon
failure
As a continuing task with discounted return
In either case, return is maximized by avoiding
failure for as long as possible.
106
Another Example
Get to the top of the hill as quickly as
possible.
Return is maximized by minimizing number of
steps reach the top of the hill.
107
Preliminaries properties of RL A Unified
Notation
  • Think of each episode as ending in an absorbing
    state that always produces reward of zero
  • We can cover all cases by writing

108
The Markov Property
  • The state at step t, means whatever information
    is available to the agent at step t about its
    environment.
  • The state can include immediate sensations,
    highly processed sensations, and structures built
    up over time from sequences of sensations.
  • A state should summarize past sensations so as to
    retain all essential information, i.e., it
    should have the Markov Property

109
Markov Decision Processes
  • If a reinforcement learning task has the Markov
    Property, it is basically a Markov Decision
    Process (MDP).
  • If state and action sets are finite, it is a
    finite MDP.
  • To define a finite MDP, you need to give
  • state and action sets
  • one-step dynamics defined by transition
    probabilities
  • reward probabilities

110
An Example Finite MDP
  • At each step, robot has to decide whether it
    should (1) actively search for a can, (2) wait
    for someone to bring it a can, or (3) go to home
    base and recharge.
  • Searching is better but runs down the battery if
    runs out of power while searching, has to be
    rescued (which is bad).
  • Decisions made on basis of current energy level
    high, low.
  • Reward number of cans collected

111
Recycling Robot MDP
112
Reinforcement procedure
  • Bellman equations
  • Policy evaluation and improvement
  • Policy iteration (value functions)

113
Reinforcement methods
  • Dynamic programming
  • Monte Carlo methods
  • Temporal Difference (TD) learning

114
Wrapping up
  • Any questions, remarks?
Write a Comment
User Comments (0)