Title: Lecture VI: Adaptive Systems
1Lecture VI Adaptive Systems
- Zhixin Liu
- Complex Systems Research Center,
- Academy of Mathematics and Systems Sciences, CAS
2In the last lecture, we talked about
- Game Theory
- An embodiment of the complex interactions among
individuals - Nash equilibrium
- Evolutionarily stable strategy
3In this lecture, we will talk about
4Adaptation
- To adapt to change oneself to conform to a new
or changed circumstance. - What we know from the new circumstance?
- Adaptive estimation, learning, identification
- How to do the corresponding response?
- Control/decision making
5Why Adaptation?
- Uncertainties always exist in modeling of
practical systems. - Adaptation can reduce the uncertainties by using
the system information. - Adaptation is an important embodiment of human
intelligence.
6Framework of Adaptive Systems
Environment
7Two levels of adaptation
- Individual learn and adapt
- Population level
- Death of old individuals
- Creation of new individuals
- Hierarchy
8Some Examples
- Adaptive control systems
- adaptation in a single agent
- Iterated prisoners dilemma
- adaptation among agents
9Some Examples
- Adaptive control systems
- adaptation in a single agent
- Iterated prisoners dilemma
- adaptation among agents
10Adaptation In A Single Agent
Environment
11Information
wt
yt
ut
Dynamical System
- Information prior posterior
- I0I1
I0 prior knowledge about the system I1
posterior knowledge about the system
u0,u1,ut, y0,y1,,yt (Observations)
The posterior information can be used to reduce
the uncertainties of the system.
12Uncertainty
External Uncertainty
Internal Uncertainty
- External uncertainty noise/disturbance
- Internal uncertainty
- Parameter uncertainty
- Signal uncertainty
- Functional uncertainty
13Adaptation
- To adapt to change oneself to conform to a new
or changed circumstance. - What we know from the new circumstance?
- Adaptive estimation, learning, identification
- How to do the corresponding response?
- Control/decision making
14 15Adaptive Estimation
- Adaptive estimation parameter or structure
estimator, which can be updated based on the
on-line observations.
yt
Adaptive Estimator
e
-
?
yt
ut
System
- Example In the parametric case, the parameter
estimator can be obtained by minimizing certain
prediction error
16Adaptive Estimation
- Parameter estimation
- Consider the following linear regression model
unknown parameter vector
regression vector
noise sequence
- Remark
- Linear regression model may be nonlinear.
- Linear system can be translated into linear
regression model.
17Least Square (LS) Algorithm
- 1795, Gauss, least square algorithm
- The number of functions is greater than that of
the unknown parameters. - The data contain noise.
- Minimize the following prediction error
18Recursive Form of LS
Recursive Form of LS
- where Pt is the following estimation covariance
matrix
A basic problem
19Recursive Form of LS
Theorem (T.L. Lai C.Z. Wei)
Under the above assumption, if the following
condition holds
then the LS has the strong consistency.
20Weighted Least Square
- Minimize the following prediction error
21Self-Convergence of WLS
- Take the weight as follows
with .
TheoremUnder Assumption 1, for any initial
value and any regression vector
, will converge to some vector almost surely.
Lei Guo, 1996, IEEE TAC
22Adaptation
- To adapt to change oneself to conform to a new
or changed circumstance. - What we know from the new circumstance?
- Adaptive estimation, learning, identification
- How to do the corresponding response?
- Control/decision making
23 24Adaptive Control
Adaptive Control a controller with adjustable
parameters (or structures) together with a
mechanism for adjusting them.
y
u
Adaptive Estimator
Plant
r
Adaptive Controller
r
25Robust Control
Model Nominal Ball
r
Can not reduce uncertainty!
26Adaptive Control
- An example
- Consider the following linear regression model
Where a and b are unknown parameters, yt ,
ut, and wt are the output, input and white
noise sequence.
Objective design a control law to minimize the
following average tracking errors
27Adaptive Control
- If (a,b) is known, we can get the optimal
controller
Certainty Equivalence Principle Replace
the unknown parameters in a non-adaptive
controller by its online estimate
If (a,b) is unknown, the adaptive controller can
be taken as
28Adaptive control
- If (a,b) is unknown, the adaptive controller can
be taken as
where (at,bt) can be obtained by LS
29Adaptive Control
30Theoretical Problems
b) Optimality
31Theoretical Obstacles
32Theoretical Obstacles
- 1) The closed-loop system is a very complicated
nonlinear stochastic dynamical system.
2) No useful statistical properties, like
stationarity or independency of the system
signals are available. 3) No properties of (at,
bt) are known a priori.
33Theorem
Assumption1) The noise sequence is a
martingale difference sequence, and there exists
a constant , such that
2) The regression vector is an
adaptive sequence, i.e.,
3) is a deterministic bounded signal.
- Theorem
- Under the above assumptions, the closed-loop
system is stable and optimal.
Lei Guo, Automatica, 1995
34Some Examples
- Adaptive control systems
- adaptation in a single agent
- Iterated prisoners dilemma
- adaptation among agents
35Prisoners Dilemma
- The story of prisoners dilemma
- Player two prisoners
- Action cooperation, Defect
- Payoff matrix
Prisoner B
C
D
(0,5)
(3,3)
C
Prisoner A
(1,1)
(5,0)
D
36Prisoners Dilemma
- No matter what the other does, the best choice
is D. - (D,D) is a Nash Equilibrium.
- But, if both choose D, both will do worse than
if both select C
Prisoner B
C
D
(0,5)
(3,3)
C
Prisoner A
(1,1)
(5,0)
D
37Iterated Prisoners Dilemma
- The individuals
- Meet many times
- Can recognize a previous interactant
- Remember the prior outcome
- Strategy specify the probability of cooperation
and defect based on the history - P(C)f1(History)
- P(D)f2(History)
38Strategies
- Tit For Tat cooperating on the first time, then
repeat opponent's last choice. - Player A C D D C C C C C D D D D C
- Player B D D C C C C C D D D D C
- Tit For Tat and Random - Repeat opponent's last
choice skewed by random setting. - Tit For Two Tats and Random - Like Tit For Tat
except that opponent must make the same choice
twice in a row before it is reciprocated. Choice
is skewed by random setting. - Tit For Two Tats - Like Tit For Tat except that
opponent must make the same choice twice in row
before it is reciprocated. - Naive Prober (Tit For Tat with Random Defection)
- Repeat opponent's last choice (ie Tit For Tat),
but sometimes probe by defecting in lieu of
cooperating. - Remorseful Prober (Tit For Tat with Random
Defection) - Repeat opponent's last choice (ie
Tit For Tat), but sometimes probe by defecting in
lieu of cooperating. If the opponent defects in
response to probing, show remorse by cooperating
once. - Naive Peace Maker (Tit For Tat with Random
Co-operation) - Repeat opponent's last choice (ie
Tit For Tat), but sometimes make peace by
co-operating in lieu of defecting. - True Peace Maker (hybrid of Tit For Tat and Tit
For Two Tats with Random Cooperation) - Cooperate
unless opponent defects twice in a row, then
defect once, but sometimes make peace by
cooperating in lieu of defecting. - Random - always set at 50 probability
39Strategies
- Tit For Tat cooperating on the first time, then
repeat opponent's last choice. - Player A C D D C C C C C D D D D C
- Player B D D C C C C C D D D D C
- Tit For Tat and Random - Repeat opponent's last
choice skewed by random setting. - Tit For Two Tats and Random - Like Tit For Tat
except that opponent must make the same choice
twice in a row before it is reciprocated. Choice
is skewed by random setting. - Tit For Two Tats - Like Tit For Tat except that
opponent must make the same choice twice in row
before it is reciprocated. - Naive Prober (Tit For Tat with Random Defection)
- Repeat opponent's last choice (ie Tit For Tat),
but sometimes probe by defecting in lieu of
cooperating. - Remorseful Prober (Tit For Tat with Random
Defection) - Repeat opponent's last choice (ie
Tit For Tat), but sometimes probe by defecting in
lieu of cooperating. If the opponent defects in
response to probing, show remorse by cooperating
once. - Naive Peace Maker (Tit For Tat with Random
Co-operation) - Repeat opponent's last choice (ie
Tit For Tat), but sometimes make peace by
co-operating in lieu of defecting. - True Peace Maker (hybrid of Tit For Tat and Tit
For Two Tats with Random Cooperation) - Cooperate
unless opponent defects twice in a row, then
defect once, but sometimes make peace by
cooperating in lieu of defecting. - Random - always set at 50 probability
40Strategies
- Always Defect
- Always Cooperate
- Grudger (Co-operate, but only be a sucker once) -
Cooperate until the opponent defects. Then always
defect unforgivingly. - Pavlov (repeat last choice if good outcome) - If
5 or 3 points scored in the last round then
repeat last choice. - Pavlov / Random (repeat last choice if good
outcome and Random) - If 5 or 3 points scored in
the last round then repeat last choice, but
sometimes make random choices. - Adaptive - Starts with c,c,c,c,c,c,d,d,d,d,d and
then takes choices which have given the best
average score re-calculated after every move. - Gradual - Cooperates until the opponent defects,
in such case defects the total number of times
the opponent has defected during the game.
Followed up by two co-operations. - Suspicious Tit For Tat - As for Tit For Tat
except begins by defecting. - Soft Grudger - Cooperates until the opponent
defects, in such case opponent is punished with
d,d,d,d,c,c. - Customised strategy 1 - default setting is T1,
P1, R1, S0, B1, always co-operate unless
sucker (ie 0 points scored). - Customised strategy 2 - default setting is T1,
P1, R0, S0, B0, always play alternating
defect/cooperate.
41Iterated Prisoners Dilemma
- Which strategy can thrive/what is the good
strategy? - Robert Axelrod, 1980s
- A computer round-robin tournament
- The first round
- The second round
AXELROD R. 1987. The evolution of strategies in
the iterated Prisoners' Dilemma. In L. Davis,
editor, Genetic Algorithms and Simulated
Annealing. Morgan Kaufmann, Los Altos, CA.
42Characters of good strategies
- Goodness never defect first
- First round the first eight strategies with
goodness - Second round fourteen strategies with
goodness in the first fifteen strategies - Forgiveness may revenge, but the memory is
short. - Grudger is not s strategy with forgiveness
- Goodness and forgiveness is a kind of
collective behavior. - For a single agent, defect is the best strategy.
43Evolution of the Strategies
- Evolve good strategies by genetic algorithm
(GA)
44Some Notations in GA
- String the individuals, and it is used to
represent the chromosome in genetics. - Population the set of the individuals
- Population size the number of the individuals
- Gene the elements of the string
- E.g., S1011, where 1,0,1,1 are called
genes. - Fitness the adaptation of the agent for the
circumstance -
From Jing Hans PPT
45How GA works?
- Represent the solution of the problem by
chromosome, i.e., the string - Generate some chromosomes as the initial solution
randomly - According to the principle of Survival of the
Fittest , the chromosome with high fitness can
reproduce, then by crossover and mutation the new
generation can be generated. - The chromosome with the highest fitness may be
the solution of the problem.
From Jing Hans PPT
46GA
Natural Selection
Create new generation
crossover
- choose an initial population
- determine the fitness of each individual
- perform selection
- repeat
- perform crossover
- perform mutation
- determine the fitness of each individual
- perform selection
- until some stopping criterion applies
mutation
From Jing Hans PPT
47Some Remarks On GA
- The GA search the optimal solution from a set of
solution, rather than a single solution - The search space is large 0,1L
- GA is a random algorithm, since selection,
crossover and mutation are all random operations.
- Suitable for the following situation
- There is structure in the search space but it is
not well-understood - The inputs are non-stationary (i.e., the
environment is changing) - The goal is not global optimization, but finding
a reasonably good solution quickly
48Evolution of Strategies By GA
- Each chromosome represents one strategy
- The strategy is deterministic and it is
determined by the previous moves. - E.g., the strategy is determined by one step
history, then there are four cases of history - Player I C D D C
- Player II D D C C
- The number of the possible strategies is
222216. - TFT F(CC)C, F(CD)D, F(DC)C, F(DD)D
- Always cooperate F(CC)F(CD)F(DC)F(DD)C
- Always defect F(CC)F(CD)F(DC)F(DD)D
49Evolution of the Strategies
- Strategies use the outcome of the three previous
moves to determine the current move. - The possible number of the histories is
44464. - Player I CCC CCD CDC CDD DCC DCD
DDD DDD - Player II CCC CCC CCC CCC CCC
CCC DDC DDD
C C C C C C
C C C C C
C C C C
D D D D D D
D D D
- The initial premises is three hypothetical move.
- The length of the chromosome is 70.
- The total number of strategies is 2701021.
50Evolution of good strategy
- Five steps of evolving good strategies by GA
- An initial population is chosen.
- Each individual is run in the current environment
to determine its effectiveness. - The relatively successful individual are selected
to have more offspring. - The successful individuals are randomly paired
off to produce two offspring per mating. - Crossover way of constructing the chromosomes of
the two offspring from the chromosome of two
parents. - Mutation randomly changing a very small
proportion of the Cs to Ds and vice versa. - New population are generated.
51Evolution of the Strategies
- Some parameters
- The population size in each generation is 20.
- Each game consists of 151 moves.
- Each of them meet eight representatives, and this
made about 24,000 moves per generation. - A run consists of 50 generation
- Forty runs were conducted.
52Results
- The median member is as successful as TFT
- Most of the strategies is resemble TFT
- Some of them have the similar patterns as TFT
- Do not rock the boat continue to cooperate after
the mutual cooperation - Be provocable defect when the other player
defects out of the blue - Accept an apology continue to cooperate after
cooperation has been restored - Forget cooperate when mutual cooperation has
been restored after an exploitation - Accept a rut defect after three mutual
defections
53What is a good strategy?
- TFT is a good strategy?
- Tit For Two Tats may be the best strategy in the
first round, but it is not a good strategy in the
second round. - Good strategy depends on other strategies,
i.e., environment.
Evolutionarily stable strategy
54Evolutionarily stable strategy (ESS)
- Introduced by John Maynard Smith and George R.
Price in 1973 - ESS means evolutionarily stable strategy, that is
a strategy such that, if all member of the
population adopt it, then no mutant strategy
could invade the population under the influence
of natural selection. - ESS is robust for evolution, it can not be
invaded by mutation.
John Maynard Smith, Evolution and the Theory of
Games
55Definition of ESS
- A strategy x is an ESS if for all y, y ? x, such
that -
- holds for small positivee.
56ESS in IPD
- Tit For Tat can not be invaded by the wiliness
strategies, such as always defect. - TFT can be invaded by goodness strategies, such
as always cooperate, Tit For Two Tats and
Suspicious Tit For Tat - Tit For Tat is not a strict ESS.
- Always Cooperate can be invaded by Always
Defect. - Always Defect is an ESS.
57Other Adaptive Systems
- Complex adaptive system
- John Holland, Hidden Order, 1996
- Examples
- stock market, social insect, ant colonies,
biosphere, brain, immune system, cell ,
developing embryo, - Evolutionary algorithms
- genetic algorithm, neural network,
58References
- Lei Guo, Self-convergence of weighted
least-squares with applications to stochastic
adaptive control, IEEE Trans. Automat. Contr.,
1996, 41(1) 79-89. - Lei Guo, Convergence and logarithm laws of
self-tuning regulators, 1995, Automatica, 31(3)
435-450. - Lei Guo, Adaptive systems theory some basic
concepts, methods and results, Journal of Systems
Science and Complexity, 16(3) 293-306. - Drew Fudenberg, Jean Tirole, Game Theory, The
MIT Press, 1991. - AXELROD R. 1987, The evolution of strategies in
the iterated Prisoners' Dilemma. In L. Davis,
editor, Genetic Algorithms and Simulated
Annealing. Morgan Kaufmann, Los Altos, CA. - Richard Dawkins, The Selfish Gene, Oxford
University Press. - John Holland, Hidden Order, 1996.
59Adaptation in a single agent
60Summary
In these six lectures, we have talked about
Complex Networks Collective Behavior of
MAS Game Theory Adaptive Systems
61Summary
In these six lectures, we have talked about
Complex Networks Topology Collective Behavior
of MAS Game Theory Adaptive Systems
62Three concepts
- Average path length ltlgt
- where dij is the shortest distance between i
and j. - Clustering Coefficient CltC(i)gt
-
-
- Degree distribution
- P(k)probability that the randomly chosen
node i has exactly k neighbors
Short average path length Large clustering
coefficient Power law degree distribution
63Regular Graphs
- Regular graphs graphs where each vertex has
the same number of neighbors. - Examples
64Random Graph
- ER random graph model G(N,p)
- Given N nodes
- Add an edge between a randomly-selected pair of
nodes with probability p
- Homogeneous nature each node has roughly the
same number of edges
65Small World Networks
Introduce pNK/2 long-range edges
A few long-range links are sufficient to
decrease l, but will not significantly change C.
66Scale Free Networks
- Some observations
- A breakthrough Barabási Albert, 1999, Science
- Generating process of BA model
- 1) Starting with a network with m0 nodes
- 2) Growth at each step, we add a new node
with m(?m0) edges that link the new node to m
different nodes already present in the network. - 3) Preferential attachment When choosing
the nodes to which the new nodes connects, we
assume that the probability ? that a new node
will be connected to node i on the degree ki of
node i, such that
67Summary
In these six lectures, we have talked about
Complex Networks Topology Collective Behavior
of MAS More is different Game Theory Adaptive
Systems
68Multi-Agent System (MAS)
- MAS
- Many agents
- Local interactions between agents
- Collective behavior in the population level
- More is different.---Philp Anderson, 1972
- e.g., Phase transition, coordination,
synchronization, consensus, clustering,
aggregation, - Examples
- Physical systems
- Biological systems
- Social and economic systems
- Engineering systems
-
69Vicsek Model
Neighbors
70Theorem 2 (Jadbabaie et al. , 2003)
Joint connectivity of the neighbor graphs on each
time interval th, (t1)h with h gt0
Synchronization of the linearized Vicsek model
Related result J.N.Tsitsiklis, et al., IEEE
TAC, 1984
71 Theorem 7 High Density Implies Synchronization
- For any given system parameters
- and when the number of agnets n
- is large, the Vicsek model will synchronize
almost surely.
This theorem is consistent with the simulation
result.
72Theorem 8 High density with short distance
interaction
Let
and the velocity satisfy Then
for large population, the MAS will synchronize
almost surely.
73Soft Control
- Key points
- Different from distributed control approach.
Intervention to the distributed system - Not to change the local rule of the existing
agents - Add one (or a few) special agent called shill
based on the system state information, to
intervene the collective behavior - The shill is controlled by us, but is treated
as an ordinary agent by all other agents. - Shill is not leader, not leader-follower type.
- Feedback intervention by shill(s).
This page is very important!
From Jing Hans PPT
74Leader-Follower Model
- Key points
- Not to change the local rule of the existing
agents. - Add some (usually not very few) information
agents called leaders, to control or
intervene the MAS But the existing agents
treated them as ordinary agents. - The proportion of the leaders is controlled by us
(If the number of leaders is small, then
connectivity may not be guaranteed). - Open-loop intervention by leaders.
75Summary
In these six lectures, we have talked about
Complex Networks Topology Collective Behavior
of MAS More is different Game Theory
Interactions Adaptive Systems
76Definition of Nash Equilibrium
- Nash Equilibrium (NE) A solution concept of a
game
- (N, S, u) a game
- Si strategy set for player i
- set of
strategy profiles -
payoff function - s-i strategy profile of all players except
player i - A strategy profile s is called a Nash
equilibrium if - where si is any pure strategy of the player i.
77Definition of ESS
- A strategy x is an ESS if for all y, y ? x, such
that -
- holds for small positivee.
78Summary
In these six lectures, we have talked about
Complex Networks Topology Collective Behavior
of MAS More is different Game Theory
Interactions Adaptive Systems Adaptation
79Other Topics
- Self-organizing criticality
- Earthquakes, fire, sand pile model, Bak-Sneppen
model - Nonlinear dynamics
- chaos, bifurcation,
- Artificial life
- Tierra model, gene pool, game of life,
- Evolutionary dynamics
- genetic algorithm, neural network,
80Complex systems
- Not a mature subject
- No unified framework or universal methods
81THE END