Title: Adaptive Dynamic Programming
1Adaptive Dynamic Programming
A New Tool for Learning Control
- Derong Liu
- Department of Electrical and Computer Engineering
- University of Illinois at Chicago
- dliu_at_ece.uic.edu
2Outline of the Presentation
- Artificial Neural Networks
- Dynamic Programming
- Adaptive Dynamic Programming
- Approximate Optimal Control
- Adaptive Critic Designs
- Applications
3Neural Networks
- Multilayer Feedforward Neural Networks
- Universal approximators
- Backpropagation training algorithm
- Gradient descent search in the weight space
- Radial Basis Function Networks
- Recurrent Neural Networks
4Dynamic Programming
- Consider
- Performance index (or cost)
5Dynamic Programming
- The utility function U indicates the performance
of the overall system. - J x(i),i is the cost-to-go of initial state
x(i). - Objective is to choose the control sequence u(k),
ki, i1,..., to minimize the function J in (2).
6Bellmans Principle of Optimality
- An optimal control policy has the property
that no matter what previous decisions have been,
the remaining decisions must constitute an
optimal policy with regard to the state resulting
from those previous decisions.
7Dynamic Programming
8Dynamic Programming
- If one applies an arbitrary control u(t) at time
t and then uses the known optimal control
sequence from t1 on, the resulting cost will be - where x(t) is the state at time t and x(t1)
is determined by (1).
9Dynamic Programming
- According to Bellmans principle of optimality,
the optimal cost from time t on is equal to
10Dynamic Programming
- Principle of optimality
- It can be applied to nonlinear systems with
constraints on the control and state variables. - Allows us to optimize over only one control
vector at a time by working backward in time.
11Curse of Dimensionality
- Dynamic programming is applicable to problems
that minimize/maximize a cost. - Applicable to many engineering problems.
- Backward numerical process.
- Backward in time.
- Unknown function J.
- Computational complexity increases exponentially
as the number of variables. - Only suitable to small problems in practice.
12Adaptive Dynamic Programming
- Approximate dynamic programming.
- HDP Heuristic dynamic programming.
- DHP Dual heuristic dynamic programming.
- GDHP Globalized DHP.
- AD Action-dependent.
- Use a neural network to approximate the cost
function J. - Critic network.
13Adaptive Critic Designs (ADHDP)
The two modules in a typical action-dependent
ACD.
14The Critic Network
- Its output will approximate the cost function J.
- The function J is unknown.
- How to train such a network?
- The target values are not known!
15The Critic Network
16The Critic Network
17The Critic Network
- When minimizing the square of error function
in (3), - we have a neural network trained so that its
output becomes an estimate of the cost function
defined in (2) for it1.
18Critic Network Training
19Critic Network Training
20Action Network Training
21Adaptive Critic Designs
- The critic network is trained by minimizing
- The action network is trained by minimizing
-
- Approximate Dynamic Programming (ADP)
- Adaptive Dynamic Programming
- Asymptotic Dynamic Programming
- Neurodynamic Programming (NDP)
- Adaptive Critic Designs (ACD)
22Adaptive Critic Designs
23Comparison with Conventional Approach
NN 1
C
NN 2
C
2
24Reinforcement Learning
- Q-learning is very close to ADHDP.
- TD learning is a special HDP.
- Look-up tables are used in RL
- Finite or discrete set of states.
25Applications
- Aircraft autolander
- NSF, NASA.
- Spacecraft attitude control.
- Ship autosteering.
- Traffic control in communication networks CDMA
wireless networks - NSF.
26The Pole Balancing Problem
27The Pole Balancing Problem
28The Pole Balancing Problem
29The Pole Balancing Problem
- The critic network is 5-7-1.
- Hidden layer tanh
- Output layer linear
- The action network is 4-6-1.
- Hidden layer tanh
- Output layer tanh
30Simulation Results
- Results
- 26 trials on average to balance the pole
- Neural network training uses the simple gradient
approach
31The Pole Balancing Problem
32Concluding Remarks
- Adaptive critic design is a very robust learning
control approach. - Theoretical development started in 1977 by
Werbos. - Practical applications with significant economic
impact are expected in the next few years.
33Where to Get More Information
- dliu_at_ece.uic.edu
- http//liu.ece.uic.edu
- (312) 355-4475