Adaptive Dynamic Programming - PowerPoint PPT Presentation

1 / 33

About This Presentation

Title:

Adaptive Dynamic Programming

Description:

Bellman's Principle of Optimality ... According to Bellman's principle of optimality, the optimal cost from time t on is equal to ... – PowerPoint PPT presentation

Number of Views:109

Avg rating:3.0/5.0

Slides: 34

Provided by: Der64

Category:

more less

Transcript and Presenter's Notes

Title: Adaptive Dynamic Programming

1
Adaptive Dynamic Programming
A New Tool for Learning Control

Derong Liu
Department of Electrical and Computer Engineering
University of Illinois at Chicago
dliu_at_ece.uic.edu

2
Outline of the Presentation

Artificial Neural Networks
Dynamic Programming
Adaptive Dynamic Programming
Approximate Optimal Control
Adaptive Critic Designs
Applications

3
Neural Networks

Multilayer Feedforward Neural Networks
Universal approximators
Backpropagation training algorithm
Gradient descent search in the weight space

Radial Basis Function Networks
Recurrent Neural Networks

4
Dynamic Programming

Consider
Performance index (or cost)

5
Dynamic Programming

The utility function U indicates the performance
of the overall system.
J x(i),i is the cost-to-go of initial state
x(i).
Objective is to choose the control sequence u(k),
ki, i1,..., to minimize the function J in (2).

6
Bellmans Principle of Optimality

An optimal control policy has the property
that no matter what previous decisions have been,
the remaining decisions must constitute an
optimal policy with regard to the state resulting
from those previous decisions.

7
Dynamic Programming
8
Dynamic Programming

If one applies an arbitrary control u(t) at time
t and then uses the known optimal control
sequence from t1 on, the resulting cost will be
where x(t) is the state at time t and x(t1)
is determined by (1).

9
Dynamic Programming

According to Bellmans principle of optimality,
the optimal cost from time t on is equal to

10
Dynamic Programming

Principle of optimality
It can be applied to nonlinear systems with
constraints on the control and state variables.
Allows us to optimize over only one control
vector at a time by working backward in time.

11
Curse of Dimensionality

Dynamic programming is applicable to problems
that minimize/maximize a cost.
Applicable to many engineering problems.

Backward numerical process.
Backward in time.
Unknown function J.
Computational complexity increases exponentially
as the number of variables.
Only suitable to small problems in practice.

12
Adaptive Dynamic Programming

Approximate dynamic programming.
HDP Heuristic dynamic programming.
DHP Dual heuristic dynamic programming.
GDHP Globalized DHP.
AD Action-dependent.
Use a neural network to approximate the cost
function J.
Critic network.

13
Adaptive Critic Designs (ADHDP)
The two modules in a typical action-dependent
ACD.
14
The Critic Network

Its output will approximate the cost function J.
The function J is unknown.

How to train such a network?
The target values are not known!

15
The Critic Network

Define an error function

16
The Critic Network
17
The Critic Network

When minimizing the square of error function
in (3),
we have a neural network trained so that its
output becomes an estimate of the cost function
defined in (2) for it1.

18
Critic Network Training
19
Critic Network Training
20
Action Network Training
21
Adaptive Critic Designs

The critic network is trained by minimizing
The action network is trained by minimizing

Approximate Dynamic Programming (ADP)

Adaptive Dynamic Programming

Asymptotic Dynamic Programming

Neurodynamic Programming (NDP)

Adaptive Critic Designs (ACD)

22
Adaptive Critic Designs
23
Comparison with Conventional Approach

NN 1
C

NN 2
C
2
24
Reinforcement Learning

Q-learning is very close to ADHDP.
TD learning is a special HDP.
Look-up tables are used in RL
Finite or discrete set of states.

25
Applications

Aircraft autolander
NSF, NASA.
Spacecraft attitude control.
Ship autosteering.
Traffic control in communication networks CDMA
wireless networks
NSF.

26
The Pole Balancing Problem
27
The Pole Balancing Problem
28
The Pole Balancing Problem
29
The Pole Balancing Problem

The critic network is 5-7-1.
Hidden layer tanh
Output layer linear
The action network is 4-6-1.
Hidden layer tanh
Output layer tanh

30
Simulation Results

Initial states

Results
26 trials on average to balance the pole

Neural network training uses the simple gradient
approach

31
The Pole Balancing Problem
32
Concluding Remarks

Adaptive critic design is a very robust learning
control approach.
Theoretical development started in 1977 by
Werbos.
Practical applications with significant economic
impact are expected in the next few years.

33
Where to Get More Information