Adaptive Dynamic Programming - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Adaptive Dynamic Programming

Description:

Bellman's Principle of Optimality ... According to Bellman's principle of optimality, the optimal cost from time t on is equal to ... – PowerPoint PPT presentation

Number of Views:108
Avg rating:3.0/5.0
Slides: 34
Provided by: Der64
Category:

less

Transcript and Presenter's Notes

Title: Adaptive Dynamic Programming


1
Adaptive Dynamic Programming
A New Tool for Learning Control
  • Derong Liu
  • Department of Electrical and Computer Engineering
  • University of Illinois at Chicago
  • dliu_at_ece.uic.edu

2
Outline of the Presentation
  • Artificial Neural Networks
  • Dynamic Programming
  • Adaptive Dynamic Programming
  • Approximate Optimal Control
  • Adaptive Critic Designs
  • Applications

3
Neural Networks
  • Multilayer Feedforward Neural Networks
  • Universal approximators
  • Backpropagation training algorithm
  • Gradient descent search in the weight space
  • Radial Basis Function Networks
  • Recurrent Neural Networks

4
Dynamic Programming
  • Consider
  • Performance index (or cost)

5
Dynamic Programming
  • The utility function U indicates the performance
    of the overall system.
  • J x(i),i is the cost-to-go of initial state
    x(i).
  • Objective is to choose the control sequence u(k),
    ki, i1,..., to minimize the function J in (2).

6
Bellmans Principle of Optimality
  • An optimal control policy has the property
    that no matter what previous decisions have been,
    the remaining decisions must constitute an
    optimal policy with regard to the state resulting
    from those previous decisions.

7
Dynamic Programming
8
Dynamic Programming
  • If one applies an arbitrary control u(t) at time
    t and then uses the known optimal control
    sequence from t1 on, the resulting cost will be
  • where x(t) is the state at time t and x(t1)
    is determined by (1).

9
Dynamic Programming
  • According to Bellmans principle of optimality,
    the optimal cost from time t on is equal to

10
Dynamic Programming
  • Principle of optimality
  • It can be applied to nonlinear systems with
    constraints on the control and state variables.
  • Allows us to optimize over only one control
    vector at a time by working backward in time.

11
Curse of Dimensionality
  • Dynamic programming is applicable to problems
    that minimize/maximize a cost.
  • Applicable to many engineering problems.
  • Backward numerical process.
  • Backward in time.
  • Unknown function J.
  • Computational complexity increases exponentially
    as the number of variables.
  • Only suitable to small problems in practice.

12
Adaptive Dynamic Programming
  • Approximate dynamic programming.
  • HDP Heuristic dynamic programming.
  • DHP Dual heuristic dynamic programming.
  • GDHP Globalized DHP.
  • AD Action-dependent.
  • Use a neural network to approximate the cost
    function J.
  • Critic network.

13
Adaptive Critic Designs (ADHDP)
The two modules in a typical action-dependent
ACD.
14
The Critic Network
  • Its output will approximate the cost function J.
  • The function J is unknown.
  • How to train such a network?
  • The target values are not known!

15
The Critic Network
  • Define an error function

16
The Critic Network
17
The Critic Network
  • When minimizing the square of error function
    in (3),
  • we have a neural network trained so that its
    output becomes an estimate of the cost function
    defined in (2) for it1.

18
Critic Network Training
19
Critic Network Training
20
Action Network Training
21
Adaptive Critic Designs
  • The critic network is trained by minimizing
  • The action network is trained by minimizing
  • Approximate Dynamic Programming (ADP)
  • Adaptive Dynamic Programming
  • Asymptotic Dynamic Programming
  • Neurodynamic Programming (NDP)
  • Adaptive Critic Designs (ACD)

22
Adaptive Critic Designs
23
Comparison with Conventional Approach

NN 1
C

NN 2
C
2
24
Reinforcement Learning
  • Q-learning is very close to ADHDP.
  • TD learning is a special HDP.
  • Look-up tables are used in RL
  • Finite or discrete set of states.

25
Applications
  • Aircraft autolander
  • NSF, NASA.
  • Spacecraft attitude control.
  • Ship autosteering.
  • Traffic control in communication networks CDMA
    wireless networks
  • NSF.

26
The Pole Balancing Problem
27
The Pole Balancing Problem
28
The Pole Balancing Problem
29
The Pole Balancing Problem
  • The critic network is 5-7-1.
  • Hidden layer tanh
  • Output layer linear
  • The action network is 4-6-1.
  • Hidden layer tanh
  • Output layer tanh

30
Simulation Results
  • Initial states
  • Results
  • 26 trials on average to balance the pole
  • Neural network training uses the simple gradient
    approach

31
The Pole Balancing Problem
32
Concluding Remarks
  • Adaptive critic design is a very robust learning
    control approach.
  • Theoretical development started in 1977 by
    Werbos.
  • Practical applications with significant economic
    impact are expected in the next few years.

33
Where to Get More Information
  • dliu_at_ece.uic.edu
  • http//liu.ece.uic.edu
  • (312) 355-4475
Write a Comment
User Comments (0)
About PowerShow.com