Application of Reinforcement Learning in Network Routing

About This Presentation

Title:

Application of Reinforcement Learning in Network Routing

Description:

A reinforcement learning task that satisfies the Markov property. Transition probabilities ... episode-by-episode base. Temporal-Difference Learning. Features ... – PowerPoint PPT presentation

Number of Views:176

Avg rating:3.0/5.0

Slides: 30

Provided by: nob872

Learn more at: http://www-ee.eng.hawaii.edu

Category:

more less

Transcript and Presenter's Notes

Title: Application of Reinforcement Learning in Network Routing

1
Application of Reinforcement Learning in Network
Routing

By
Chaopin Zhu

2
Machine Learning

Supervised Learning
Unsupervised Learning
Reinforcement Learning

3
Supervised Learning

Feature Learning with a teacher
Phases
Training phase
Testing phase
Application
Pattern recognition
Function approximation

4
Unsupervised Leaning

Feature
Learning without a teacher
Application
Feature extraction
Other preprocessing

5
Reinforcement Learning

Feature Learning with a critic
Application
Optimization
Function approximation

6
Elements ofReinforcement Learning

Agent
Environment
Policy
Reward function
Value function
Model of environment (optional)

7
Reinforcement Learning Problem
8
Markov Decision Process (MDP)

Definition
A reinforcement learning task that satisfies
the Markov property
Transition probabilities

9
An Example of MDP
10
Markov Decision Process (cont.)

Parameters
Value functions

11
Elementary Methods forReinforcement Learning
Problem

Dynamic programming
Monte Carlo Methods
Temporal-Difference Learning

12
Bellmans Equations
13
Dynamic Programming Methods

Policy evaluation
Policy improvement

14
Dynamic Programming (cont.)

E ---- policy evaluation
I ---- policy improvement
Policy Iteration
Value Iteration

15
Monte Carlo Methods

Feature
Learning from experience
Do not need complete transition probabilities
Idea
Partition experience into episodes
Average sample return
Update at episode-by-episode base

16
Temporal-Difference Learning

Features
(Combination of Monte Carlo and DP ideas)
Learn from experience (Monte Carlo)
Update estimates based in part on other learned
estimates (DP)
TD(?) algorithm seemlessly integrates TD and
Monte Carlo Methods

17
TD(0) Learning

Initialize V(x) arbitrarily
to the policy to be evaluated
Repeat (for each episode)
Initialize x
Repeat (for each step of episode)
a?action given by ? for x
Take action a observe reward r and next state
x
x?x
until x is terminal

18
Q-Learning

Initialize Q(x,a) arbitrarily
Repeat (for each episode)
Initialize x
Repeat (for each step of episode)
Choose a from x using policy derived from Q
Take action a, observe r, x
x?x
until x is terminal

19
Q-Routing

Qx(y,d)----estimated time that a packet would
take to reach the destination node d from current
node x via xs neighbor node y
Ty(d) ------ys estimate for the time remaining
in the trip
qy ---------queuing time in node y
Txy --------transmission time between x and y

20
Algorithm of Q-Routing

Set initial Q-values for each node
Get the first packet from the packet queue of
node x
Choose the best neighbor node and forward the
packet to node by
Get the estimated value from node
Update
Go to 2.

21
Dual Reinforcement Q-Routing
22
Network Model
23
Network Model (cont.)
24
Node Model
25
Routing Controller
26
Initialization/ Termination Procedures

Initilization
Initialize and / or register global variable
Initialize routing table
Termination
Destroy routing table
Release memory

27
Arrival Procedure

Data packet arrival
Update routing table
Route it with control information or destroy the
packet if it reaches the destination
Control information packet arrival
Update routing table
Destroy the packet

28
Departure Procedure

Set all fields of the packet
Get a shortest route
Send the packet according to the route

29
References

1 Richard S. Sutton and Andrew G. Barto,
Reinforcement LearningAn Introduction
2 Chengan Guo, Applications of Reinforcement
Learning in Sequence Detection and Network
Routing
3 Simon Haykin, Neural Networks A
Comprehensive Foundation

Write a Comment

User Comments (0)