Application of Reinforcement Learning in Network Routing - PowerPoint PPT Presentation

About This Presentation
Title:

Application of Reinforcement Learning in Network Routing

Description:

A reinforcement learning task that satisfies the Markov property. Transition probabilities ... episode-by-episode base. Temporal-Difference Learning. Features ... – PowerPoint PPT presentation

Number of Views:176
Avg rating:3.0/5.0
Slides: 30
Provided by: nob872
Category:

less

Transcript and Presenter's Notes

Title: Application of Reinforcement Learning in Network Routing


1
Application of Reinforcement Learning in Network
Routing
  • By
  • Chaopin Zhu

2
Machine Learning
  • Supervised Learning
  • Unsupervised Learning
  • Reinforcement Learning

3
Supervised Learning
  • Feature Learning with a teacher
  • Phases
  • Training phase
  • Testing phase
  • Application
  • Pattern recognition
  • Function approximation

4
Unsupervised Leaning
  • Feature
  • Learning without a teacher
  • Application
  • Feature extraction
  • Other preprocessing

5
Reinforcement Learning
  • Feature Learning with a critic
  • Application
  • Optimization
  • Function approximation

6
Elements ofReinforcement Learning
  • Agent
  • Environment
  • Policy
  • Reward function
  • Value function
  • Model of environment (optional)

7
Reinforcement Learning Problem
8
Markov Decision Process (MDP)
  • Definition
  • A reinforcement learning task that satisfies
    the Markov property
  • Transition probabilities

9
An Example of MDP
10
Markov Decision Process (cont.)
  • Parameters
  • Value functions

11
Elementary Methods forReinforcement Learning
Problem
  • Dynamic programming
  • Monte Carlo Methods
  • Temporal-Difference Learning

12
Bellmans Equations
13
Dynamic Programming Methods
  • Policy evaluation
  • Policy improvement

14
Dynamic Programming (cont.)
  • E ---- policy evaluation
  • I ---- policy improvement
  • Policy Iteration
  • Value Iteration

15
Monte Carlo Methods
  • Feature
  • Learning from experience
  • Do not need complete transition probabilities
  • Idea
  • Partition experience into episodes
  • Average sample return
  • Update at episode-by-episode base

16
Temporal-Difference Learning
  • Features
  • (Combination of Monte Carlo and DP ideas)
  • Learn from experience (Monte Carlo)
  • Update estimates based in part on other learned
    estimates (DP)
  • TD(?) algorithm seemlessly integrates TD and
    Monte Carlo Methods

17
TD(0) Learning
  • Initialize V(x) arbitrarily
  • to the policy to be evaluated
  • Repeat (for each episode)
  • Initialize x
  • Repeat (for each step of episode)
  • a?action given by ? for x
  • Take action a observe reward r and next state
    x
  • x?x
  • until x is terminal

18
Q-Learning
  • Initialize Q(x,a) arbitrarily
  • Repeat (for each episode)
  • Initialize x
  • Repeat (for each step of episode)
  • Choose a from x using policy derived from Q
  • Take action a, observe r, x
  • x?x
  • until x is terminal

19
Q-Routing
  • Qx(y,d)----estimated time that a packet would
    take to reach the destination node d from current
    node x via xs neighbor node y
  • Ty(d) ------ys estimate for the time remaining
    in the trip
  • qy ---------queuing time in node y
  • Txy --------transmission time between x and y

20
Algorithm of Q-Routing
  • Set initial Q-values for each node
  • Get the first packet from the packet queue of
    node x
  • Choose the best neighbor node and forward the
    packet to node by
  • Get the estimated value from node
  • Update
  • Go to 2.

21
Dual Reinforcement Q-Routing
22
Network Model
23
Network Model (cont.)
24
Node Model
25
Routing Controller
26
Initialization/ Termination Procedures
  • Initilization
  • Initialize and / or register global variable
  • Initialize routing table
  • Termination
  • Destroy routing table
  • Release memory

27
Arrival Procedure
  • Data packet arrival
  • Update routing table
  • Route it with control information or destroy the
    packet if it reaches the destination
  • Control information packet arrival
  • Update routing table
  • Destroy the packet

28
Departure Procedure
  • Set all fields of the packet
  • Get a shortest route
  • Send the packet according to the route

29
References
  • 1 Richard S. Sutton and Andrew G. Barto,
    Reinforcement LearningAn Introduction
  • 2 Chengan Guo, Applications of Reinforcement
    Learning in Sequence Detection and Network
    Routing
  • 3 Simon Haykin, Neural Networks A
    Comprehensive Foundation
Write a Comment
User Comments (0)
About PowerShow.com