Choosing a New Tool for Adaptive Control Strategy Design presentation

About This Presentation

Transcript and Presenter's Notes

Title: Choosing a New Tool for Adaptive Control Strategy Design

1
Choosing a New Tool for Adaptive Control Strategy
Design
University St. Kliment Ohridski Faculty of
Technical Sciences, Department of
TrafficTransport Bitola, Republic of Macedonia

Supervisor
PhD Student
Prof. Kristi Bombol, Ph. D.
Kostandina Veljanovska, MSc.

2
Overview of the presentation

The IdeaMotivation
Research Issues Objectives
State-of-the-Art Control Strategies
Problem Identification
Why to use AI techniques?
New Tool Proposal
DiscussionConclusions

3
Motivation

Proper motorway entry access control can decrease
total travel time spent in the system up to 30
and increase the safety of the merging operation
The majority of implemented traffic responsive
motorway entry access systems are of the local,
regulatory type, and not truly adaptive

4
Research Objective

To develop an adaptive closed-loop optimal
control strategy for multiple motorway entry
access using AI technique known as reinforcement
learning (RL)

5
Motorway Entry Access (MEA) Control

Aim limiting access to the motorway mainstream
so as to achieve and maintain capacity flow and
avoid or reduce congestion
Affects the drivers route-choice behavior and
may be employed as a dynamic assignment tool to
encourage use of corridor networks

6
Categories of MEA Control

Fixed Time Control
mainly off line
not responsive to traffic dynamics

Traffic Responsive Metering
directly influenced by the mainline and entry
access traffic conditions

Regulator Approach
SISO Regulator (e.g.ALINEA)
MIMO Regulator (e.g. METALINE)
Optimal Control (e.g. AMOC)

Integrated Systems Control
Multiple motorway entry access control, Signal
timing, VMS

7
Reinforcement Learning (RL)

Machine learning technique without supervision
Goal directed learning from the interaction with
an environment
How to map situations to actions, in order to
maximize a numerical reward signal
Empirical law of effect in animal learning
Implemented when there is incomplete knowledge of
an environment, sequential learning
The most important features trial and error
search and delayed reward
Dualistic character of decision making

8
Elements of RL

An Agent
A Policy
A Reward Function
A Value Function
A Model of the Environment

9
Elements of RL

Agent
Decision maker
Hardware or software
Capable of
observing the state of the environment and
performing actions to alter the current state of
the environment

Model of the Environment
The environment everything agent interacts with
Not necessary for RL
Simulated environment easier implementation in
field

10
Elements of RL

Policy
the learning agent's way of behaving at a given
time

Reward Function
the goal in the reinforcement learning problem
it maps each perceived state (or state-action
pair) of the environment to a single number
indicating the desirability of that state

Value Function
what is good in the long run
indicate the long-term desirability of states
after taking into account the states that are
likely to follow, and the reward available in
those states.

11
RL Framework

The agent receives some representation of the
environments state and on that basis selects
action. One step later it receives reward.
The aim is to optimize a long term performance
measure (e.g. cumulative reward)

12
Q - learning

Originates from the concept and principles of DP
Integrates planning and learning
Markov property
the probability distributions for the reward and
transition functions from one state to another
depend on s and a, and they do not depend on
previous states or actions
Non-deterministic MDP

13
Q - Learning Algorithm

In the learning rule the non-deterministic
environment is accommodated
Estimated value of the Q-value

After infinite number of visits, estimated Q
value will converge to its true value Q

14
Q Learning Example
15
New Control Tool Why To Use It

Model of the environment not required
Learning relations between states, actions and
rewards
Truly adaptive
capable of responding to not only dynamic sensory
inputs from the environment, but also a
dynamically changing environment, through ongoing
learning and adaptation
control policy changes itself as a response to
inherent system characteristics changes
Supervision is not required
Learning through dynamic trial-and-error
exploration of alternative actions and
observation of the relative rewards received
Exploit action found to perform well

16
Control Tool Description

Tools used VISSIM traffic micro simulator by PTV
Vision, VBA
Simulation time peak hour
Network one segment of a motorway with three
lanes in each direction and three motorway
entries
Detectors

17
RL agent

Signal
only two phases green and red
green that allows one car per cycle, and red
phase is changeable in order to provide metering
rate changes
default signal timing was created using VISSIM
signal is controlled by an agent that takes
action every 5 min.

18
RL agent

Action selection - e - greedy policy
the best action is exploited with probability 1-e
an exploratory action is chosen randomly with
probability e
Reward - total travel time experienced by all the
vehicles
Effectiveness of the algorithm - measured by
several measures
The most important is travel time spent by all
the vehicles in the network

19
Discussion

The savings in total travel time spent in the
system - expected to increase with the number of
iterations, as Q-learning algorithm converges to
the true optimal Q-values
Communication among agents to expand the
agent's perceptual horizons and the ability to
cooperate towards a globally optimal policy
But, excessive information can increase the
dimensionality of the problem and the
computational effort, and reduce robustness

20
Conclusions

Research is in its infancy
Encouraging preliminary results
RL is feasible for optimal adaptive control for
motorway entry access
Q-learning agent can adapt to the changing
environment of traffic circumstances
Multi agent behavior testing is currently
undergoing
Next step comparison with existing state-of-the
art strategies

21
Contacts

E-mail kostandina_at_rocketmail.com
kristi.bombol_at_uklo.edu.mk

22
THANKS FOR YOUR ATTENTION!

Write a Comment

User Comments (0)

About PowerShow.com

Choosing a New Tool for Adaptive Control Strategy Design PowerPoint PPT Presentation