Choosing a New Tool for Adaptive Control Strategy Design PowerPoint PPT Presentation

presentation player overlay
1 / 22
About This Presentation
Transcript and Presenter's Notes

Title: Choosing a New Tool for Adaptive Control Strategy Design


1
Choosing a New Tool for Adaptive Control Strategy
Design
University St. Kliment Ohridski Faculty of
Technical Sciences, Department of
TrafficTransport Bitola, Republic of Macedonia
  • Supervisor
    PhD Student
  • Prof. Kristi Bombol, Ph. D.
    Kostandina Veljanovska, MSc.

2
Overview of the presentation
  • The IdeaMotivation
  • Research Issues Objectives
  • State-of-the-Art Control Strategies
  • Problem Identification
  • Why to use AI techniques?
  • New Tool Proposal
  • DiscussionConclusions

3
Motivation
  • Proper motorway entry access control can decrease
    total travel time spent in the system up to 30
    and increase the safety of the merging operation
  • The majority of implemented traffic responsive
    motorway entry access systems are of the local,
    regulatory type, and not truly adaptive

4
Research Objective
  • To develop an adaptive closed-loop optimal
    control strategy for multiple motorway entry
    access using AI technique known as reinforcement
    learning (RL)

5
Motorway Entry Access (MEA) Control
  • Aim limiting access to the motorway mainstream
    so as to achieve and maintain capacity flow and
    avoid or reduce congestion
  • Affects the drivers route-choice behavior and
    may be employed as a dynamic assignment tool to
    encourage use of corridor networks

6
Categories of MEA Control
  • Fixed Time Control
  • mainly off line
  • not responsive to traffic dynamics
  • Traffic Responsive Metering
  • directly influenced by the mainline and entry
    access traffic conditions
  • Regulator Approach
  • SISO Regulator (e.g.ALINEA)
  • MIMO Regulator (e.g. METALINE)
  • Optimal Control (e.g. AMOC)
  • Integrated Systems Control
  • Multiple motorway entry access control, Signal
    timing, VMS

7
Reinforcement Learning (RL)
  • Machine learning technique without supervision
  • Goal directed learning from the interaction with
    an environment
  • How to map situations to actions, in order to
    maximize a numerical reward signal
  • Empirical law of effect in animal learning
  • Implemented when there is incomplete knowledge of
    an environment, sequential learning
  • The most important features trial and error
    search and delayed reward
  • Dualistic character of decision making

8
Elements of RL
  • An Agent
  • A Policy
  • A Reward Function
  • A Value Function
  • A Model of the Environment

9
Elements of RL
  • Agent
  • Decision maker
  • Hardware or software
  • Capable of
  • observing the state of the environment and
  • performing actions to alter the current state of
    the environment
  • Model of the Environment
  • The environment everything agent interacts with
  • Not necessary for RL
  • Simulated environment easier implementation in
    field

10
Elements of RL
  • Policy
  • the learning agent's way of behaving at a given
    time
  • Reward Function
  • the goal in the reinforcement learning problem
  • it maps each perceived state (or state-action
    pair) of the environment to a single number
  • indicating the desirability of that state
  • Value Function
  • what is good in the long run
  • indicate the long-term desirability of states
    after taking into account the states that are
    likely to follow, and the reward available in
    those states.

11
RL Framework
  • The agent receives some representation of the
    environments state and on that basis selects
    action. One step later it receives reward.
  • The aim is to optimize a long term performance
    measure (e.g. cumulative reward)



12
Q - learning
  • Originates from the concept and principles of DP
  • Integrates planning and learning
  • Markov property
  • the probability distributions for the reward and
    transition functions from one state to another
    depend on s and a, and they do not depend on
    previous states or actions
  • Non-deterministic MDP

13
Q - Learning Algorithm
  • In the learning rule the non-deterministic
    environment is accommodated
  • Estimated value of the Q-value
  • After infinite number of visits, estimated Q
    value will converge to its true value Q

14
Q Learning Example
15
New Control Tool Why To Use It
  • Model of the environment not required
  • Learning relations between states, actions and
    rewards
  • Truly adaptive
  • capable of responding to not only dynamic sensory
    inputs from the environment, but also a
    dynamically changing environment, through ongoing
    learning and adaptation
  • control policy changes itself as a response to
    inherent system characteristics changes
  • Supervision is not required
  • Learning through dynamic trial-and-error
    exploration of alternative actions and
    observation of the relative rewards received
  • Exploit action found to perform well

16
Control Tool Description
  • Tools used VISSIM traffic micro simulator by PTV
    Vision, VBA
  • Simulation time peak hour
  • Network one segment of a motorway with three
    lanes in each direction and three motorway
    entries
  • Detectors

17
RL agent
  • Signal
  • only two phases green and red
  • green that allows one car per cycle, and red
    phase is changeable in order to provide metering
    rate changes
  • default signal timing was created using VISSIM
  • signal is controlled by an agent that takes
    action every 5 min.

18
RL agent
  • Action selection - e - greedy policy
  • the best action is exploited with probability 1-e
  • an exploratory action is chosen randomly with
    probability e
  • Reward - total travel time experienced by all the
    vehicles
  • Effectiveness of the algorithm - measured by
    several measures
  • The most important is travel time spent by all
    the vehicles in the network

19
Discussion
  • The savings in total travel time spent in the
    system - expected to increase with the number of
    iterations, as Q-learning algorithm converges to
    the true optimal Q-values
  • Communication among agents to expand the
    agent's perceptual horizons and the ability to
    cooperate towards a globally optimal policy
  • But, excessive information can increase the
    dimensionality of the problem and the
    computational effort, and reduce robustness

20
Conclusions
  • Research is in its infancy
  • Encouraging preliminary results
  • RL is feasible for optimal adaptive control for
    motorway entry access
  • Q-learning agent can adapt to the changing
    environment of traffic circumstances
  • Multi agent behavior testing is currently
    undergoing
  • Next step comparison with existing state-of-the
    art strategies

21
Contacts
  • E-mail kostandina_at_rocketmail.com
  • kristi.bombol_at_uklo.edu.mk

22
THANKS FOR YOUR ATTENTION!
Write a Comment
User Comments (0)
About PowerShow.com