Title: Fuzzy Inference System Learning By Reinforcement
1Fuzzy Inference System Learning By Reinforcement
2A Comparison of Fuzzy Classical Controllers
- Fuzzy Controller Expert systems based on if-then
rules where premises and conclusions are
expressed by means of linguistic terms. - Rules close to natural language
- A priori knowledge
- Classical Controller Need analytical task model.
3Design Problem of FC
- A priori knowledge extraction is not easy
- Disagreement between experts
- Great number of variables necessary to solve the
control task
4Self Tunning FIS
- A direct teacher based on input-output set of
trainning data. - A distal teacher does not give the correct
actions, but the desired effect on the process. - A performance measure EA
- A critic gives rewards and punishment with
respect to state reached by the learner. RL
methods. - There are no more than two fuzzy sets activated
for an input value
5Goal
- To overcome the limitations of classical
reinforcement learning methods, discrete state
perception and discrete actions. - NOTE In this paper MISO FIS is used.
6A MIMO FIS
FIS is made of N rules of the following form
Ri ith rule of the rule base Siinput
variables Lij linguistic term of input variable
its membership function ?Lij YNOoutput
variables Oij linguistic term of output variable
7Rule Preconditions
- Membership functions are triangles and trapezoids
(altough not differentiable). - because they are simple
- Sufficient in a number of application
- Strong fuzzy partition used
- All values activate at least one fuzzy set, the
input universe is completely covered.
8Strong Fuzzy Partition Example
9Rule Conclusions
- Each of i rule has No corresponding conclusions
- For Each Rule the truth value with respect to S
is computed with - where T norm is implemented by a product
- The FIS outputs are
10Learning
- Number and positions of the input fuzzy labels
being set using a priori knowledge. - Structural Learning consists in tuning the
number of rules. - FACL and FQL learning are reinforcement learning
methods that deal with only the conclusion part.
11Reinforcement Learning
NOTE state observability is total.
12Markovian Decision Problem
- S a finite discrete state
- U a finite discrete action
- R primary reinforcements RSxU?R
- P transition probabilities
- PSxUxS ?0,1.
- State evaluation function
13The Curse of Dimensionality
- Some form of generalization must be incorporated
in state representation. Various function
approximators used - CMAC
- Neural Networks
- FIS the state space encoding is based on a
vector corresponding to the current state.
14Adaptive Heuristic Critic
- AHC is made of two components
- Adaptive Critic Element Critic developed in an
adaptive way from primary reinforcements,
represent an evaluation function more informative
than the one given by the environment through
rewards and punishment (V(S) values). - Associative Search Element selects actions which
lead to better critic values
15FACL Scheme
16The Critic
At time step t, the critic value is computed with
conclusion vector
TD error is given by
TD-learning update rule
17The Actor
- When the rule Ri is activated, one of the Ri
local action is elected to participate in the
global action, based on its quality. The global
action triggered - where ?-greedy is a function implementing
mixed exploration-exploitation strategy.
18Tunning vector w
- TD error, the improvement measure except in the
beginning is a good approximator of the optimal
evaluation function. The actor learning rule
19Meta Learning Rule
- Update strategie for learning rate
- Every parameter should have its learning rate.
(?1?n) - Every learning rate should be allowed to vary
over time. (in order V values to converge) - When the derivative of a parameter have the same
sign for several consecutive time steps, its
learning rate should be increased. - When the parameter derivative sign alternates for
several consecutive time steps, its learning rate
should be decreased. Delta-Bar-Delta rule
20Execution Procedure
- Estimation of evaluation function corresponding
to the current state. - Computation of the TD error.
- Tunning of parameter vector v and w.
- Estimation of the new evaluation function for the
current state with new conclusion vector vt1. - Learning rate updating with Delta-Bar-Delta rule.
- For each activated rule, election of the local
action computation and triggering of the global
action Ut1.
21Example
22Example Cont.
- The number of rules is twenty five.
- For the sake of simplicity, the discerete actions
available are the same for all rules. - The discerete action set
- The reinforcement function
23Results
- Performance measure for distance
- Results