Title: Combining fuzzy and statistical uncertainty: probabilistic fuzzy systems and their applications
1 Combining fuzzy and statistical uncertainty
probabilistic fuzzy systems and their
applications
- prof.dr.ir. Jan van den Berg
- - TUDelft Faculties of TPM and EWI
- - Cyber Security Academy The Hague
- j.vandenberg_at_tudelft.nl
- http//tbm.tudelft.nl/index.php?id30084L1
2Summary of the talk
- Two complementary conceptualizations of
uncertainty will be discussed statistical and
fuzzy uncertainty. - These uncertainties can be combined into one
theory on probabilistic fuzzy events. Using this
theory,classical fuzzy systems can be generalized
to probabilistic fuzzy systems (PFS). - PFS can be induced using both expert knowledge
and data enabling both interpretability and
accuracy (despite the fact there remains an
accuracy-interpretability dilemma we have to deal
with in practice). - To finalize, one or two examples of PFSs we
developed will be shown next time!
3Agenda
- Probabilistic/statistical uncertainty
- Fuzzy uncertainty
- Fuzzy systems
- Probabilistic fuzzy theory
- Probabilistic fuzzy systems
- Applications (next time)
- Conclusions
4Probabilitic/statistical uncertainty
- Probabilitic/statistical uncertainty well-known
notion - Crisp events occur with a certain probability
- These probabilities can be assessed statistically
- E.g., frequentist approach by repeating
experiments - A lot of theory on unbiased estimation, ML
estimation, etc. - In continuous outcome spaces, probability
distributions are used - Mathematical statistics descriptive and
inferential statistics(the latter on drawing
conclusions from data using some model for the
data)
5Agenda
- Probabilistic/statistical uncertainty
- Fuzzy uncertainty
- Fuzzy systems
- Probabilistic fuzzy theory
- Probabilistic fuzzy systems
- Applications
- Conclusions
6Crisp sets
- Collection of definite, well-definable objects
(elements) to form a whole, having crisp
boundaries
- Representations of sets
- list of all elementsA x1, ¼,xn, xj Î X
- elements with property PAxx satisfies P , x
Î X - Venn diagram
- characteristic functionfA X 0,1, fA(x)
1, Û x Î XfA(x) 0, Û x Ï X
A
6
7Fuzzy sets
- Sets with fuzzy, gradual boundaries (Zadeh 1965)
- A fuzzy set A in X is characterized by its
membership function mA X 0,1
A fuzzy set A is completely determined by the set
of ordered pairs A(x,mA(x)) x Î X X is
called the domain or universe of discourse
8Fuzzy sets on discrete universes
- Fuzzy set C desirable city to live in
- X SF, Boston, LA (discrete and non-ordered)
- C (SF, 0.9), (Boston, 0.8), (LA, 0.6)
- Fuzzy set A sensible number of children
- X 0, 1, 2, 3, 4, 5, 6 (discrete universe)
- A (0, .1), (1, .3), (2, .7), (3, 1), (4, .6),
(5, .2), (6, .1)
9Fuzzy sets on continuous universes
- Fuzzy set B about 50 years old
- X Set of positive real numbers (continuous)
- B (x, mB(x)) x in X
10Fuzzy partition
- Fuzzy partition formed by the linguistic values
young, middle aged, and old - For any age sum of membership values 1
11Operations with fuzzy sets
Note the multiple definitions!
12Set theoretic operations, examples
minimum
maximum
13Agenda
- Probabilistic/statistical uncertainty
- Fuzzy uncertainty
- Fuzzy systems
- Probabilistic fuzzy theory
- Probabilistic fuzzy systems
- Applications
- Conclusions
14Fuzzy Modeling (expert-driven)
- FM can be defined based on expert knowledge
-
- Many human concepts (big, long, high, very much,
adequate, satisfactory, ) are defined in a
(context dependent) quantitative way, describing
state of nature - Examples of facts about the world
- The president is middle-aged
- The water supply is insufficient
- Birth weight of Romanian children is quite low
- Need for formalization linguistic variable
15Linguistic variable
- A numerical variable takes numerical values
- Age 65 (defines
a crisp event) - A linguistic variables takes linguistic values
- Age is old (defines
a fuzzy event) - A linguistic value is defined by a fuzzy set
(enabling a characterization with gradual
transitions) Ex A fuzzy partition of
linguistic cariable Age formed with linguistic
values young, middle aged, and old
16Expert-driven fuzzy modeling, cont.
- Experts can express their knowledge in
- Facts about the world (see above), and
- Fuzzy IF-THEN rules
- Example fuzzy rules
- IF Nutrician state is poor AND Birth weight is
medium AND Respiration disease is absent THEN
Child mortality rate is medium - IF Nutrician state is medium AND Birth weight is
not too low AND Respiration disease is absent
THEN Child mortality rate is rather low - Need for Reasoning/Inference mechanism
17Fuzzy (Inference) System (FS)
- Fuzzifier (interface from crisp to fuzzy)
- Rule base (enabling interpretability/transparency)
- Inference engine (implements fuzzy reasoning)
- Defuzzifier (interface from fuzzy to crisp)
- Note Fuzzifier or defuzzifier may be absent
input
output
18Example Fuzzy System Mamdani model
- Five major steps
- Fuzzification
- Degree of fulfillment
- Inference
- Aggregation
- Defuzzification
- Computations according to Mamdani reasoning apply
a generalized form of classical modus ponens - Given x is A' and If x is A, then y is
B, - Conclude y is B'
19Mamdani reasoning - example
20Resulting FS a smooth non-linear mapping
21Inducing a fuzzy model from data
- If income is Low then tax is Low
- If income is High then tax is High
22 Bias-variance dilemma (!)Interpretability-accura
cy dilemma (!)
- Algorithms exist to gradually induce more and
more rules from data ? - Bias-variance dilemma to find models of right
complexity - Interpretability-accuracy dilemma is another key
issue (of Data Mining)
23Agenda
- Probabilistic/statistical uncertainty
- Fuzzy uncertainty
- Fuzzy systems
- Probabilistic fuzzy theory
- Probabilistic fuzzy systems
- Applications
- Conclusions
24Probability of a crisp and fuzzy events
- Crisp
- Fuzzy (Zadeh, 1968)
-
Satisfies P(AA) 1
25An example, continuous domain
- Answering the question
- What is the probability that a randomly
selected Indian woman is tall?
26Probabilistic fuzzy events, discrete case
- 2 discrete probabilistic fuzzy events
- A1 (? (x 1), ? (x 2)) (m, 1 - n) and A2
(? (x 1), ? (x 2)) (1 m,n) - If x 1 occurs, then
- A1 occurs with degree m
- and
- A2 occurs with degree 1- m
- Similarly, if x 2 occurs
- E.g., A1 means tall and
- A2 small
- where x 1 and x 2 are two
- values of the x - variable length
- Pr(A1) mp (1 n)(1 p), and
-
Pr(A 2) (1 m )p n (1 p)
27Simple estimation of probabilities
- Let x1, , xn be a random sample on a domain X
- The probability of a crisp event A can be
estimated by - The probability of a fuzzy event Ai can be
estimated byassuming that X is well-formed (
fuzzy partition), i.e.
28Deterministic and probabilistic rules
Ex If current returns are large, then future
returns will be large
Linguistic vagueness
Probabilistic uncertainty
29Probabilistic fuzzy rules
30A crazy theoretical sidestep
- Probabilistic Fuzzy Entropy
- To be used to induce fuzzy decision trees
31Definition of PF Entropy, discrete source
- Given a (well-formed) sample space with a fuzzy
partition of fuzzy events A1, , AC defined by
membership functions occurring with
probabilities Pr(A1), , Pr(AC ), the PFE is
defined as - ? PFE is a probabilistic type of entropy defined
in a fuzzily partitioned
(sample) space!
32A very special information source
- Consider 2 discrete strictly complementary
statistical fuzzy events - A1 (? (x1), ? (x2)) (m, 1 - m) , A2 (?
(x1), ? (x2)) (1 m,m)
33A very special information source, cont.
- Since A1 (m, 1 - m), A2 (1 m, m), Pr (x1 )
p , and Pr (x2 ) 1 p, it follows that - Pr(A1) mp (1 m)(1 p) 2mp 1 m p
1 Q (1) - Pr(A2) p (1 m) (1 p) m m p
2mp Q (2)
34Entropy of information source generating 2
strictly complementary stat fuzzy events
- Using definition of PFE (sh. 31 ) and equations
(1) and (2) from previous sheet, it follows
that - Hsf (m,p ) - Q log2 Q - (1 - Q ) log2 (1 -
Q ) where - Q (m,p ) m p - 2 m p
- Q (like 1 Q) relates to the combined
uncertainty of the probabilistic fuzzy
events based on their fuzziness m and the
probability of occurrence p
35Further interpreting Q
- Q (m,p ) m p - 2 m p
- Q 0 or 1 ? no uncertaintyQ 0.5 ? highest
uncertainty - Illumination and interpretation- m p 0 or
1, ? Q 0 two crisp events, one of which
occurs with probability 1- m 0, p 1 or m
1, p 0 ? Q 1, same explanation!- p 0.5
or m 0.5 ? Q 0.5 two fuzzy events having
equal prob, or two non-distinguishable
fuzzy events!!
36 Interpretation of Hpf (m,p)
-
Hpf (m,p ) - Q log2 Q - (1 - Q ) log2 (1 - Q ) - PFE quantifies the combined uncertainty
- Illumination- Q 0 or 1 no uncertainty
H (m,p ) 0, in 4 corners - - p 0.5 or m 0.5 Q 0.5 two fuzzy
events having equal prob, or two
non-distinguishable fuzzy events highest
uncertainty H (m,p ) 1- if m 0 or 1
classical crisp entropy- if p 0 or 1
fuzzy entropy only
37Agenda
- Probabilistic/statistical uncertainty
- Fuzzy uncertainty
- Fuzzy systems
- Probabilistic fuzzy theory
- Probabilistic fuzzy systems
- Applications
- Conclusions
38Probabilistic fuzzy systems, withdiscrete
probability distribution
- Consists of a set of rules whose antecedents are
fuzzy conditions and whose consequents are
probability distributions -
- where,
-
- and
- (3)
consequent
antecedent
fuzzy set
39Deterministic vs. probabilistic FS
If x is A4 then y is B1 with probability p(B1
A4), and y is B2 with probability p(B2 A4),
and y is B3 with probability p(B3 A4).
Y
If x is A4 then y is B2
B3
B2
A5
A3
A1
A4
B1
A2
X
40Probabilistic fuzzy system, with continuous
probability distribution
Additive reasoning
41Probability distribution characterization
- In general, different characterizations can be
used for the conditional probability density in
the rule consequents - This characterization could be an approximation
with a histogram or an explicit model for
density, e.g., a normal or other distribution - In PFS, we can select a fuzzy histogram
characterization
42Histograms, classical crisp case
- Let x1, , xn be a random sample from a
univariate distribution with pdf f(x) - Let the characteristic functions ?i (x) (defining
crisp bins/intervals Ai) constitute a crisp
partitioning - A histogram estimates f (x) (from data xk ) as
follows
43Fuzzy histograms
- Let x1, , xn be a random sample of size n from
a (univariate) distribution with pdf f(x) - Let the membership functions ?i (x) (defining
fuzzy bins Ai) constitute a fuzzy partitioning - A fuzzy histogram estimates f (x) (from data xk )
as follows
44Crisp vs. fuzzy histogram
45PFS fuzzy histogram model
Fuzz IEEE 2013, Hyderabad India
45
46Fuzzy histogram model
Fuzz IEEE 2013, Hyderabad India
46
47Probabilistic Mamdani systems
Reasoning
Centroid of fuzzy consequent set Cj
48Probabilistic fuzzy output model
49Probabilistic TS systems
- Zero-order probabilistic Takagi-Sugeno
50Relation to deterministic FSs
- Zero-order Takagi-Sugeno system
Takagi-Sugeno reasoning
c.f.
51Probabilistic fuzzy systems
summary
- Essentially a fuzzy system that estimates a
probability density function, i.e. the fuzzy
system approximates a p.d.f. - Usually p.d.f. is conditional on the input
- Linguistic information is coded in fuzzy rules
- Combine linguistic uncertainty with probabilistic
uncertainty - Different types of fuzzy systems can be extended
to the PFS equivalent (e.g. Mamdani fuzzy
systems, Takagi-Sugeno fuzzy systems)
52PFS design
- Identifying mental world vs. observed world (van
den Eijkel 1999) - Mental world linguistic descriptions, fuzzy
conceptualization, experts knowledge - Observed world data measurements, probability
density functions, optimal consequent parameters - Optimal design given a mental world application
of conditional probability measures for fuzzy
events - Optimal design given an observed world nonlinear
optimization techniques
53Parameter determination
54Sequential method
- Part 1 Finding the membership function (MF)
parameters - is a fuzzy set defined by a membership
function - E.g. Gaussian MF parameters
- v center of MF
- s width of MF
FCM Clustering
55MF determination
- For the first part of the sequential method,
well-known techniques from fuzzy modeling can be
applied - Fuzzy clustering in input-output product space
- Fuzzy clustering in input and output space
- Expert-driven design
- Similarity-based rule-base simplification
- Feature selection
- Heuristic approaches
- Etc.
56Sequential method
- Part 2 Finding the probability parameters -
Pr(?cAj) - Set the parameters Pr(?cAj) equal to estimates
of the conditional probabilities - conditional
probability estimation -
56
57Estimation of probability parameters
- Conditional probabilities Pr(Cj Aq) can be
assessed directly by using the definition of the
probability of joint events -
- This method does not provide maximum likelihood
estimates of the probability parameters.
57
58Maximum likelihood method
- Part 2 Optimization of vj, sj and Pr(?cAj)
Likelihood of a data set
Minimization of the negative log-likelihood
Optimize parameters vj, sj and Pr(?cAj) that
minimize the error function
Constrained optimization problem (probability
parameters Pr(?cAj) must satisfy summation
conditions)
58
59Maximum likelihood method
Constrained optimization problem
Unconstrained optimization problem
using ujc
- Unconstrained minimization of vj, sj and ujc
- Gradient descent optimization algorithm is used
to minimize the objective function i.e. the
available classification examples are processed
one by one and updates are performed after each
sample
59
60Experimental comparison (1)
- Use Gaussian membership functions
- The centers cql are determined using fuzzy
c-means clustering - The widths sql are set equal to sql minj' ? j
cq cq'
Fuzz IEEE 2013, Hyderabad India
60
61Experimental comparison (2)
- Misclassification rates
- Calculated using ten-fold cross-validation
- Standard deviations reported within parentheses
Wisconsin breast cancer Wine
Sequential method 0.261(0.036) 0.034(0.048)
Maximum likelihood 0.029(0.021) 0.023(0.041)
Fuzz IEEE 2013, Hyderabad India
61
62Future research directions
- New estimation methods for the model parameters
- Joint estimation
- Information-theory based techniques
- Better optimization methods
- Interaction linguistic knowledge and data-driven
estimation - Optimizing model complexity, model simplification
- Interpretability of probabilistic fuzzy models
- Linguistic descriptions of probability density
functions - Equivalence to other systems e.g. fuzzy Markov
models - Density estimation using more complex models as
rule consequents e.g. fuzzy GARCH models - New applications
Fuzz IEEE 2013, Hyderabad India
62
63Agenda
- Probabilistic/statistical uncertainty
- Fuzzy uncertainty
- Fuzzy systems
- Probabilistic fuzzy theory
- Probabilistic fuzzy systems
- Applications
- Conclusions
64Applications
65Agenda
- Probabilistic/statistical uncertainty
- Fuzzy uncertainty
- Fuzzy systems
- Probabilistic fuzzy theory
- Probabilistic fuzzy systems
- Applications
- Conclusions
66Concluding remarks
- Probabilistic fuzzy systems combine linguistic
uncertainty and probabilistic uncertainty - Very useful in applications where a probabilistic
model (pdf estimation) has to be conditioned (or
constrained) by linguistic information - Good parameter estimation methods exist and the
added value of these models has been demonstrated
in various applications
67Conclusions
- Fuzzy models usually show smooth non-linear
behavior - If certain measures are taken, fuzzy models are
interpretable - Fuzzy models can be induced from
- expert knowledge (grid selection and definition
of fuzzy rules) this expert-driven approach was
very successful in control theory) - a data set the data-driven approach
- The accuracy-interpretability dilemma is
prominent! - The bias-variance dilemma is prominent!
68Selected bibliography
- J. van den Berg, W. M. van den Bergh, and U.
Kaymak. Probabilistic and statistical fuzzy set
foundations of competitive exception learning. In
Proceedings of the Tenth IEEE International
Conference on Fuzzy Systems, volume 2, pages
10351038, Melbourne, Australia, Dec. 2001. - J. van den Berg, U. Kaymak, and W.-M. van den
Bergh. Probabilistic reasoning in fuzzy
rule-based systems. In P. Grzegorzewski, O.
Hryniewicz, and M. A. Gil, editors, Soft Methods
in Probability, Statistics and Data Analysis,
Advances in Soft Computing, pages 189196.
Physica Verlag, Heidelberg, 2002. - J. van den Berg, U. Kaymak, and W.-M. van den
Bergh. Fuzzy classification using
probability-based rule weighting. In Proceedings
of 2002 IEEE International Conference on Fuzzy
Systems, pages 991996, Honolulu, Hawaii, May
2002. - U. Kaymak, W.-M. van den Bergh, and J. van den
Berg. A fuzzy additive reasoning scheme for
probabilistic Mamdani fuzzy systems. In
Proceedings of the 2003 IEEE International
Conference on Fuzzy Systems, volume 1, pages
331336, St. Louis, USA, May 2003. - U. Kaymak and J. van den Berg. On probabilistic
connections of fuzzy systems. In Proceedings of
the 15th Belgium-Netherlands Artificial
Intelligence Conference, pages 187194, Nijmegen,
Netherlands, Oct. 2003. - J. van den Berg, U. Kaymak, and W.-M. van den
Bergh. Financial markets analysis by using a
probabilistic fuzzy modelling approach.
International Journal of Approximate Reasoning,
35 291305, 2004.
69Selected bibliography
- L. Waltman, U. Kaymak, and J. van den Berg.
Maximum likelihood parameter estimation in
probabilistic fuzzy classifiers. In Proceedings
of the 14th Annual IEEE International Conference
on Fuzzy Systems, pages 10981103, Reno, Nevada,
USA, May 2005. - D. Xu and U. Kaymak. Value-at-risk estimation by
using probabilistic fuzzy systems. In Proceedings
of the 2008 IEEE World Congress on Computational
Intelligence (WCCI 2008), pages 21092116, Hong
Kong, June 2008. - R. J. Almeida and U. Kaymak. Probabilistic fuzzy
systems in value-at-risk estimation.
International Journal of Intelligent Systems in
Accounting, Finance and Management,
16(1/2)4970, 2009. - J. Hinojosa, S. Nefti, and U. Kaymak. Systems
control with generalized probabilistic
fuzzy-reinforcement learning. IEEE Transactions
on Fuzzy Systems, 19(1)5164, February 2011. - R. J. Almeida, N. Basturk, U. Kaymak, and V.
Milea. A multi-covariate semi-parametric
conditional volatility model using probabilistic
fuzzy systems. In Proceedings of the 2012 IEEE
International Conference on Computational
Intelligence in Financial Engineering and
Economics (CIFEr 2012), pages 489496, New York
City, USA, 2012. - J. van den Berg, U. Kaymak, and R.J. Almeida.
Function approximation using probabilistic fuzzy
systems. IEEE Transactions on Fuzzy Systems,
2013.