Title: Data mining II The fuzzy way
1Data mining IIThe fuzzy way
- Wlodzislaw Duch
- Dept. of Informatics, Nicholas Copernicus
University, Torun, Poland - http//www.phys.uni.torun.pl/duch
ISEP Porto, 8-12 July 2002
2Basic ideas
- Complex problems cannot be analyzed precisely
- Knowledge of an expert may be approximated
using imprecise concepts. If the
weather is nice and the place is attractive
then not many participants stay at the school.
- Fuzzy logic/systems include
- Mathematics of fuzzy sets/systems, fuzzy
logics. - Fuzzy knowledge representation for
clusterization, - Classification and regression.
- Extraction of fuzzy concepts and rules from
data. - Fuzzy control theory.
3Types of uncertainty
- Stochastic uncertainty Rolling dice,
accident, insurance risk -
probability theory. - Measurement uncertainty About 3 cm
20 degrees - statistics. - Information uncertainty Trustworthy
client, known constraints - data
mining. - Linguistic uncertainty Small, fast,
low price fuzzy logic.
4Crisp sets
young x ? M age(x) ? 20
myoung(x)
1 age(x) ? 20 0 age(x) gt 20
Membership function
Ayoung
1
0
x years
5Fuzzy sets
- X - universum, space x ?X
- A - linguistic variable, concept, fuzzy set.
- mA a Membership Function (MF), determining
the degree, to which x belongs to A.
Linguistic variables, concepts sums of fuzzy
sets. Logical predicate functions with continuous
values. Membership value different from
probability. m(bold) 0.8 does not mean bold 1
in 5 cases. Probabilities are normalized to 1, MF
are not. Fuzzy concepts are subjective and
context-dependent.
6Fuzzy examples
- Crisp and fuzzy concept young men
Ayoung
Ayoung
1
1
?0.8
0
0
x years
x years
x23
x20
Boiling temperature has value around 100
degrees (pressure, chemistry).
7Few definitions
- Support of a fuzzy set A
- supp(A) x ? X ? A(x) gt 0
Core of a fuzzy set A core(A) x ? X ? A(x)
1
a-cut of a fuzzy set A Aa x ? X ? A(x) gt a
Height max x ? A(x) ? 1 Normal fuzzy set sup
x ? X ? A(x) 1
8Definitions illustrated
MF
1
.5
a
0
Core
X
Crossover points
a - cut
Support
9Types of MF
Trapezoid lta,b,c,dgt
Gaus/Bell N(m,s)
?(x)
?(x)
1
1
s
0
0
x
a
b
c
d
x
c
10MF example
Singleton (a,1) i (b,0.5)
Triangular lta,b,cgt
?(x)
?(x)
1
0
x
a
b
c
11Linguistic variables
W20 gt Ageyoung. L. variable L. value. L.
variable temperature terms, fuzzy sets
cold, warm, hot
?(x)
?cold
?warm
?hot
1
0
40
20
x C
12Fuzzy numbers
- MP are usually convex, with single maximum.
- MPs for similar numbers overlap.
Numbers core point, ?x ?(x)1 Decrease
monotonically on both sides of the
core. Typically triangular functions (a,b,c) or
singletons.
13Fuzzy rules
- Commonsense knowledge may sometimes be captured
in an natural way using fuzzy rules.
IF L-variable-1 term-1 and L-variable-2
term-2 THEN zm. L-variable-3 term-3
IF Temperature hot and air-condition price
low THEN cooling strong
What does it mean for fuzzy rules IF x is A then
y is B ?
14Fuzzy implication
- If gt means correlation T-norma T(A,B) is
sufficient. - AgtB has many realizations.
15Interpretation of implication
- If x is A then y is B correlation or implication.
AgtB ? not A or B A entails B
AgtB ? A and B
16Types of rules
- FIR, Fuzzy Implication Rules.
- Logic of implications between fuzzy facts.
- FMR, Fuzzy Mapping Rules.
- Functional dependencies, fuzzy graphs,
approximation problems. - Mamdani type IF MFA(x)high then
MFB(y)medium. - Takagi-Sugeno type IF MFA(x)high then yfA(x)
Linear fA(x) first order Sugeno type. FIS,
Fuzzy Inference Systems. Combine rules fuzzy
rules to calculate final decisions.
17Fuzzy approximation
- Fuzzy systems F ?n ? ?p use m rules to map
vector x on the output F(x), vector or scalar.
Singleton modelRi IF x is Ai Then y is bi
18Rules base
Temperature
freezing cold chilly
Heating
Price cheap so-so expensive
full full medium full medium
weak medium weak no
IF Temperaturefreezing and Heating-pricecheap
THEN heatingfull
191. Fuzzification
Fuzzification from measured values to
MF Determine membership degrees for all fuzzy
sets (linguistic variables)
Temperature T15 C Heating-price
p48 Euro/MBtu
?chilly(T)0.5
?cheap(p)0.3
1
1
0.5
0.3
0
0
t
p
15C
48 Euro/MBtu
IF Temperature chilly
and Heating-price cheap...
202. Term composition
Calculate the degree of rule fulfillment for all
conditionscombining terms using fuzzy AND, ex.
MIN operator.
?A(X) ?A1(X1) ? ?A2(X2) ? ?AN(XN) for rules
RA ?all(X) min?chilly(t), ?cheap(p)
min0.5,0.3 0.3
213. Inference
Calculate the degree of truth of rule conclusion
use T-norms such as MIN or product to combine the
degree of fulfillment of conditions and the MF of
conclusion.
?full(h)
?conclusions(h)
1
Inference MIN ?conclmin?cond, ?full
?cond0.3
...
0
h
THEN Heatingfull
?mocno(h)
?konkl(h)
1
?cond 0.3
...
Inference ?concl. ?cond ?full
0
h
224. Aggregation
Aggregate all possible rule conclusion using MAX
operator to calculate the sum.
THEN Heatingfull THEN Heating medium THEN
Heating no
1
0
h
235. Defuzzification
Calculate crisp value/decision using for example
the Center of Gravity (COG) method
?concl(h)
COG
1
0
h
73
For discrete sets a center of singletons, for
continuous
mi degree of membership in i Ai area under MF
for the set i ci center of gravity for the set
i.
Si mi Ai ci Si mi Ai
h
24FIS for heating
Fuzzification
Defuzzification
Inference
Rule base
if tempfreezing then valveopen
?freeze
?cold
?warm
?full
?half
?closed
?freeze0.7
0.7
0.7
if tempcold then valvehalf open
0.2
0.2
?cold 0.2
T
v
Measured temperature
if tempwarm then valveclosed
Output that controls the valve position
?hot 0.0
25Takagi-Sugeno rules
Mamdani rules conclude that IF X1 A1 i X2
A2 Xn An Then Y B
TS rules conclude some functional dependence
f(xi) IF X1 A1 i X2 A2 . Xn An Then
Yf(x1,x2,..xn)
TS rules are usually based on piecewise linear
functions(equivalent to linear splines
approximation) IF X1 A1 i X2 A2Xn An
Then Ya0 a1x1 anxn
26Fuzzy system in Matlab
rulelist 1 1 3 1 1 1 2 3
1 1 1 3 2 1 1 2 1 3 1 1
2 2 2 1 1 2 3 1 1 1
3 1 2 1 1 3 2 3 1 1 3 3 3 1
1 fisaddrule(fis,rulelist) showrule(fis) gensu
rf(fis) Surfview(fis)
1. If (temperature is cold) and (oilprice is
normal) then (heating is high) (1) 2. If
(temperature is cold) and (oilprice is expensive)
then (heating is medium) (1) 3. If (temperature
is warm) and (oilprice is cheap) then (heating is
high) (1) 4. If (temperature is warm) and
(oilprice is normal) then (heating is medium) (1)
5. If (temperature is cold) and (oilprice is
cheap) then (heating is high) (1) 6. If
(temperature is warm) and (oilprice is expensive)
then (heating is low) (1) 7. If (temperature is
hot) and (oilprice is cheap) then (heating is
medium) (1) 8. If (temperature is hot) and
(oilprice is normal) then (heating is low) (1)
9. If (temperature is hot) and (oilprice is
expensive) then (heating is low) (1)
first input second input output rule
weight operator (1AND, 2OR)
27Fuzzy Inference System (FIS)
IF speed is slow then break 2 IF speed is
medium then break 4 speed IF speed is high
then break 8 speed
slow
medium
high
MF(speed)
.8
.3
.1
speed
2
R1 w1 .3 r1 2 R2 w2 .8 r2 42 R3 w3
.1 r3 82
Break S(wiri) / Swi 7.12
28First-order TS FIS
- Rules
- IF X is A1 and Y is B1 then Z p1x q1y r1
- IF X is A2 and Y is B2 then Z p2x q2y r2
A1
B1
z1 p1xq1yr1
w1
X
Y
A2
B2
z2 p2xq2yr2
w2
X
Y
w1z1w2z2
x3
y2
z
P
w1w2
29Induction of fuzzy rules
- All this may be presented in form on networks.
- Choices/adaptive parameters in fuzzy rules
- The number of rules (nodes).
- The number of terms for each attribute.
- Position of the membership function (MF).
- MF shape for each attribute/term.
- Type of rules (conclusions).
- Type of inference and composition operators.
- Induction algorithms incremental or refinement.
- Type of learning procedure.
30Feature space partition
Regular grid
Independent functions
31MFs on a grid
- Advantage simplest approach
- Regular grid divide each dimension in a fixed
number of MFs and assign an average value from
all samples that belong to the region. - Irregular grid find largest error, divide the
grid there in two parts adding new MF. - Mixed method start from regular grid, adapt
parameters later. - Disadvantages for k dimensions and N MFs in each
Nk areas are created !Poor quality of
approximation.
32Optimized MP
- Advantages higher accuracy, better
approximation, less functions, context dependent
MPs. - Optimized MP may come from
- Neurofuzzy systems equivalent to RBF network
with Gaussian functions (several proofs). FSM
models with triangular or trapezoidal functions.
Modified MLP networks with bicentral functions,
etc. - Decision trees, fuzzy decision trees.
- Fuzzy machine learning inductive systems.
- Disadvantages extraction of rules is hard,
optimized MPs are more difficult to create.
33Improving sets of rules.
- How to improve known sets of rules?
- Use minimization methods to improve parameters of
fuzzy rules usually non-gradient methods are
used most often genetic algorithms. - change rules into neural network, train the
network and convert it into rules again. - Use heuristic methods for local adaptation of
parameters of individual rules.
- Fuzzy logic good for modeling imprecise
knowledge but ... - How do the decision borders of FIS look like? Is
it worthwhile to make input fuzzy and output
crisp? - Is it the best approximation method?
34Fuzzy rules and data uncertainty
- Data has been measured with unknown error. Assume
Gaussian distribution
x fuzzy number with Gaussian membership
function. A set of logical rules R is used for
fuzzy input vectors Monte Carlo simulations for
arbitrary system gt p(CiX) Analytical evaluation
p(CX) is based on cumulant
Error function is identical to logistic f. lt 0.02
35Fuzzification of crisp rules
- Rule Ra(x) xgta is fulfilled by Gx with
probability
Error function is approximated by logistic
function assuming error distribution s(x)(1-
s(x)), for s21.7 approximates Gauss lt 3.5
Rule Rab(x) bgt x gta is fulfilled by Gx with
probability
36Soft trapezoids and NN
The difference between two sigmoids makes a soft
trapezoidal membership functions.
Conclusion fuzzy logic with s(x) - s(x-b) m.f.
is equivalent to crisp logic Gaussian
uncertainty. Gaussian classifiers (RBF) are
equivalent to fuzzy systems with Gaussian
membership functions.
37Optimization of rules
- Fuzzy large receptive fields, rough estimations.
- Gx uncertainty of inputs, small receptive
fields.
Minimization of the number of errors difficult,
non-gradient, but now Monte Carlo or analytical
p(CXM).
- Gradient optimization works for large number of
parameters. - Parameters sx are known for some features, use
them as optimization parameters for others! - Probabilities instead of 0/1 rule outcomes.
- Vectors that were not classified by crisp rules
have now non-zero probabilities.
38Summary
- Fuzzy sets/logic is a useful form of knowledge
representation, allowing for approximate but
natural expression of some types of knowledge. - An alternative way is to include uncertainty of
input data while using crisp logic rules. - Adaptation of fuzzy rule parameters leads to
neurofuzzy systems the simplest are the RBF
networks and Separable Function Networks (SFN),
equivalent to any fuzzy inference systems. - Results may sometimes be better than with other
systems since it is easier to include a priori
knowledge in fuzzy systems.
39Disclaimer
- A few slides/figures were taken from various
presentations found in the Internet
unfortunately I cannot identify original authors
at the moment, since these slides went through
different iterations one source seems to be
J.-S. Roger Jang from NTHU, Taiwan. - I have to apologize for that.