Title: NeuroFuzzy and Soft Computing
1Neuro-Fuzzy and Soft Computing
Instructor Kao-Shing Hwang at RM321 Office
hours walking-in
- Textbooks
- Neuro-Fuzzy and Soft Computing by J.-S. R. Jang,
C.-T. Sun, and E. Mizutani, - And
- My own notes at http//www.eis.ee.ccu.edu.tw/conte
nt/coursedata.htm - Usernamestudent
- Passwordsccu
2Neuro-Fuzzy and Soft Computing
- Prerequisites Calculus, Linear Algebra, and
basic knowlege on Probability - Required computer skills DOS/UNIX/MS-Windows,
MATLAB - Grading policy
- 4-5 homeworks 20
- Mid-term and final exams.20 each
- Mid-term and final projects20 each
3Neuro-Fuzzy andSoft Computing
4Outline
- Soft computing
- Fuzzy logic and fuzzy inference systems
- Neural networks
- Neuro-fuzzy integration ANFIS
- Derivative-free optimization
- Genetic algorithms
- Simulated annealing
- Random search
- Examples and demos
5- Neural Networks (NN) that recognize patterns
adapts themselves to cope with changing
environments - Fuzzy inference systems that incorporate human
knowledge perform inferencing decision making
Adaptivity Expertise NF SC
6SC Constituants and Conventional AI
- SC is an emerging approach to computing which
parallel the remarkable ability of the human mind
to reason and learn in a environment of
uncertainty and imprecision Lotfi A. Zadeh,
1992 - SC consists of several computing paradigms
including - NN
- Fuzzy set theory
- Approximate reasoning
- Derivative-free optimization methods such as
genetic algorithms (GA) simulated annealing (SA)
7SC constituents and conventional AI
8The core of SC
- In general, SC does not perform much symbolic
manipulation - SC in this sense complements conventional AI
approaches
9Computational Intelligence
-
- Conventional AI manipulates symbols on the
assumption that human intelligence behavior can
be stored in symbolically structured knowledge
bases this is known as The physical symbol
system hypothesis -
- The knowledge-based system (or expert system) is
an example of the most successful conventional AI
product
10 11 Neural Network (NN)
- Imitation of the natural intelligence of the
brain - Parallel processing with incomplete information
- Nerve cells function about 106 times slower than
electronic circuit gates, but human brains
process visual and auditory information much
faster than modern computers - The brain is modeled as a continuous-time non
linear dynamic system in connectionist
architectures - Connectionism replaced symbolically structured
representations - Distributed representation in the form of weights
between a massive set of interconnected neurons
12- Human brains interpret imprecise and incomplete
sensory information provided by perceptive organs - Fuzzy set theory provides a systematic calculus
to deal with such information linguistically - It performs numerical computation by using
linguistic labels stimulated by membership
functions - It lacks the adaptability to deal with changing
external environments gt incorporate NN learning
concepts in fuzzy inference systems NF modeling
13Evolutionary computation
- Natural intelligence is the product of millions
of years of biological evolution - Simulation of complex biological evolutionary
processes - GA is one computing technique that uses an
evolution based on natural selection - Immune modeling and artificial life are similar
disciplines based on chemical and physical laws - GA and SA population-based systematic random
search (RA) techniques
14NF and SC characteristics
- With NF modeling as a backbone, SC can be
characterized as - Human expertise (fuzzy if-then rules)
- Biologically inspired computing models (NN)
- New optimization techniques (GA, SA, RA)
- Numerical computation (no symbolic AI so far,
only numerical)
15Neuro-Fuzzy and Soft Computing
Soft Computing
16Fuzzy Sets
- Sets with fuzzy boundaries
A Set of tall people
17Membership Functions (MFs)
- About MFs
- Subjective measures
- Not probability functions
MFs
.8
.5
.1
180
Heights (cm)
18Fuzzy If-Then Rules
- Mamdani style
- If pressure is high then volume is small
- Sugeno style
- If speed is medium then resistance 5speed
19Fuzzy Inference System (FIS)
If speed is low then resistance 2 If speed is
medium then resistance 4speed If speed is high
then resistance 8speed
MFs
low
medium
high
.8
.3
.1
Speed
2
Rule 1 w1 .3 r1 2 Rule 2 w2 .8 r2
42 Rule 3 w3 .1 r3 82
Resistance S(wiri) / Swi
7.12
20First-Order Sugeno FIS
- Rule base
- If X is A1 and Y is B1 then Z p1x q1y r1
- If X is A2 and Y is B2 then Z p2x q2y r2
21Fuzzy Inference Systems (FIS)
- Also known as
- Fuzzy models
- Fuzzy associate memories (FAM)
- Fuzzy controllers
Rule base (Fuzzy rules)
Data base (MFs)
input
output
Fuzzy reasoning
22Neural Networks
- Supervised Learning
- Multilayer perceptrons
- Radial basis function networks
- Modular neural networks
- LVQ (learning vector quantization)
- Unsupervised Learning
- Competitive learning networks
- Kohonen self-organizing networks
- ART (adaptive resonant theory)
- Others
- Hopfield networks
23The Basics of Neural Networks
- Neural networks are typically organized in
layers. - Layers are made up of a number of interconnected
'nodes' which contain an 'activation function'. - Patterns are presented to the network
- input layer
- hidden layers
- weighted connections
- output layer
24Structure of ANN
25Learning
- Most ANNs contain some form of 'learning rule'
which modifies the weights of the connections
according to the input patterns that it is
presented with. - In a sense, ANNs learn by example as do their
biological counterparts a child learns to
recognize dogs from examples of dogs.
26Learning Rules
- The delta rule is often utilized by the most
common class of ANNs called 'backpropagational
neural networks' (BPNNs). - Backpropagation is an abbreviation for the
backwards propagation of error.
27Supervised Learning
- Learning is a supervised process that occurs with
each cycle or 'epoch' through a forward
activation flow of outputs, and the backwards
error propagation of weight adjustments. - Biologically, when a neural network is initially
presented with a pattern it makes a random
'guess' as to what it might be. It then sees how
far its answer was from the actual one and makes
an appropriate adjustment to its connection
weights.
28Supervised Learning
29ANN v.s. Optimal Search
- Note that within each hidden layer node is a
sigmoidal activation function which polarizes
network activity and helps stabilize it. - Backpropagation performs a gradient descent
within the solution's vector space towards a
'global minimum' along the steepest vector of the
error surface. - The global minimum is that theoretical solution
with the lowest possible error.
30Search in Error Space
31Convergence
- Neural network analysis often requires a large
number of individual runs to determine the best
solution. - Most learning rules have built-in mathematical
terms to assist in this process which control the
'speed' (Beta-coefficient) and the 'momentum' of
the learning.
32Convergence
- The speed of learning is actually the rate of
convergence between the current solution and the
global minimum. - Momentum helps the network to overcome obstacles
(local minima) in the error surface and settle
down at or near the global minimum.
33Single-Layer Perceptrons
x1
w1
w0
w2
x2
y signum(Swi xi w0)
w3
x3
34Single-Layer Perceptrons
- Example Gender classification
35Multilayer Perceptrons (MLPs)
- Learning rule
- Steepest descent (Backprop)
- Conjugate gradient method
- All optimization methods using first derivative
- Derivative-free optimization
36Multilayer Perceptrons (MLPs)
Example XOR problem
Training data
x1 x2 y 0 0 0 0 1 1 1 0 1 1
1 0
37MLP Decision Boundaries
38Adaptive Networks
x
z
y
- Architecture
- Feedforward networks with diff. node functions
- Squares nodes with parameters
- Circles nodes without parameters
- Goal
- To achieve an I/O mapping specified by training
data - Basic training method
- Backpropagation or steepest descent
39Derivative-Based Optimization
- Based on first derivatives
- Steepest descent
- Conjugate gradient method
- Gauss-Newton method
- Levenberg-Marquardt method
- And many others
- Based on second derivatives
- Newton method
- And many others
40Fuzzy Modeling
Unknown target system
y
Fuzzy Inference system
y
- Given desired i/o pairs (training data set) of
the form - (x1, ..., xn y), construct a FIS to match the
i/o pairs
- Two steps in fuzzy modeling
- structure identification --- input selection,
MF numbers - parameter identification --- optimal parameters
41Neuro-Fuzzy Modeling
Adaptive networks
Generalization
Specialization
Neural networks
Fuzzy inference systems
ANFIS
42Parameter ID
nonlinear parameters
linear parameters
w1
A1
P
w1z1
x
A2
P
S
Swizi
B1
P
/
z
y
B2
P
w4z4
Swi
w4
S
forward pass
backward pass
fixed
steepest descent
MF param. (nonlinear)
least-squares
fixed
Coef. param. (linear)
43Parameter ID Gauss-Newton Method
- Synonyms
- linearization method
- extended Kalman filter method
- Concept
- general nonlinear model y f(x, q)
- linearization at q qnow
- y f(x, qnow)a1(q1 - q1,now)a2(q2 -
q2,now) ... - LSE solution
- qnext qnow h(A A) A B
T
T
-1
44Param. ID Levenberg-Marquardt
- Formula
- qnext qnow h(A A lI) A B
- Effects of l
- l small Gauss-Newton method
- l big steepest descent
- How to update l
- greedy policy make l
small - cautious policy make l large
T
T
-1
45Param. ID Comparisons
- Steepest descent (SD)
- treats all parameters as nonlinear
- Hybrid learning (SDLSE)
- distinguishes between linear and nonlinear
- Gauss-Newton (GN)
- linearizes and treat all parameters as linear
- Levenberg-Marquardt (LM)
- switches smoothly between SD and GN
46Structure ID
- Input selection
- Input space partitioning
To select relevant input for efficient modeling
Grid partitioning
Tree partitioning
Scatter partitioning
- C-means clustering
- mountain method
- hyperplane clustering
47Derivative-Free Optimization
- Genetic algorithms (GAs)
- Simulated annealing (SA)
- Random search
- Downhill simplex search
- Tabu search
48Genetic Algorithms
- Motivation
- Look at what evolution brings us?
- Vision
- Hearing
- Smelling
- Taste
- Touch
- Learning and reasoning
- Can we emulate the evolutionary process with
today's fast computers?
49Genetic Algorithms
- Terminology
- Fitness function
- Polulation
- Encoding schemes
- Selection
- Crossover
- Mutation
- Elitism
50Genetic Algorithms
Chromosome
(11, 6, 9) 1011 0110 1001
Gene
Crossover
1 0 0 1 1 1 1 0
1 0 0 1 0 0 1 0
1 0 1 1 0 0 1 0
1 0 1 1 1 1 1 0
Crossover point
Mutation
1 0 0 1 1 1 1 0
1 0 0 1 1 0 1 0
Mutation bit
51Genetic Algorithms
10010110 01100010 10100100 10011001 01111101 . .
. . . . . . . . . .
10010110 01100010 10100100 10011101 01111001 . .
. . . . . . . . . .
Elitism
Selection
Crossover
Mutation
Current generation
Next generation
52Genetic Algorithms
- Example Find the max. of the peaks function
- z f(x, y) 3(1-x)2exp(-(x2) - (y1)2) -
10(x/5 - x3 - y5)exp(-x2-y2)
-1/3exp(-(x1)2 - y2).
53Genetic Algorithms
- Derivatives of the peaks function
- dz/dx -6(1-x)exp(-x2-(y1)2) -
6(1-x)2xexp(-x2-(y1)2) -
10(1/5-3x2)exp(-x2-y2) 20(1/5x-x3-y5)
xexp(-x2-y2) - 1/3(-2x-2)exp(-(x1)2-y2) - dz/dy 3(1-x)2(-2y-2)exp(-x2-(y1)2)
50y4exp(-x2-y2) 20(1/5x-x3-y5)yexp(-x
2-y2) 2/3yexp(-(x1)2-y2) - d(dz/dx)/dx 36xexp(-x2-(y1)2) -
18x2exp(-x2-(y1)2) - 24x3exp(-x2-(y1)2
) 12x4exp(-x2-(y1)2) 72xexp(-x2-y2)
- 148x3exp(-x2-y2) - 20y5exp(-x2-y2)
40x5exp(-x2-y2) 40x2exp(-x2-y2)y5
-2/3exp(-(x1)2-y2) - 4/3exp(-(x1)2-y2)x2
-8/3exp(-(x1)2-y2)x - d(dz/dy)/dy -6(1-x)2exp(-x2-(y1)2)
3(1-x)2(-2y-2)2exp(-x2-(y1)2)
200y3exp(-x2-y2)-200y5exp(-x2-y2)
20(1/5x-x3-y5)exp(-x2-y2) -
40(1/5x-x3-y5)y2exp(-x2-y2)
2/3exp(-(x1)2-y2)-4/3y2exp(-(x1)2-y2)
54Genetic Algorithms
Initial population
5th generation
10th generation
55Genetic Algorithms
56Simulated Annealing
57Simulated Annealing
- Terminology
- Objective function E(x) function to be
optiimized - Move set set of next points to explore
- Generating function to select next point
- Acceptance function h(DE, T) to determine if the
selected point should be accept or not. Usually
h(DE, T) 1/(1exp(DE/(cT)). - Annealing (cooling) schedule schedule for
reducing the temperature T
58Simulated Annealing
Select a new point xnew in the move sets via
generating function
Compute the obj. function E(xnew)
Set x to xnew with prob. determined by h(DE, T)
Reduce temperature T
59Simulated Annealing
- Example Travel Salesperson Problem (TSP)
How to transverse n cities once and only once
with a minimal total distance?
60Simulated Annealing
12
12
10
10
3
3
Translation
1
1
Inversion
6
6
7
7
2
2
9
11
9
11
8
8
4
5
4
5
1-2-3-4-5-6-7-8-9-10-11-12
1-2-3-4-5-9-8-7-6-10-11-12
12
12
10
10
3
3
Switching
1
1
6
6
7
7
2
2
9
11
9
11
8
8
4
5
4
5
1-2-11-4-8-7-5-9-6-10-3-12
1-2-3-4-8-7-5-9-6-10-11-12
61Simulated Annealing
Initial random path
During SA process
Final path
62Simulated Annealing
- 100-city TSP with penalities when crossing the
circle
Penalty 0
Penalty 0.5
Penalty -0.3
63Conclusions
- Contributing factors to successful applications
of neuro-fuzzy and soft computing - Sensor technologies
- Cheap fast microprocessors
- Modern fast computers
64References and WWW Resources
- References
- Neuro-Fuzzy and Soft Computing, J.-S. R. Jang,
C.-T. Sun and E. Mizutani, Prentice Hall, 1996 - Neuro-Fuzzy Modeling and Control, J.-S. R. Jang
and C.-T. Sun, the Proceedings of IEEE, March
1995. - ANFIS Adaptive-Network-based Fuzzy Inference
Systems,, J.-S. R. Jang, IEEE Trans. on Systems,
Man, and Cybernetics, May 1993. - Internet resources
- This set of slides is available at
- http//www.cs.nthu.edu.tw/jang/publication.ht
m - WWW resouces about neuro-fuzzy and soft-computing
- http//www.cs.nthu.edu.tw/jang/nfsc.htm