Title: Industrial Applications of Neuro-Fuzzy Networks
1Industrial Applications of Neuro-Fuzzy Networks
2Example Continously Adapting Gear Shift Schedule
in VW New Beetle
3Continously Adapting Gear Shift Schedule
Technical Details
- Mamdani controller with 7 rules
- Optimized program
- 24 Byte RAM on Digimat
- 702 Byte ROM
- Runtime 80 ms12 times per second a new sport
factor is assigned - How to generate knowledge automatically from data?
AG4
4Learning from Examples (Observations, Databases)
- Statistics parameter fitting, structure
identification, inference method, model
selection - Machine Learning computational learning (PAC
learning), inductive learning, learning
decision trees, concept learning, ... - Neural Networks learning from data
- Cluster Analysis unsupervised classification
? Learning Problem is transformed into an
optimization problem. ? How to use these methods
in fuzzy systems?
5Function Approximation with Fuzzy Rules
6How to Derive a Fuzzy Controller Automatically
from Observed Process Data
- Perform fuzzy cluster analysis of input-output
data (FCM, GK, GG, ...) - Project clusters
- Obtain fuzzy rules of the kind If x is small
then y is medium
7Fuzzy Cluster Analysis
- Classification of a given data set X x1, ...,
xn ? ?p into c clusters. - Membership degree of datum xk to class i is uik.
- Representation of cluster i by prototype vi ? ?p.
- Formal Minimisation of functional
- under constraints
8Simplest Algorithm Fuzzy-c-Means (FCM)
Iterative Procedure (with random initialisation
of prototypes vi)
and
FCM is searching for equally large clusters in
form of (hyper-)balls.
9Examples
10Fuzzy Cluster Analysis
- Fuzzy C-Means simple, looks for spherical
clusters of same size, uses Euclidean distance - Gustafson Kessel looks for hyper-ellipsoidal
clusters of same size, distance via matrices - Gath Geva looks for hyper-ellipsoidal clusters
of arbitrary size, distance via matrices - Axis-parallel variations exist that use diagonal
matrices (computationally less expensive and less
loss of information when rules are created)
11Fuzzy Cluster Analysis with DataEngine
12Construct Fuzzy Sets by Cluster Projection
Projecting a cluster means to project the degrees
of membership of the data on the single
dimensions Histograms are obtained.
13FCLUSTER Tool for Fuzzy Cluster Analysis
14Introduction
- Building a fuzzy system requires
- prior knowledge (fuzzy rules, fuzzy sets)
- manual tuning time consuming and error-prone
- Therefore Support this process by learning
- learning fuzzy rules (structure learning)
- learning fuzzy set (parameter learning)
Approaches from Neural Networks can be used
15Learning Fuzzy Sets Problems in Control
- Reinforcement learning must be used to compute an
error value (note the correct output is
unknown) - After an error was computed, any fuzzy set
learning procedures can be used - Example GARIC (Berenji/Kedhkar 1992)online
approximation to gradient-descent - Example NEFCON (Nauck/Kruse 1993)online
heuristic fuzzy set learning using arule-based
fuzzy error measure
16(No Transcript)
17Neuro-Fuzzy Systems in Data Analysis
- Neuro-Fuzzy System
- System of linguistic rules (fuzzy rules).
- Not rules in a logical sense, but function
approximation. - Fuzzy rule vague prototype / sample.
- Neuro-Fuzzy-System
- Adding a learning algorithm inspired by neural
networks. - Feature local adaptation of parameters.
18Example Prognosis of the Daily Proportional
Changes of the DAX at the Frankfurter Stock
Exchange (Siemens)
- Database time series from 1986 - 1997
19Fuzzy Rules in Finance
- Trend RuleIF DAX decreasing AND US-
decreasingTHEN DAX prediction
decreaseWITH high certainty - Turning Point RuleIF DAX decreasing AND US-
increasingTHEN DAX prediction
increaseWITH low certainty - Delay RuleIF DAX stable AND US-
decreasingTHEN DAX prediction
decreaseWITH very high certainty - In generalIF x1 is m1 AND x2 is m2THEN y
hWITH weight k
20Classical Probabilistic Expert Opinion Pooling
Method
- DM analyzes each source (human expert, data
forecasting model) in terms of (1) Statistical
accuracy, and (2) Informativeness by asking the
source to asses quantities (quantile assessment) - DM obtains a weight for each source
- DM eliminates bad sources
- DM determines the weighted sum of source outputs
- Determination of Return of Invest
21- E experts, R quantiles for N quantities
- ? each expert has to asses RN values
- stat. Accuracy
- information score
- weight for expert e
- outputt
- roi
22Formal Analysis
- Sources of information R1 rule set given by
expert 1 R2 rule set given by expert 2 D data
set (time series) - Operator schema fuse (R1, R2) fuse two rule
sets induce(D) induce a rule set from
D revise(R, D) revise a rule set R by D
23Formal Analysis
- Strategies
- fuse(fuse (R1, R2), induce(D))
- revise(fuse(R1, R2), D) ?
- fuse(revise(R1, D), revise(R2, D))
- Technique Neuro-Fuzzy Systems
- Nauck, Klawonn, Kruse, Foundations of Neuro-Fuzzy
Systems, Wiley 97 - SENN (commercial neural network environment,
Siemens)
24From Rules to Neural Networks
1. Evaluation of membership degrees 2. Evalu
ation of rules (rule activity) 3. Accumulation
of rule inputs and normalization
25Neuro-Fuzzy Architecture
26The Semantics-Preserving Learning Algorithm
Reduction of the dimension of the weight space
1. Membership functions of different inputs share
their parameters, e.g. 2. Membership functions
of the same input variable are not allowed to
pass each other, they must keep their original
order, e.g. Benefits ? the optimized rule
base can still be interpreted ? the number of
free parameters is reduced
27Return-on-Investment Curves of the Different
Models
Validation data from March 01, 1994 until April
1997
28A Neuro-Fuzzy System
- is a fuzzy system trained by heuristic learning
techniques derived from neural networks - can be viewed as a 3-layer neural network with
fuzzy weights and special activation functions - is always interpretable as a fuzzy system
- uses constraint learning procedures
- is a function approximator (classifier,
controller)
29Learning Fuzzy Rules
- Cluster-oriented approachesgt find clusters in
data, each cluster is a rule - Hyperbox-oriented approachesgt find clusters in
the form of hyperboxes - Structure-oriented approachesgt used predefined
fuzzy sets to structure the data space, pick
rules from grid cells
30Hyperbox-Oriented Rule Learning
Search for hyperboxes in the data space Create
fuzzy rules by projecting the hyperboxes Fuzzy
rules and fuzzy sets are created at the same
time Usually very fast
31Hyperbox-Oriented Rule Learning
- Detect hyperboxes in the data, example XOR
function - Advantage over fuzzy cluster anlysis
- No loss of information when hyperboxes are
represented as fuzzy rules - Not all variables need to be used, dont care
variables can be discovered - Disadvantage each fuzzy rules uses individual
fuzzy sets, i.e. the rule base is complex.
32Structure-Oriented Rule Learning
Provide initial fuzzy sets for all variables. The
data space is partitioned by a fuzzy grid Detect
all grid cells that contain data (approach by
Wang/Mendel 1992) Compute best consequents and
select best rules (extension by Nauck/Kruse 1995,
NEFCLASS model)
33Structure-Oriented Rule Learning
- Simple Rule base available after two cycles
through the training data - 1. Cycle discover all antecedents
- 2. Cycle determine best consequents
- Missing values can be handled
- Numeric and symbolic attributes can be processed
at the same time (mixed fuzzy rules) - Advantage All rules share the same fuzzy sets
- Disadvantage Fuzzy sets must be given
34Learning Fuzzy Sets
- Gradient descent proceduresonly applicable, if
differentiation is possible, e.g. for Sugeno-type
fuzzy systems. - Special heuristic procedures that do not use
gradient information. - The learning algorithms are based on the idea of
backpropagation.
35Learning Fuzzy Sets Constraints
- Mandatory constraints
- Fuzzy sets must stay normal and convex
- Fuzzy sets must not exchange their relative
positions (they must not pass each other) - Fuzzy sets must always overlap
- Optional constraints
- Fuzzy sets must stay symmetric
- Degrees of membership must add up to 1.0
- The learning algorithm must enforce these
constraints.
36Different Neuro-Fuzzy Approaches
- ANFIS (Jang, 1993)no rule learning, gradient
descent fuzzy set learning, function approximator - GARIC (Berenji/Kedhkar, 1992)no rule learning,
gradient descent fuzzy set learning, controller - NEFCON (Nauck/Kruse, 1993)structure-oriented
rule learning, heuristic fuzzy set learning,
controller - FuNe (Halgamuge/Glesner, 1994)combinatorical
rule learning, gradient descent fuzzy set
learning, classifier - Fuzzy RuleNet (Tschichold-Gürman,
1995)hyperbox-oriented rule learning, no fuzzy
set learning, classifier - NEFCLASS (Nauck/Kruse, 1995)structure-oriented
rule learning, heuristic fuzzy set learning,
classifier - Learning Fuzzy Graphs (Berthold/Huber,
1997)hyperbox-oriented rule learning, no fuzzy
set learning, function approximator - NEFPROX (Nauck/Kruse, 1997)structure-oriented
rule learning, heuristic fuzzy set learning,
function approx.
37Example Medical Diagnosis
- Results from patients tested for breast cancer
(Wisconsin Breast Cancer Data). - Decision support Do the data indicate a
malignant or a benign case? - A surgeon must be able to check the
classification for plausibility. - We are looking for a simple and interpretable
classifier ?knowledge discovery.
38Example WBC Data Set
- 699 cases (16 cases have missing values).
- 2 classes benign (458), malignant (241).
- 9 attributes with values from 1, ... ,
10(ordinal scale, but usually interpreted as a
numerical scale). - Experiment x3 and x6 are interpreted as nominal
attributes. - x3 and x6 are usually seen as important
attributes.
39Applying NEFCLASS-J
- Tool for developing Neuro-Fuzzy Classifiers
- Written in JAVA
- Free version for research available
- Project started at Neuro-Fuzzy Group of
University of Magdeburg, Germany
40NEFCLASS Neuro-Fuzzy Classifier
41NEFCLASS Features
- Automatic induction of a fuzzy rule base from
data - Training of several forms of fuzzy sets
- Processing of numeric and symbolic attributes
- Treatment of missing values (no imputation)
- Automatic pruning strategies
- Fusion of expert knowledge and knowledge obtained
from data
42Representation of Fuzzy Rules
Example 2 Rules R1 if x is large and y is
small, then class is c1. R2 if x is large and y
is large, then class is c2. The connections x ?
R1 and x ? R2are linked. The fuzzy set large
is a shared weight. That means the term large
has always the same meaning in both rules.
431. Training Step Initialisation
Specify initial fuzzy partitions for all input
variables
442. Training Step Rule Base
Algorithm for (all patterns p) do find
antecedent A, such that A( p) is maximal if (A
? L) then add A to L end for (all
antecedents A ? L) do find best consequent C for
A create rule base candidate R
(A,C) Determine the performance of R Add R to
Bend Select a rule base from B
Variations Fuzzy rule bases can also be created
by using prior knowledge, fuzzy cluster analysis,
fuzzy decision trees, genetic algorithms, ...
45Selection of a Rule Base
- Order rules by performance.
- Either selectthe best r rules orthe best r/m
rules per class. - r is either given or is determined automatically
such that all patterns are covered.
46Rule Base Induction
NEFCLASS uses a modified Wang-Mendel procedure
47Computing the Error Signal
Fuzzy Error ( jth output)
Rule Error
483. Training Step Fuzzy Sets
Exampletriangularmembershipfunction.
Parameterupdates for anantecedentfuzzy set.
49Training of Fuzzy Sets
Heuristics a fuzzy set is moved away from x
(towards x) and its support is reduced
(enlarged), in order to reduce (enlarge) the
degree of membership of x.
50Training of Fuzzy Sets
- Variations
- Adaptive learning rate
- Online-/Batch Learning
- optimistic learning(n step look ahead)
Observing the error on a validation set
51Constraints for Training Fuzzy Sets
- Valid parameter values
- Non-empty intersection of adjacent fuzzy sets
- Keep relative positions
- Maintain symmetry
- Complete coverage (degrees of membership add up
to 1 for each element)
524. Training Step Pruning
Goal remove variables, rules and fuzzy sets, in
order to improve interpretability and
generalisation.
53Pruning
Algorithm repeat select pruning
method repeat execute pruning step train
fuzzy sets if (no improvement) then undo
step until (no improvement) until (no
further method)
Pruning Methods 1. Remove variables(use
correlations, information gain etc.) 2. Remove
rules(use rule performance) 3. Remove
terms(use degree of fulfilment) 4. Remove
fuzzy sets(use fuzziness)
54WBC Learning Result Fuzzy Rules
R1 if uniformity of cell size is small and bare
nuclei is fuzzy0 then benign R2 if uniformity of
cell size is large then malignant
55WBC Learning Result Classification Performance
Estimated Performance on Unseen Data (Cross
Validation)
- NEFCLASS-J 95.42
- Discriminant Analysis 96.05
- C 4.5 95.10
- NEFCLASS-J (numeric) 94.14
- Multilayer Perceptron 94.82
- C 4.5 Rules 95.40
56WBC Learning Result Fuzzy Sets
57NEFCLASS-J
58Resources
Detlef Nauck, Frank Klawonn Rudolf
Kruse Foundations of Neuro-Fuzzy Systems Wiley,
Chichester, 1997, ISBN 0-471-97151-0
Neuro-Fuzzy Software (NEFCLASS, NEFCON,
NEFPROX) http//www.neuro-fuzzy.de Beta-Version
of NEFCLASS-J http//www.neuro-fuzzy.de/nefclass/
nefclassj
59Conclusions
- Neuro-Fuzzy-Systems can be useful for knowledge
discovery. - Interpretability enables plausibility checks and
improves acceptance. - (Neuro-)Fuzzy systems exploit tolerance for
sub-optimal solutions. - Neuro-fuzzy learning algorithms must observe
constraints in order not to jeopardise the
semantics of the model. - Not an automatic model creator, the user must
work with the tool. - Simple learning techniques support explorative
data analysis.
60Download NEFCLASS-J
Download the free version of NEFCLASS-J
athttp//fuzzy.cs.uni-magdeburg.de
61Fuzzy Methods in Information Mining Examples
- here Exploiting quantitative and qualitative
information - Fuzzy Data Analysis (Projects with Siemens)
- Information Fusion (EC Project)
- Dependency Analysis (Project with
Daimler/Chrysler)
62Analysis of Daimler/Chrysler Database
- Database 18.500 passenger cars gt 100
attributes per car - Analysis of dependencies between special
equipment and faults. - Results used as a starting point for technical
experts looking for causes.
63Learning Graphical Models
local models
64The Learning Problem
65Possibility Theory
- fuzzy set induces possibility
66General Structure of (most) Learning Algorithms
for Graphical Models
- Use a criterion to measure the degree to which a
network structure fits the data and the prior
knowledge(model selection, goodness of
hypergraph) - Use a search algorithm to find a model that
receives a high score by the criterion(optimal
spanning tree, K2 greedy selection of parents,
...)
67Measuring the Deviation from an Independent
Distribution
Probability- and Information-based Measures
- information gain identical with mutual
information - information gain ratio
- g-function (Cooper and Herskovits)
- minimum description length
- gini index
Possibilistic Measures
- expected nonspecificity
- specificity gain
- specificity gain ratio
(Measures marked with originated from decision
tree learning)
68Data Mining Tool Clementine
69Analysis of Daimler/Chrysler Database
electrical roof top
air con- ditioning
type of engine
type of tyres
slippage control
faulty battery
faulty compressor
faulty brakes
Fictituous example There are significantly more
faulty batteries, if both air conditioning and
electrical roof top are built into the car.