Literal and ProRulext: Algorithms for Rule Extraction of ANNs - PowerPoint PPT Presentation

About This Presentation

Title:

Literal and ProRulext: Algorithms for Rule Extraction of ANNs

Description:

And, kinds of non-linear activation functions (sigmoid and hyperbolic tangent). 3. Experiments ... where: Sig Sigmoid, Tan Hyperbolic Tangent, Irr non ... – PowerPoint PPT presentation

Number of Views:74

Avg rating:3.0/5.0

Slides: 51

Provided by: cin71

Category:

more less

Transcript and Presenter's Notes

Title: Literal and ProRulext: Algorithms for Rule Extraction of ANNs

1
Literal and ProRulext Algorithms for Rule
Extraction of ANNs

Paulemir G. Campos, Teresa B. Ludermir
E-mail pgc, tbl_at_cin.ufpe.br

2
Presentation Summary

1. Introduction
2. Literal and ProRulext
3. Experiments
4. Results
5. Discussions
6. Conclusions
Acknowledgements
References

3
1. Introduction

Main Features of Artificial Neural Networks
(ANN)
Excellent capacity for generalization
It have been applied with success to solve
several problems the actual world
It represents the domain knowledge in topology,
weight values and bias
And, explaining clearly your answers is not
available promptly (main deficiency).

4
1. Introduction

Usually this deficiency can be minimized through
the IF/THEN Rule Extraction from the trained
network (ANN Rule Extraction).
However, exist others hybrid models for this aim,
such as, Evolutionary Algorithms and Neuro-Fuzzy
Systems.

5
1. Introduction

This paper presents two algorithms for extraction
of rules from trained networks Literal and
ProRulext.
The Literal has as a differential to be portable.
The ProRulext has a relatively low computational
cost in the rules extraction from feedforward MLP
networks with one hidden layer.

6
2. Literal and ProRulext

Literal
Is a very simple algorithm proposed for the
extraction of IF-THEN propositional rules from
trained networks applied to problems of pattern
classification and time series forecast
The rules are extracted through a literal mapping
of the network input and output
This approach is a Pedagogical Technique (Andrews
et al 2 Taxonomy).

7
2. Literal and ProRulext

Overview of the Literal Algorithm
1. Make discrete the network inputs and outputs
in intervals with the same width
2. Normalize the patterns of the training set of
network for values within 01 or -11
3. Present each one of these normalized input
patterns to the trained network obtaining the
respective rule consequents

8
2. Literal and ProRulext

Overview of the Literal Algorithm (to continue)
4. De-Normalize the rule antecedents and
consequents previously obtained for original
values of the database
5. Store the new rules created in the previous
steps in a file
6. Select the input attribute with more frequent
contents through the conclusion of the rules

9
2. Literal and ProRulext

Overview of the Literal Algorithm (to continue)
7. Eliminate the other attributes of each one of
these rules, guaranteeing more general rules
8. Eliminate the redundant rules that can be
obtained after the execution of steps 6 and 7

10
2. Literal and ProRulext

Overview of the Literal Algorithm (to continue)
9. Calculate the coverage of the training set of
each resultant rule through conclusion, based on
the number of activations of theses rules
10. Exclude the rules with 0 coverage of the
patterns used in the training of the network from
which the rules have been extracted originally.

11
2. Literal and ProRulext

ProRulext
Is the other algorithm proposed in this paper for
the extraction of IF-THEN propositional rules
from MLP networks with one hidden layer trained
to pattern classification and time series
forecast

12
2. Literal and ProRulext

ProRulext (to continue)
The rules are extracted by using a
decompositional method to obtain its antecedents
and by applying a pedagogical approach to
determine the consequents
This approach is a Eclectic Technique (Andrews et
al 2 Taxonomy).

13
2. Literal and ProRulext

Overview of the ProRulext Algorithm
1. Make it discrete the network inputs and
outputs in intervals with the same width
2. Normalize the network input and output
patterns of the training set for values within
01 or -11
3. Present each one of these input patterns to
the trained network

14
2. Literal and ProRulext

Overview of the ProRulext Algorithm (to
continue)
4. Build the AND/OR graph of the trained network
considering only its positive weights
5. Determine the antecedents of the rules through
the decompositional method
6. Apply a pedagogical approach to find the
consequents of these rules

15
2. Literal and ProRulext

Overview of the ProRulext Algorithm (to
continue)
7. De-Normalize the rule antecedents and
consequents previously obtained for original
values of the database
8. Store the new rules created in the previous
step in a file
9. Select an input attribute with more frequent
contents through conclusion of rules

16
2. Literal and ProRulext

Overview of the ProRulext Algorithm (to
continue)
10. Eliminate the other attributes of each one of
these rules, guaranteeing more general rules
11. Eliminate the redundant rules which can be
obtained through the execution of steps 9 and 10

17
2. Literal and ProRulext

Overview of the ProRulext Algorithm (to
continue)
12. Calculate the coverage of the training set of
each resulting rule through conclusion, based on
the number of activations of these rules
13. Erase the rules with 0 of coverage of the
patterns used in the training of the network from
which the rules have been extracted originally.

18
2. Literal and ProRulext

It is valid to emphasize that both algorithms
presented have rule simplification stages (the
last five steps of Literal and ProRulext).
This way it can be assured the acquisition of
concise and legible rules from trained network
for pattern classification and time series
forecast.

19
3. Experiments

The trained networks and the respective sets of
rules have been generated through the AHES
(Applied Hybrid Expert System) version 1.2.1.5
4.

20
3. Experiments

The models implemented in the AHES are
feedforward MLP networks with one hidden layer
and the rule extraction techniques BIO-RE 11,
Geometrical 7, NeuroLinear 10, Literal 5
and ProRulext 4.

21
3. Experiments - Databases

In a problem of patterns classification, it will
be used a database about Breast Cancer from the
Proben1 repository 6.
This base contains 699 cases, among which 458 are
related to benign Breast Cancer and 241 to
malignant Breast Cancer, each one with 10
attributes more the Breast Cancer class.

22
3. Experiments - Databases

For the time series forecast problem it will be
used a database with the Index of the Stock
Market of São Paulo (IBOVESPA) 6.
The series predicted in this work will be of
minimum with a total amount of 584 patterns.

23
3. Experiments - Databases

Before the experiments those bases have been
submitted to pre-processing stages 6.
Thus, the Breast Cancer database remained with
457 cases, 219 benign and 238 malignant.
The IBOVESPA database has the size of the time
window indicated equal to two and the number of
patterns has become 582.

24
3. Experiments - Databases

Furthermore, the databases have been normalized
to values belonging to the interval 0 1 or
-1 1 (depending on the activation function
used) before the stages of training and rule
extraction from each trained networks.

25
3. Experiments The Trained Networks

The MLP networks have been trained according to
the Holdout methodology.
Thus, each training set contains 2/3 of the total
normalized input and output patterns. On the
other hand, each test set has the remaining 1/3
of the patterns.

26
3. Experiments The Trained Networks

Fixed parameters during the training stage of the
networks obtained with the Breast Cancer
database
Method of weight adjusting per epochs or batch
Choice of the fixed initial weights among values
within the interval -0.1 0.1
Moment term equal to 0.1, number of epochs
equal to 100 and output maximum error desired
equal to 0.01.

27
3. Experiments The Trained Networks

Fixed parameters during the training stage of the
networks obtained with the IBOVESPA database
Method of weight adjusting per pattern or
on-line
Choice of the fixed initial weights among values
belonging to the interval -0.1 0.1
Without moment term number of epochs equal to
100 and output maximum error desired equal to
0.01.

28
3. Experiments The Trained Networks

Variable parameters during the training stage of
the networks obtained with the Breast Cancer and
IBOVESPA databases
Number of units of the hidden layer (1, 3 and 5)
Learning rate (0.1 0.5 0.9)
Use or not of bias
And, kinds of non-linear activation functions
(sigmoid and hyperbolic tangent).

29
3. Experiments The Trained Networks

Trained networks selected using Breast Cancer
database

where CM1 Network CM_Tan_NE9_Bias_4
CM2 Network CM_Sig_NE9_Bias_1.
30
3. Experiments The Trained Networks

Trained networks selected using IBOVESPA database

where IB1 Network IBOVESPA_Sig_Bias_2
IB2 Network IBOVESPA_Tan_4
MAE Mean Absolute Error.
31
3. Experiments Extracting Rules

ProRulext algorithm
Limits of the IF part using the two database
0.1, 0.5 and 0.9
Limits of the THEN part using the Breast Cancer
database 0.1, 0.5 and 0.9
And, limits of the THEN part using the IBOVESPA
database 0.1, 0.5 and 0.8, because with
0.9 no rule has been obtained.

32
3. Experiments Extracting Rules

Literal and ProRulext Algorithms
Quantity of intervals to make discrete numerical
input and output attributes of the two databases
2 (two)
This to obtain sets of rules as much compact as
possible.

33
3. Experiments Extracting Rules

Examples of extracted rules by Literal from CM2
Network (Breast Cancer)

34
3. Experiments Extracting Rules

Examples of extracted rules by ProRulext from IB1
Network (IBOVESPA)

35
3. Experiments Extracting Rules

It was also obtained sets of rules with the
BIO-RE (Bio) 11, Geometrical (Geo) 7 and
NeuroLinear (Neuro) 10 techniques.
It has been done for comparison among the results
obtained with these techniques and the ones
presented by Literal and ProRulext.

36
4. Results

The best results of the sets of extracted rules
from trained networks with Breast Cancer database

where Sig Sigmoid, Tan Hyperbolic Tangent,
Irr non relevant (Sig or Tan)
37
4. Results

The best results of the sets of extracted rules
from trained networks with IBOVESPA database

where Sig Sigmoid, Tan Hyperbolic Tangent,
Irr non relevant (Sig or Tan)
38
5. Discussions

The results using Breast Cancer database indicate
that the BIO-RE technique 11 has obtained sets
of more concise, comprehensible and faithful
rules, because the antecedents of the rules
extracted by the Geometrical approach 7 are
hidden units, what damages its legibility.

39
5. Discussions

The Literal and the ProRulext algorithms have
presented performance compatible with the one
obtained with the NeuroLinear technique, mainly
recognized for extracting very faithful, compact
and legible rules.

40
5. Discussions

However, the NeuroLinear was the most expensive
computational method.
And the BIO-RE and Literal techniques have not
been affected by the kind of activation function
used in the network training.

41
5. Discussions

By analyzing results obtained with IBOVESPA
database, can be concluded that all the
investigated approaches, except by the Geometric
technique, have offered the acquisition of sets
of rules that are very concise, legible and
faithful to the networks from which they have
been obtained.

42
5. Discussions

It is important to mention that Literal and
ProRulext do not have the disadvantages presented
by the other methods investigated.
Besides, the algorithms proposed in this paper
extract very expressive rules, as already
illustrated.

43
6. Conclusions

It has been observed that Literal and ProRulext
algorithms presented performance similar to the
NeuroLinear, obtaining sets of rules that are
concise, legible and faithful to the networks
from which they have extracted, also with a lower
computational cost and applicable to trained
networks for pattern classification and time
series forecast.

44
6. Conclusions

BIO-RE has obtained optimal rule sets, but it is
only applicable to binary data or when the
conversion to this type does not significantly
affect the network performance 11.

45
6. Conclusions

Thus, as Literal and ProRulext do not have that
limitation, these new approaches appear as
efficient alternatives for the rule extraction
from trained networks to justify the inferred
outputs.

46
Acknowledgements

The authors thanks to CNPQ and CAPES (Brazilian
Government Research Institutes) for financial
support to development this research.

47
References

1 R. Andrews and S. Geva, Rule Extraction from
Local Cluster Neural Nets, Neurocomputing, vol.
47, 2002, pp. 1-20.
2 R. Andrews, A. B. Tickle and J. Diederich, A
Survey and Critique of Techniques for Extracting
Rules from Trained Artificial Neural Networks,
Knowledge-Based Systems, vol. 8, n. 6, 1995, pp.
373389.
3 B. Baesens, R. Setiono, C. Mues and J.
Vanthienen, Using Neural Network Rule Extraction
and Decision Tables for Credit-Risk Evaluation,
Management Science, vol. 49, 2003, pp. 312-329.

48
References

4 P. G. Campos, Explanatory Mechanisms for
ANNs as Extraction of Knowledge, Master Thesis,
Federal University of Pernambuco, Brazil, 2005
(In Portuguese).
5 P. G. Campos and T. B. Ludermir, Literal A
Pedagogical Technique for Rules Extraction of
ANNs, V ENIA Brazilian Conference of
Artificial Intelligence, São Leopoldo-RS, 2005,
pp. 1138-1141 (In Portuguese).
6 P. G. Campos, E. M. J. Oliveira, T. B.
Ludermir and A. F. R. Araújo, MLP Networks for
Classification and Prediction with Rule
Extraction Mechanism, Proceedings of the
International Joint Conference on Neural
Networks, Budapest, 2004, pp. 1387-1392.

49
References

7 Y. M. Fan and C. J. Li, Diagnostic Rule
Extraction from Trained Feedforward Neural
Networks, Mechanical Systems and Signal
Processing, vol. 16, n. 6, 2002, pp. 1073-1081.
8 Y. Hayashi, R. Setiono and K. Yoshida, A
Comparison Between Two Neural Network Rule
Extraction Techniques for the Diagnosis of
Hepatobiliary Disorders, Artificial Intelligence
in Medicine, vol. 20, n. 3, 2000, pp. 205-216.
9 T. B. Ludermir, A. C. P. L. F. Carvalho, A.
P. Braga et al, Hybrid Intelligent Systems, In
S. O. Rezende (Organizer), Intelligent Systems
Foundations and Applications, Manole, Barueri,
2003, pp. 249-268 (In Portuguese).

50
References

10 R. Setiono, H. Liu, NeuroLinear From
Neural Networks to Oblique Decision Rules,
Neurocomputing, vol. 17, 1997, pp. 1-24.
11 I. A. Taha, J. Ghosh, Symbolic
Interpretation of Artificial Neural Networks,
IEEE Transactions on Knowledge and Data
Engineering, vol. 11, n. 3, 1999, pp. 448-463.

Write a Comment

User Comments (0)