Title: Literal and ProRulext: Algorithms for Rule Extraction of ANNs
1Literal and ProRulext Algorithms for Rule
Extraction of ANNs
- Paulemir G. Campos, Teresa B. Ludermir
- E-mail pgc, tbl_at_cin.ufpe.br
2Presentation Summary
- 1. Introduction
- 2. Literal and ProRulext
- 3. Experiments
- 4. Results
- 5. Discussions
- 6. Conclusions
- Acknowledgements
- References
31. Introduction
- Main Features of Artificial Neural Networks
(ANN) - Excellent capacity for generalization
- It have been applied with success to solve
several problems the actual world - It represents the domain knowledge in topology,
weight values and bias - And, explaining clearly your answers is not
available promptly (main deficiency).
41. Introduction
- Usually this deficiency can be minimized through
the IF/THEN Rule Extraction from the trained
network (ANN Rule Extraction). - However, exist others hybrid models for this aim,
such as, Evolutionary Algorithms and Neuro-Fuzzy
Systems.
51. Introduction
- This paper presents two algorithms for extraction
of rules from trained networks Literal and
ProRulext. - The Literal has as a differential to be portable.
- The ProRulext has a relatively low computational
cost in the rules extraction from feedforward MLP
networks with one hidden layer.
62. Literal and ProRulext
- Literal
- Is a very simple algorithm proposed for the
extraction of IF-THEN propositional rules from
trained networks applied to problems of pattern
classification and time series forecast - The rules are extracted through a literal mapping
of the network input and output - This approach is a Pedagogical Technique (Andrews
et al 2 Taxonomy).
72. Literal and ProRulext
- Overview of the Literal Algorithm
- 1. Make discrete the network inputs and outputs
in intervals with the same width - 2. Normalize the patterns of the training set of
network for values within 01 or -11 - 3. Present each one of these normalized input
patterns to the trained network obtaining the
respective rule consequents
82. Literal and ProRulext
- Overview of the Literal Algorithm (to continue)
- 4. De-Normalize the rule antecedents and
consequents previously obtained for original
values of the database - 5. Store the new rules created in the previous
steps in a file - 6. Select the input attribute with more frequent
contents through the conclusion of the rules
92. Literal and ProRulext
- Overview of the Literal Algorithm (to continue)
- 7. Eliminate the other attributes of each one of
these rules, guaranteeing more general rules - 8. Eliminate the redundant rules that can be
obtained after the execution of steps 6 and 7
102. Literal and ProRulext
- Overview of the Literal Algorithm (to continue)
- 9. Calculate the coverage of the training set of
each resultant rule through conclusion, based on
the number of activations of theses rules - 10. Exclude the rules with 0 coverage of the
patterns used in the training of the network from
which the rules have been extracted originally.
112. Literal and ProRulext
- ProRulext
- Is the other algorithm proposed in this paper for
the extraction of IF-THEN propositional rules
from MLP networks with one hidden layer trained
to pattern classification and time series
forecast
122. Literal and ProRulext
- ProRulext (to continue)
- The rules are extracted by using a
decompositional method to obtain its antecedents
and by applying a pedagogical approach to
determine the consequents - This approach is a Eclectic Technique (Andrews et
al 2 Taxonomy).
132. Literal and ProRulext
- Overview of the ProRulext Algorithm
- 1. Make it discrete the network inputs and
outputs in intervals with the same width - 2. Normalize the network input and output
patterns of the training set for values within
01 or -11 - 3. Present each one of these input patterns to
the trained network
142. Literal and ProRulext
- Overview of the ProRulext Algorithm (to
continue) - 4. Build the AND/OR graph of the trained network
considering only its positive weights - 5. Determine the antecedents of the rules through
the decompositional method - 6. Apply a pedagogical approach to find the
consequents of these rules
152. Literal and ProRulext
- Overview of the ProRulext Algorithm (to
continue) - 7. De-Normalize the rule antecedents and
consequents previously obtained for original
values of the database - 8. Store the new rules created in the previous
step in a file - 9. Select an input attribute with more frequent
contents through conclusion of rules
162. Literal and ProRulext
- Overview of the ProRulext Algorithm (to
continue) - 10. Eliminate the other attributes of each one of
these rules, guaranteeing more general rules - 11. Eliminate the redundant rules which can be
obtained through the execution of steps 9 and 10
172. Literal and ProRulext
- Overview of the ProRulext Algorithm (to
continue) - 12. Calculate the coverage of the training set of
each resulting rule through conclusion, based on
the number of activations of these rules - 13. Erase the rules with 0 of coverage of the
patterns used in the training of the network from
which the rules have been extracted originally.
182. Literal and ProRulext
- It is valid to emphasize that both algorithms
presented have rule simplification stages (the
last five steps of Literal and ProRulext). - This way it can be assured the acquisition of
concise and legible rules from trained network
for pattern classification and time series
forecast.
193. Experiments
- The trained networks and the respective sets of
rules have been generated through the AHES
(Applied Hybrid Expert System) version 1.2.1.5
4.
203. Experiments
- The models implemented in the AHES are
feedforward MLP networks with one hidden layer
and the rule extraction techniques BIO-RE 11,
Geometrical 7, NeuroLinear 10, Literal 5
and ProRulext 4.
213. Experiments - Databases
- In a problem of patterns classification, it will
be used a database about Breast Cancer from the
Proben1 repository 6. - This base contains 699 cases, among which 458 are
related to benign Breast Cancer and 241 to
malignant Breast Cancer, each one with 10
attributes more the Breast Cancer class.
223. Experiments - Databases
- For the time series forecast problem it will be
used a database with the Index of the Stock
Market of São Paulo (IBOVESPA) 6. - The series predicted in this work will be of
minimum with a total amount of 584 patterns.
233. Experiments - Databases
- Before the experiments those bases have been
submitted to pre-processing stages 6. - Thus, the Breast Cancer database remained with
457 cases, 219 benign and 238 malignant. - The IBOVESPA database has the size of the time
window indicated equal to two and the number of
patterns has become 582.
243. Experiments - Databases
- Furthermore, the databases have been normalized
to values belonging to the interval 0 1 or
-1 1 (depending on the activation function
used) before the stages of training and rule
extraction from each trained networks.
253. Experiments The Trained Networks
- The MLP networks have been trained according to
the Holdout methodology. - Thus, each training set contains 2/3 of the total
normalized input and output patterns. On the
other hand, each test set has the remaining 1/3
of the patterns.
263. Experiments The Trained Networks
- Fixed parameters during the training stage of the
networks obtained with the Breast Cancer
database - Method of weight adjusting per epochs or batch
- Choice of the fixed initial weights among values
within the interval -0.1 0.1 - Moment term equal to 0.1, number of epochs
equal to 100 and output maximum error desired
equal to 0.01.
273. Experiments The Trained Networks
- Fixed parameters during the training stage of the
networks obtained with the IBOVESPA database - Method of weight adjusting per pattern or
on-line - Choice of the fixed initial weights among values
belonging to the interval -0.1 0.1 - Without moment term number of epochs equal to
100 and output maximum error desired equal to
0.01.
283. Experiments The Trained Networks
- Variable parameters during the training stage of
the networks obtained with the Breast Cancer and
IBOVESPA databases - Number of units of the hidden layer (1, 3 and 5)
- Learning rate (0.1 0.5 0.9)
- Use or not of bias
- And, kinds of non-linear activation functions
(sigmoid and hyperbolic tangent).
293. Experiments The Trained Networks
- Trained networks selected using Breast Cancer
database
where CM1 Network CM_Tan_NE9_Bias_4
CM2 Network CM_Sig_NE9_Bias_1.
303. Experiments The Trained Networks
- Trained networks selected using IBOVESPA database
where IB1 Network IBOVESPA_Sig_Bias_2
IB2 Network IBOVESPA_Tan_4
MAE Mean Absolute Error.
313. Experiments Extracting Rules
- ProRulext algorithm
- Limits of the IF part using the two database
0.1, 0.5 and 0.9 - Limits of the THEN part using the Breast Cancer
database 0.1, 0.5 and 0.9 - And, limits of the THEN part using the IBOVESPA
database 0.1, 0.5 and 0.8, because with
0.9 no rule has been obtained.
323. Experiments Extracting Rules
- Literal and ProRulext Algorithms
- Quantity of intervals to make discrete numerical
input and output attributes of the two databases
2 (two) - This to obtain sets of rules as much compact as
possible.
333. Experiments Extracting Rules
- Examples of extracted rules by Literal from CM2
Network (Breast Cancer)
343. Experiments Extracting Rules
- Examples of extracted rules by ProRulext from IB1
Network (IBOVESPA)
353. Experiments Extracting Rules
- It was also obtained sets of rules with the
BIO-RE (Bio) 11, Geometrical (Geo) 7 and
NeuroLinear (Neuro) 10 techniques. - It has been done for comparison among the results
obtained with these techniques and the ones
presented by Literal and ProRulext.
364. Results
- The best results of the sets of extracted rules
from trained networks with Breast Cancer database
where Sig Sigmoid, Tan Hyperbolic Tangent,
Irr non relevant (Sig or Tan)
374. Results
- The best results of the sets of extracted rules
from trained networks with IBOVESPA database
where Sig Sigmoid, Tan Hyperbolic Tangent,
Irr non relevant (Sig or Tan)
385. Discussions
- The results using Breast Cancer database indicate
that the BIO-RE technique 11 has obtained sets
of more concise, comprehensible and faithful
rules, because the antecedents of the rules
extracted by the Geometrical approach 7 are
hidden units, what damages its legibility.
395. Discussions
- The Literal and the ProRulext algorithms have
presented performance compatible with the one
obtained with the NeuroLinear technique, mainly
recognized for extracting very faithful, compact
and legible rules.
405. Discussions
- However, the NeuroLinear was the most expensive
computational method. - And the BIO-RE and Literal techniques have not
been affected by the kind of activation function
used in the network training.
415. Discussions
- By analyzing results obtained with IBOVESPA
database, can be concluded that all the
investigated approaches, except by the Geometric
technique, have offered the acquisition of sets
of rules that are very concise, legible and
faithful to the networks from which they have
been obtained.
425. Discussions
- It is important to mention that Literal and
ProRulext do not have the disadvantages presented
by the other methods investigated. - Besides, the algorithms proposed in this paper
extract very expressive rules, as already
illustrated.
436. Conclusions
- It has been observed that Literal and ProRulext
algorithms presented performance similar to the
NeuroLinear, obtaining sets of rules that are
concise, legible and faithful to the networks
from which they have extracted, also with a lower
computational cost and applicable to trained
networks for pattern classification and time
series forecast.
446. Conclusions
- BIO-RE has obtained optimal rule sets, but it is
only applicable to binary data or when the
conversion to this type does not significantly
affect the network performance 11.
456. Conclusions
- Thus, as Literal and ProRulext do not have that
limitation, these new approaches appear as
efficient alternatives for the rule extraction
from trained networks to justify the inferred
outputs.
46Acknowledgements
- The authors thanks to CNPQ and CAPES (Brazilian
Government Research Institutes) for financial
support to development this research.
47References
- 1 R. Andrews and S. Geva, Rule Extraction from
Local Cluster Neural Nets, Neurocomputing, vol.
47, 2002, pp. 1-20. - 2 R. Andrews, A. B. Tickle and J. Diederich, A
Survey and Critique of Techniques for Extracting
Rules from Trained Artificial Neural Networks,
Knowledge-Based Systems, vol. 8, n. 6, 1995, pp.
373389. - 3 B. Baesens, R. Setiono, C. Mues and J.
Vanthienen, Using Neural Network Rule Extraction
and Decision Tables for Credit-Risk Evaluation,
Management Science, vol. 49, 2003, pp. 312-329.
48References
- 4 P. G. Campos, Explanatory Mechanisms for
ANNs as Extraction of Knowledge, Master Thesis,
Federal University of Pernambuco, Brazil, 2005
(In Portuguese). - 5 P. G. Campos and T. B. Ludermir, Literal A
Pedagogical Technique for Rules Extraction of
ANNs, V ENIA Brazilian Conference of
Artificial Intelligence, São Leopoldo-RS, 2005,
pp. 1138-1141 (In Portuguese). - 6 P. G. Campos, E. M. J. Oliveira, T. B.
Ludermir and A. F. R. Araújo, MLP Networks for
Classification and Prediction with Rule
Extraction Mechanism, Proceedings of the
International Joint Conference on Neural
Networks, Budapest, 2004, pp. 1387-1392.
49References
- 7 Y. M. Fan and C. J. Li, Diagnostic Rule
Extraction from Trained Feedforward Neural
Networks, Mechanical Systems and Signal
Processing, vol. 16, n. 6, 2002, pp. 1073-1081. - 8 Y. Hayashi, R. Setiono and K. Yoshida, A
Comparison Between Two Neural Network Rule
Extraction Techniques for the Diagnosis of
Hepatobiliary Disorders, Artificial Intelligence
in Medicine, vol. 20, n. 3, 2000, pp. 205-216. - 9 T. B. Ludermir, A. C. P. L. F. Carvalho, A.
P. Braga et al, Hybrid Intelligent Systems, In
S. O. Rezende (Organizer), Intelligent Systems
Foundations and Applications, Manole, Barueri,
2003, pp. 249-268 (In Portuguese).
50References
- 10 R. Setiono, H. Liu, NeuroLinear From
Neural Networks to Oblique Decision Rules,
Neurocomputing, vol. 17, 1997, pp. 1-24. - 11 I. A. Taha, J. Ghosh, Symbolic
Interpretation of Artificial Neural Networks,
IEEE Transactions on Knowledge and Data
Engineering, vol. 11, n. 3, 1999, pp. 448-463.