Literal and ProRulext: Algorithms for Rule Extraction of ANNs - PowerPoint PPT Presentation

About This Presentation
Title:

Literal and ProRulext: Algorithms for Rule Extraction of ANNs

Description:

And, kinds of non-linear activation functions (sigmoid and hyperbolic tangent). 3. Experiments ... where: Sig Sigmoid, Tan Hyperbolic Tangent, Irr non ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 51
Provided by: cin71
Category:

less

Transcript and Presenter's Notes

Title: Literal and ProRulext: Algorithms for Rule Extraction of ANNs


1
Literal and ProRulext Algorithms for Rule
Extraction of ANNs
  • Paulemir G. Campos, Teresa B. Ludermir
  • E-mail pgc, tbl_at_cin.ufpe.br

2
Presentation Summary
  • 1. Introduction
  • 2. Literal and ProRulext
  • 3. Experiments
  • 4. Results
  • 5. Discussions
  • 6. Conclusions
  • Acknowledgements
  • References

3
1. Introduction
  • Main Features of Artificial Neural Networks
    (ANN)
  • Excellent capacity for generalization
  • It have been applied with success to solve
    several problems the actual world
  • It represents the domain knowledge in topology,
    weight values and bias
  • And, explaining clearly your answers is not
    available promptly (main deficiency).

4
1. Introduction
  • Usually this deficiency can be minimized through
    the IF/THEN Rule Extraction from the trained
    network (ANN Rule Extraction).
  • However, exist others hybrid models for this aim,
    such as, Evolutionary Algorithms and Neuro-Fuzzy
    Systems.

5
1. Introduction
  • This paper presents two algorithms for extraction
    of rules from trained networks Literal and
    ProRulext.
  • The Literal has as a differential to be portable.
  • The ProRulext has a relatively low computational
    cost in the rules extraction from feedforward MLP
    networks with one hidden layer.

6
2. Literal and ProRulext
  • Literal
  • Is a very simple algorithm proposed for the
    extraction of IF-THEN propositional rules from
    trained networks applied to problems of pattern
    classification and time series forecast
  • The rules are extracted through a literal mapping
    of the network input and output
  • This approach is a Pedagogical Technique (Andrews
    et al 2 Taxonomy).

7
2. Literal and ProRulext
  • Overview of the Literal Algorithm
  • 1. Make discrete the network inputs and outputs
    in intervals with the same width
  • 2. Normalize the patterns of the training set of
    network for values within 01 or -11
  • 3. Present each one of these normalized input
    patterns to the trained network obtaining the
    respective rule consequents

8
2. Literal and ProRulext
  • Overview of the Literal Algorithm (to continue)
  • 4. De-Normalize the rule antecedents and
    consequents previously obtained for original
    values of the database
  • 5. Store the new rules created in the previous
    steps in a file
  • 6. Select the input attribute with more frequent
    contents through the conclusion of the rules

9
2. Literal and ProRulext
  • Overview of the Literal Algorithm (to continue)
  • 7. Eliminate the other attributes of each one of
    these rules, guaranteeing more general rules
  • 8. Eliminate the redundant rules that can be
    obtained after the execution of steps 6 and 7

10
2. Literal and ProRulext
  • Overview of the Literal Algorithm (to continue)
  • 9. Calculate the coverage of the training set of
    each resultant rule through conclusion, based on
    the number of activations of theses rules
  • 10. Exclude the rules with 0 coverage of the
    patterns used in the training of the network from
    which the rules have been extracted originally.

11
2. Literal and ProRulext
  • ProRulext
  • Is the other algorithm proposed in this paper for
    the extraction of IF-THEN propositional rules
    from MLP networks with one hidden layer trained
    to pattern classification and time series
    forecast

12
2. Literal and ProRulext
  • ProRulext (to continue)
  • The rules are extracted by using a
    decompositional method to obtain its antecedents
    and by applying a pedagogical approach to
    determine the consequents
  • This approach is a Eclectic Technique (Andrews et
    al 2 Taxonomy).

13
2. Literal and ProRulext
  • Overview of the ProRulext Algorithm
  • 1. Make it discrete the network inputs and
    outputs in intervals with the same width
  • 2. Normalize the network input and output
    patterns of the training set for values within
    01 or -11
  • 3. Present each one of these input patterns to
    the trained network

14
2. Literal and ProRulext
  • Overview of the ProRulext Algorithm (to
    continue)
  • 4. Build the AND/OR graph of the trained network
    considering only its positive weights
  • 5. Determine the antecedents of the rules through
    the decompositional method
  • 6. Apply a pedagogical approach to find the
    consequents of these rules

15
2. Literal and ProRulext
  • Overview of the ProRulext Algorithm (to
    continue)
  • 7. De-Normalize the rule antecedents and
    consequents previously obtained for original
    values of the database
  • 8. Store the new rules created in the previous
    step in a file
  • 9. Select an input attribute with more frequent
    contents through conclusion of rules

16
2. Literal and ProRulext
  • Overview of the ProRulext Algorithm (to
    continue)
  • 10. Eliminate the other attributes of each one of
    these rules, guaranteeing more general rules
  • 11. Eliminate the redundant rules which can be
    obtained through the execution of steps 9 and 10

17
2. Literal and ProRulext
  • Overview of the ProRulext Algorithm (to
    continue)
  • 12. Calculate the coverage of the training set of
    each resulting rule through conclusion, based on
    the number of activations of these rules
  • 13. Erase the rules with 0 of coverage of the
    patterns used in the training of the network from
    which the rules have been extracted originally.

18
2. Literal and ProRulext
  • It is valid to emphasize that both algorithms
    presented have rule simplification stages (the
    last five steps of Literal and ProRulext).
  • This way it can be assured the acquisition of
    concise and legible rules from trained network
    for pattern classification and time series
    forecast.

19
3. Experiments
  • The trained networks and the respective sets of
    rules have been generated through the AHES
    (Applied Hybrid Expert System) version 1.2.1.5
    4.

20
3. Experiments
  • The models implemented in the AHES are
    feedforward MLP networks with one hidden layer
    and the rule extraction techniques BIO-RE 11,
    Geometrical 7, NeuroLinear 10, Literal 5
    and ProRulext 4.

21
3. Experiments - Databases
  • In a problem of patterns classification, it will
    be used a database about Breast Cancer from the
    Proben1 repository 6.
  • This base contains 699 cases, among which 458 are
    related to benign Breast Cancer and 241 to
    malignant Breast Cancer, each one with 10
    attributes more the Breast Cancer class.

22
3. Experiments - Databases
  • For the time series forecast problem it will be
    used a database with the Index of the Stock
    Market of São Paulo (IBOVESPA) 6.
  • The series predicted in this work will be of
    minimum with a total amount of 584 patterns.

23
3. Experiments - Databases
  • Before the experiments those bases have been
    submitted to pre-processing stages 6.
  • Thus, the Breast Cancer database remained with
    457 cases, 219 benign and 238 malignant.
  • The IBOVESPA database has the size of the time
    window indicated equal to two and the number of
    patterns has become 582.

24
3. Experiments - Databases
  • Furthermore, the databases have been normalized
    to values belonging to the interval 0 1 or
    -1 1 (depending on the activation function
    used) before the stages of training and rule
    extraction from each trained networks.

25
3. Experiments The Trained Networks
  • The MLP networks have been trained according to
    the Holdout methodology.
  • Thus, each training set contains 2/3 of the total
    normalized input and output patterns. On the
    other hand, each test set has the remaining 1/3
    of the patterns.

26
3. Experiments The Trained Networks
  • Fixed parameters during the training stage of the
    networks obtained with the Breast Cancer
    database
  • Method of weight adjusting per epochs or batch
  • Choice of the fixed initial weights among values
    within the interval -0.1 0.1
  • Moment term equal to 0.1, number of epochs
    equal to 100 and output maximum error desired
    equal to 0.01.

27
3. Experiments The Trained Networks
  • Fixed parameters during the training stage of the
    networks obtained with the IBOVESPA database
  • Method of weight adjusting per pattern or
    on-line
  • Choice of the fixed initial weights among values
    belonging to the interval -0.1 0.1
  • Without moment term number of epochs equal to
    100 and output maximum error desired equal to
    0.01.

28
3. Experiments The Trained Networks
  • Variable parameters during the training stage of
    the networks obtained with the Breast Cancer and
    IBOVESPA databases
  • Number of units of the hidden layer (1, 3 and 5)
  • Learning rate (0.1 0.5 0.9)
  • Use or not of bias
  • And, kinds of non-linear activation functions
    (sigmoid and hyperbolic tangent).

29
3. Experiments The Trained Networks
  • Trained networks selected using Breast Cancer
    database

where CM1 Network CM_Tan_NE9_Bias_4
CM2 Network CM_Sig_NE9_Bias_1.
30
3. Experiments The Trained Networks
  • Trained networks selected using IBOVESPA database

where IB1 Network IBOVESPA_Sig_Bias_2
IB2 Network IBOVESPA_Tan_4
MAE Mean Absolute Error.
31
3. Experiments Extracting Rules
  • ProRulext algorithm
  • Limits of the IF part using the two database
    0.1, 0.5 and 0.9
  • Limits of the THEN part using the Breast Cancer
    database 0.1, 0.5 and 0.9
  • And, limits of the THEN part using the IBOVESPA
    database 0.1, 0.5 and 0.8, because with
    0.9 no rule has been obtained.

32
3. Experiments Extracting Rules
  • Literal and ProRulext Algorithms
  • Quantity of intervals to make discrete numerical
    input and output attributes of the two databases
    2 (two)
  • This to obtain sets of rules as much compact as
    possible.

33
3. Experiments Extracting Rules
  • Examples of extracted rules by Literal from CM2
    Network (Breast Cancer)

34
3. Experiments Extracting Rules
  • Examples of extracted rules by ProRulext from IB1
    Network (IBOVESPA)

35
3. Experiments Extracting Rules
  • It was also obtained sets of rules with the
    BIO-RE (Bio) 11, Geometrical (Geo) 7 and
    NeuroLinear (Neuro) 10 techniques.
  • It has been done for comparison among the results
    obtained with these techniques and the ones
    presented by Literal and ProRulext.

36
4. Results
  • The best results of the sets of extracted rules
    from trained networks with Breast Cancer database

where Sig Sigmoid, Tan Hyperbolic Tangent,
Irr non relevant (Sig or Tan)
37
4. Results
  • The best results of the sets of extracted rules
    from trained networks with IBOVESPA database

where Sig Sigmoid, Tan Hyperbolic Tangent,
Irr non relevant (Sig or Tan)
38
5. Discussions
  • The results using Breast Cancer database indicate
    that the BIO-RE technique 11 has obtained sets
    of more concise, comprehensible and faithful
    rules, because the antecedents of the rules
    extracted by the Geometrical approach 7 are
    hidden units, what damages its legibility.

39
5. Discussions
  • The Literal and the ProRulext algorithms have
    presented performance compatible with the one
    obtained with the NeuroLinear technique, mainly
    recognized for extracting very faithful, compact
    and legible rules.

40
5. Discussions
  • However, the NeuroLinear was the most expensive
    computational method.
  • And the BIO-RE and Literal techniques have not
    been affected by the kind of activation function
    used in the network training.

41
5. Discussions
  • By analyzing results obtained with IBOVESPA
    database, can be concluded that all the
    investigated approaches, except by the Geometric
    technique, have offered the acquisition of sets
    of rules that are very concise, legible and
    faithful to the networks from which they have
    been obtained.

42
5. Discussions
  • It is important to mention that Literal and
    ProRulext do not have the disadvantages presented
    by the other methods investigated.
  • Besides, the algorithms proposed in this paper
    extract very expressive rules, as already
    illustrated.

43
6. Conclusions
  • It has been observed that Literal and ProRulext
    algorithms presented performance similar to the
    NeuroLinear, obtaining sets of rules that are
    concise, legible and faithful to the networks
    from which they have extracted, also with a lower
    computational cost and applicable to trained
    networks for pattern classification and time
    series forecast.

44
6. Conclusions
  • BIO-RE has obtained optimal rule sets, but it is
    only applicable to binary data or when the
    conversion to this type does not significantly
    affect the network performance 11.

45
6. Conclusions
  • Thus, as Literal and ProRulext do not have that
    limitation, these new approaches appear as
    efficient alternatives for the rule extraction
    from trained networks to justify the inferred
    outputs.

46
Acknowledgements
  • The authors thanks to CNPQ and CAPES (Brazilian
    Government Research Institutes) for financial
    support to development this research.

47
References
  • 1 R. Andrews and S. Geva, Rule Extraction from
    Local Cluster Neural Nets, Neurocomputing, vol.
    47, 2002, pp. 1-20.
  • 2 R. Andrews, A. B. Tickle and J. Diederich, A
    Survey and Critique of Techniques for Extracting
    Rules from Trained Artificial Neural Networks,
    Knowledge-Based Systems, vol. 8, n. 6, 1995, pp.
    373389.
  • 3 B. Baesens, R. Setiono, C. Mues and J.
    Vanthienen, Using Neural Network Rule Extraction
    and Decision Tables for Credit-Risk Evaluation,
    Management Science, vol. 49, 2003, pp. 312-329.

48
References
  • 4 P. G. Campos, Explanatory Mechanisms for
    ANNs as Extraction of Knowledge, Master Thesis,
    Federal University of Pernambuco, Brazil, 2005
    (In Portuguese).
  • 5 P. G. Campos and T. B. Ludermir, Literal A
    Pedagogical Technique for Rules Extraction of
    ANNs, V ENIA Brazilian Conference of
    Artificial Intelligence, São Leopoldo-RS, 2005,
    pp. 1138-1141 (In Portuguese).
  • 6 P. G. Campos, E. M. J. Oliveira, T. B.
    Ludermir and A. F. R. Araújo, MLP Networks for
    Classification and Prediction with Rule
    Extraction Mechanism, Proceedings of the
    International Joint Conference on Neural
    Networks, Budapest, 2004, pp. 1387-1392.

49
References
  • 7 Y. M. Fan and C. J. Li, Diagnostic Rule
    Extraction from Trained Feedforward Neural
    Networks, Mechanical Systems and Signal
    Processing, vol. 16, n. 6, 2002, pp. 1073-1081.
  • 8 Y. Hayashi, R. Setiono and K. Yoshida, A
    Comparison Between Two Neural Network Rule
    Extraction Techniques for the Diagnosis of
    Hepatobiliary Disorders, Artificial Intelligence
    in Medicine, vol. 20, n. 3, 2000, pp. 205-216.
  • 9 T. B. Ludermir, A. C. P. L. F. Carvalho, A.
    P. Braga et al, Hybrid Intelligent Systems, In
    S. O. Rezende (Organizer), Intelligent Systems
    Foundations and Applications, Manole, Barueri,
    2003, pp. 249-268 (In Portuguese).

50
References
  • 10 R. Setiono, H. Liu, NeuroLinear From
    Neural Networks to Oblique Decision Rules,
    Neurocomputing, vol. 17, 1997, pp. 1-24.
  • 11 I. A. Taha, J. Ghosh, Symbolic
    Interpretation of Artificial Neural Networks,
    IEEE Transactions on Knowledge and Data
    Engineering, vol. 11, n. 3, 1999, pp. 448-463.
Write a Comment
User Comments (0)
About PowerShow.com