?Y Li - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

?Y Li

Description:

Icy. Misty. 1. N. Consequence Accident. Time. Driving condition ... if the weather is foggy and road is icy. then the accident occurred at night' in 140 cases. ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 33
Provided by: Roe52
Category:
Tags: icy

less

Transcript and Presenter's Notes

Title: ?Y Li


1
Interpretations of Association Rules by Granular
Computing
  • Yuefeng Li
  • Ning Zhong

2
Data Mining
  • Data mining, which is also referred to as
    knowledge discovery in database is a process of
    nontrivial extraction of implicit, previously
    unknown and potentially useful information
    (patterns) from data in databases
  • Typical approaches
  • Data classification
  • Data clustering
  • Association rules mining

3
Association Rules
  • The objective of mining association rules
  • to discover all rules that have support and
    confidence greater than the user-specified
    minimum support and minimum confidence
  • The form of a rule is
  • A1 ? A2 ? ? Am ? B1 ? B2 ? ? Bm ,
  • where Ai and Bj are sets of attributes values
    from the relevant datasets in a database.

4
Association Rules cont.
  • A ? B is an interesting rule iff P(BA) P(B) is
    greater than a suitable constant.
  • Criteria
  • Frequency of occurrence is a well-accepted
    criterion.
  • The rules should reflect real world phenomena,
    that is, data mining is to find interesting, real
    world patterns.
  • It is desirable to use some mathematical models
    to interpret association rules in order to obtain
    useful patterns.

5
Meaning of Association Rules
  • Patterns
  • ABC, ABD, AEF, BCD
  • How to use these patterns for reasoning in a
    system?

A
C
B
D
F
E
D
B
C
6
Compress a Database to a Decision Table
Table 1 A Decision Table
7
An Example cont.
  • Attributes driver, vehicle type, weather,
    road, time, accident
  • Condition attributes weather, road
  • Decision attributes time, accident
  • Decision rules, e.g.,
  • if the weather is foggy and road is icy
  • then the accident occurred at night in
    140 cases.

8
Formalization Rough Sets
  • S (U, A) -- an information system
  • U, a database, a set of records.
  • A, a set of attributes and
  • There is a function for every attribute a?A such
    that a U ? Va, where Va is the set of all values
    of a. We call Va the domain of a.

9
Formalization Rough Sets cont.
  • B-granule
  • Let B be a subset of A. B determines a binary
    relation I(B) on U such that (x, y) ? I(B) if and
    only if a(x) a(y) for every a?B, where a(x)
    denotes the value of attribute a for element x?U.
  • I(B) is an equivalence relation, it determines
    a family of all equivalence classes of I(B)
  • The partition determined by B, is denoted by U/B.
  • The classes in U/B are referred to B-granules.
  • The class which contains x is called B-granule
    induced by x, and is denoted by B(x).

10
Formalization Rough Sets cont.
  • (U, C, D) is called a decision table of (U, A),
    iff
  • C ? D ? A, where C, condition attributes, and D,
    decision attributes, are disjoint sets of A.
  • C(x) and D(x) indicate the condition granule and
    the decision granule induced by x, respectively.
  • L is a language defined using attributes of A, an
    atomic formula is given by a v, where a ? A and
    v ?Va.
  • Formulas can be also formed by logical negation,
    conjunction and disjunction.
  • A formula is called a basic formula in this paper
    if it is an atomic formula or is formed only by
    conjunction.

11
Formalization Rough Sets cont.
  • In Table 1, if C weather, road and
  • D time, accident,
  • then we have
  • U/C 1, 7, 2, 5, 3, 6, 4 c1, c2,
    c3, c4 the set of condition granules
  • U/D 1, 2, 3, 7, 4, 5, 6 d1, d2,
    d3, d4 the set of decision granules
  • (U, C, D) is a decision table of (U, A), where U
    is a database which includes 1000 records.

12
Pawlaks Interpretation
  • Assumption - Each fact in the decision table is a
    subset of U in which all elements have the same
    values for all attributes
  • Every class f determines a rule f(C ) ? f(D).
  • The strength of the decision rule f(C ) ? f(D)
    is defined as C(f)?D(f) / U and
  • The certainty factor of the decision rule is
    defined as C(f)?D(f) / C(f) .

Z. Pawlak, In pursuit of patterns in data
reasoning from data, the rough set way, 3rd
International Conference on Rough Sets and
Current Trends in Computing, USA, 2002, 1-9.
13
Pawlaks Interpretation cont.
c1 1,7
d1 1
1
7
c2 2,5
d2 2,3,7
2
3
c3 3,6
d3 4
5
6
c4 4
d4 5,6
4
14
Pawlaks Interpretation cont.
Table 2. Strengths and certainty factors of
decision rules
15
Extended Random Sets
  • The relationships between the premises and
    the conclusions of decision rules.
  • c1 ? (d1, 80/100), (d2, 20/100)
  • c2 ? (d2, 140/160), (d4, 20/160)
  • c3 ? (d2 , 40/240), (d4, 200/240)
  • c4 ? (d3, 500/500)

Y. Li, Extended random sets for knowledge
discovery in information system, in Proc. the
9th International Conference on Rough Sets, Fuzzy
Sets, Data Mining and Granular Computing, China,
2003, 524-532.
16
Extended Random Sets cont.
  • We use a mapping to formalize the relationship

and
for all ci?U/C.
17
Extended Random Sets cont.
  • Use the frequency in the decision table for
    support degree of each condition granule. We
    have

for every condition granule ci, where, Nx is the
number of analogous cases of fact x. By
normalizing, we can get a probability function P
on U/C such that
18
Extended Random Sets cont.
  • We call the pair (?, P) an extended random set.
  • For a given condition granule ci, we assume

we can obtain the following decision rules
19
Extended Random Sets cont.
  • We define the strengths of the decision rules are

And, the corresponding certainty factors are
20
Extended Random Sets cont.
  • A decision rule

is an interesting rule if
is greater than a suitable constant.
21
Extended Random Sets cont.
  • where,

We can prove that pr is a probability function on
(U/D).
22
Extended Random Sets cont.
  • Example of an extended random set

23
Extended Random Sets cont.
Table 3. Probability function on the set of
decision granules
24
Extended Random Sets cont.
Table 4. Interesting rules
25
Interpretation of Extended Random Sets
  • A very interesting phenomena from Table 3
  • Only some descriptions on the set of decision
    granules are meaningful for a given information
    system if we use or to combine decision
    granules.
  • e.g.,
  • d1 or d2 -- (accident yes)
  • d2 or d3 -- ?
  • The concept of meaningful
  • A description X on the set of decision granules
    of decision table (U, C, D) is meaningful if
    there is a decision table (U, E, F), such that E
    ? C, and X ? F.

26
Interpretation of Extended Random Sets cont.
  • The derived random set (?, P) from the extended
    random set (?, P)

It can determines a Dempster-Shafer mass function
m on ? such that
27
Interpretation of Extended Random Sets cont.
Table 5. Uncertain measures on the set of
decision granules
28
Interpretation of Extended Random Sets cont.
29
Algorithm 1 from Pawlaks Method
  • let UN 0
  • for (i 1 to n ) // n is the number of classes
  • UN UN Ni
  • for (i 1 to n)
  • strength(i) Ni/UN CN Ni
  • for (j 1 to n)
  • if ((j ? i) and (fj(C) fi(C)))
  • CN CN Nj
  • certainty_factor(i) Ni/CN
  • .

30
Algorithm 2 from extended random sets
  • let UN 0, U/C ?
  • for (i 1 to n)
  • UN UN Ni
  • for (i 1 to n do ) // create the data structure
  • if (fi(C)? U/C)
  • insert((fi(D), Ni)) to ?(fi(C))
  • else
  • add(fi(C)) into U/C, and set ?(fi(C))?
  • for (i 1 to U/C)
  • P(ci) (1/UN ) ? ()
  • for (i 1 to U/C) // normalization
  • temp 0
  • for (j 1 to ?(ci))
  • temp temp sndi,j
  • for (j 1 to ?(ci))
  • sndi,j sndi,j/temp
  • for (i 1 to U/C) // calculate rule strengths
  • for (j 1 to ?(ci))
  • strength(ci?fsti,j) P(ci) ? sndi,j

31
Algorithm Analysis
  • Algorithm 1
  • time complexity is O(n2), where n is the number
    of classes in the decision table.
  • Algorithm 2
  • the time complexity is O(n?U/C)
  • U/C n, Algorithm 2 is better than Algorithm
    1 for the time complexity.

32
Summary
  • The advantages of our approach can be summarized
    as follows
  • It provides a new algorithm to calculate decision
    rules, which is faster than Pawlaks algorithm
  • In addition to the well-accepted criterion
    frequencies, the extended random sets are
    easily to include other criteria when determining
    association rules
  • The extended random sets can provide more than
    one measures for dealing with uncertainties in
    the association rules. This is a significant
    distinguished characteristic from other methods.
Write a Comment
User Comments (0)
About PowerShow.com