BIDE: Efficient Mining of Frequent Closed Sequences - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

BIDE: Efficient Mining of Frequent Closed Sequences

Description:

Most of the frequent closed pattern mining algorithms need to maintain the set ... Costly in both runtime and space usage. BI-Directional Extension ... – PowerPoint PPT presentation

Number of Views:802
Avg rating:3.0/5.0
Slides: 24
Provided by: Kan61
Category:

less

Transcript and Presenter's Notes

Title: BIDE: Efficient Mining of Frequent Closed Sequences


1
BIDE Efficient Mining of Frequent Closed
Sequences
  • JianyongWang and Jiawei Han

2
outline
  • Frequent closed sequences
  • BIDE
  • Save space
  • Speed up
  • Conclusion

3
frequent closed sequences
Sequence identifier Sequence
1 CAABC
2 ABCB
3 CABC
4 ABBCA
  • frequent sequences
  • A4, AA2, AB4, ABB2, ABC4, AC4, B4,
    BB2,BC4, C4, CA3, CAB2, CABC2, CAC2, CB3,
    CBC2,CC2
  • frequent closed sequences
  • AA2, ABB2, ABC4, CA3, CABC2,CB3

4
(No Transcript)
5
space usage
  • Most of the frequent closed pattern mining
    algorithms need to maintain the set of already
    mined frequent closed patterns (or just
    candidates) in memory
  • subpattern checking
  • super-pattern checking

6
mining frequent closed sequences
  • Closed frequent pattern
  • Compact
  • Efficient
  • Costly in both runtime and space usage
  • BI-Directional Extension
  • mining frequent closed sequences without
    candidate maintenance

7
sequence closure checking scheme
  • if an n-sequence, Se1e2. . . en, is non-closed
  • e is a forward-extension event
  • S e1e2 . . . ene and supSDB(S) supSDB(S)
  • e is a backward-extension event
  • ?i (1 i lt n), S e1e2 . . . eieei1 . . .en
    and
  • supSDB(S) supSDB(S)
  • S ee1e2 . . . en and supSDB(S) supSDB(S)

8
Theorem
  • BI-Directional Extension closure checking
  • If there exists no forward-extension event nor
    backward extension event w.r.t. a prefix sequence
    Sp, Sp must be a closed sequence otherwise, Sp
    must be non-closed.

9
Lemma
  • Forward-extension event checking
  • For a pre-fix sequence Sp, its complete set of
    forward-extension events is equivalent to the set
    of its locally frequent items whose supports are
    equal to SUPSDB(Sp)

10
Definition
  • Projected sequence of a prefix sequence
  • the projected sequence of prefix sequence AB in
    sequence ABBCA is BCA
  • Projected database of a prefix sequence
  • the projected database of prefix sequence AB in
    our running example is C, CB, C, BCA

Sequence identifier Sequence
1 CAABC
2 ABCB
3 CABC
4 ABBCA
11
Lemma
  • Backward-extension event checking
  • First, we assume SpAC4, it is easy to find that
    item B appears in each of the 2nd maximum periods
    of Sp. As a result AC4 is not closed.
  • let SpABC4, we cannot find any
    backward-extension item for it. Also there is no
    forward-extension item for it, therefore ABC4 is
    a frequent closed sequence.

Sequence identifier Sequence
1 CAABC
2 ABCB
3 CABC
4 ABBCA
12
Definition
  • First instance of a prefix sequence
  • the first instance of the prefix sequence AB in
    sequence CAABC is CAAB
  • Last instance of a prefix sequence
  • the subsequence from the beginning of S to the
    last appearance of item ei in S
  • the last instance of the prefix sequence AB in
    sequence ABBCA is ABB

13
Definition
  • Given an input sequence S which contains a prefix
    i-sequence e1e2 . . . ei
  • The i-th last-in-last appearance w.r.t. a pre-fix
    sequence
  • it is the last appearance of ei in the last
    instance of the prefix Sp in S
  • SCAABC and SpAB, the 1st last-in-last
    appearance w.r.t. prefix Sp in S is the second A
    in S

14
Definition
  • Given an input sequence S which contains a prefix
    i-sequence e1e2 . . . ei
  • The i-th maximum period of a prefix sequence
  • if 1 lt i n, it is the piece of sequence between
    the end of the first instance of prefix e1e2 . .
    . ei-1 in S and the i-th last-in-last appearance
    w.r.t. prefix Sp
  • if i 1, it is the piece of sequence in S
    locating before the 1st last-in-last appearance
    w.r.t. prefix Sp
  • SABCB and the prefix sequence SpAB
  • 2nd maximum period of prefix Sp in S is BC,
  • 1st maximum period of prefix Sp in S is f

15
EXAMPLE
  • SCAABCBC and SpAB
  • First instance of a prefix sequence
  • CAAB
  • Last instance of a prefix sequence
  • CAABCB
  • The i-th last-in-last appearance w.r.t. a pre-fix
    sequence
  • 1st ---- 2nd A
  • 2nd ---- 2nd B
  • The i-th maximum period of a prefix sequence
  • 1st ---- CA
  • 2nd ---- ABC

16
(No Transcript)
17
Definition
  • Given an input sequence S which contains a prefix
    i-sequence e1e2 . . . ei
  • The i-th last-in-first appearance w.r.t. a prefix
    sequence
  • it is the last appearance of ei in the first
    instance of the prefix Sp in S
  • SCAABC and SpCA
  • 2nd last-in-first appearance w.r.t. prefix Sp in
    S is the first A in S

18
Definition
  • Given an input sequence S which contains a prefix
    i-sequence e1e2 . . . ei
  • The i-th semi-maximum period of a prefix sequence
  • if 1 lt i n, it is the piece of sequence between
    the end of the first instance of prefix e1e2 . .
    . ei-1 in S and the i-th last-in-first appearance
    w.r.t. prefix Sp
  • if i 1, it is the piece of sequence in S
    locating before the 1st last-in-first appearance
    w.r.t. pre-fix Sp
  • SABCB and the prefix sequence SpAC
  • 2nd semi-maximum period of prefix AC in S is B
  • 1st semi-maximum period of prefix AC in S is f

19
Theorem
  • BackScan search space pruning
  • Let the pre-fix sequence be an n-sequence,
    Spe1e2 . . . en. If ?i (1 i n) and there
    exists an item e which appears in each of the
    i-th semi-maximum periods of the prefix Sp in
    SDB, we can safely stop growing prefix Sp

Sequence identifier Sequence
1 CAABC
2 ABCB
3 CABC
4 ABBCA
20
ScanSkip
  • The ScanSkip optimization technique
  • closure checking scheme needs to scan backward a
    set of maximum-periods w.r.t. a certain prefix
  • SpABC 4
  • 3rd maximum periods is f, f, f, B gt skip last
    three
  • 2nd maximum periods is A, f, f, B gt skip last
    two
  • 1st maximum periods is CA, f, C, f gt skip last
    two

Sequence identifier Sequence
1 CAABC
2 ABCB
3 CABC
4 ABBCA
21
BIDE
  • scans the database once to find the frequent
    1-sequences
  • projected database for each frequent 1- sequence
  • BackScan pruning method
  • backward-extension-item forward-extension-item
  • output Sp as a frequent closed sequence
  • grow Sp to get a new prefix
  • projected database for the new pre-fix
  • ?BackScan and backward extension use the ScanSkip
    to speed up the mining process

22
(No Transcript)
23
Conclusion
  • BIDE
  • closed pattern mining
  • Memory space usage
  • It does not need to maintain the set of historic
    closed patterns
  • Time cost
  • BackScan
  • ScanSkip
Write a Comment
User Comments (0)
About PowerShow.com