Permu-pattern :discovery of mutable permutation patterns with proximity constraints - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Permu-pattern :discovery of mutable permutation patterns with proximity constraints

Description:

Permu-pattern :discovery of mutable permutation patterns with proximity constraints ... Mutable permutation patterns: the order of symbols in a pattern could be ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 18
Provided by: makingCsi
Category:

less

Transcript and Presenter's Notes

Title: Permu-pattern :discovery of mutable permutation patterns with proximity constraints


1
Permu-pattern discovery of mutable permutation
patterns with proximity constraints
  • Meng Hu , Jiong Yang , Wei Su
  • KDD 2008

2
Outline
  • Introduction
  • Problem definition (Mutable set and permu-pattern
    )
  • The permu-pattern algorithm
  • Experimental results
  • Conclusions

3
Introduction
  • In some applications different symbols may be
    considered the same due to their similarity
  • ex In text mining , a set of synonymous words is
    a group of distinct symbols that should be
    treated as the same
  • Mutable permutation patterns the order of
    symbols in a pattern could be altered in
    sequences and some symbols are interchangeable
    with other symbols
  • Ex Google bought Youtube
  • Youtube was acquired by Google
  • Google, buy, acquire, Youtube
  • Advantage It can capture not only the total
    order of symbols but also the permutated order of
    symbols

4
Problem definition
  • Sequence data
  • lts1,g1,s2,g2,sngt si symbols gi gap
  • In text si keyword gi the gap between two
    keywords
  • This paper focus on
  • (1).the order of the symbols may not be important
  • (2).the gap between two symbols is important

5
Cont.
6
(No Transcript)
7
Cont.
  • example
  • PPs1,s2,s3,s4,Tgap500
  • S1s1,100,s2,250,s4,300,s3
  • (s1,s4,s3) as a subsequence because the gap
    between s1 and s4,s4 and s3 are both below 500
  • then S1 support PP
  • S2(s1,100,s5,500,s4,300,s3)
  • The distance between s1 and s4 is 600gtTgap
  • then S2 does not support PP

8
Cont.
9
The permu-pattern algorithm
  • Apriori-based algorithm can not be used to
    discover the frequent permutation pattern
  • Ex S1(A,100,C,100,B) ,S2(B,100,C,100,A)
  • Tsup2 ,Tgap150
  • A,B is not a frequent mutable
    permutation pattern
  • A,B,C is a frequent mutable
    permutation pattern

10
Cont.
  • Reachable
  • exS1(s2,100,s1,500,s4,100,s6,100,s7,800,s3)

Intermediate set
11
Cont.
  • S1(s2,100,s1,500,s4,100,s6,100,s7,800,s3)
  • S2(s1,50,s4,100,s2,300,s7,150,s4,150,s6)
  • S3(s2,100,s1,150,s4,500,s5,100,s6,100,s7)
  • s3,s7 mutable set ,Tsup3, Tgap200

s1 s2 s3,s7 s4 s5 s6
3 3 3 3 1 3
Support of each mutable set
12
Pruning strategies
13
(No Transcript)
14
Data structure for reachable case
Starting mutable set
ending mutable set
intermediate mutable set
15
Experimental results
16
Cont.
17
Conclusions
  • It can be used in many application such as
    biological sequence analysis and text mining
  • The gap threshold can be avoid through the
    following methods
  • 1.remove stopwords from the documents when
    transforming a document into a sequence
  • 2.treating certain types of punctuation as large
    gap
Write a Comment
User Comments (0)
About PowerShow.com