Association Analysis 5 Mining Word Associations - PowerPoint PPT Presentation

About This Presentation
Title:

Association Analysis 5 Mining Word Associations

Description:

Convert into 0/1 matrix and then apply existing algorithms ... Anti-monotone property of Support. Example: s({W1}) = 0.4 0 0.4 0 0.2 = 1 ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 7
Provided by: alext8
Category:

less

Transcript and Presenter's Notes

Title: Association Analysis 5 Mining Word Associations


1
Association Analysis (5)(Mining Word
Associations)
2
Mining word associations (in Web)
Document-term matrix Frequency of words in a
document
  • Itemset here is a collection of words
  • Transactions are the documents.
  • Example
  • W1 and W2 tend to appear together in the same
    documents.
  • Potential solution for mining frequent itemsets
  • Convert into 0/1 matrix and then apply existing
    algorithms
  • Ok, but looses word frequency information

3
Normalize First
  • How to determine the support of a word?
  • First, normalize the word vectors
  • Each word has a support, which equals to 1.0
  • Reason for normalization
  • Ensure that the data is on the same scale so that
    sets of words that vary in the same way have
    similar support values.

4
Association between words
  • E.g. How to compute a meaningful normalized
    support for W1, W2?
  • One might think to sum-up the average normalized
    supports for W1 and W2.
  • s(W1,W2)
  • (0.40.33)/2 (0.40.5)/2 (0.20.17)/2
  • 1
  • This result is by no means an accident. Why?
  • Averaging is useless here.

5
Min-APRIORI
  • Use instead the min value of normalized support
    (frequencies).

Example s(W1,W2) min0.4, 0.33
min0.4, 0.5 min0.2, 0.17 0.9
s(W1,W2,W3) 0 0 0 0 0.17 0.17
6
Anti-monotone property of Support
Example s(W1) 0.4 0 0.4 0 0.2
1 s(W1, W2) 0.33 0 0.4 0 0.17
0.9 s(W1, W2, W3) 0 0 0 0 0.17 0.17
So, standard APRIORI algorithm can be applied.
Write a Comment
User Comments (0)
About PowerShow.com