Mining surprising patterns using temporal description length - PowerPoint PPT Presentation

About This Presentation
Title:

Mining surprising patterns using temporal description length

Description:

Choices: windows, thresholds. May miss fine detail. Over-sensitive to outliers. 17 ... T. W. Anderson. 18. Experiments. 2.8 million baskets over 7 years, 1987-93 ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 24
Provided by: sou59
Category:

less

Transcript and Presenter's Notes

Title: Mining surprising patterns using temporal description length


1
Mining surprising patterns usingtemporal
description length
  • Soumen Chakrabarti (IIT Bombay)Sunita Sarawagi
    (IIT Bombay)Byron Dom (IBM Almaden)

2
Market basket mining algorithms
  • Find prevalent rules that hold over large
    fractions of data
  • Useful for promotions and store arrangement
  • Intensively researched

1990
Milk and cereal selltogether!
3
Prevalent ? Interesting
  • Analysts already know about prevalent rules
  • Interesting rules are those that deviate from
    prior expectation
  • Minings payoff is in finding surprising phenomena

1995
Milk and cereal selltogether!
Milk and cereal selltogether!
4
What makes a rule surprising?
  • Does not match prior expectation
  • Correlation between milk and cereal remains
    roughly constant over time
  • Cannot be trivially derived from simpler rules
  • Milk 10, cereal 10
  • Milk and cereal 10 surprising
  • Eggs 10
  • Milk, cereal and eggs 0.1 surprising!
  • Expected 1

5
Two views on data mining
Data
Data
Model of Analysts Knowledge of the Data
Mining Program
Mining Program
Discovery
Discovery
Analyst
6
Our contributions
  • A new notion of surprising patterns
  • Detect changes in correlation along time
  • Filter out steady, uninteresting correlations
  • Algorithms to mine for surprising patterns
  • Encode data into bit streams using two models
  • Surprise difference in number of bits needed
  • Experimental results
  • Demonstrate superiority over prevalent patterns

7
A simpler problem one item
  • Milk-buying habits modeled by biased coin
  • Customer tosses this coin to decide whether to
    buy milk
  • Head or 1 denotes basket contains milk
  • Coin bias is Prmilk
  • Analyst wants to study Prmilk along time
  • Single coin with fixed bias is not interesting
  • Changes in bias are interesting

8
The coin segmentation problem
  • Players A and B
  • A has a set of coins with different biases
  • A repeatedly
  • Picks arbitrary coin
  • Tosses it arbitrary number of times
  • B observes H/T
  • Guesses transition points and biases

Return
Pick
A
Toss
B
9
How to explain the data
  • Given n head/tail observations
  • Can assume n different coins with bias 0 or 1
  • Data fits perfectly (with probability one)
  • Many coins needed
  • Or assume one coin
  • May fit data poorly
  • Best explanation is a compromise

5/7
1/3
1/4
10
Coding examples
  • Sequence of k zeroes
  • Naïve encoding takes k bits
  • Run length takes about log k bits
  • 1000 bits, 10 randomly placed 1s, rest 0s
  • Posit a coin with bias 0.01
  • Data encoding cost is (Shannons theorem)

11
How to find optimal segments
Sequence of 17 tosses
Derived graph with 18 nodes
Data cost for Prhead 5/7, 5 heads, 2 tails
Edge cost model cost data cost
Model cost one node ID one Prhead
12
Approximate shortest path
  • Suppose there are T tosses
  • Make T1? chunks each with T? nodes(tune ?)
  • Find shortest paths within chunks
  • Some nodes are chosen in each chunk
  • Solve a shortest path with all chosen nodes

13
Two or more items
  • Unconstrained segmentation
  • k items induce a 2k sided coin
  • milk and cereal 11, milk, not cereal 10,
    neither 00, etc.
  • Shortest path finds significant shift in any of
    the coin face probabilities
  • Problem some of these shifts may be completely
    explained by lower order marginal

14
Example
  • Drop in joint sale of milk and cereal is
    completely explained by drop in sale of milk
  • Prmilk cereal / (Prmilk Prcereal) remains
    constant over time
  • Call this ratio ?

15
Constant-? segmentation
Observed support
Independence
  • Compute global ? over all time
  • All coins must have this common value of ?
  • Segment by constrained optimization
  • Compare with unconstrained coding cost

16
Is all this really needed?
  • Simpler alternative
  • Aggregate data into suitable time windows
  • Compute support, correlation, ?, etc. in each
    window
  • Use variance threshold to choose itemsets
  • Pitfalls
  • Choices windows, thresholds
  • May miss fine detail
  • Over-sensitive to outliers

17
but no simpler
Smoothing leads to an estimated trend that
isdescriptive rather than analytic or
explanatory. Because it is not based on an
explicit probabilisticmodel, the method cannot
be treated rigorously in terms of mathematical
statistics. The Statistical Analysis of Time
Series T. W. Anderson
18
Experiments
  • 2.8 million baskets over 7 years, 1987-93
  • 15800 items, average 2.62 items per basket
  • Two algorithms
  • Complete MDL approach
  • MDL segmentation statistical tests (MStat)
  • Anecdotes
  • MDL effective at penalizing obvious itemsets

19
Quality of approximation
20
Little agreement in itemset ranks
  • Simpler methods do not approximate MDL

21
MDL has high selectivity
  • Score of best itemsets stand out from the rest
    using MDL

22
Three anecdotes
  • ? against time
  • High MStat score
  • Small marginals
  • Polo shirt shorts
  • High correlation
  • Small variation
  • Bedsheets pillow cases
  • High MDL score
  • Significant gradual drift
  • Mens womens shorts

23
Conclusion
  • New notion of surprising patterns based on
  • Joint support expected from marginals
  • Variation of joint support along time
  • Robust MDL formulation
  • Efficient algorithms
  • Near-optimal segmentation using shortest path
  • Pruning criteria
  • Successful application to real data
Write a Comment
User Comments (0)
About PowerShow.com