Mining surprising patterns using temporal description length presentation

About This Presentation

Transcript and Presenter's Notes

Title: Mining surprising patterns using temporal description length

1
Mining surprising patterns usingtemporal
description length

Soumen Chakrabarti (IIT Bombay)Sunita Sarawagi
(IIT Bombay)Byron Dom (IBM Almaden)

2
Market basket mining algorithms

Find prevalent rules that hold over large
fractions of data
Useful for promotions and store arrangement
Intensively researched

1990
Milk and cereal selltogether!
3
Prevalent ? Interesting

Analysts already know about prevalent rules
Interesting rules are those that deviate from
prior expectation
Minings payoff is in finding surprising phenomena

1995
Milk and cereal selltogether!
Milk and cereal selltogether!
4
What makes a rule surprising?

Does not match prior expectation
Correlation between milk and cereal remains
roughly constant over time

Cannot be trivially derived from simpler rules
Milk 10, cereal 10
Milk and cereal 10 surprising
Eggs 10
Milk, cereal and eggs 0.1 surprising!
Expected 1

5
Two views on data mining
Data
Data
Model of Analysts Knowledge of the Data
Mining Program
Mining Program
Discovery
Discovery
Analyst
6
Our contributions

A new notion of surprising patterns
Detect changes in correlation along time
Filter out steady, uninteresting correlations
Algorithms to mine for surprising patterns
Encode data into bit streams using two models
Surprise difference in number of bits needed
Experimental results
Demonstrate superiority over prevalent patterns

7
A simpler problem one item

Milk-buying habits modeled by biased coin
Customer tosses this coin to decide whether to
buy milk
Head or 1 denotes basket contains milk
Coin bias is Prmilk
Analyst wants to study Prmilk along time
Single coin with fixed bias is not interesting
Changes in bias are interesting

8
The coin segmentation problem

Players A and B
A has a set of coins with different biases
A repeatedly
Picks arbitrary coin
Tosses it arbitrary number of times
B observes H/T
Guesses transition points and biases

Return
Pick
A
Toss
B
9
How to explain the data

Given n head/tail observations
Can assume n different coins with bias 0 or 1
Data fits perfectly (with probability one)
Many coins needed
Or assume one coin
May fit data poorly
Best explanation is a compromise

5/7
1/3
1/4
10
Coding examples

Sequence of k zeroes
Naïve encoding takes k bits
Run length takes about log k bits
1000 bits, 10 randomly placed 1s, rest 0s
Posit a coin with bias 0.01
Data encoding cost is (Shannons theorem)

11
How to find optimal segments
Sequence of 17 tosses
Derived graph with 18 nodes
Data cost for Prhead 5/7, 5 heads, 2 tails
Edge cost model cost data cost
Model cost one node ID one Prhead
12
Approximate shortest path

Suppose there are T tosses
Make T1? chunks each with T? nodes(tune ?)
Find shortest paths within chunks
Some nodes are chosen in each chunk
Solve a shortest path with all chosen nodes

13
Two or more items

Unconstrained segmentation
k items induce a 2k sided coin
milk and cereal 11, milk, not cereal 10,
neither 00, etc.
Shortest path finds significant shift in any of
the coin face probabilities
Problem some of these shifts may be completely
explained by lower order marginal

14
Example

Drop in joint sale of milk and cereal is
completely explained by drop in sale of milk
Prmilk cereal / (Prmilk Prcereal) remains
constant over time
Call this ratio ?

15
Constant-? segmentation
Observed support
Independence

Compute global ? over all time
All coins must have this common value of ?
Segment by constrained optimization
Compare with unconstrained coding cost

16
Is all this really needed?

Simpler alternative
Aggregate data into suitable time windows
Compute support, correlation, ?, etc. in each
window
Use variance threshold to choose itemsets
Pitfalls
Choices windows, thresholds
May miss fine detail
Over-sensitive to outliers

17
but no simpler
Smoothing leads to an estimated trend that
isdescriptive rather than analytic or
explanatory. Because it is not based on an
explicit probabilisticmodel, the method cannot
be treated rigorously in terms of mathematical
statistics. The Statistical Analysis of Time
Series T. W. Anderson
18
Experiments

2.8 million baskets over 7 years, 1987-93
15800 items, average 2.62 items per basket
Two algorithms
Complete MDL approach
MDL segmentation statistical tests (MStat)
Anecdotes
MDL effective at penalizing obvious itemsets

19
Quality of approximation
20
Little agreement in itemset ranks

Simpler methods do not approximate MDL

21
MDL has high selectivity

Score of best itemsets stand out from the rest
using MDL

22
Three anecdotes

? against time
High MStat score
Small marginals
Polo shirt shorts
High correlation
Small variation
Bedsheets pillow cases
High MDL score
Significant gradual drift
Mens womens shorts

23
Conclusion

New notion of surprising patterns based on
Joint support expected from marginals
Variation of joint support along time
Robust MDL formulation
Efficient algorithms
Near-optimal segmentation using shortest path
Pruning criteria
Successful application to real data

Write a Comment

User Comments (0)

About PowerShow.com

Mining surprising patterns using temporal description length PowerPoint PPT Presentation