Title: Mining frequency counts from sensor set data
1Mining frequency counts from sensor set data
- Loo Kin Kong
- 25th June 2003
2Outline
- Motivation
- Sensor set data
- Finding frequency counts of itemsets from sensor
set data - Future work
3Stock quotes
- Closing prices of some HK stocks
4Stock quotes
- Intra-day stock price of TVB (0511) on 23rd June
2003(Source quamnet.com)
5Motivation
- Fluctuation of the price of a stock may be
related to that of another stock or other
conditions - Online analysis tools can help to give more
insight on such variations - The case of stock market can be generalized...
- We use sensors to monitor some conditions, for
example - We monitor the prices of stocks by getting
quotations from a finance website - We monitor the weather by measuring temperature,
humidity, air pressure, wind, etc.
6Sensors
- Properties of a sensor include
- A sensor reports values, either spontaneously or
by request, reflecting the state of the condition
being monitored - Once a sensor reports a value, the value remains
valid until the sensor reports again - The lifespan of a value is defined as the length
of time when the value is valid - The value reported must be one of the possible
states of the condition - The set of all possible states of a sensor is its
state set
7Sensor set data
- A set of sensors (say, n of them) is called a
sensor set - At any time, we can obtain an n-tuple, which is
composed of the values of the n sensors, attached
with a time stamp vn)where t is the time when the n-tuple is
obtained vx is the value of the x-th sensor - If the n sensors have the same state set, we call
the sensor set homogeneous
8Mining association rules from sensor set data
- An association rule is a rule, satisfying certain
support and confidence restrictions, in the
form X ? Ywhere X and Y are two disjoint
itemsets - We redefine the support to reflect the time
factor in sensor set data supp(X) ?
lifespan(X) / length of history
9Transformations of sensor-set data
- The n-tuples need
transformation for finding frequent itemsets - Transformation 1
- Each (zx, sy) pair, where zx is a sensor and sy a
state for zx, is treated as an item in
traditional association rule mining - Hence, the i-th n-tuple is transformed
as
vn) where ti is the timestamp of the i-th
n-tuple - Thus, association rules of the form (z1, s1),
(z2, v2), ..., (zn, vn) ? (zx, vx)can be
obtained
10Transformations of sensor-set data
- Transformation 2
- Assuming a homogeneous sensor set, each s in the
state set is treated as an item in traditional
association rule mining - The i-th n-tuple is transformed as ti, (e1, s1), (e2, s2), ..., (em, sm) where
ti is the timestamp of the i-th n-tuple, ex is a
boolean value, showing whether the state sx
exists in the tuple - Thus, association rules of the form s1, s2,
..., sj ? skcan be obtained
11The Lossy Counting (LC) Algorithm for items
- User specifies the support threshold s and error
tolerance ? - Transactions of single item are conceptually kept
in buckets of size ?1/?? - At the end of each bucket, counts smaller than
the error tolerance are discarded - Counts, kept in a data structure D, of items are
kept in the form (e, f, ?), where - e is the item
- f is the frequency of e since the entry is
inserted in D - ? is the maximum count of e before the entry is
added to D
12The Lossy Counting (LC) Algorithm for items
D The set of all counts N Curr. len. of
stream e Transaction (of item) w Bucket
width b Current bucket id
- D ? ? N ? 0
- w ? ?1/?? b ? 1
- e ? next transaction N ? N 1
- if (e,f,?) exists in D do
- f ? f 1
- else do
- insert (e,1,b-1) to D
- endif
- if N mod w 0 do
- prune(D, b) b ? b 1
- endif
- Goto 3
13The Lossy Counting (LC) Algorithm for items
- function prune(D, b)
- for each entry (e,f,?) in D do
- if f ? ? b do
- remove the entry from D
- endif
14The Lossy Counting (LC) Algorithm for itemsets
- Transactions are kept in buckets
- Multiple (say m) buckets are processed at a time.
The value m depends on the amount of memory
available - For each transaction E, essentially, every subset
of E is enumerated and treated as if an item in
LC algorithm for items
15Extending the LC Algorithm for sensor-set data
- We can extend the LC Algorithm for finding
approximate frequency counts of itemsets for SSD - Instead of using a fixed sized bucket, size of
which is determined by ?, we can use a bucket
which can hold an arbitrary number of
transactions - During the i-th bucket, when a count is inserted
to D, we set ? ? T1,i-1where Ti,j denotes
the total time elapsed since bucket i up to
bucket j - At the end of the i-th bucket, we prune D by
removing the counts such that ?f ? ?T1,i
16Extending the LC Algorithm for sensor-set data
D The set of all counts N Curr. len. of
stream E Transaction (of itemset) w Bucket
width b Current bucket id
- D ? ? N ? 0
- w ? (user defined value) b ? 1
- E ? next transaction N ? N 1
- foreach subset e of E
- if (e,f,?) exists in D do
- f ? f 1
- else do
- insert (e,1, ? T1,b-1) to D
- endif
- if N mod w 0 do
- prune(D, T1,b) b ? b 1
- endif
- Goto 3
17Observations
- The choice of w can affect the efficiency of the
algorithm - A small w may cause the pruning procedure being
invoked too frequently - A big w may cause that many transactions being
kept in the memory - It may be possible to derive a good w w.r.t. mean
lifespan of the transactions - If the lifespans of the transactions are short,
potentially we need to prune D frequently - Difference between adjacent transactions may be
little
18Future work
- Evaluate the efficiency of the LC Algorithm for
sensor-set data - Investigate how to exploit the observation that
adjacent transactions may be very similar
19Q A
20(No Transcript)