Mining frequency counts from sensor set data - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Mining frequency counts from sensor set data

Description:

Stock quotes. Closing prices of some HK stocks... 7.90. 28.10 ... Stock quotes. Intra-day stock price of TVB (0511) on 23rd June 2003 (Source: quamnet.com) ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 21
Provided by: looki
Category:

less

Transcript and Presenter's Notes

Title: Mining frequency counts from sensor set data


1
Mining frequency counts from sensor set data
  • Loo Kin Kong
  • 25th June 2003

2
Outline
  • Motivation
  • Sensor set data
  • Finding frequency counts of itemsets from sensor
    set data
  • Future work

3
Stock quotes
  • Closing prices of some HK stocks

4
Stock quotes
  • Intra-day stock price of TVB (0511) on 23rd June
    2003(Source quamnet.com)

5
Motivation
  • Fluctuation of the price of a stock may be
    related to that of another stock or other
    conditions
  • Online analysis tools can help to give more
    insight on such variations
  • The case of stock market can be generalized...
  • We use sensors to monitor some conditions, for
    example
  • We monitor the prices of stocks by getting
    quotations from a finance website
  • We monitor the weather by measuring temperature,
    humidity, air pressure, wind, etc.

6
Sensors
  • Properties of a sensor include
  • A sensor reports values, either spontaneously or
    by request, reflecting the state of the condition
    being monitored
  • Once a sensor reports a value, the value remains
    valid until the sensor reports again
  • The lifespan of a value is defined as the length
    of time when the value is valid
  • The value reported must be one of the possible
    states of the condition
  • The set of all possible states of a sensor is its
    state set

7
Sensor set data
  • A set of sensors (say, n of them) is called a
    sensor set
  • At any time, we can obtain an n-tuple, which is
    composed of the values of the n sensors, attached
    with a time stamp vn)where t is the time when the n-tuple is
    obtained vx is the value of the x-th sensor
  • If the n sensors have the same state set, we call
    the sensor set homogeneous

8
Mining association rules from sensor set data
  • An association rule is a rule, satisfying certain
    support and confidence restrictions, in the
    form X ? Ywhere X and Y are two disjoint
    itemsets
  • We redefine the support to reflect the time
    factor in sensor set data supp(X) ?
    lifespan(X) / length of history

9
Transformations of sensor-set data
  • The n-tuples need
    transformation for finding frequent itemsets
  • Transformation 1
  • Each (zx, sy) pair, where zx is a sensor and sy a
    state for zx, is treated as an item in
    traditional association rule mining
  • Hence, the i-th n-tuple is transformed
    as
    vn) where ti is the timestamp of the i-th
    n-tuple
  • Thus, association rules of the form (z1, s1),
    (z2, v2), ..., (zn, vn) ? (zx, vx)can be
    obtained

10
Transformations of sensor-set data
  • Transformation 2
  • Assuming a homogeneous sensor set, each s in the
    state set is treated as an item in traditional
    association rule mining
  • The i-th n-tuple is transformed as ti, (e1, s1), (e2, s2), ..., (em, sm) where
    ti is the timestamp of the i-th n-tuple, ex is a
    boolean value, showing whether the state sx
    exists in the tuple
  • Thus, association rules of the form s1, s2,
    ..., sj ? skcan be obtained

11
The Lossy Counting (LC) Algorithm for items
  • User specifies the support threshold s and error
    tolerance ?
  • Transactions of single item are conceptually kept
    in buckets of size ?1/??
  • At the end of each bucket, counts smaller than
    the error tolerance are discarded
  • Counts, kept in a data structure D, of items are
    kept in the form (e, f, ?), where
  • e is the item
  • f is the frequency of e since the entry is
    inserted in D
  • ? is the maximum count of e before the entry is
    added to D

12
The Lossy Counting (LC) Algorithm for items
D The set of all counts N Curr. len. of
stream e Transaction (of item) w Bucket
width b Current bucket id
  • D ? ? N ? 0
  • w ? ?1/?? b ? 1
  • e ? next transaction N ? N 1
  • if (e,f,?) exists in D do
  • f ? f 1
  • else do
  • insert (e,1,b-1) to D
  • endif
  • if N mod w 0 do
  • prune(D, b) b ? b 1
  • endif
  • Goto 3

13
The Lossy Counting (LC) Algorithm for items
  • function prune(D, b)
  • for each entry (e,f,?) in D do
  • if f ? ? b do
  • remove the entry from D
  • endif

14
The Lossy Counting (LC) Algorithm for itemsets
  • Transactions are kept in buckets
  • Multiple (say m) buckets are processed at a time.
    The value m depends on the amount of memory
    available
  • For each transaction E, essentially, every subset
    of E is enumerated and treated as if an item in
    LC algorithm for items

15
Extending the LC Algorithm for sensor-set data
  • We can extend the LC Algorithm for finding
    approximate frequency counts of itemsets for SSD
  • Instead of using a fixed sized bucket, size of
    which is determined by ?, we can use a bucket
    which can hold an arbitrary number of
    transactions
  • During the i-th bucket, when a count is inserted
    to D, we set ? ? T1,i-1where Ti,j denotes
    the total time elapsed since bucket i up to
    bucket j
  • At the end of the i-th bucket, we prune D by
    removing the counts such that ?f ? ?T1,i

16
Extending the LC Algorithm for sensor-set data
D The set of all counts N Curr. len. of
stream E Transaction (of itemset) w Bucket
width b Current bucket id
  • D ? ? N ? 0
  • w ? (user defined value) b ? 1
  • E ? next transaction N ? N 1
  • foreach subset e of E
  • if (e,f,?) exists in D do
  • f ? f 1
  • else do
  • insert (e,1, ? T1,b-1) to D
  • endif
  • if N mod w 0 do
  • prune(D, T1,b) b ? b 1
  • endif
  • Goto 3

17
Observations
  • The choice of w can affect the efficiency of the
    algorithm
  • A small w may cause the pruning procedure being
    invoked too frequently
  • A big w may cause that many transactions being
    kept in the memory
  • It may be possible to derive a good w w.r.t. mean
    lifespan of the transactions
  • If the lifespans of the transactions are short,
    potentially we need to prune D frequently
  • Difference between adjacent transactions may be
    little

18
Future work
  • Evaluate the efficiency of the LC Algorithm for
    sensor-set data
  • Investigate how to exploit the observation that
    adjacent transactions may be very similar

19
Q A
20
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com