Mining frequency counts from sensor set data - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

Mining frequency counts from sensor set data

Description:

Stock quotes. Closing prices of some HK stocks... 7.90. 28.10 ... Stock quotes. Intra-day stock price of TVB (0511) on 23rd June 2003 (Source: quamnet.com) ... – PowerPoint PPT presentation

Number of Views:55

Avg rating:3.0/5.0

Slides: 21

Provided by: looki

Category:

more less

Transcript and Presenter's Notes

Title: Mining frequency counts from sensor set data

1
Mining frequency counts from sensor set data

Loo Kin Kong
25th June 2003

2
Outline

Motivation
Sensor set data
Finding frequency counts of itemsets from sensor
set data
Future work

3
Stock quotes

Closing prices of some HK stocks

4
Stock quotes

Intra-day stock price of TVB (0511) on 23rd June
2003(Source quamnet.com)

5
Motivation

Fluctuation of the price of a stock may be
related to that of another stock or other
conditions
Online analysis tools can help to give more
insight on such variations
The case of stock market can be generalized...
We use sensors to monitor some conditions, for
example
We monitor the prices of stocks by getting
quotations from a finance website
We monitor the weather by measuring temperature,
humidity, air pressure, wind, etc.

6
Sensors

Properties of a sensor include
A sensor reports values, either spontaneously or
by request, reflecting the state of the condition
being monitored
Once a sensor reports a value, the value remains
valid until the sensor reports again
The lifespan of a value is defined as the length
of time when the value is valid
The value reported must be one of the possible
states of the condition
The set of all possible states of a sensor is its
state set

7
Sensor set data

A set of sensors (say, n of them) is called a
sensor set
At any time, we can obtain an n-tuple, which is
composed of the values of the n sensors, attached
with a time stamp vn)where t is the time when the n-tuple is
obtained vx is the value of the x-th sensor
If the n sensors have the same state set, we call
the sensor set homogeneous

8
Mining association rules from sensor set data

An association rule is a rule, satisfying certain
support and confidence restrictions, in the
form X ? Ywhere X and Y are two disjoint
itemsets
We redefine the support to reflect the time
factor in sensor set data supp(X) ?
lifespan(X) / length of history

9
Transformations of sensor-set data

The n-tuples need
transformation for finding frequent itemsets
Transformation 1
Each (zx, sy) pair, where zx is a sensor and sy a
state for zx, is treated as an item in
traditional association rule mining
Hence, the i-th n-tuple is transformed
as
vn) where ti is the timestamp of the i-th
n-tuple
Thus, association rules of the form (z1, s1),
(z2, v2), ..., (zn, vn) ? (zx, vx)can be
obtained

10
Transformations of sensor-set data

Transformation 2
Assuming a homogeneous sensor set, each s in the
state set is treated as an item in traditional
association rule mining
The i-th n-tuple is transformed as ti, (e1, s1), (e2, s2), ..., (em, sm) where
ti is the timestamp of the i-th n-tuple, ex is a
boolean value, showing whether the state sx
exists in the tuple
Thus, association rules of the form s1, s2,
..., sj ? skcan be obtained

11
The Lossy Counting (LC) Algorithm for items

User specifies the support threshold s and error
tolerance ?
Transactions of single item are conceptually kept
in buckets of size ?1/??
At the end of each bucket, counts smaller than
the error tolerance are discarded
Counts, kept in a data structure D, of items are
kept in the form (e, f, ?), where
e is the item
f is the frequency of e since the entry is
inserted in D
? is the maximum count of e before the entry is
added to D

12
The Lossy Counting (LC) Algorithm for items
D The set of all counts N Curr. len. of
stream e Transaction (of item) w Bucket
width b Current bucket id

D ? ? N ? 0
w ? ?1/?? b ? 1
e ? next transaction N ? N 1
if (e,f,?) exists in D do
f ? f 1
else do
insert (e,1,b-1) to D
endif
if N mod w 0 do
prune(D, b) b ? b 1
endif
Goto 3

13
The Lossy Counting (LC) Algorithm for items

function prune(D, b)
for each entry (e,f,?) in D do
if f ? ? b do
remove the entry from D
endif

14
The Lossy Counting (LC) Algorithm for itemsets

Transactions are kept in buckets
Multiple (say m) buckets are processed at a time.
The value m depends on the amount of memory
available
For each transaction E, essentially, every subset
of E is enumerated and treated as if an item in
LC algorithm for items

15
Extending the LC Algorithm for sensor-set data

We can extend the LC Algorithm for finding
approximate frequency counts of itemsets for SSD
Instead of using a fixed sized bucket, size of
which is determined by ?, we can use a bucket
which can hold an arbitrary number of
transactions
During the i-th bucket, when a count is inserted
to D, we set ? ? T1,i-1where Ti,j denotes
the total time elapsed since bucket i up to
bucket j
At the end of the i-th bucket, we prune D by
removing the counts such that ?f ? ?T1,i

16
Extending the LC Algorithm for sensor-set data
D The set of all counts N Curr. len. of
stream E Transaction (of itemset) w Bucket
width b Current bucket id

D ? ? N ? 0
w ? (user defined value) b ? 1
E ? next transaction N ? N 1
foreach subset e of E
if (e,f,?) exists in D do
f ? f 1
else do
insert (e,1, ? T1,b-1) to D
endif
if N mod w 0 do
prune(D, T1,b) b ? b 1
endif
Goto 3

17
Observations

The choice of w can affect the efficiency of the
algorithm
A small w may cause the pruning procedure being
invoked too frequently
A big w may cause that many transactions being
kept in the memory
It may be possible to derive a good w w.r.t. mean
lifespan of the transactions
If the lifespans of the transactions are short,
potentially we need to prune D frequently
Difference between adjacent transactions may be
little

18
Future work

Evaluate the efficiency of the LC Algorithm for
sensor-set data
Investigate how to exploit the observation that
adjacent transactions may be very similar

19
Q A
20
(No Transcript)

Write a Comment

User Comments (0)