Analytical Data Mining for Stream Data Analysis

About This Presentation

Title:

Analytical Data Mining for Stream Data Analysis

Description:

closed data cubing, computing cuboids cells consisting of only closed cells (on ... cubing tradeoff between size complexity and efficient computation ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 21

Provided by: alfaDi

Category:

more less

Transcript and Presenter's Notes

Title: Analytical Data Mining for Stream Data Analysis

1
Department of Informatics, University of Minho
Braga, 22 de February de 2006
Analytical Data Mining for Stream Data Analysis
Ronnie Alves Orlando Belo ronnie,obelo_at_di.umin
ho.pt http//alfa.di.uminho.pt/ronnie
Department of Informatics University of
Minho PORTUGAL
2
outlines

motivation
analytical data mining
cube -gt lattice of cuboids
main issues
first efforts (on 2005)
current work
final discussion
research agenda

3
motivation

emerging applications
such as sensor networks, telecommunications, web,
power supply, stock exchange, have to handle
various data streams
data streams characteristics
continuous, ordered, changing, fast, huge amount

4
motivation

most stream data are at pretty low-level or
multi-dimensional in nature, needs multi-level
and multi-dimensional processing
analysts want to see changes, trends, unusual
patterns, at reasonable levels of details

5
motivation

stream data analysis
query approximations
data mining
on-line analytical processing (cube-based)
keywords
multi-dimensional, trends, unusual patterns

6
analytical data mining

analytical data mining, combine ideas of
cube-based algorithms with data mining functions
we want to provide a set of analytical data
mining methods to reveal exceptional and trend
patterns over data streams
cubing while mining or mining while cubing

7
cube -gt lattice of cuboids
all
0-D(apex) cuboid
time
item
location
supplier
1-D cuboids
time,location
item,location
location,supplier
2-D cuboids
time,supplier
item,supplier
time,location,supplier
3-D cuboids
item,location,supplier
time,item,supplier
4-D(base) cuboid
8
main issues

(issue 1) given such characteristics of stream
data, is it feasible to compute such data cube,
since its size is usually much bigger than the
original data set, and its construction may take
multiple database scans? curse of
dimensionality
Online Analytical Processing Stream Data Is It
Feasible? (DMKD02)
(issue 2) how to detect abnormal changes of
cuboids cells, since on-line mining of the
changes is one of the core issues is stream data
analysis?

9
main issues

compared to the history
(issue 3) what are the distinct features of the
current status?
(issue 4) what are the relatively factors over
time?
on-line mining of changes(SIGMOD03)

10
first efforts (on 2005)

itemset mining is a core problem in many data
mining tasks
it can be used as a building block for more
complex data mining process and also for
computing data cubes

11
first efforts (on 2005)

pattern-growth SQL-extensions (one dimension)
(EPIA05)
inter-transactional rules (two dimensions
distance measures) (JISBD05)
cube-based mining method for multi-dimensional
associations rules (n-dimensions, incremental and
multi-level) (miuda project)
industrial projects on telecommunications and
retail (real testbed)

12
current work

iceberg data cubing, computing only cuboids cells
above minimum support threshold (curse of
dimensionality remains)
closed data cubing, computing cuboids cells
consisting of only closed cells (on dense
databases, cuboids will be too large)
maximal data cubing, computing cuboids cells
consisting of only maximal cells (pure maximal,
loose aggregates info)

13
current work

real data applications have dense and correlated
databases
can we develop an algorithm which captures
maximal correlated cuboids cells on dense
databases?
we propose m3c-cubing

14
current work

Let the measure be count, the iceberg be count
2 and the correlated value 3CV 0.85. Then c1
(a1,b1,c1, 3) and c2 (a1,,, 4) are
closed cells c1 is a maximal cell c3
(a1,b1,, 3) and c4 (,b1,c1, 3) are
covered by c1 but c4 has a correlated exception
(3CV1) c4 (a2,b2,c2,d4 1) does not satisfy
the iceberg constraint. Therefore, c1 and c4 are
maximal correlated cuboids cells

c1 (a1,b1,c1, 3)
c4 (,b1,c1, 3)
15
current work

we provide a interesting measure which disclose
true correlation (also dependence) relationship
among cuboids cells (inspired by all_confidence)
the computation of cuboids is guided by a m3cTree
(based on SetEnumemeration tree)
the m3cTree is traversed by using a pure
depth-first order

16
current work

several pruning strategies are proposed for
reducing the search space
data cube computing is optimized by performing
shared partitions and caching intermediate
aggregations

17
final discussion

cubing tradeoff between size complexity and
efficient computation
high performance data cube computing is critical
to analytical data mining

18
final discussion

the challenge could be how to share computation
and explore optimization
further studies must to deal with statistical
aspects, proper constraints, data mining and data
cubing functions, tilted time window frame

19
research agenda
1st quarter
2nd quarter
3rd quarter
4th quarter
1
2
2005 2006 2007 2008
2
3
4
5
5
6
6

activities
area background
cube-based mining
exceptional patterns
on-line changes
analytical data mining
thesis writing

past
future
20
Department of Informatics, University of Minho
Braga, 22 de February de 2006
Analytical Data Mining for Stream Data Analysis
Ronnie Alves Orlando Belo ronnie,obelo_at_di.umin
ho.pt http//alfa.di.uminho.pt/ronnie
Department of Informatics University of
Minho PORTUGAL

Write a Comment

User Comments (0)