Title: Multidimensional Databases
1Multidimensional Databases
- Challenge representation for efficient storage,
indexing querying - Examples (time-series, images)
- New multidimensional data sets approaches
- Graphs (e.g., road networks)
- Immersidata (e.g., haptic)
- User profiles aggregation/clustering
2Challenges
- Storing multidimensional data (matrix vs.
relations) - Indexing multidimensional data (R-tree)
- Queries
- Search for similar objects (similarity
search)ICDE00,ICME00 - Spatial and temporal queries IDEAS00,ACM-GIS01,
KAIS02 - Multidimensional data mining
- Aggregation EDBT02,PODS02
- ClusteringACM-MMj02
- Classification INFORMS02
- Finding outliers SSDBM01
3Stock Prices
S1
Sn
4More Similarity Search Clustering
C
Shapes ICDE99 ICME00
5On-Line Analytical Processing (OLAP)
Market-Relation
- Multidimensional data sets
- Dimension attributes (e.g., Store, Product, Date)
- Measure attributes (e.g., Sale, Price)
- Range-sum queries
- Average sale of shoes in CA in 2001
- Number of jackets sold in Seattle in Sep. 2001
- Tougher queries
- Covariance of sale and price of jackets in CA in
2001 (correlation) - Variance of price of jackets in 2001 in Seattle
Store Location
Date
Sale
Product
Price
LA Shoes Jan. 01 21,500 85.99
NY Jacket June 01 28,700 45.99
. . .
. . .
. . .
. . .
. . .
Avg (sale)
s(d ltingt 2001)
Too Slow!
s(s ltingt CA)
s(pshoe)
Market-Relation
6Example Solution (Pre-computation) Prefix-sum
Agrawal et. al 1997
Salary
Age Salary
100k
120k
150k
40k
55k
65k
- 50k
- 55k
- 58k
- 100k
- 130k
- 57 120k
0
25
40
Age
50
60
- Issues
- Measure attribute should be pre-selected
- Aggregation function should be pre-selected
(sum or count) - Updates are expensive (need re-computation)
80
Result I II III IV
7Spatial Temporal Data
Complex Queries
ACM-GIS01, VLDB01
- Data types
- A point ltlatitude, longitude, altitudegt or ltx,
y, zgt - A line-segment ltx1, y1, x2, y2gt
- A line sequence of line-segments
- A region A closed set of lines
- Moving point ltx, y, tgt (e.g., car, train, )
- Changing region ltregion, value, tgt (e.g.,
changing temperature of a county)
- Queries
- Rivers ltintersectgt Countries
- Hospitals ltingt Cities
- Taxi ltwithingt 5km of Home
- ltin the nextgt 10 min
- Experiments ltoverlapgt BrainR
Visual99
8Spatial Temporal Data Queries
- Data types
- A point ltlatitude, longitude, altitudegt or ltx,
y, zgt - A line-segment ltx1, y1, x2, y2gt
- A line sequence of line-segments
- A region A closed set of lines
- Moving point ltx, y, tgt (e.g., objects, car,
train, ) - Queries
- Molecules ltintersectgt Microbes
- Train-stations ltingt Cities
- Round objects ltwithingt 5cm of Hand ltin the nextgt
10 s - Number of distractions in ltsouth-eastgt of subject
9Spatial Temporal Data Queries
- K Nearest Neighbor queries find the k nearest
objects to a query point (5 closest hospitals to
my car)
10Immersidata and Mining Queries
CIKM01, UACHI01
11Immersidata and Mining Queries
A dynamic sign, e.g., ASL colors
Subject-1
12User Profiles Clustering Offline Processes
PPED Similarity Measure and Clustering
Favorite Features (Rock High Classical Low Po
p Low Rap High)
Voting
13User Profiles Clustering Online Processes
Current Users Profile