Adapting to Source Properties in Processing Data Integration Queries - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Adapting to Source Properties in Processing Data Integration Queries

Description:

Computer Science Faculty, System Architecture, Database Technology ... Model-driven developement of ... Y. Halevy, Daniel S. Weld. SIGMOD 2004. 9/14/09 ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 12
Provided by: rainerg
Category:

less

Transcript and Presenter's Notes

Title: Adapting to Source Properties in Processing Data Integration Queries


1
Adapting to Source Properties in Processing Data
Integration Queries
Computer Science Faculty, System Architecture,
Database Technology Group
Oberseminar 09.05.2007 Matthias Böhm
2
Gliederung
  • Current Work
  • Project "DIPBench"
  • (Data Intensive Integration Process Benchmark)
  • Project "GCIP"
  • (Generation of Complex Integration Processes)
  • Model-driven developement of integration
    processes
  • Inter-System optimization of integration
    processes
  • Intra-System optimization of integration
    processes
  • Papersession
  • "Adapting to Source Properties in Processing Data
    Integration Queries "
  • Zachary G. Ives, Alon Y. Halevy, Daniel S. Weld
  • SIGMOD 2004

3
Project "DIPBench"
  • DWH scenario
  • Schema definitions
  • Integration process types
  • Execution schedule
  • scale factors
  • datasize
  • time
  • Execution tools
  • Initializer, Client, Monitor
  • Reference implementationexperiments
  • VDBMS
  • EAI-Server

4
Project "GignoMDA-GCIP"
  • Model-driven developement of integration
    processes
  • MDA for EII and EAI
  • Inter-System Process Optimization
  • Selection of optimal integration system for a set
    of integration processes
  • Intra-System Process Optimization
  • adaptive optimization of A-PSM (MTM)
    representations
  • rule-based process rewriting
  • workload-based process rewriting
  • using generic representations for workload,
    statistics and hints

5
Papersession 7 Related Work
  • Pipelined Hash Join (aka Double-Pipelined Join)
  • symmetric
  • non-blocking
  • memory-consuming
  • foreach arrivingtupel (P1 P2)
  • 1) Probing
  • 2) Building

Joinfidflight
P1
P2
6
Papersession 7 Related Work
  • Query Scrambling 27
  • addresses minimization of delays, while accessing
    remote data
  • response-time optimization

Normal Execution
Sources delayed?
  • Query Scrambling
  • Phase 1 Rescheduling
  • Phase 2 Operator Synthesis
  • Pair
  • Include Delayed (IN)
  • Estimated Delay (ED)

Sources still delayed?
Sources responded?
7
Papersession 7 Related Work
  • Bushy-Tree Enumeration
  • addresses the parallel computation of joins
  • exploit the network delay, ...
  • enumeration algorithm determines the join-order

Sequenced-Tree Enumeration(Left- /
Right-Deep-Tree)
Bushy-Tree Enumeration
Thread 3
D
Thread 1
Thread 2
C
B
A
B
C
D
A
8
Papersession 7 Related Work
  • Dynamic Rescheduling 25
  • Rate-based Pipeline Scheduling
  • Importance-based Tupel Regulation (SIT Selective
    Input Processing)
  • Plan Partitioning Techniques
  • Choose Nodes 6
  • Dynamic plans
  • Materialization points / Choose-Plan operator
  • Mid-Query Optimization 19
  • query plans annotated with statistics
  • Collection of statistics / query plan
    modification during runtime
  • Inter-Query-Adaptive Techniques
  • LEO DB2 Learning Optimizer 23
  • Adaptive Selectivity Estimation 5
  • SIT Statistics on query expressions 3

9
Papersession 2 Adaptive Data Partitioning
  • Algebraic key property
  • distribute relational UNION through PSJ
    operations
  • If R1 R11 ? R12, (vertical partitioning)
  • R2 R21 ? R22
  • then R1 ? R2 (R11 ? R12) ? (R21 ? R22)
  • (R11 ? R21)
  • ? (R12 ? R22)
  • ? (R11 ? R22)
  • ? (R12 ? R21)

Union over all join combinations of partitions
R11
R21
R1
R2
R12
R22
10
Papersession 5 Exploiting Order
  • Complementary Join Pair

Stichup-phase
Joinfidflight, Sorted-Merge Join
Joinfidflight, Pipelined Hash Join
no
no
yes
yes
Is join-attribute in sequence?
11
Papersession 6 Pre-Aggregation
  • Adjustable-window pre-aggregation operator
  • pipelined pre-aggregation
  • potential paralism
  • adapting join selectivities
  • if current window is effective, increase next
    window size
  • if current window is not effective, reduce next
    window size (until 1)
  • used before
  • Join
  • Final-Aggregation

Joinflight fk_flight
PRE-AGGfk_flight, windowsize
...
Write a Comment
User Comments (0)
About PowerShow.com