Optimal Distributed Declustering using Replication - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

Optimal Distributed Declustering using Replication

Description:

Declustering data over multiple disks to improve performance for range queries ... Golden Ratio Sequences (GRS) [Bhatia et al, 2000] ICDT 2005. 6. Other schemes ... – PowerPoint PPT presentation

Number of Views:27

Avg rating:3.0/5.0

Slides: 21

Provided by: nes6

Category:

Tags: declustering | distributed | golden | optimal | ratio | replication | using

Transcript and Presenter's Notes

Title: Optimal Distributed Declustering using Replication

1
Optimal Distributed Declustering using Replication

Keith Frikken
Purdue University
Jan 5, 2005

2
Declustering Data

Declustering data over multiple disks to improve
performance for range queries has been well
studied
Applications include
Spatio-temporal databases
Image and video data
Scientific simulation datasets

3
Goal

Divide data uniformly along dimensions to create
tiles
Put records contained in each tile on different
disks so that I/O can be parallelized
Assumptions
Data can be tiled in such a way
Disks have constant retrieval times
Assigning tiles to disks is similar to a coloring
problem (disks are colors)
A range query can be answered optimally if the
of I/O retrievals for any specific disk is ? of
tiles/ of disks?
Two approaches
Coloring schemes
Replication

4
Notations

k is number of disks
m is number of tiles in queries
r is level of replication (i.e., is 2)
Q is the set of all range queries
ret(q) is the actual retrieval time of q
Optimal retrieval time for a query q is
oq?m/k?
Additive error e, maxq?Qret(q)-oq

5
Coloring schemes

Disk Modulo (DM) Du and Sobolewski, 1982
Fieldwise XOR (FX) Kim and Pramanik, 1988
Cyclic Schemes (RPHM, GFIB, EXH) Prabhakar et
al, 1998
Golden Ratio Sequences (GRS) Bhatia et al,
2000

6
Other schemes

Atallah and Prabhakar, 2000 developed a scheme
in two dimensional grids for k2n disks the has
additive error of O(log k)
Sinha et al, 2001 proved lower bounds on the
additive error of ?(log k) and ?(log(d-1)/2 k)
for 2 dimensions and d (gt2) dimensions
respectively
Chen and Cheng, 2002 showed that an additive
error of O(log(d-1) k) is achievable for any of
dimensions (gt2)

7
Replication

Placing records on multiple disks can further
improve performance of declustering schemes
Two Problems
How to schedule a query (i.e., what tiles are
retrieved from each disk)
How to use replication to balance load
Approaches
Chained Declustering Hsiao and DeWitt, 1990
Random Duplication Allocation Sanders et al
2000, Sanders, 2001, and Czumaj and
Scheidler, 2003

8
Replication Results

Chained Declustering
Fast Scheduling Algorithm O(mk) time to test if
a specific retrieval time is possible Aerts et
al, 2000
RDA
If mck(log k) then optimal with high prob
Czumaj and Scheideler, 2003
Fast scheduling algorithm O(?kO(1)) time
Czumaj and Scheideler, 2003
Hybrid techniques Chen and Cheng, 2002
Use GRS with second random disk

9
Our Results

We define a new class of schemes called the shift
schemes
Deterministic
Any query with at least k(k-1)e tiles can be
answered in an optimal fashion
Queries can be scheduled in O(mk(log e)) time
If a single disk fails, then any query with at
least k(k-1)e tiles can be answered optimally
Experimental performance similar to RDA (better
for many cases)

10
Shift Scheme Definition

Use any strong coloring scheme
Use a modified chain declustering
Defined by shift value s (where gcd(s,k)1)
Base scheme is defined by function f(x,y)
Second color is (f(x,y)s mod k)

11
Shift Scheme Definition

Use any strong coloring scheme
Use a modified chain declustering
Defined by shift value s (where gcd(s,k)1)
Base scheme is defined by function f(x,y)
Second color is (f(x,y)s mod k)

0,3 1,4 2,0 3,1 4,2
2,0 3,1 4,2 0,3 1,4
4,2 0,3 1,4 2,0 3,1
1,4 2,0 3,1 4,2 0,3
3,1 4,2 0,3 1,4 2,0
12
Scheduling

Can use modification of chain declustering
scheduling algorithm to schedule queries in
O(mk(log e)) time
Essentially, use previous algorithm to test if a
specific load is possible and do a binary search
on the possible loads

13
Bound(1)

There are k disks (D0,,Dk-1)
Disk Di has ti tiles initially (as the primary
disk)
The number of tiles is mt0tk-1
Di shifts di tiles to Di1
di ti
The goal is to minimize the most tiles at a disk,
i.e., max0ik-1di-1ti-di

14
Bound(2)

Recall,
o?m/k?
max0ik-1ti oe
Suppose mk(k-1)e
Then,
o (k-1)e
Surplus ( ) is bounded by
(k-1)e
max0ik-1di (k-1)e o
Two cases
If disk has a surplus
If disk has a shortage

15
32 disks
16
64 disks
17
128 disks
18
32 disks, 3 dimensions
19
Generalizations

Permutations
Higher levels of replication
Survivability
If the level of replication is r, can handle any
r-1 failures
When r2, and a single disk fails then
Fast scheduling still possible
Large queries still optimal

20
Summary

Shift schemes are a new class of schemes
Optimal for large enough queries
Efficient scheduling algorithm
Resilient to disk failures
Future Work
Better analysis of scheme
Choosing shift values

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

The%20UC%20Berkeley%20ISTORE%20Project:%20bringing%20availability,%20maintainability,%20and%20evolutionary%20growth%20to%20storage-based%20clusters PowerPoint PPT Presentation

The%20UC%20Berkeley%20ISTORE%20Project:%20bringing%20availability,%20maintainability,%20and%20evolutionary%20growth%20to%20storage-based%20clusters - ISTORE is not one super-system that demonstrates all these techniques! at least, not yet ... Linux is paranoid and stops using a disk on any error ... | PowerPoint PPT presentation | free to view

Distributed Database research group PowerPoint PPT Presentation

Distributed Database research group - In the name of God Distributed Database research group Instructor: Dr. M. Rahgouzar Samira Tasharofi Reza Basseda Outline Introduction Distributed Data Storage ... | PowerPoint PPT presentation | free to view

Outline PowerPoint PPT Presentation

Outline - Offer a new language in which parallelism can be expressed or automatically inferred ... Low response time with intra-operation parallelism ... | PowerPoint PPT presentation | free to view

Very Large Dataset Access and Manipulation: Active Data Repository (ADR) DataCutter and MetaChaos PowerPoint PPT Presentation

Very Large Dataset Access and Manipulation: Active Data Repository (ADR) DataCutter and MetaChaos - Very Large Dataset Access and Manipulation: Active Data Repository (ADR) DataCutter and MetaChaos Joel Saltz University of Maryland, College Park | PowerPoint PPT presentation | free to view

Javier Jaen Martinez PowerPoint PPT Presentation

Javier Jaen Martinez - Javier Jaen Martinez CERN IT/PDP | PowerPoint PPT presentation | free to view

Contribution to the Design PowerPoint PPT Presentation

Contribution to the Design - Contribution to the Design & Implementation of the Highly Available Scalable and ... Tor Risch. Jury President: Pr. G rard L vy. Paris Dauphine University *CERIA Lab. ... | PowerPoint PPT presentation | free to view

Computers for the Post-PC Era PowerPoint PPT Presentation

Computers for the Post-PC Era - Computers for the Post-PC Era David Patterson University of California at Berkeley Patterson@cs.berkeley.edu UC Berkeley IRAM Group UC Berkeley ISTORE Group | PowerPoint PPT presentation | free to view

data parallelism PowerPoint PPT Presentation

data parallelism - renaissance: map-reduce etc. 1970's. 1980's. now. architectures. shared-memory. shared-disk ... low overhead (high system throughput) these are at odds ... | PowerPoint PPT presentation | free to view

Javier%20Jaen%20Martinez PowerPoint PPT Presentation

Javier%20Jaen%20Martinez - How are Farms evolving in non HEP environments? ... Dynamite. NQS. PBS. NQE. Condor. DNQS. DQS. Codine. Utopia. LSF. LHC - 28 September 1999 ... | PowerPoint PPT presentation | free to view

Computers for the Post-PC Era PowerPoint PPT Presentation

Computers for the Post-PC Era - Computers for the Post-PC Era David Patterson University of California at Berkeley Patterson@cs.berkeley.edu UC Berkeley IRAM Group UC Berkeley ISTORE Group | PowerPoint PPT presentation | free to view

Computers for the Post-PC Era PowerPoint PPT Presentation

Computers for the Post-PC Era - Computers for the Post-PC Era David Patterson University of California at Berkeley Patterson@cs.berkeley.edu UC Berkeley IRAM Group UC Berkeley ISTORE Group | PowerPoint PPT presentation | free to view

Spatial and Temporal Data Mining PowerPoint PPT Presentation

Spatial and Temporal Data Mining - Spatial and Temporal Data Mining Classification and Prediction Vasileios Megalooikonomou (based on notes by Jiawei Han and Micheline Kamber) Agenda What is ... | PowerPoint PPT presentation | free to view

Parallel%20Database%20Primer PowerPoint PPT Presentation

Parallel%20Database%20Primer - Parallel Database Primer Joe Hellerstein | PowerPoint PPT presentation | free to view

ISTORE Update PowerPoint PPT Presentation

ISTORE Update - Remains important, but its not SPECint 'Back to the Future: Time to Return to Longstanding. Problems in Computer Systems?' Keynote address, FCRC, ... | PowerPoint PPT presentation | free to view

Storage%20Bricks%20Jim%20Gray%20Microsoft%20Research%20http://Research.Microsoft.com/~Gray/talks%20FAST%202002%20Monterey,%20CA,%2029%20Jan%202002%20Acknowledgements:%20Dave%20Patterson%20explained%20this%20to%20me%20long%20ago%20%20%20%20%20%20Leonard%20Chung%20%20Kim%20Keeton%20%20%20%20%20%20%20%20Erik PowerPoint PPT Presentation

Storage%20Bricks%20Jim%20Gray%20Microsoft%20Research%20http://Research.Microsoft.com/~Gray/talks%20FAST%202002%20Monterey,%20CA,%2029%20Jan%202002%20Acknowledgements:%20Dave%20Patterson%20explained%20this%20to%20me%20long%20ago%20%20%20%20%20%20Leonard%20Chung%20%20Kim%20Keeton%20%20%20%20%20%20%20%20Erik - Moving to sheet metal ? The end of computers ? 7. It's Already True of Printers ... Music/Video/Photo appliance (home) Game appliance 'PC' File server appliance ... | PowerPoint PPT presentation | free to view

Computers for the Post-PC Era PowerPoint PPT Presentation

Computers for the Post-PC Era - Computers for the Post-PC Era David Patterson, Katherine Yelick University of California at Berkeley Patterson@cs.berkeley.edu UC Berkeley IRAM Group | PowerPoint PPT presentation | free to view

IRAM and ISTORE Projects PowerPoint PPT Presentation

IRAM and ISTORE Projects - www.eecs.berkeley.edu | PowerPoint PPT presentation | free to view

1 of 245 PowerPoint PPT Presentation

1 of 245 - 1 of 245 | PowerPoint PPT presentation | free to view

Parallel Database Primer PowerPoint PPT Presentation

Parallel Database Primer - ... the brain of all unnecessary work, a good notation sets it free to concentrate ... refinement: try to avoid repartitioning (query coloring) ... | PowerPoint PPT presentation | free to view

IRAM and ISTORE Projects PowerPoint PPT Presentation

IRAM and ISTORE Projects - ... Beck, Rich Fromm, Joe Gebis, Paul Harvey, Adam Janin, Dave Judd, Kimberly Keeton, ... Integrated processor in memory provides efficient access to high ... | PowerPoint PPT presentation | free to view

1. Database design process PowerPoint PPT Presentation

1. Database design process - Contextual design: designers get immersed in the workplace. ... apply in a top-down fashion, starting from high-level output and processes. Teuhola AdvDB-1 ... | PowerPoint PPT presentation | free to view

Parallel Database Primer PowerPoint PPT Presentation

Parallel Database Primer - Parallel Database Primer Joe Hellerstein Today Background: The Relational Model and you Meet a relational DBMS Parallel Query Processing: sort and hash-join We will ... | PowerPoint PPT presentation | free to view

The Hashing Approach to the Internet File System Problem PowerPoint PPT Presentation

The Hashing Approach to the Internet File System Problem - Release Consistency - Shared data are made consistent when a critical region ... IFS support Release cache Consistency model using Tokens and Acq/Rel protocol ... | PowerPoint PPT presentation | free to view

ECE 6160: Advanced Computer Networks Disk Arrays PowerPoint PPT Presentation

ECE 6160: Advanced Computer Networks Disk Arrays - Platter. Arm. Access time = seek time rotational delay transfer time overhead. seek time = 5-15 milliseconds to move the disk arm and settle on a cylinder ... | PowerPoint PPT presentation | free to view

CG096: Lecture 9 PowerPoint PPT Presentation

CG096: Lecture 9 - parallel read and partition (coarse radix sort), pipelined with parallel sorting ... Coarse Radix Sort. Radix sort. Based on binary representation. Compute ... | PowerPoint PPT presentation | free to view

Parallel Database Systems 101 Jim Gray PowerPoint PPT Presentation

Parallel Database Systems 101 Jim Gray - presented at VLDB 95, Zurich Switzerland, Sept 1995 ... WAN bandwidth approaching LANs. Exponential Growth: The past does not matter ... | PowerPoint PPT presentation | free to view