A Survey of External Memory Algorithms and Data Structures PowerPoint PPT Presentation

presentation player overlay
1 / 45
About This Presentation
Transcript and Presenter's Notes

Title: A Survey of External Memory Algorithms and Data Structures


1
A Survey of External Memory Algorithms and Data
Structures
  • Presented by Reynold Cheng
  • Feb 10, 2003

2
Internal vs. External Memory
  • Internal memory (RAM) not sufficient to store
    data sets in large applications
  • External memory (e.g., disks) used to store data
    for the algorithm
  • The I/O between fast internal memory and slow
    external memory is a performance bottleneck.

3
Virtual Memory and I/O Performance
  • Virtual Memory in OS
  • Provides one uniform address space
  • Principle of Locality
  • Caching and prefetching improves performance
  • Designed to be General-purpose
  • Doesnt help if computations are inherently
    nonlocal
  • Results in Large amounts of I/O and poor
    performance

4
EM Algorithms and Data Structures
  • Incorporate locality directly into algorithm
    design
  • Bypass the virtual memory system
  • EM Algorithms and Data Structures
  • explicitly manage data placement and movement
    among memory hierarchy
  • I/O communication between internal memory (RAM)
    and external memory (magnetic disk)

5
Two Problem Categories
  • Batched Problems
  • No preprocessing is done
  • Process entire file of data items
  • Stream data through internal memory in 1 passes
  • Online Problems
  • Immediate response to queries
  • Only small portion of data is examined for each
    query
  • Usually organize data items in hierarchical index
  • Data can be static or dynamic

6
Talk Outline
  • Modeling Parallel Disks
  • Design Goal of EM Algorithms
  • Striping and Load Balancing
  • Batched Problems
  • Algorithms for External Sorting
  • Online Problems
  • Hash tables and B-trees
  • Conclusions

7
Parallel Disk Model (PDM)
8
Disk Block Size B
  • Rotational Latency and seek time are long
  • Improve average search time by transferring data
    in blocks
  • Track size 50 200 KB
  • Batched applications B a little larger than
    track size ( 100 KB)
  • Online Applications (pointer-based indexes) B
    8 KB
  • Further improve performance by parallel disks

9
Data placement and disk Striping
  • We assume the input data are striped across D
    disks.
  • D 5, B 2
  • Items 12 and 13 stored in 2nd block (stripe 1)
    of disk D1
  • N items read/written in O(N/DB) O(n/D) I/Os
    optimal

10
Performance measures in PDM
  • Number of I/O operations
  • Disk space used (utilization)
  • Ideally data structures should use linear space,
    i.e., O(N/B) O(n) disk blocks
  • Internal (sequential/parallel) computation time
  • Focus only on (1) and (2)
  • Most algorithms described run in optimal CPU time

11
Fundamental I/O Bounds (1)
  • I/O performance often expressed in terms of the
    bounds for
  • Scan(N) Sequential read/write N items
  • Sort(N) Sorting N items
  • Search(N) Searching N sorted items
  • Output(Z) Outputting Z answers to a query in a
    blocked output-sensitive fashion

12
Fundamental I/O Bounds (2)
  • Scan(N) and Sort(N) apply to batched problems
  • Search(N) and Output(Z) apply to online problems
  • Typically combined in the form Search(N)
    Output (Z)

13
Optimal I/O Bounds
14
Comments on I/O Bounds (1)
  • Scan(N) O(n/D) linear of I/Os
  • Nontrival batched problems like permuting, matrix
    transposing and list ranking need nonlinear I/Os
  • those can be solved in linear time in internal
    memory

15
Comments on I/O Bounds (2)
  • Multiple-disk bounds for Scan(N), Sort(N) and
    Output(Z) are D times smaller than single-disk
    bounds
  • For Search(N), the speedup is less significant
  • (logBN) for D 1 becomes T(logDBN) for D 1
  • Speedup T((logBN)/logDBN)
    T((log DB) / log B)
    T(1 (log D) / log B) lt 2

16
Locality I/O Performance
  • Key to achieve efficient I/O access data with
    high degree of locality
  • Single Disk Each I/O transfers a block of B
    items optimal when all B items are needed
  • Multiple Disks Each I/O transfers D blocks
    (stripe) optimal when all DB items are needed

17
Single Disk Locality
  • Many batched problems like sorting requires O(N
    log N) CPU time
  • If data set doesnt fit into RAM, relying on
    virtual memory may need O(N log n) I/O!
  • Goal Incorporate locality in algorithm design to
    achieve O(n logmn) I/Os

18
Design Goal of EM Algorithm
  • For batched problems,
  • N and Z terms of the I/O bounds of the naïve
    algorithms are replaced by n and z
  • Base of the logarithm terms 2 ? m
  • For online problems,
  • Base of the logarithm terms 2 ? B
  • Z ? z
  • Speedup significant, e.g., for batched problems,
    (N log n) / n logmn B log m

19
Multiple Disk Locality
  • Disk Striping I/Os are permitted on entire
    stripes, one stripe at a time

20
Single-Disk to Multiple-Disk (1)
  • Net Effect of disk striping behave as a single
    disk with logical block size DB
  • Apply disk striping paradigm to automatically
    convert
  • algorithm for single disk with block size DB
  • ? algorithm for D disks with block size B

21
Single-Disk to Multiple-Disk (2)
  • Example Single-disk algorithm for searching
    requires T(logBN) I/Os.
  • Using striping, we obtain a multiple-disk
    algorithm by substituting DB for B
  • T(logDBN) I/Os
  • Disk striping is very powerful can be used to
    get optimal multiple-disk algorithms from optimal
    single-disk algorithms for
  • streaming, online search and output reporting

22
Disk Striping and Sorting
  • Disk striping is not optimal for sorting!
  • I/O Bound for disk-striping
  • Optimal I/O Bound
  • Striping larger than optimal bound by

23
Sorting with Multiple Disks
  • To attain optimal sorting bound, we need to
    forsake disk striping
  • Control disks independently
  • Average/Worst case cost
  • Algorithms are based on distribution and merge
    paradigms
  • Online load-balancing distribute data in D
    disks evenly for access

24
Distribution Sort with D Disks
S 7 using 6 partitioning elements
1
2
5
4
3
6
7
  • Uses S-1 partitioning elements to partition items
    into S disjoint buckets
  • Items in bucket i Items in bucket j for i j
  • Sort the buckets recursively
  • Concatenate the sorted buckets

25
Partitioning Elements
  • Choose S-1 partitioning elements so that buckets
    sizes are roughly equal, using ?(n/D) I/Os
  • Bucket sizes decrease by a factor of ?(S)
    O(logSn) levels of recursion.
  • Maximum S ?(m), so minimum of recursion
    levels is O(logmn).
  • Deterministic methods exist for choosing S?m
    partitioning elements.

26
I/O bound for Distribution Sort
  • If each level of recursion uses ?(n/D) I/Os,
    Number of I/Os
  • Each set of items being partitioned is itself one
    bucket formed in previous recursion level.
  • The blocks of a bucket should be spread evenly
    for the next read, so that
  • All D disks are involved for reading from a
    bucket at the same time.

27
Randomized Distribution Sort VS94
  • Goal With high probability, each bucket is well
    balanced across D disks
  • If N is so large that number of blocks per bucket
    ?(n/S) is ?(D log D), then write D blocks in
    independent random order to a disk stripe.

28
Classical Occupancy Problem
  • b balls are inserted independently and uniformly
    at random into d bins.
  • If b/d grows faster than log d, the largest bin
    contains b/d balls on average ? Even distribution

29
Distribution Sort for the case?(n/S) ? ?(D log
D)
  • The previous technique breaks down when ?(n/S) ?
    ?(D log D)
  • A typical memoryload contains S log S blocks
    ? well-balanced among S buckets
  • 1st pass read file in one pass, one memoryload
    at a time. Permute and write to disk.
  • 2nd pass Extract a part from each of several
    memory to form a typical memoryload.
  • Output each memoryload by a round-robin placement
    of S buckets on D disks.

30
Merge Sort with D Disks
  • Orthogonal to the distribution paradigm
  • Run formation scan n blocks of data, one
    memoryload (m blocks) each time. Sort each load
    and output on stripes.
  • N/M n/m sorted runs
  • Merging Scan and merge groups of R ?(m) runs
  • passes logR(n/m) logmn - 1

31
I/O bound for Merge Sort
  • If each pass uses ?(n/D) I/Os,
  • Number of I/Os
  • Must bring in next ?(D) blocks on average at each
    step
  • Ensure that the blocks used for merging reside on
    different disks

32
Simple Randomized Mergesort BV99
  • Each run is striped starting at a randomly chosen
    disk.
  • At any time, the disk containing the leading
    block of any run is uniformly random.

D 4 disks, R 8 runs
33
Cyclic Occupancy Problem
  • Conjecture The expected maximum bin size is at
    most that of the
  • classical occupancy problem.

34
External Hashing for Online Dictionary Search
  • Advantage of hashing expected number of probes
    per operation is constant, independent of N
  • Goal develop dynamic EM structures that can
    adapt smoothly to widely varying values of N
  • Directory method extendible hashing
  • 2 I/Os for directory data access
  • Directory-less method linear hashing
  • 1 I/O only may require overflow lists

35
Multiway Tree
  • Items in a tree are sorted ? the tree can be used
    for 1D range search
  • To find items in x,y search x, inorder
    traversal from x to y
  • Arise naturally in online settings updates and
    queries are processed immediately
  • To exploit block transfers, the balanced multiway
    B-tree was proposed most widely used nontrival
    EM data structure.

36
B-Trees and B-Trees
  • In B-tree, all items are stored in leaves
  • Internal nodes only store key values and
    pointers higher branching factor
  • Leaves are linked together sequentially to
    facilitate range queries
  • or sequential access
  • B-tree The most popular variant of B-tree

37
B-Trees
  • Postpone splitting by sharing the nodes data
    with one of adjacent siblings
  • Split node only when sibling is also full, and
    evenly redistribute data among node and siblings
  • Reduces the number of times new nodes created
  • Higher storage utilization, lower search I/O
    costs
  • Average utilization of nodes from 69 ? 81

38
The Buffer Tree ARGE95
  • Each insertion operation takes O(logBN) I/Os
  • Building a B-Tree
  • Repeated insertion ? O(N logBN) I/Os!
  • We can take advantage of blocking and obtain O(n
    logmn) I/Os
  • Buffer tree ARGE95

39
Main Idea of Buffer Tree
  • Logically group nodes together and add buffers.
  • Balanced tree Degree ?(m) rather than ?(B)
  • Each node has a buffer for storing ?(M) items
    (?(m) blocks)
  • Insertions done lazily items inserted into
    buffers.
  • When a buffer is full, its items are pushed one
    level down.

40
I/O Cost of Buffer Trees
  • Buffer-emptyng in O(m) I/Os
  • Amortize the cost of distributing M items to ?(m)
    children
  • Each item incurs an amortized cost of
    O(m/M)O(1/B) I/Os per level
  • The resulting cost for update/query is only
    O((1/B)logmn) I/Os amortized
  • Cost of inserting N nB items
  • nB O((1/B)logmn) nlogmn I/Os

41
Conclusions (1)
  • Internal memory algorithms are not readily
    adapted for external memory
  • EM algorithms are developed to handle I/O
    communications more efficiently
  • The most fundamental I/O bounds are scanning,
    sorting, searching, outputting
  • The goal of EM algorithms is try to achieve these
    optimal bounds

42
Conclusions (2)
  • Striping is optimal for scanning, searching and
    outputting, but not sorting.
  • Randomized distribution sort and merge sort can
    achieve optimal I/O bound.
  • Hash tables for online dictionary search
  • B-trees best for 1D online range search
  • Buffer trees improves the speed of B-tree
    construction

43
Other Interesting Topics
  • Handling duplicates during sorting
  • Permutation, Fast Fourier Transform (FFT),
    computational geometry and graphs
  • Multi-dimensional data, R-trees and range queries
  • Dynamic data Structures, moving objects, strings
  • EM algorithm development tools (e.g., TPIE)

44
References
  • JSV2001 J. S. Vitter. External Memory
    Algorithms and Data Structures Dealing with
    MASSIVE DATA, ACM Computing Surveys, 33(2), June
    2001, 209-271.
  • JSV1998 J.S. Vitter. External Memory
    Algorithms in an invited tutorial in Proceedings
    of the 17th Annual ACM Symposium on Principles of
    Database Systems (PODS '98), Seattle, WA, June
    1998.
  • VS94 Vitter, J.S. and Shriver, E. A. M. 1994.
    Algorithms for parallel memory I Two-level
    memories. Algorithmica 12, 23, 110147.

45
References
  • BV99 Barve,R.D. and Vitter, J. S. 1999. A
    simple and efficient parallel disk mergesort. In
    Proceedings of the ACM Symposium on Parallel
    Algorithms and Architectures (St. Malo, France,
    June), Vol. 11, 232241.
  • BY96 Baeza-Yates, R. 1996. Bounded disorder
    The effect of the index. Theoretical Computer
    Science 168, 2138.
  • ARGE95 Arge, L. 1995. The buffer tree A new
    technique for optimal I/O-algorithms. In
    Proceedings of the Workshop on Algorithms and
    Data Structures, Vol. 955 of Lecture Notes in
    Computer Science, Springer-Verlag, 334345.
Write a Comment
User Comments (0)
About PowerShow.com