CS257 Presentations - PowerPoint PPT Presentation

1 / 494
About This Presentation
Title:

CS257 Presentations

Description:

Slide 1 – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 495
Provided by: sanuja
Category:

less

Transcript and Presenter's Notes

Title: CS257 Presentations


1
CS257 Presentations
  • Submitted By
  • Vikas Vittal Rao
  • Class ID 227/124
  • Student ID 005885269
  • CS257
  • Dr. T.Y. Lin

2
Sections 13.1 13.3
Secondary storage management
  • Sanuja Dabade Eilbroun Benjamin CS 257 Dr.
    TY Lin

3
13.1.1 Memory Hierarchy
  • Data storage capacities varies for different data
  • Cost per byte to store data also varies
  • Device with smallest capacity offer the fastest
    speed with highest cost per bit

4
Memory Hierarchy Diagram
  • Programs,
    DBMS
  • Main Memory DBMSs

Tertiary Storage
As Visual Memory Disk
File System
Main Memory
Cache
5
13.1.1 Memory Hierarchy
  • Cache
  • Lowest level of the hierarchy
  • Data items are copies of certain locations of
    main memory
  • Sometimes, values in cache are changed and
    corresponding changes to main memory are delayed
  • Machine looks for instructions as well as data
    for those instructions in the cache
  • Holds limited amount of data
  • No need to update the data in main memory
    immediately in a single processor computer
  • In multiple processors data is updated
    immediately to main memory.called as write
    through

6
Main Memory
  • Refers to physical memory that is internal to the
    computer. The word main is used to distinguish it
    from external mass storage devices such as disk
    drives.
  • Everything happens in the computer i.e.
    instruction execution, data manipulation, as
    working on information that is resident in main
    memory
  • Main memories are random access.one can obtain
    any byte in the same amount of time

7
Secondary storage
  • Used to store data and programs when they are not
    being processed
  • More permanent than main memory, as data and
    programs are retained when the power is turned
    off
  • A personal computer might only require 20,000
    bytes of secondary storage
  • E.g. magnetic disks, hard disks

8
Tertiary Storage
  • consists of anywhere from one to several storage
    drives.
  • It is a comprehensive computer storage system
    that is usually very slow, so it is usually used
    to archive data that is not accessed frequently.
  • Holds data volumes in terabytes
  • Used for databases much larger than what can be
    stored on disk

9
13.1.2 Transfer of Data Between levels
  • Data moves between adjacent levels of the
    hierarchy
  • At the secondary or tertiary levels accessing the
    desired data or finding the desired place to
    store the data takes a lot of time
  • Disk is organized into bocks
  • Entire blocks are moved to and from memory called
    a buffer
  • A key technique for speeding up database
    operations is to arrange the data so that when
    one piece of data block is needed it is likely
    that other data on the same block will be needed
    at the same time
  • Same idea applies to other hierarchy levels

10
13.1.3 Volatile and Non Volatile Storage
  • A volatile device forgets what data is stored on
    it after power off
  • Non volatile holds data for longer period even
    when device is turned off
  • Secondary and tertiary devices are non volatile
  • Main memory is volatile

11
13.1.4 Virtual Memory
  • computer system technique which gives an
    application program the impression that it has
    contiguous working memory (an address space),
    while in fact it may be physically fragmented and
    may even overflow on to disk storage
  • technique make programming of large applications
    easier and use real physical memory (e.g. RAM)
    more efficiently
  • Typical software executes in virtual memory
  • Address space is typically 32 bit or 232 bytes or
    4GB
  • Transfer between memory and disk is in terms of
    blocks

12
13.2.1 Mechanism of Disk
  • Mechanisms of Disks
  • Use of secondary storage is one of the important
    characteristic of DBMS
  • Consists of 2 moving pieces of a disk
  • 1. disk assembly
  • 2. head assembly
  • Disk assembly consists of 1 or more platters
  • Platters rotate around a central spindle
  • Bits are stored on upper and lower surfaces of
    platters

13
13.2.1 Mechanism of Disk
  • Disk is organized into tracks
  • The track that are at fixed radius from center
    form one cylinder
  • Tracks are organized into sectors
  • Tracks are the segments of circle separated by
    gap

14
(No Transcript)
15
13.2.2 Disk Controller
  • One or more disks are controlled by disk
    controllers
  • Disks controllers are capable of
  • Controlling the mechanical actuator that moves
    the head assembly
  • Selecting the sector from among all those in the
    cylinder at which heads are positioned
  • Transferring bits between desired sector and main
    memory
  • Possible buffering an entire track

16
13.2.3 Disk Access Characteristics
  • Accessing (reading/writing) a block requires 3
    steps
  • Disk controller positions the head assembly at
    the cylinder containing the track on which the
    block is located. It is a seek time
  • The disk controller waits while the first sector
    of the block moves under the head. This is a
    rotational latency
  • All the sectors and the gaps between them pass
    the head, while disk controller reads or writes
    data in these sectors. This is a transfer time

17
13.3 Accelerating Access to Secondary Storage
  • Secondary storage definition
  • Several approaches for more-efficiently accessing
    data in secondary storage
  • Place blocks that are together in the same
    cylinder.
  • Divide the data among multiple disks.
  • Mirror disks.
  • Use disk-scheduling algorithms.
  • Prefetch blocks into main memory.
  • Scheduling Latency added delay in accessing
    data caused by a disk scheduling algorithm.
  • Throughput the number of disk accesses per
    second that the system can accommodate.

18
13.3.1 The I/O Model of Computation
  • The number of block accesses (Disk I/Os) is a
    good time approximation for the algorithm.
  • Disk I/os proportional to time taken, so should
    be minimized.
  • Ex 13.3 You want to have an index on R to
    identify the block on which the desired tuple
    appears, but not where on the block it resides.
  • For Megatron 747 (M747) example, it takes 11ms to
    read a 16k block.
  • delay in searching for the desired tuple is
    negligible.

19
13.3.2 Organizing Data by Cylinders
  • First seek time and first rotational latency can
    never be neglected.
  • Ex 13.4 We request 1024 blocks of M747.
  • If data is randomly distributed, average latency
    is 10.76ms by Ex 13.2, making total latency 11s.
  • If all blocks are consecutively stored on 1
    cylinder
  • 6.46ms 8.33ms 16 139ms
  • (1 average seek) (time per rotation) ( rotations)

20
13.3.3 Using Multiple Disks
  • Number of disks is proportional to the factor by
    which performance is performance will increase by
    improved
  • Striping distributing a relation across
    multiple disks following this pattern
  • Data on disk R1 R1, R1n, R12n,
  • Data on disk R2 R2, R2n, R22n,
  • Data on disk Rn Rn, Rnn, Rn2n,
  • Ex 13.5 We request 1024 blocks with n 4.
  • 6.46ms (8.33ms (16/4)) 39.8ms
  • (1 average seek) (time per rotation) ( rotations)

21
13.3.4 Mirroring Disks
  • Mirroring Disks having 2 or more disks hold
    identical copy of data.
  • Benefit 1 If n disks are mirrors of each other,
    the system can survive a crash by n-1 disks.
  • Benefit 2 If we have n disks, read performance
    increases by a factor of n.
  • Performance increases gtincreasing efficiency

22
13.3.5 Disk Scheduling and the Elevator
Problem
  • Disk controller will run this algorithm to select
    which of several requests to process first.
  • Pseudo code
  • requests // array of all non-processed data
    requests
  • upon receiving new data request
  • requests.add(new request)
  • while(requests is not empty)
  • move head to next location
  • if(head is at data in requests)
  • retrieves data
  • removes data from requests
  • if(head reaches end)
  • reverses head direction

23
13.3.5 Disk Scheduling and the Elevator
Problem (cont)
Events Head starting point Request data at
8000 Request data at 24000 Request data at
56000 Get data at 8000 Request data at 16000 Get
data at 24000 Request data at 64000 Get data at
56000 Request Data at 40000 Get data at 64000 Get
data at 40000 Get data at 16000
64000
56000
48000
Current time

Current time
0
Current time
4.3
Current time
10
Current time
13.6
Current time
20
Current time
26.9
Current time
30
Current time
34.2
Current time
45.5
Current time
56.8
40000
32000
24000
16000
8000
data time






data time
8000.. 4.3





data time
8000.. 4.3
24000.. 13.6




data time
8000.. 4.3
24000.. 13.6
56000.. 26.9



data time
8000.. 4.3
24000.. 13.6
56000.. 26.9
64000.. 34.2


data time
8000.. 4.3
24000.. 13.6
56000.. 26.9
64000.. 34.2
40000.. 45.5

data time
8000.. 4.3
24000.. 13.6
56000.. 26.9
64000.. 34.2
40000.. 45.5
16000.. 56.8
24
13.3.5 Disk Scheduling and the Elevator
Problem (cont)
Elevator Algorithm
FIFO Algorithm
data time
8000.. 4.3
24000.. 13.6
56000.. 26.9
64000.. 34.2
40000.. 45.5
16000.. 56.8
data time
8000.. 4.3
24000.. 13.6
56000.. 26.9
16000.. 42.2
64000.. 59.5
40000.. 70.8
25
13.3.6 Prefetching and Large-Scale Buffering
  • If at the application level, we can predict the
    order blocks will be requested, we can load them
    into main memory before they are needed.
  • This even reduces the cost and even save the time

26
13.4.Disk Failures
  • Intermittent Error Read or write is
    unsuccessful.
  • If we try to read the sector but the correct
    content of that sector is not delivered to the
    disk controller. Check for the good or bad
    sector. To check write is correct Read is
    performed. Good sector and bad sector is known by
    the read operation.
  • Checksums Each sector has some additional bits,
    called the checksums. They are set on the
    depending on the values of the data bits stored
    in that sector. Probability of reading bad sector
    is less if we use checksums. For Odd parity Odd
    number of 1s, add a parity bit 1. For Even
    parity Even number of 1s, add a parity bit 0.
    So, number of 1s becomes always even.

27
  • Example
  • 1. Sequence 01101000-gt odd no of 1s
  • parity bit 1 -gt 011010001
  • 2. Sequence 111011100-gteven no of 1s
  • parity bit 0 -gt 111011100
  • Stable -Storage Writing Policy
  • To recover the disk failure known as Media
    Decay, in which if we overwrite a file, the new
    data is not read correctly. Sectors are paired
    and each pair is said to be X, having left and
    right copies as Xl and Xr respectively and check
    the parity bit of left and right by substituting
    spare sector of Xl and Xr until the good value is
    returned.

28
  • The term used for these strategies is RAID or
    Redundant Arrays of Independent Disks.
  • Mirroring
  • Mirroring Scheme is referred as RAID level 1
    protection against data loss scheme. In this
    scheme we mirror each disk. One of the disk is
    called as data disk and other redundant disk. In
    this case the only way data can be lost is if
    there is a second disk crash while the first
    crash is being repaired.
  • Parity Blocks
  • RAID level 4 scheme uses only one redundant disk
    no matter how many data disks there are. In the
    redundant disk, the ith block consists of the
    parity checks for the ith blocks of all the data
    disks. It means, the jth bits of all the ith
    blocks of both data disks and redundant disks,
    must have an even number of 1s and redundant
    disk bit is used to make this condition true.

29
  • Failures If out of Xl and Xr, one fails, it can
    be read form other, but in case both fails X is
    not readable, and its probability is very small
  • Write Failure During power outage,
  • 1. While writing Xl, the Xr, will remain good
    and X can be read from Xr
  • 2. After writing Xl, we can read X from Xl, as
    Xr may or may not have the correct copy of X.
  • Recovery from Disk Crashes
  • To reduce the data loss by Dish crashes, schemes
    which involve redundancy, extending the idea of
    parity checks or duplicate sectors can be
    applied.

30
  • Parity Block Writing
  • When we write a new block of a data disk, we
    need to change that block of the redundant disk
    as well.
  • One approach to do this is to read all the disks
    and compute the module-2 sum and write to the
    redundant disk.
  • But this approach requires n-1 reads of data,
    write a data block and write of redundant disk
    block.
  • Total n1 disk I/Os
  • RAID 5
  • RAID 4 is effective in preserving data unless
    there are two simultaneous disk crashes.
  • Error-correcting codes theory known as Hamming
    code leads to the RAID level 6.
  • By this strategy the two simultaneous crashes are
    correctable.
  • The bits of disk 5 are the modulo-2 sum of the
    corresponding bits of disks 1, 2, and 3.
  • The bits of disk 6 are the modulo-2 sum of the
    corresponding bits of disks 1, 2, and 4.

31
  • Whatever scheme we use for updating the disks, we
    need to read and write the redundant disk's
    block. If there are n data disks, then the number
    of disk writes to the redundant disk will be n
    times the average number of writes to any one
    data disk.
  • However we do not have to treat one disk as the
    redundant disk and the others as data disks.
    Rather, we could treat each disk as the redundant
    disk for some of the blocks. This improvement is
    often called RAID level 5.

32
13.5 Arranging data on disk
  • Data elements are represented as records, which
    stores in consecutive bytes in same same disk
    block.
  • Basic layout techniques of storing data
  • Fixed-Length Records
  • Allocation criteria - data should start at word
    boundary.
  • Fixed Length record header
  • 1. A pointer to record schema.
  • 2. The length of the record.
  • 3. Timestamps to indicate last modified or
    last read.

33
  • Example
  • CREATE TABLE employee(
  • name CHAR(30) PRIMARY KEY,
  • address VARCHAR(255),
  • gender CHAR(1),
  • birthdate DATE
  • )
  • Data should start at word boundary and contain
    header and four fields name, address, gender and
    birthdate.

34
  • Packing Fixed-Length Records into Blocks
  • Records are stored in the form of blocks on the
    disk and they move into main memory when we need
    to update or access them.
  • A block header is written first, and it is
    followed by series of blocks.
  • Block header contains the following information
  • Links to one or more blocks that are part of a
    network of blocks.
  • Information about the role played by this block
    in such a network.
  • Information about the relation, the tuples in
    this block belong to.

35
  • A "directory" giving the offset of each record
    in the block.
  • Time stamp(s) to indicate time of the block's
    last modification and/or access
  • Along with the header we can pack as many record
    as we can
  • Along with the header we can pack as many record
    as we can
  • in one block as shown in the figure and remaining
    space will
  • be unused.

36
13.6 Representing Block and Record Addresses
  • Address of a block and Record
  • In Main Memory
  • Address of the block is the virtual memory
    address of the first byte
  • Address of the record within the block is the
    virtual memory address of the first byte of the
    record
  • In Secondary Memory sequence of bytes describe
    the location of the block in the overall system
  • Sequence of Bytes describe the location of the
    block the device Id for the disk, Cylinder
    number, etc.

37
  • Addresses in Client-Server Systems
  • The addresses in address space are represented in
    two ways
  • Physical Addresses byte strings that determine
    the place within the secondary storage system
    where the record can be found.
  • Logical Addresses arbitrary string of bytes of
    some fixed length
  • Physical Address bits are used to indicate
  • Host to which the storage is attached
  • Identifier for the disk
  • Number of the cylinder
  • Number of the track
  • Offset of the beginning of the record

38
  • Map Table relates logical addresses to physical
    addresses.

Logical Physical



39
  • Logical and Structured Addresses
  • Purpose of logical address?
  • Gives more flexibility, when we
  • Move the record around within the block
  • Move the record to another block
  • Gives us an option of deciding what to do when a
    record is deleted?
  • Pointer Swizzling
  • Having pointers is common in an
    object-relational database systems
  • Important to learn about the management of
    pointers
  • Every data item (block, record, etc.) has two
    addresses
  • database address address on the disk
  • memory address, if the item is in virtual memory

40
  • Translation Table Maps database address to
    memory address
  • All addressable items in the database have
    entries in the map table, while only those items
    currently in memory are mentioned in the
    translation table

Dbaddr Mem-addr



41
  • Pointer consists of the following two fields
  • Bit indicating the type of address
  • Database or memory address
  • Example 13.17

Memory
Disk
Swizzled
Block 1
Block 1
Unswizzled
Block 2
42
  • Example 13.7
  • Block 1 has a record with pointers to a second
    record on the same block and to a record on
    another block
  • If Block 1 is copied to the memory
  • The first pointer which points within Block 1 can
    be swizzled so it points directly to the memory
    address of the target record
  • Since Block 2 is not in memory, we cannot swizzle
    the second pointer
  • Three types of swizzling
  • Automatic Swizzling
  • As soon as block is brought into memory, swizzle
    all relevant pointers.

43
  • Swizzling on Demand
  • Only swizzle a pointer if and when it is actually
    followed.
  • No Swizzling
  • Pointers are not swizzled they are accesses using
    the database address.
  • Unswizzling
  • When a block is moved from memory back to disk,
    all pointers must go back to database (disk)
    addresses
  • Use translation table again
  • Important to have an efficient data structure for
    the translation table

44
  • Pinned records and Blocks
  • A block in memory is said to be pinned if it
    cannot be written back to disk safely.
  • If block B1 has swizzled pointer to an item in
    block B2, then B2 is pinned
  • Unpin a block, we must unswizzle any pointers to
    it
  • Keep in the translation table the places in
    memory holding swizzled pointers to that item
  • Unswizzle those pointers (use translation table
    to replace the memory addresses with database
    (disk) addresses

45
13.7 Records With Variable-Length Fields
  • A simple but effective scheme is to put all fixed
    length
  • fields ahead of the variable-length fields. We
    then place
  • in the record header
  • 1. The length of the record.
  • 2. Pointers to (i.e., offsets of) the beginnings
    of all the variable-length fields. However, if
    the variable-length fields always appear in the
    same order then the first of them needs no
    pointer we know it immediately follows the
    fixed-length fields.

46
  • Records With Repeating Fields
  • A similar situation occurs if a record contains a
    variable
  • number of Occurrences of a field F, but the field
    itself is of
  • fixed length. It is sufficient to group all
    occurrences of field F
  • together and put in the record header a pointer
    to the first.
  • We can locate all the occurrences of the field F
    as follows.
  • Let the number of bytes devoted to one instance
    of field F be
  • L. We then add to the offset for the field F all
    integer
  • multiples of L, starting at 0, then L, 2L, 3L,
    and so on.
  • Eventually, we reach the offset of the field
    following F.
  • Where upon we stop.

47
  • An alternative representation is to keep the
    record of fixed length, and put the variable
    length portion - be it fields of variable length
    or fields that repeat an indefinite number of
    times - on a separate block. In the record itself
    we keep
  • 1. Pointers to the place where each repeating
    field begins, and
  • 2. Either how many repetitions there are, or
    where the repetitions end.

48
  • Variable-Format Records
  • The simplest representation of variable-format
    records is a sequence of tagged fields, each of
    which consists of
  • 1. Information about the role of this field, such
    as
  • (a) The attribute or field name,
  • (b) The type of the field, if it is not
    apparent from the field name and some readily
    available schema information, and
  • (c) The length of the field, if it is not
    apparent from the type.
  • 2. The value of the field.
  • There are at least two reasons why tagged fields
    would make sense.

49
  • Information integration applications - Sometimes,
    a relation has been constructed from several
    earlier sources, and these sources have different
    kinds of information For instance, our movie star
    information may have come from several sources,
    one of which records birthdates, some give
    addresses, others not, and so on. If there are
    not too many fields, we are probably best off
    leaving NULL those values we do not know.
  • 2. Records with a very flexible schema - If many
    fields of a record can repeat and/or not
    appear at all, then even if we know the schema,
    tagged fields may be useful. For instance,
    medical records may contain information about
    many tests, but there are thousands of possible
    tests, and each patient has results for
    relatively few of them

50
  • These large values have a variable length, but
    even if the length is fixed for all values of the
    type, we need to use some special techniques to
    represent these values. In this section we shall
    consider a technique called spanned records"
    that can be used to manage records that are
    larger than blocks.
  • Spanned records also are useful in situations
    where records are smaller than blocks, but
    packing whole records into blocks wastes
    significant amounts of space.
  • For both these reasons, it is sometimes
    desirable to allow records to be split across two
    or more blocks. The portion of a record that
    appears in one block is called a record fragment.
  • If records can be spanned, then every record and
    record fragment requires some extra header
    information

51
  • 1. Each record or fragment header must contain a
    bit telling whether or not it is a fragment.
  • 2. If it is a fragment, then it needs bits
    telling whether it is the first or last fragment
    for its record.
  • 3. If there is a next and/or previous fragment
    for the same record, then the fragment needs
    pointers to these other fragments.
  • Storing spanned records across blocks

52
  • BLOBS
  • Binary, Large OBjectS BLOBS
  • BLOBS can be images, movies, audio files and
    other very large values that can be stored in
    files.
  • Storing BLOBS
  • Stored in several blocks.
  • Preferable to store them consecutively on a
    cylinder or multiple disks for efficient
    retrieval.
  • Retrieving BLOBS
  • A client retrieving a 2 hour movie may not want
    it all at the same time.
  • Retrieving a specific part of the large data
    requires an index structure to make it efficient.
    (Example An index by seconds on a movie BLOB.)

53
  • Column Stores
  • An alternative to storing tuples as records is to
    store each column as a record. Since an entire
    column of a relation may occupy far more than a
    single block, these records may span many block,
    much as long as files do. If we keep the values
    in each column in the same order then we can
    reconstruct the relation from column records

54
13.8
  • Insertion
  • Insertion of records without order
  • Records can be placed in a block with
    empty space or in a new block.
  • Insertion of records in fixed order
  • Space available in the block
  • No space available in the block (outside the
    block)
  • Structured address
  • Pointer to a record from outside the block.
  • Insertion in fixed order
  • Space available within the block
  • Use of an offset table in the header of each
    block with pointers to the location of each
    record in the block.

55
  • The records are slid within the block and the
    pointers in the offset table are adjusted.
  • No space available within the block (outside the
    block)
  • Find space on a nearby block.
  • In case of no space available on a block, look at
    the following block in sorted order of blocks.
  • If space is available in that block ,move the
    highest records of first block 1 to block 2 and
    slide the records around on both blocks.
  • Create an overflow block
  • Records can be stored in overflow block.
  • Each block has place for a pointer to an overflow
    block in its header. The overflow block can point
    to a second overflow block as shown below.

56
  • Deletion
  • Recover space after deletion
  • When using an offset table, the records can be
    slid around the block so there will be an unused
    region in the center that can be recovered.
  • In case we cannot slide records, an available
    space list can be maintained in the block header.
  • The list head goes in the block header and
    available regions hold the links in the list.
  • Use of tombstone
  • The tombstone is placed in a record in order to
    avoid pointers to the deleted record to point to
    new records.
  • The tombstone is permanent until the entire
    database is reconstructed.

57
  • If pointers go to fixed locations from which the
    location of the record is found then we put the
    tombstone in that fixed location. (See examples)
  • Where a tombstone is placed depends on the nature
    of the record pointers.
  • Map table is used to translate logical record
    address to physical address.
  • UPDATING RECORDS
  • For Fixed-Length Records, there is no effect on
    the storage system
  • For variable length records
  • If length increases, like insertion slide the
    records
  • If length decreases, like deletion we update the
    space-available list, recover the space/eliminate
    the overflow blocks.

58
Query Execution
Chapter 15 Section 15.1 Presented by Khadke,
Suvarna CS 257 (Section II) Id 213
58
59
Agenda
  • What is query processing?
  • Query Processor and major parts of Query
    processor
  • Physical-Query-Plan Operators
  • Scanning Tables
  • Basic approaches to locate the tuples of a
    relation R
  • Sorting While Scanning Tables
  • Computation Model for Physical Operator
  • I/O Cost for Scan Operators
  • Iterators

59
60
What is query processing?
A given SQL query is translated by the query
processor into a low level execution plan An
execution plan is a program in a functional
language The physical relational algebra,
specific for each DBMS. The physical
relational algebra extends the relational
algebra with Primitives to search through
the internal data structures of the DBMS
61
What is a Query Processor
  • Group of components of a DBMS that converts a
    user queries and data-modification commands into
    a sequence of database operations
  • It also executes those operations
  • Must supply detail regarding how the query is to
    be executed

61
62
Major parts of Query processor
  • Query Execution
  • The algorithms that manipulate the data of the
    database.
  • Focus on the operations of extended relational
    algebra.

62
63
Query Processing Steps
SQL Query PARSER
(parsing and semantic checking as in any
compiler)? Parse tree ( tree structure
representing relational calculus expression)?
OPTIMIZER (very
advanced)? Execution plan (annotated relation
algebra expression)? EXECUTOR
(execution plan interpreter)?
DBMS kernel
Data structures
64
Outline of Query Compilation
  • Query compilation
  • Parsing A parse tree for the query is
    constructed
  • Query Rewrite The parse tree is converted to an
    initial query plan and transformed into logical
    query plan (less time)?
  • Physical Plan Generation Logical Q Plan is
    converted into physical query plan by selecting
    algorithms and order of execution of these
    operator.

64
65
Basic Steps in Query Processing
66
Physical-Query-Plan Operators
  • Physical operators are implementations of the
    operator of relational algebra.
  • They can also be use in non relational algebra
    operators like scan which scans tables, that
    is, bring each tuple of some relation into main
    memory

66
67
Basic Steps in Query Processing
  • 1. Parsing and translation
  • 2. Optimization
  • 3. Evaluation

68
Basic Steps in Query Processing (Cont.)?
  • Parsing and translation
  • translate the query into its internal form. This
    is then translated into relational algebra.
  • Parser checks syntax, verifies relations
  • Evaluation
  • The query-execution engine takes a
    query-evaluation plan, executes that plan, and
    returns the answers to the query

69
Scanning Tables
  • One of the basic thing we can do in a Physical
    query plan is to read the entire contents of a
    relation R.
  • Variation of this operator involves simple
    predicate, read only those tuples of the relation
    R that satisfy the predicate.

69
70
Scanning Tables
  • Basic approaches to locate the tuples of a
    relation R
  • Table Scan
  • Relation R is stored in secondary memory with its
    tuples arranged in blocks
  • It is possible to get the blocks one by one
  • Index-Scan
  • If there is an index on any attribute of Relation
    R, we can use this index to get all the tuples of
    Relation R

70
71
Sorting While Scanning Tables
  • Number of reasons to sort a relation
  • Query could include an ORDER BY clause, requiring
    that a relation be sorted.
  • Algorithms to implement relational algebra
    operations requires one or both arguments to be
    sorted relations.
  • Physical-query-plan operator sort-scan takes a
    relation R, attributes on which the sort is to be
    made, and produces R in that sorted order

71
72
Computation Model for Physical Operator
  • Physical-Plan Operator should be selected wisely
    which is essential for good Query Processor .
  • For cost of each operator is estimated by
    number of disk I/Os for an operation.
  • The total cost of operation depends on the size
    of the answer, and includes the final write back
    cost to the total cost of the query.

72
73
Parameters for Measuring Costs
  • Parameters that affect the performance of a query
  • Buffer space availability in the main memory at
    the time of execution of the query
  • Size of input and the size of the output
    generated
  • The size of memory block on the disk and the size
    in the main memory also affects the performance

73
74
Parameters for Measuring Costs
  • B The number of blocks are needed to hold all
    tuples of relation R.
  • Also denoted as B(R)?
  • TThe number of tuples in relationR.
  • Also denoted as T(R)?
  • V The number of distinct values that appear in a
    column of a relation R
  • V(R, a)- is the number of distinct values of
    column for a in relation R

74
75
I/O Cost for Scan Operators
  • If relation R is clustered, then the number of
    disk I/O for the table-scan operator is B disk
    I/Os
  • If relation R is not clustered, then the number
    of required disk I/O generally is much higher
  • A index on a relation R occupies many fewer than
    B(R) blocks
  • That means a scan of the entire relation R
    which takes at least B disk I/Os will require
    more I/Os than the entire index

75
76
Iterators for Implementation of Physical Operators
  • Many physical operators can be implemented as an
    Iterator.
  • Three methods forming the iterator for an
    operation are
  • 1. Open( )
  • This method starts the process of getting tuples
  • It initializes any data structures needed to
    perform the operation

76
77
Iterators for Implementation of Physical Operators
  • 2. GetNext( )
  • Returns the next tuple in the result
  • If there are no more tuples to return, GetNext
    returns a special value NotFound
  • 3. Close( )
  • Ends the iteration after all tuples
  • It calls Close on any arguments of the operator

77
78
Reference
  • ULLMAN, J. D., WISDOM J. HECTOR G., DATABASE
    SYSTEMS THE COMPLETE BOOK, 2nd Edition, 2008.

78
79
Thank You
79
80
Query Execution
One-Pass Algorithms for Database Operations
(15.2)?
Presented by Ronak Shah (214)? April 22, 2009
80
81
Introduction
  • The choice of an algorithm for each operator is
    an essential part of the process of transforming
    a logical query plan into a physical query plan.
  • Main classes of Algorithms
  • Sorting-based methods
  • Hash-based methods
  • Index-based methods
  • Division based on degree difficulty and cost
  • 1-pass algorithms
  • 2-pass algorithms
  • 3 or more pass algorithms

81
82
Categorizing Algorithms
  • By general technique
  • sorting-based
  • hash-based
  • index-based
  • By the number of times data is read from disk
  • one-pass
  • two-pass
  • multi-pass (more than 2)?
  • By what the operators work on
  • tuple-at-a-time, unary
  • full-relation, unary
  • full-relation, binary

83
One-Pass Algorithm Methods
  • Tuple-at-a-time, unary operations (selection
    projection)?
  • Full-relation, unary operations
  • Full-relation, binary operations (set bag
    versions of union)?

83
84
One-Pass, Tuple-at-a-Time
  • These are for SELECT and PROJECT
  • Algorithm
  • read the blocks of R sequentially into an input
    buffer
  • perform the operation
  • move the selected/projected tuples to an output
    buffer
  • Requires only M 1
  • I/O cost is that of a scan (either B or T,
    depending on if R is clustered or not)?
  • Exception! Selecting tuples that satisfy some
    condition on an indexed attribute can be done
    faster!

85
One-Pass, Tuple-at-a-Time
  • duplicate elimination (DELTA)?
  • Algorithm
  • keep a main memory search data structure D (use
    search tree or hash table) to store one copy of
    each tuple
  • read in each block of R one at a time (use scan)?
  • for each tuple check if it appears in D
  • if not then add it to D and to the output buffer
  • Requires 1 buffer to hold current block of R
    remaining M-1 buffers must be able to hold D
  • I/O cost is just that of the scan

86
One-Pass, Unary, Full-Relation
  • duplicate elimination (DELTA)?
  • Algorithm
  • keep a main memory search data structure D (use
    search tree or hash table) to store one copy of
    each tuple
  • read in each block of R one at a time (use scan)?
  • for each tuple check if it appears in D
  • if not then add it to D and to the output buffer
  • Requires 1 buffer to hold current block of R
    remaining M-1 buffers must be able to hold D
  • I/O cost is just that of the scan

87
One-Pass Algorithms for Tuple-at-a-Time Operations
  • Tuple-at-a-time operations are selection and
    projection
  • read the blocks of R one at a time into an input
    buffer
  • perform the operation on each tuple
  • move the selected tuples or the projected tuples
    to the output buffer
  • The disk I/O requirement for this process depends
    only on how the argument relation R is provided.
  • If R is initially on disk, then the cost is
    whatever it takes to perform a table-scan or
    index-scan of R.

87
88
A selection or projection being performed on a
relation R
88
89
One-Pass Algorithms for Unary, fill-Relation
Operations
  • Duplicate Elimination
  • To eliminate duplicates, we can read each block
    of R one at a time, but for each tuple we need to
    make a decision as to whether
  • It is the first time we have seen this tuple, in
    which case we copy it to the output, or
  • We have seen the tuple before, in which case we
    must not output this tuple.
  • One memory buffer holds one block of R's tuples,
    and the remaining M - 1 buffers can be used to
    hold a single copy of every tuple.

89
90
Managing memory for a one-pass duplicate-eliminati
on
90
91
Duplicate Elimination
  • When a new tuple from R is considered, we compare
    it with all tuples seen so far
  • if it is not equal we copy both to the output
    and add it to the in-memory list of tuples we
    have seen.
  • if there are n tuples in main memory each new
    tuple takes processor time proportional to n, so
    the complete operation takes processor time
    proportional to n2.
  • We need a main-memory structure that allows each
    of the operations 
  • Add a new tuple, and
  • Tell whether a given tuple is already there 

91
92
Duplicate Elimination (contd.)?
  • The different structures that can be used for
    such main memory structures are
  • Hash table
  • Balanced binary search tree

92
93
One-Pass Algorithms for Unary, fill-Relation
Operations
  • Grouping
  • The grouping operation gives us zero or more
    grouping attributes and presumably one or more
    aggregated attributes
  • If we create in main memory one entry for each
    group then we can scan the tuples of R, one block
    at a time.
  • The entry for a group consists of values for the
    grouping attributes and an accumulated value or
    values for each aggregation.

93
94
Grouping
  • The accumulated value is
  • For MIN(a) or MAX(a) aggregate, record minimum
    /maximum value, respectively.
  • For any COUNT aggregation, add 1 for each tuple
    of group.
  • For SUM(a), add value of attribute a to the
    accumulated sum for its group.
  • AVG(a) is a hard case. We must maintain 2
    accumulations count of no. of tuples in the
    group sum of a-values of these tuples. Each is
    computed as we would for a COUNT SUM
    aggregation, respectively. After all tuples of R
    are seen, take quotient of sum count to obtain
    average.

94
95
One-Pass Algorithms for Binary Operations
  • Binary operations include
  • Union
  • Intersection
  • Difference
  • Product
  • Join

95
96
Set Union
  • We read S into M - 1 buffers of main memory and
    build a search structure where the search key is
    the entire tuple.
  • All these tuples are also copied to the output.
  • Read each block of R into the Mth buffer, one at
    a time.
  • For each tuple t of R, see if t is in S, and if
    not, we copy t to the output. If t is also in S,
    we skip t.

96
97
Set Intersection
  • Read S into M - 1 buffers and build a search
    structure with full tuples as the search key.
  • Read each block of R, and for each tuple t of R,
    see if t is also in S. If so, copy t to the
    output, and if not, ignore t.

97
98
Set Difference
  • Read S into M - 1 buffers and build a search
    structure with full tuples as the search key. 
  • To compute R -s S, read each block of R and
    examine each tuple t on that block. If t is in S,
    then ignore t if it is not in S then copy t to
    the output. 
  • To compute S -s R, read the blocks of R and
    examine each tuple t in turn. If t is in S, then
    delete t from the copy of S in main memory, while
    if t is not in S do nothing.
  • After considering each tuple of R, copy to the
    output those tuples of S that remain.

98
99
Bag Intersection
  • Read S into M - 1 buffers.
  • Multiple copies of a tuple t are not stored
    individually. Rather store 1 copy of t
    associate with it a count equal to no. of times t
    occurs.  
  • Next, read each block of R, for each tuple t of
    R see whether t occurs in S. If not ignore t it
    cannot appear in the intersection. If t appears
    in S, count associated with t is ()ve, then
    output t decrement count by 1. If t appears in
    S, but count has reached 0, then do not output t
    we have already produced as many copies of t in
    output as there were copies in S.

99
100
Bag Difference
  • To compute S -B R, read tuples of S into main
    memory count no. of occurrences of each
    distinct tuple.
  • Then read R check each tuple t to see whether t
    occurs in S, and if so, decrement its associated
    count. At the end, copy to output each tuple in
    main memory whose count is positive, no. of
    times we copy it equals that count.
  • To compute R -B S, read tuples of S into main
    memory count no. of occurrences of distinct
    tuples.

100
101
Bag Difference (contd.)?
  • Think of a tuple t with a count of c as c reasons
    not to copy t to the output as we read tuples of
    R.
  • Read a tuple t of R check if t occurs in S. If
    not, then copy t to the output. If t does occur
    in S, then we look at current count c associated
    with t. If c 0, then copy t to output. If c gt
    0, do not copy t to output, but decrement c by 1.

101
102
Product
  • Read S into M - 1 buffers of main memory
  • Then read each block of R, and for each tuple t
    of R concatenate t with each tuple of S in main
    memory.
  • Output each concatenated tuple as it is formed.
  • This algorithm may take a considerable amount of
    processor time per tuple of R, because each such
    tuple must be matched with M - 1 blocks full of
    tuples. However, output size is also large,
    time/output tuple is small.

102
103
Natural Join
  • Convention R(X, Y) is being joined with S(Y, Z),
    where Y represents all the attributes that R and
    S have in common, X is all attributes of R that
    are not in the schema of S, Z is all attributes
    of S that are not in the schema of R. Assume that
    S is the smaller relation.
  • To compute the natural join, do the following 
  • Read all tuples of S form them into a
    main-memory search structure.
  • Hash table or balanced tree are good e.g. of
    such structures. Use M - 1 blocks of memory for
    this purpose.

103
104
Natural Join
  • Read each block of R into 1 remaining main-memory
    buffer.
  • For each tuple t of R, find tuples of S that
    agree with t on all attributes of Y, using the
    search structure.
  • For each matching tuple of S, form a tuple by
    joining it with t, move resulting tuple to
    output.

104
105
Thank you
105
106
QUERY EXECUTION
  • 15.3
  • Nested-Loop Joins

By Saloni Tamotia (215)
107
Introduction to Nested-Loop Joins
  • Used for relations of any side.
  • Not necessary that relation fits in main memory
  • Uses One-and-a-half pass method in which for
    each variation
  • One argument read just once.
  • Other argument read repeatedly.
  • Two kinds
  • Tuple-Based Nested Loop Join
  • Block-Based Nested Loop Join

108
ADVANTAGES OF NESTED-LOOP JOIN
  • Fits in the iterator framework.
  • Allows us to avoid storing intermediate relation
    on disk.

109
Tuple-Based Nested-Loop Join
  • Simplest variation of the nested-loop join
  • Loop ranges over individual tuples

110
Tuple-Based Nested-Loop Join
  • Algorithm to compute the Join R(X,Y) S(Y,Z)
  • FOR each tuple s in S DO
  • FOR each tuple r in R DO
  • IF r and s join to make tuple t THEN
  • output t
  • R and S are two Relations with r and s as tuples.
  • carelessness in buffering of blocks causes the
    use of T(R)T(S) disk I/Os

111
IMPROVEMENT MODIFICATION
  • To decrease the cost
  • Method 1 Use algorithm for Index-Based joins
  • We find tuple of R that matches given tuple of S
  • We need not to read entire relation R
  • Method 2 Use algorithm for Block-Based joins
  • Tuples of R S are divided into blocks
  • Uses enough memory to store blocks in order to
    reduce the number of disk I/Os.

112
An Iterator for Tuple-Based Nested-Loop
Join
  • Open0 C
  • R.Open()
  • S . Open ()
  • GetNextO
  • REPEAT C
  • r R.GetNext()
  • IF (r NotFound) C / R is exhausted for
  • the current s /
  • R.Close()
  • s S.GetNext()
  • IF (s NotFound) RETURN NotFound
  • / both R and S are exhausted /
  • R.Open0
  • r R.GetNext()
  • UNTIL(r and s join)
  • RETURN the join of r and s
  • Close0 (
  • R. Close () S. Close ()

113
Block-Based Nested-Loop Join Algorithm
  • Access to arguments is organized by block.
  • While reading tuples of inner relation we use
    less number of I/Os disk.
  • Using enough space in main memory to store tuples
    of relation of the outer loop.
  • Allows to join each tuple of the inner relation
    with as many tuples as possible.

114
  • FOR each chunk of M-1 blocks of S DO BEGIN
  • read these blocks into main-memory buffers
  • organize their tuples into a search structure
    whose
  • search key is the common attributes of R and S
  • FOR each block b of R DO BEGIN
  • read b into main memory
  • FOR each tuple t of b DO BEGIN
  • find the tuples of S in main memory that
  • join with t
  • output the join of t with each of these tuples
  • END
  • END
  • END

115
Block-Based Nested-Loop Join Algorithm
  • ALGORITHM
  • FOR each chunk of M-1 blocks of S DO
  • FOR each block b of R DO
  • FOR each tuple t of b DO
  • find the tuples of S in memory that join
    with t
  • output the join of t with each of these
    tuples

116
Block-Based Nested-Loop Join Algorithm
  • Assumptions
  • B(S) B(R)
  • B(S) gt M
  • This means that the neither relation fits in the
    entire main memory.

117
Analysis of Nested-Loop Join
  • Number of disk I/Os
  • B(S)/(M-1)(M-1 B(R))
  • or
  • B(S) B(S)B(R)/(M-1)
  • or approximately B(S)B(R)/M

118
QUESTIONS
119
Two-Pass Algorithms Based on Sorting
  • SECTION 15.4
  • Rupinder Singh

120
Two-Pass Algorithms Based on Sorting
  • Two-pass Algorithms data from operand relations
    is read into main memory, then processed, written
    out to disk and then re-read from the disk to
    complete the operation
  • In this section, we consider sorting as tool from
    implementing relational operations. The basic
    idea is as follows if we have large relation R,
    where B(R) is larger than M, the number of memory
    buffers we have available, then we can repeatedly

121
Basic idea
  • Step 1 Read M blocks of R into main memory.
  • Step 2Sort these M blocks in main memory, using
    an efficient, main-memory sorting algorithm. so
    we expect that the time to sort will not exceed
    the disk 1/0 time for step (1).
  • Step 3 Write the sorted list into M blocks of
    disk.

122
Duplicate Elimination Using Sorting d(R)
  • To perform d(R) operation in two passes, we sort
    tuples of R in sublists. Then we use available
    memory to hold one block from each stored
    sublists and then repeatedly copy one to the
    output and ignore all tuples identical to it.
  • The total cost of this algorithm is 3B(R)
  • This algorithm requires only vB(R)blocks of main
    memory, rather than B(R) blocks(one-pass
    algorithm).

123
Example
  • Suppose that tuples are integers, and only two
    tuples fit on a block. Also, M 3 and the
    relation R consists of 17 tuples
  • 2,5,2,1,2,2,4,5,4,3,4,2,1,5,2,1,3
  • After first-pass

Sublists Elements
R1 1,2,2,2,2,5
R2 2,3,4,4,4,5
R3 1,1,2,3,5
124
Example
  • Second pass
  • After processing tuple 1
  • Output 1
  • Continue the same process with next tuple.

Sublist In memory Waiting on disk
R1 1,2 2,2, 2,5
R2 2,3 4,4, 4,5
R3 1,1 2,3,5
Sublist In memory Waiting on disk
R1 2 2,2, 2,5
R2 2,3 4,4, 4,5
R3 2,3 5
125
Grouping and Aggregation Using Sorting ?(R)
  • Two-pass algorithm for grouping and aggregation
    is quite similar to the previous algorithm.
  • Step 1Read the tuples of R into memory, M blocks
    at a time. Sort each M blocks, using the grouping
    attributes of L as the sort key. Write each
    sorted sublist to disk.
  • Step 2Use one main-memory buffer for each
    sublist, and initially load the first block of
    each sublist into its buffer.
  • Step 3Repeatedly find the least value of the
    sort key (grouping attributes) present among the
    first available tuples in the buffers.
  • This algorithm takes 3B(R) disk 1/0's, and will
    work as long as B(R) lt M².

126
A Sort-Based Union Algorithm
  • For bag-union one-pass algorithm is used.
  • For set-union
  • Step 1Repeatedly bring M blocks of R into main
    memory, sort their tuples, and write the
    resulting sorted sublist back to disk.
  • Step 2Do the same for S, to create sorted
    sublists for relation S.
  • Step 3Use one main-memory buffer for each
    sublist of R and S. Initialize each with the
    first block from the corresponding sublist.
  • Step 4Repeatedly find the first remaining tuple
    t among all the buffers. Copy t to the output.
    and remove from the buffers all copies of t (if R
    and S are sets there should be at most two
    copies)
  • This algorithm takes 3(B(R)B(S)) disk 1/0's, and
    will work as long as B(R)B(S) lt M².

127
Sort-Based Intersection and Difference
  • For both set version and bag version, the
    algorithm is same as that of set-union except
    that the way we handle the copies of a tuple t at
    the fronts of the sorted sublists.
  • For set intersection, output t if it appears in
    both R and S.
  • For bag intersection, output t the minimum of the
    number of times it appears in R and in S.
  • For set difference, R-S, output t if and only if
    it appears in R but not in S.
  • For bag difference, R-S, output t the number of
    times it appears in R minus the number of times
    it appears in S.

128
A Simple Sort-Based Join Algorithm
  • Given relation R(x,y) and S(y,z) to join, and
    given M blocks of main memory for buffers,
  • 1. Sort R, using a two phase, multiway merge
    sort, with y as the sort key.
  • 2. Sort S similarly
  • 3. Merge the sorted R and S. Generally we use
    only two buffers, one for the current block of R
    and the other for current block of S. The
    following steps are done repeatedly.
  • a. Find least value y of the join attributes Y
    that is currently at the front of the blocks for
    R and S.
  • b. If y doesnt appear at the front of the
    other relation, then remove the tuples with sort
    key y.
  • c. Otherwise identify all the tuples from both
    relation having sort key y
  • d. Output all tuples that can be formed by
    joining tuples from R and S with a common Y value
    y.
  • e. If either relation has no more unconsidered
    tuples in main memory reload buffer for the
    relation.

129
A More Efficient Sort-Based Join
  • If we do not have to worry about very large
    numbers of tuples with a common value for the
    join attribute(s), then we can save two disk
    1/0's per block by combining the second phase of
    the sorts with the join itself
  • To compute R(X, Y) ?? S(Y, Z) using M main-memory
    buffers
  • Create sorted sublists of size M, using Y as the
    sort key, for both R and S.
  • Bring the first block of each sublist into a
    buffer
  • Repeatedly find the least Y-value y among the
    first available tuples of all the sublists.
    Identify all the tuples of both relations that
    have Y-value y. Output the join of all tuples
    from R with all tuples from S that share this
    common Y-value

130
A More Efficient Sort-Based Join
  • The number of disk I/Os is 3(B(R) B(S))
  • It requires B(R) B(S) M² to work

131
Summary of Sort-Based Algorithms
Operators Approximate M required Disk I/O
?,d vB 3B
U,n,- v(B(R) B(S)) 3(B(R) B(S))
?? v(max(B(R),B(S))) 5(B(R) B(S))
??(more efficient) v(B(R) B(S)) 3(B(R) B(S)
Write a Comment
User Comments (0)
About PowerShow.com