Physical Database - PowerPoint PPT Presentation

About This Presentation
Title:

Physical Database

Description:

Title: Physical Database Author: thu Last modified by: thu Created Date: 4/11/2003 12:18:38 AM Document presentation format: On-screen Show (4:3) Company – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 86
Provided by: Thu78
Learn more at: https://cci.drexel.edu
Category:

less

Transcript and Presenter's Notes

Title: Physical Database


1
Physical Database
2
Physical Design Activities
  • Minimize the number of disk accesses
  • Minimize the amount of space used to store the
    data
  • Provide for data integrity and protect the data
    from loss

3
Disk Storage Devices
  • Disk units are referred to as direct access
    storage device or DASD ( daz_dee)
  • Offer high storage capacity and low cost
  • Data stored as magnetized areas on magnetic disk
    surfaces.
  • A disk pack contains several magnetic disks
    connected to a rotating spindle.
  • For disk packs, the tracks with the same
    diameter on the various surfaces are called a
    cylinder. (a set of tracks that can be accessed
    w/o moving arms)

4
Cont
  • Disks are divided into concentric circular tracks
    on each disk surface. Track capacity vary
    typically from 4 to 50 Kbytes. (about 800 tracks)
  • Each concentric circle stores the same amount
    of information, so bits are packed more densely
    on the smaller-diameter tracks.
  • A track is divided into sectors or blocks
    (or data page). The division of a track into
    equal-sized blocks is set by the OS during disk
    formatting. So, The block size B is fixed for
    each system.
  • A block is a unit of space read/written by
    the disk head. Typical block sizes range from
    512 bytes to 4096 bytes. A whole block is
    transferred between disk and main memory for
    processing. (Typical Unix sys 2 KB mainframe
    4KB 1.44 floppy 512B)
  • This means the bigger the block size, the faster
    the data read/write time. However, the storage
    could also be wasted.  

5
Cont
  • In Oracle, the overhead to manage a block is
    approximately 90 bytes for a table, and 161 bytes
    for an index (about 80 are used for data).
  • Blocks are separated by fixed-size interblock
    gaps.
  • A section of memory set aside to receive a
    block of data is called a buffer. Typically, a
    buffer size is the same as the block size. There
    are many buffers in commercial systems.
    (500-3000).
  • A read-write head moves to the track that
    contains the block to be transferred (seek time).
    Disk rotation moves the block under the
    read-write head for reading or writing
    (rotational delay or latency).

6
Cont
  • Reading or writing a disk block is time-consuming
    because of the seek time s and rotational delay
    (latency) rd.
  • Thus, the access time (the average time
    required to access a single data record) will be
  • Average_Seek_time Avg_Latency
    Block_Transfer_time
  • where Block_Transfer_time
    Transfer_Rate x Block_Size
  • Typical access time ranges from 10-60
    milliseconds
  • The seek time gtgt rotational delay gtgt BTT.
  • So, it is common to transfer several consecutive
    blocks on the same track or cylinder. (to
    eliminate ST and RD)

7
Cont
  • Locating data on a disk is a major bottle neck in
    db applications.
  • A physical disk block address consists of a
    surface number, track number (within surface),
    and block number (within track).

8
Cont
  • Mainframe systems call each disk unit a volume,
    and each volume has a volume table of content
    (VTOC).
  •  
  • In the MS-DOS, the data entry info. is stored
    into a file allocation table (FAT).
  •  
  • Typically, relative addressing is used. Each
    record has a relative record number. The first
    record is 0 or certain displacement, etc.
  •  
  • To speed up processing, double buffering can
    be used. Once one buffer is full, the CPU can
    start processing at the same time, the I/O
    processor can be reading and transferring the
    next block into a different buffer.

9
Files of Records
  • A file is a sequence of records, where each
    record is a collection of data values (or data
    items).
  • A file descriptor (or file header) includes
    info. that describes the file, such as field
    names and their data types, and the address of
    the file blocks on disk.
  • Records are stored on disk blocks. The blocking
    factor bfr for a file is the average number of
    records stored in a disk block.

10
Cont
  • A file can have fixed-length records or
    variable-length records.
  • File records can be unspanned (no record can
    span two blocks) or spanned (a record can be
    stored in more than one block).
  • In a file of fixed-length records, all
    records have the same format. Usually, unspanned
    blocking is used with such files.
  • Files of variable-length records require
    additional info. to be stored in each record,
    such as separator characters and field types.
    Usually spanned blocking is used with such files.
  • The physical disk blocks that are allocated to
    hold the records of a file can be contiguous,
    linked, or indexed

11
Typical Operations on Files
  • OPEN, FIND, FIND NEXT, READ, INSERT, DELETE,
    MODIFY, CLOSE, REORGANIZE, READ_ORDERED

12
Primary File Organization
  • Unordered Files
  • Ordered Files
  • Hashed Files

13
Unordered Files
  • Also called a heap or a pile file, new records
    are inserted at the end of the file.
  • To search for a record, a linear search
    through the file records is necessary. This
    requires reading and searching half the file
    blocks on the average, and is hence quite
    expensive.
  • Record insertion is quite efficient.
  • Reading the records in order of a particular
    field requires sorting the file records.

14
Ordered Files
  • Also called a sequential ( ordered) file.
  • File records are kept sorted by the values of
    an ordering field.
  • Insertion is expensive records must be
    inserted in the correct order. It is common to
    keep a separate unordered overflow (or
    transaction) file for new records to improve
    insertion efficiency this is periodically
    merged with the main ordered file.
  • A binary search can be used to search for a
    record on its ordering field value. This
    requires reading and searching log2 (b) blocks on
    the average, an improvement over linear search.
  • Reading the records in order of the ordering
    field is efficient

15
Hashed (Direct) Files
  • Static External Hashing
  • Hashing refers to the process of converting
    attribute values directly into addresses.
  • Hashing for disk files is called external
    hashing
  • The file blocks are divided into M equal-sized
    buckets, numbered bucket0, bucket1 , ....,
    bucketM-1. Typically, a bucket corresponds to
    one (or a fixed number of) disk blocks.
  • One of the file fields is designated to be
    the hash key of the file.

16
Cont
  • The record with hash key value K is stored in
    bucket i, where i h(K), and h is the hash
    function. 
  • Collisions occur when a new record hashes to a
    bucket that is already full. An overflow file is
    kept for storing such records. Overflow records
    that hash to each bucket can be linked together.
  • To reduce overflow records, a hash file is
    typically kept 70-80 full. If we expect to have
    r records to store in the table, we should choose
    M locations for the address space such that (r/M)
    is between 0.7 and 0.9.

17
Cont
  • Typically, M is a prime number since it
    distributes randomly via MOD fn.
  • The hash fn h should distribute the records
    uniformly among the buckets otherwise search
    time will be increased because many overflow
    records will exist.
  • Main advantages search is efficient in most
    cases, needs a single block access to retrieve
    that record (fastest possible access, given the
    hash value)

18
Disadvantage
  • No ordering of records
  • No search other than hash field
  • Fixed buckets M is a problem if of records
    grows or shrinks

19
Cont
  • Only OpenIngres and ORACLE support hashing.
  • Oracle supports hashing in terms of table
    clustering

20
Index Structures for Files
  • 1. Iindexes as Access Paths
  • 2. Types of Single-level Indexe
  • Primary Indexes, Clustering Indexes,
    Secondary Index
  • 3. Multi-level Indexes
  • 4. Using B-Trees and B-Trees as Dynamic
    Multi-level Indexes
  • 5. Indexes and Performance Optimization

21
Indexes as Access Paths
  • A single-level index is an auxiliary file that
    makes it more efficient to search for a record
    in the data file.
  • The index is usually specified on one field of
    the file (could be multiple fields)
  • - Oracle allows up to 16 attributes (or max
    1000-2000 bytes)
  •  
  • One form of an index is a file of entries
    ltfield_value, pointer_to_recordgt, which is
    ordered by the field value so that a binary
    search can be used.
  •  

22
Cont
  • The index is called an access path on the field.
  •  
  • Indexes speed up retrieval of data.
  • - Most beneficial for read-intensive and large
    files
  • - Write/update-intensive (volatile) files
    produce overheads

23
Types of Indexes
  • Primary Index
  • Clustering Index
  • Secondary Index
  • Multi-Level Indexes

24
Primary Index
  • Defined on an ordered data file whose records are
    fixed length with 2 fields
  • (ordering key field pointer to a disk
    block)
  • The data file is physically ordered on a Primary
    Key or an ordering field
  •  
  • Includes one index entry (or index record) for
    each block in the data file
  • (Not every ordering field value appears as an
    index entry)

25
Cont
  • The index entry has the ordering field value for
    the first record in the block, which is called
    the block anchor
  • A similar scheme can use the last record in a
    block

26
Cont
  • A primary index is an example of a nondense
    (sparse) index
  • A Nondense index has an entry for each disk
    block of the data file
  • A Dense index has an entry for each record

27
Cont
  • Major problems insertion and deletion of records
  • -  Use either unordered overflow file or a
    linked list of overflow records for each block in
    the data file.

28
Example
  • Suppose we have an ordered file EMP (Name, SSN,
    Address, Job, Sal, ....), record size R 100
    bytes (fixed and unspanned), block size B 2048
    bytes, records r 30,000 records, Blocking
    factor Bfr B/R CEIL(2048/100) 20
    records/block
  • CEIL(X) returns a next higher integer value after
    truncation.
  • E.g., CEIL(5.6) returns 6 or CEIL(12.01) returns
    13.
  • The number of file blocks b CEIL(r/Bfr)
    CEIL(30000/20) 1500 blocks

29
Cont
  • No index use linear search
  • An average linear search cost of (b/2)
    (1500)/2 750 block accesses
  • (b) No index use binary search
  • A binary search on the data file would need
    log2 b log2 (1500) 11 block accesses

30
Cont
  • (c) Index use binary search
  • Suppose Length(SSN) 9 bytes and a block
    pointer P 6 bytes
  • Index entry size IE_Size (9 6) 15 bytes
  • The Bfr for the index is (B/IE_size)
    BOTTOM(2048/15)
  • BOTTOM(136.53) 136 entries/block
  • The total of index entries 1500
  • The number of index blocks CEIL(1500/136) 12
    blocks
  • A binary search on the index file would need
    CEIL(log2 b)
  • CEIL(log2 (12)) 4 block accesses
  • To search for a record using the index, needs one
    additional block access. So, a total of 4 1 5
    block accesses is needed

31
Cont
  • Comparison
  • (Linear_Search No_Index__BS Index__BS)
    (750 11 5)
  • BS Binary Search

32
Clustering Index
  • Defined on a data file physically ordered on a
    non-key field that does not have a distinct
    value for each record.
  •  
  • A relation can have at most one clustering
    index.
  •  
  • The field is called the clustering field.
  •  
  • Includes one index entry (index record) for
    each distinct value of the field
  •  
  • The index entry points to the first data block
    that contains records w/ that indexed field value

33
Secondary Index
  • Defined on an unordered data file
  • Can be defined on a key field or a non-key field
  • Includes one index entry for each record in the
    data file dense index.
  •  
  • A block anchor cannot be used in the secondary
    index because the records of the data file are
    not physically ordered by values of the secondary
    key field.

34
Cont
  • A secondary index for a key field is a dense
    index (one entry/record)
  • A secondary index for a non-key field could be
    densed or nondensed.
  •  
  • A secondary index provides a logical ordering on
    the records by the indexing field.
  •  
  • There can be many secondary indexes for a data
    file 
  • A secondary index is called an inverted index

35
Multi-Level Indexes
  • Because a single-level index is an ordered file,
    we can create a primary index to the index
    itself.
  •  
  • We call the original index the first-level
    index and the index to the index the second-level
    index
  •  
  • We can repeat this process, creating a 3rd,
    4th, ..., top level until all entries of the top
    level fit in one disk block
  •  

36
Cont
  • A multi-level index can be created for any type
    of first-level index (primary, clustering,
    secondary) as long as the first-level index
    consists of more than one disk block
  •  
  • Such a multi-level index is a form of search
    tree however, insertion and deletion of new
    index entries is a severe problem because every
    level of the index is an ordered file
  •  
  • Hence, most multi-level indexes use B-tree or
    B tree data structures, which leave space in
    each tree node (disk block) to allow for new
    index entries

37
Example of Multi-level Tree Index
  • a)  A table with 1 M records, Index value 8
    bytes and Pointer to row ID 6 bytes, So, 14
    bytes/index entry
  • b)  1 block 2 KB, Assume the block header
    overhead is 150 bytes and the fill factor is
    0.80., Actual 1 block space (block size - block
    header overhead) fill_factor (2 KB 150)
    0.80 (2048-150)0.8 1518 bytes, Index
    Blocking Factor, BF BOTTOM(1518/14) 108 index
    records

38
Cont
  • c) disk blocks necessary for storing 1 M records
  • CEIL (1,000,000/108) 9,260 blocks (disk
    blocks)
  •  
  • Therefore, we have 9,260 blocks at the leaf
    level and 9,260 node pointers at the directory
    level to the leaf level

39
Cont
  • d) The space for directory entries at a higher
    level CEIL (9,260/108) 86 blocks
  •  
  • e) Now we can create a root of the index tree
    with a single block. Thus, the index tree for
    this example consists of 3 levels
  • Root 1 block
  • 2nd level 86 blocks
  • 3rd level 9,260 blocks

40
Cont
  • f) Finding the index entry needs only 3 block
    accesses in this example.
  • log108 (1 million) lt 3
  • This is more efficient than disk binary search
  • log2 (1 million) 20 block access
  • In general, for a tree of depth K with the
    blocking factor BF,
  • The total leaf-level entries (BF)k
  • The max. records for 3 level index tree with BF
    108 is
  • (108)3 1,259,712 records

41
Cont
  • The access to the data requires
  • ( access to the index) ( 1 additional disk
    access to the data file ) (3 disk accesses to
    the index ) ( 1 to the data file) 4 disk
    accesses.
  • This means over 1 M records can be searched by 4
    disk accesses. If we assume that 1 disk access
    0.025 sec, 4 disk access 0.1 sec a few
    calculations 0.1 seconds , Note that we used a
    simplified method for estimating the access time

42
B Trees and B Trees as Dynamic Multi-level
Indexes
  • A tree data structure
  • A node in a search tree of order P
  • - Each node contain at most P-1 search values
    and P pointers
  • - Two constraints
  • 1. Within each node, K1 ltK2lt .... lt Kq-1
  • 2. Subtree nodes has larger values than the
    LHS search value and smaller values than the RHS
    search value
  • - Example search tree of order P 3

43
Cont
  • A tree is balanced if all of its leaf nodes are
    at the same level
  • - Ensures that no nodes will be at very high
    levels (many block accesses)
  • - Deletion may leave some nodes nearly empty
    (storage waste)
  •  
  • B Tree and B Tree structures are variations of
    search trees that allow efficient insertion and
    deletion of new search values

44
Cont
  • In B Tree and B Tree structures, each node
    corresponds to a disk block (or a disk block
    offset)
  •  Each node is kept between half-full and
    completely full
  •   An insertion into a node that is not full is
    quite efficient if a node is full, the insertion
    causes a split into two nodes

45
Cont
  • Splitting may propagate to other tree levels
  • A deletion is quite efficient if a node does not
    become less than half full
  •  If a deletion causes a node to become less than
    half full, it must be merged with neighboring
    nodes

46
B tree
  • B trees and its variations were designed to
    reside on disk, partially memory resident (the
    root node)
  • It may contain an entry for every record  
  • In a B tree, pointers to data records exist at
    all levels of the tree
  • B tree insertion and deletion algorithm
    guarantees that the tree will always be balanced
    (performance vs. complexity).

47
B tree
  • In a B tree, all pointers to data records exists
    only at the leaf-level nodes
  •  The leaf nodes have an entry for every search
    value w/ a pointer to the record
  •  The leaf nodes of the B trees are usually
    linked together to provide ordered access on the
    search field
  •  Some search field values are repeated in the
    internal nodes, Causes duplicate key values and
    more maintenance cost
  •   Faster sequential processing

48
B tree
  • A B tree whose nodes are at least two-thirds
    filled
  •  Increase storage utilization and speed-up in
    search
  •  Increase update cost (node full occurs more
    frequently)
  •  Used in Oracle

49
Discussion
  • Most systems allow users to choose a fill
    factor b/n 0.5 to 1
  •  
  • The PK constraint is enforced by creating an
    index for the primary key
  •  
  • A fully inverted file is a file with a
    secondary index for every field
  •  
  • The most popular index organization is B trees
    or B trees
  •  
  • An ordered file with multi-level primary index
    on its ordering key field is called an Indexed
    Sequential File

50
Cont
  • IBM's ISAM (Indexed Sequential Access Method) has
    a 2-level index
  • - Cylinder index (the key value of an anchor
    record for each cylinder of a disk pack and a
    pointer to the track index)
  • - Track index (the key value of an anchor
    record for each track and a pointer to the track)
  • IBM's VSAM (Virtual Storage Access Method) is
    similar to B tree structure

51
Recommendation for Physical Design
  • Do not make physical design decisions too soon.
  • Do not tie the logical design to physical
    constraints that may well change
  • Optimize the physical design for top 20 of
    queries and transaction types that are most
    frequently executed.

52
Recommendation for Indexing
  • - Create indexes for PK, FK, CK, frequently-used
    attributes in queries (i.e., name), The
    uniqueness of values (PK, CK)
  • Do not create an index for FK which is not used
    for selection condition. eg Student(SSN, Name,
    Grade, Bdate, Zip) and ZIP(Zip, City, State)
    Find address of all seniors (don't create an
    index for ZIP in Student) , Find all names living
    in Wayne (create an index for ZIP in Student)

53
Cont
  • For multi-attribute indexes
  • .Attribute with most variety first
    (selectivity)
  • .If equal variety, attribute with most often
    accessed first
  • .Weak entity "main-attr minor-attr" such
    as owner PK partial key (In this case, do not
    create another index on owner PK in the WE table)
  • . When two attributes frequently occur at the
    same time in a query
  • . A composite index will be used only when the
    leading column appears in WHERE clause.

54
Cont
  • Do not create indexes for attributes with small
    values
  • . Boolean value ----- poor choice, slow down
    performance, (e.g., gender each value will have
    50 of table size big overhead)
  • . Phone number ----- good choice
  • . Area code ---------- marginal
  • . Small tables (less than 30 or 100 records)
  • (Better left unindexed except to enforce
    uniqueness in the PK)
  • . Small of blocks (less than 8 data blocks
    in Oracle)

55
Heuristics
  • . A frequent retrieval with less than 1
    retrieval is a good candidate.
  • . A frequent retrieval with 1-5 retrieval should
    be considered.
  • . An occasional retrieval with over 2 retrieval
    is not a good choice.
  • . The more static a file, the more attractive
  • . The larger the record, the greater the value of
    indexing
  • . Don't create more than 3 indexes for a volatile
    file in general
  • . Create as many indexes as needed for static
    files

56
When to create indexes
  • When the table is initially loaded with data
  • This method will create the index much faster for
    the sort algorithm will sort all the index values
    at once and then insert them into a B tree.

57
NULL and Indexes
  • .Columns that are NULL will not appear in an
    index (Except in cluster index)
  • . Indexes with more than one column will have an
    entry if any of the columns are not NULL
  • . Since an index does not include NULL value,
    some queries will be faster to run if you leave a
    numeric column NULL, rather than zero
  • SELECT Name, Commission
  • FROM Commission
  • WHERE Commission gt 0

58
Guidelines for Not Using Indexes
  • - If the index is never used by the optimizer
  • If more than 10-20 of the rows are to be
    retrieved
  • - If the column contains only one, two or three
    unique values.
  • - If the column to be indexed is too long (more
    than 20 bytes)
  • - If the overhead of maintaining the index is
    greater than the benefit.

59
LOGICAL STRUCTURES
  • Schema Objects
  • Base table, view, sequences, indexes, clusters,
    database links, synonyms, procedures, packages
  •  

60
LOGICAL STORAGE UNITS
  • TableSpaces
  • - A tablespace is a logical data organization
    unit that contains one or more db objects (DD,
    tables, indexes, or clusters).
  • - A db consists of one or more tablespaces.
  • - Every db has at least one tablespace SYSTEM
    that holds the DDs, the names and locations of
    all the tablespaces, tables, indexes, and
    clusters for this db
  • - Physically, a tablespace has one or more data
    files.
  •  

61
Cont
  • Example
  • CREATE TABLESPACE talbot DATAFILE 'HOME.ONE SIZE
    1000K DEFAULT STORAGE (INITIAL 25K NEXT 10K
    MINEXTENTS 1 MAXEXTENTS 100 PCTINCREASE 50) 
  • / Declaring a table within a tablespace /
  • CREATE TABLE ledger (ActionDate DATE, ........)
  • TABLESPACE talbot

62
Segments
  • A segment is a logical storage space for a db
    object.
  • - A tabelspace contains many segments.
  • - Types of segments in Oracle
  • Table, Index, Rollback, Temporary, Partition,
    Cluster

63
Extents
  • - A segment consists of one or many extents.
  • - An extent is a set of contiguous data blocks
    used to store a particular type of information.
  • - Each segment has an initial disk space set
    aside called the initial extent.
  • - The server allocates an additional extent to a
    segment whose space is full.
  • - Extents cannot span datafiles.

64
Blocks
  • A block is the smallest logical (also physical)
    unit of storage.
  • - Several blocks comprise an extent.
  • A block contains
  • . fixed block header (block address, segment
    type, the table and rows that use the block, etc
    about 113 bytes.),
  • . variable block header
  • . table directory (stores info about the
    tables in the cluster)
  • . the actual data with 2 bytes of row
    overhead, and
  • . free space (PCTFREE) percent of the block
    space reserved for update
  • - DB_FILE_MULTIBLOCK_READ_COUNT in INIT.ORA shows
    the of multi-blocks that can be read into buffer
    cache by a single read request during full table
    scan.

65
Controlling Space Usage at BLOCK Level
  • - Identifying the block size
  • SQLDBAgt SHOW PARAMETERS db_block_size
  • SQLgt SELECT SEGMENT_NAME, BYTES, BLOCKS,
    BYTES/BLOCKS Block_Size
  • FROM USER_SEGMENT

66
Cont
  • - Computing the number of blocks
  • ANALYZE TABLE mytable COMPUTE STATISTICS
  • SELECT BLOCKS FROM USER_TABLES
  • WHERE Table_Name MYTABLE

67
  • - CREATE TABLE product ( ....... ) TABLESPACE
    users
  • PCTFREE 20 PCTUSED 60
  • STOARGE (INITIAL 10M NEXT 5K
  • PCTINCREASE 0) 
  • - PCTFREE Parameter
  • . "Percent free" means what percentage of a block
    to keep free for changes to rows in the block
  • . The purpose of PCTFREE is to minimize disk I/O
    by reducing row chaining and row migration
  • . Row migration moving an entire row to a new
    block after an update of a row (VARCHAR2 or
    NULL) occurs when PCTFREE is too small.
  • . Row chaining the splitting of a row across
    multiple blocks (LONG or LONG RAW)
  • . The default 10
  • . Used for both tables and indexes.

68
Cont
  • PCTFREE and Tables
  • . The free space must be large enough to handle
    updates.
  • . Lower PCTFREE can cause row migration.
  • . The more volatile your data, the larger PCTFREE
    should be.
  • . For tables with no updates, PCTFREE should be
    zero.

69
PCTFREE and Indexes
  • . Specifies the free space that should be left
    within the leaf blocks
  • Example Employee index on last_name
  • If we have 10,000 employees and expect 2000 more,
    PCTFREE might be 25
  • This avoids the split of index leaf blocks.
  • If we build an index whose index key is higher
    than all current values, PCTFREE could be zero

70
DISPLAYING STORAGE INFORMATION with DD VIEWS
  • USER_EXTENT
  • Extents of segments belonging to the user
  • USER_FREE_SPACE
  • Free extents in a tablespace accessible to
    the user
  • USER_SEGMENTS
  • Storage allocation for segments belonging to
    the user
  • USER_TABLESPACES
  • Description of accessible tablespaces
  • USER_USERS
  • Information about the current user
  • USER_TS_QUOTAS
  • Tablespace quotas for the user

71
PCTUSED Parameter
  • . When the percentage of a data block falls down
    to the value of PCTUSED, the block becomes
    available for inserts
  • . Used only for tables, not for indexes.
  • . The default value of PCTUSED 40
  • . The higher the PCTUSED, the fuller the data
    block
  • . The smaller the gap between PCTUSED and
    PCTFREE, the more number of blocks on the free
    list even if none of them have enough space to
    accommodate most inserts.
  • . To avoid the bottleneck, leave some slack space
    b/n PCTFREE and PCTUSED

72
Indexing in Oracle
  • There are some differences between indexing
    theory we discussed in this chapter and those
    implemented in Oracle.
  • Oracle does not have Primary index as we
    discussed. The index created by the primary key
    clause is a dense secondary index. The reason
    is Oracle does not sort the rows based on the
    value of primary key even though you create the
    primary key. Rows are just appended as you
    insert into a table.

73
  • Oracle automatically creates a dense secondary
    index when PRIMARY KEY or UNIQUE clause is used
  • . CREATE TABLE employee (SSN CHAR(9) PRIMARY
    KEY ..)
  • . ALTER TABLE employee ADD CONSTRAINT emp_PK
    PRIMARY KEY (SSN).

74
  • - If you drop the primary key, the unique index
    is automatically dropped.
  •  
  • -  However, if you sort the data by the primary
    key using CTAS (CREATE TABLE xyz AS SELECT ..),
    the index on the primary key will be a primary
    index as discussed in the book. Note that a
    primary index will speed up your query processing
    based on the primary key.

75
  • -  Indexes in Oracle are stored in a B tree with
    doubly-linked leaf blocks.
  • -   An index may have max. 16 column values in a
    composite index.
  • -   Hashing is supported only in the clustering
    tables.
  • -   Recent version of Oracle also supports Bitmap
    indexes for OLAP or data warehousing
    applications. Bitmap indexes are only for
    low-cardinality attributes (that has a small
    number of unique values) such as SEX, REGION,
    STATE, or CODE-like attributes.

76
Index Usage
  • - Indexes speed up query processing on larger
    tables, but could cause overhead for volatile
    files
  • - Small tables ( rows lt1000 or less than 8
    blocks) may not be beneficial with indexes
  • Users do not have to specify indexes in queries
  • Indexes are typically automatically used when
    mentioned in WHERE clause
  • Indexes are stored in data dictionary called
    USER_INDEXES (IND).
  • Indexes are automatically updated when the table
    is updated
  • - Indexes are stored in data dictionary called
    USER_INDEXES (IND)
  • - Indexes can be created for any data type except
    LONG

77
When Indexes are Ignored?
  • -         When there is no WHERE clause
  • -         When WHERE clause contains IS NOT NULL
    or IS NULL
  • -         When the comparison is ltgt (or !)
  • When an indexed column is modified by a function
    such as SUBSTR or

78
Storage Allocation in Oracle
  • - Oracle only allocates whole blocks, not parts
    of blocks
  • -  Oracle allocates sets of blocks, usually I
    multiple of five blocks
  • -  Oracle may allocate larger or smaller sets of
    blocks depending on the available free space in
    the tablespace.
  • -   Oracle uses 7 bytes for a DATE values and 3
    bytes for a NUMBER. IN storage estimation, use 8
    bytes for a DATE and 4 bytes for a NUMBER
    including column overhead.

79
INDEX Command in Oracle
  • CREATE UNIQUE BITMAP INDEX index-name ON
    table-name (attribute ASC DESC list)
    CLUSTER cluster_name
  • TABLESPACE lttablespacegt
  • INITRANS ltintegergt
  • MAXTRANS ltintegergt
  • PCTFREE ltintegergt
  • PCTUSED ltintegergt
  • STORAGE ltstorage clausegt

80
  • ASC DESC
  • - ASC and DESC means ascending and descending
  • - This is only for DB2 compatibility, but has no
    effect
  • - Indexes in Oracle are always ascending
  •  
  • TABLESPACE
  • - Specifies the table space in which the index is
    created
  • - Store the table and index in different
    tablespaces
  • - This allow concurrent access to the table and
    the index
  •  
  • INITRANS
  • - Number of transaction entries allocated to each
    block of the index
  • - Every transaction that modifies the block
    requires a transaction entry
  • - The minimum and default value is 2.
  • - 23 INITRANS bytes are used as a block header.

81
  • MAXTRANS
  • - Max. number of transaction entries allocated to
    each block of the index
  • - Max number of concurrent transactions that can
    update an index block
  • - Max and default is 255.
  •  
  • PCTFREE value1
  • - The percentage of free space left for updates
    on the index page
  • - The default value is 10 (i.e., each index page
    is 90 full)
  • - Use higher value for a volatile file
  • - Use a lower value for a stable file
  •  
  • PCTUSED
  • - The percentage of a block that Oracle attempts
    to fill before it allocates another block.
  • - The default is 40.
  •  
  • STORAGE clause
  • - If the index will occupy less than 25 blocks,
    then one extent for the segment will be allocated
  • Extents are contiguous blocks

82
  • - INITIAL
  • -- First extent in bytes. K or M can be used
  • -- The default value is 5 data blocks. Minimum
    is 2.
  • - NEXT
  • -- The size of the next extent that will be
    allocated to the index.
  • - PCTINCREASE
  • -- The percent by which each extent grows over
    the previous extent.
  • - MINEXTENTS
  • -- The total number of extents that are
    created when a segment is created
  • One segment corresponds to database objects
  • - MAXEXTENTS
  • -- The total number of extents that can be
    created for the segment.

83
EXAMPLE
  • CUST (Cust_num, Fname, Lname, Address, City,
    State, Zip, Phone)
  • Creating an index on a single attribute
  • CREATE INDEX Cust_Lname_IDX ON Cust (Lname)
  • / Cust_Lname_IDX is a user-defined index file
    name /
  • / Index name can be up to 30 char long /
  • / Good naming convention TableName_AttributeName
    /
  • / Oracle creates a B tree index using Lname. /

84
  • Creating an index on multi-attribues
  • CREATE INDEX Cust_Lname_Zip_IDX ON Cust
    (Cname, Zip ASC)
  • / You can have up to 16 attributes in one index
    /
  • / You can have columns in only one table /
  • / Entries must not exceed 1/3 of the data block
    size /
  • / Put the most restrictive column first /
  • / Put the most frequently queried column first
    /
  •  
  • Creating a unique index to enforce a unique
    constraint
  • CREATE UNIQUE INDEX Cust_num_IDX ON Cust
    (Cust_Num)
  • / Declaring an attribute as PRIMARY KEY or
    UNIQUE automatically
  • create a unique index /

85
  • Creating an index with parameters
  • CREATE INDEX Cust_Lname_IDX ON Cust (Lname)
    TABLESPACE cust_data_space
  • PCTFREE 20
  • PCTUSED 50
  • STORAGE ( INITIAL 5K
  • NEXT 5K
  • MINEXTENTS 1 MAXEXTENTS 50
  • PCTINCREASE 0)
  • Dropping an index
  • DROP INDEX Cust_Lname_Zip_IDX
Write a Comment
User Comments (0)
About PowerShow.com