I/O Trap - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

I/O Trap

Description:

Access method (table scan / indexes...) Contention between ... 3 - Search begins either in full scan or with index until data found. Designing DB for reading ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 35
Provided by: freder4
Category:
Tags: scan | trap

less

Transcript and Presenter's Notes

Title: I/O Trap


1
I/O Trap
  • Reading existing data
  • Changing existing data
  • Update existing records
  • Adding new records
  • Deleting records
  • All these involve going to disk gt slowest device
    by considerable margin
  • Minimise / reschedule physical I/O

2
Issues in Database Performance
  • Performance in Read / write are hardware issues
    gt throw money at it
  • Performance of DB ability of engine to locate
    data
  • Factors affecting speed of retrieval
  • Cache (sizing of objects)
  • Access method (table scan / indexes)
  • Contention between processes
  • Indirect processes (roll back, archiving)

3
Reading data in Oracle
  • 1 determine how to access block where data is
    located
  • read data dictionary stored in separate part of
    DB
  • DD may be loaded up in cache (aka row cache) to
    limit I/O activity
  • 2 - DD indicates preferred access method for
    block
  • B tree index, partitioning, hash clusters etc
  • 3 - Search begins either in full scan or with
    index until data found

4
Designing DB for reading
  • Supply methods for high precision access to data
  • But some queries will defeat the strategies
  • E.g. credit cards transactions monthly report
    of scattered items
  • No solution take off-line

5
Changing data
  • Oracle makes hard work of changes
  • Rollback data (immediate)
  • Log files (long term)
  • Changed blocks read and updated in buffer
  • Released to disk as buffer is cleared
  • But rollback info generate most I/O operations
  • In sensitive environments, simultaneous archiving
    makes it worse (ARCHIVELOG mode)

6
Indexing Problems
  • Super-fast indexes need updating as data is
    changed gt DB slows down.
  • More complex index more complex update
    mechanism more rollback .
  • DB physical structure degrades, so does index (eg
    split blocks)
  • Performance decreases over time
  • Rebuild needed (which interferes with operations)

INDEX_STATS tells you how big the index has grown
7
Side effects
  • Definition of DB consistency
  • A reading process may need an older version of
    the data
  • Need to create a private version of the data
  • Cl processes that should be Read only require
    writes
  • Attempt to describe all required operations in
    finishing to execute query on old data

8
Effects of read consistency
  • Must find an older version of data
  • Must apply roll back
  • Must find old roll back block (I/O)
  • Roll back index (I/O)
  • Find data (I/O)
  • Roll back data
  • Read old data
  • Reverse all changes
  • Significant I/O implications buffer full of old
    stuff

9
Conclusions
  • Writers and readers DO interfere with each other
  • The mechanisms used by Oracle to bypass locking
    have performance side effects
  • Performance come with minimising I/O
  • i.e. with good access techniques
  • Precision of data location
  • Physical proximity of related data (cf caching)
  • However Techniques to reduce I/O numbers tend to
    reduce the speed of access!

10
Indexing example (see figure)
  • Alphabetic search using B tree index for name
    Oscar Smith
  • Split the table in sections and read until find
    start overshoot letter S Smith, N is the one
  • Then look for page with Smith, N in header
  • Then scan for actual entry in index
  • Read pointer
  • Move to table
  • Read

11
About Btree indexes
  • Root and Branch blocks approx 2 of index
    (small)
  • In frequent hit situations, both blocks loaded in
    Data Buffer all the time
  • Then only 2 I/O may be required
  • One to read leaf block
  • One to read the table
  • In practice, read in index and in table may
    require reading several blocks
  • E.g Several similar names
  • Index spans two (or more) leaf blocks
  • Each record is in a different data block

12
Creating and Using Indexes
  • Important for live access, but even more for
    querying with multiple tables
  • Value matching is costly process in RDB
  • No pointers
  • Connection purely on comparison basis only
  • One row with all other rows
  • If link between 2 huge tables, perf is low
  • All RDBs use some form of indexing
  • Some complexity involved as index can reduce
    physical I/O while increasing logical I/O (ie CPU
    time)

13
Btree indexing
  • Creating an index means creating a table with X1
    columns
  • X number of columns in index
  • Rowid (added field) table block row
  • Index is then copied into consecutive blocks
  • PCTFREE function leaves space for growth of data
    (but high value will generate many leaf blocks)
  • Pointer is added to previous and next leaf blocks
    in header of block

14
Btree indexing (2)
  • Then branch layer is built
  • If index gt one block
  • Collect all first entries block address of each
    leaf block
  • Write down into the first level branch block
    (packed)
  • If branch block is full, initiates second level
    of branch blocks etc.
  • Room is saved in branch blocks
  • No forward and backward pointer in branch blocks
  • Entries are trimmed to the bare minimum
  • First entries are omitted
  • See figure 6.1

15
Syntax
  • CREATE INDEX name ON table name (field1, field2
    )
  • PCTFREE 80
  • Oracle has many utility programmes to assess the
    performance of indexes use INDEX_STATS (see
    handout)
  • Practical problem is it easy to create a new
    index for a large table? NO!

16
Updating indexes
  • Index entries are NEVER changed
  • Marked as deleted and re-inserted
  • Space made available cannot be used until after
    index is re-built
  • Inserts that dont fit split the block (rarely
    50/50!)
  • If a blocks becomes empty, it is marked as free,
    but is never removed
  • Also, blocks never merge automatically

17
Some problems
  • Some situations cannot be addressed with indexes
  • e.g. In a FIFO processing situation (e.g. a
    queue), indexes will prove counterproductive
  • Index may grow to stupid proportions even with
    small error rate (unsuccessful processing of
    data)
  • Every time a transaction is added or processed
    (deleted) the index must change

18
Alternative Bitmap indexing
  • Imagine following query in huge table
  • Find customers living in London, with 2 cars and
    3 children occupying a 4 bed house
  • Index not useful why?
  • Too big
  • If query changes in any way gtnew index needed
  • Maintaining a set of indexes for each query would
    just be too costly
  • Use a bitmap (see table 6.1, 6.2 and 6.3)

19
Bitmap indexes (2)
  • Special for data warehouse type DBs
  • Build one bitmap for each relevant parameter
  • Combine bitmaps using the and SQL keyword
  • Also possible to use noting of bitmap (see
    table 6.4, 6.5)

20
Key points to remember
  • What is the key advantage of a bitmap index?
  • What situation does it best suit ?
  • Bitmaps can also be packed by Oracle compression
    features
  • But size is unpredictable why?

21
Example
  • Table with 1,000,000 rows
  • Bitmap on one column that can contain one of 8
    different values (e.g. city names)
  • Data is such that all same city together 125,000
    times
  • Write the bitmap
  • Imagine what compression can be achieved
  • Data is such that cities are in random order, but
    same number of each
  • Same questions

22
solution
  • First scenario
  • Bitmap for first city 125,000 ones and 875,000
    zeros trimmed off
  • Size 125,000 bits or approx 18Kbytes
  • Full bitmap 156 Kbytes
  • Second scenario
  • Bitmap for first city sequences of 1s and 7
    zeros, repeated 125,000 times
  • Size 1,000,000 bits or approx 140 Kbytes
  • Full bitmap 1.12 Mbytes
  • But BTree index for such data would be around
    12MB

23
Conclusion
  • Bitmap indexes work best when combined
  • They are very quick to build
  • Up to a million rows for 10 seconds
  • Work best when limited number of values when
    high repetitions
  • Best way to deal with huge volumes gt make
    drastic selection of interesting rows before
    reading the table
  • Warning one entry in a Bitmap hundreds of
    records gt locking can be crazy (OLTP systems)
  • gt for datawarehouse type applications (no
    contention)

24
What oracle says about it
  • Use index for queries with low hit ratio or when
    queries access lt 2 - 4 of data
  • Index maintenance is costly so index just in
    case is silly
  • Must analyse the type of data when deciding what
    kind of index
  • Do NOT use columns with loads of changes in an
    index
  • Use indexed fields in where statement
  • Can also write queries with NO_INDEX

25
Administering indexes
  • Indexes degrade over time
  • Should stabilise around 75 efficiency, but dont
  • Run stats
  • Analyse index NAME validate structure
  • Analyse index NAME compute statistics
  • Analyse index NAME estimate statistics sample 1
    percent

See table 6.6
26
Partitioned tables
  • Partitions / partitioning / partitioned tables
  • For very large tables
  • Improve querying
  • Easier admin
  • Backup and recovery easier
  • Optimiser knows when partitioning used
  • Can use in SQL also

27
Creating a PT
  • Create table FRED (
  • ID number
  • name varchar2(25)
  • age number
  • constraint fred_pk primary key (ID)
  • )
  • partition by range (age)
  • (partition PART1 values less than (21)
  • partition PART2 values less than (40)
  • partition PART3 values less than (maxvalue)

28
Warning
  • Specification of partition is exclusive
  • E.g. partition by range (name)
  • (partition part1 values less than (F) implies
    that f is excluded
  • Maxvalue is a general term to pick up anything
    that failed so far
  • Works for text as well as number

29
Hash partition
  • Only in Oracle 8i and above
  • Uses a numerical algorithm based on partition key
    to determine where to place data
  • Range partition consecutive values together
  • Hash consecutive values may be in different
    partitions
  • Also gives more partitions reduces the risk of
    contention

30
What is Hash?
  • Imagine 8GB table split in 8 / 1 GB
  • No intuitively clever way to split data
  • Or obvious way is totally imbalanced
  • 1 partition 7BG 7 140MB
  • Huge variations in performance
  • Randomise breakdown of data so objects of similar
    size
  • Select one column
  • Select number of chuncks
  • Oracle does the rest!

31
Mechanics of hashing
  • Each record is allocated into a bucket based on
    key value e.g. Name Joe
  • Applying the hashing function to the value Joe
    uniquely returns the bucket number where the
    record is located
  • E.g. using prime number
  • divide KEY by a prime number
  • If text, translation into numeric value using
    ASCII code
  • use remainder of the division address on the
    disk
  • if record already at same address - pointer to
    overflow area.

32
Hash partition - SQL
  • Create table FRED (
  • Name varchar2(25) primary key,
  • Age number,
  • Years abroad number
  • )
  • Partition by hash (age)
  • Partitions 2
  • Store in (Part1_fred, Part2_fred)

(Not compulsory)
33
Sub-partitions
  • Create table FRED (
  • Name varchar2(25) primary key,
  • Age number,
  • Years abroad number
  • )
  • Partition by range (years abroad)
  • Subpartition by hash (name)
  • Subpartitions 5
  • (partition Part1 values less than (1)
  • partition Part2 values less than (3)
  • partition Part3 values less than (6)
  • partition Part4 values less than (MAXVALUE))

34
Indexing partitions
  • Performance requirements may mean Partitioned
    tables should be indexed (separate issue)
  • Create index FRED_NAME on FRED (name)
  • Local
  • Partitions (Part1, Part2, Part3, Part4)
  • Local means create separate index for each
    partition of the table
  • Alternative is to create a global index with
    values from different partitions
  • Global indexes cannot be created for Hash
    partitions
Write a Comment
User Comments (0)
About PowerShow.com