Chapter 3. Tuning Indexes - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

Chapter 3. Tuning Indexes

Description:

A point query returns at most one record based on an equality selection ... Give them a face lift: Eliminate overflow chains on index or table ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 26

Provided by: sangh3

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 3. Tuning Indexes

1
Chapter 3. Tuning Indexes

May 2002
Prof. Sang Ho Lee
School of Computing, Soongsil University
shlee_at_computing.soongsil.ac.kr

2
Type of Queries (1)

A point query returns at most one record based on
an equality selection
Select name from employee where id 8478
A multipoint query returns several records based
on an equality selection
Select name from employee where salary 40000
A range query returns a set of records whose
values lie in an interval
Select name from employee where salary gt 50000
and salary lt 60000
A prefix match query on an attribute or sequence
of attributes X Specifies only a prefix of X
last name Gates and first name like G

3
Type of Queries (2)

An extremal query obtains a set of records whose
value on some attribute is a minimum or maximum
Select name from employee where salary
MAX(select salary from employee)
An ordering query displays a set of records in
order of the value of some attribute(s)
Select from employee order by salary
A grouping query partitions the results of a
query into groups
Select dept, AVG(salary) from employee group by
dept
A join query links two or more tables
Select employee.ssnum from employee,
studentwhere employee.ssum student.ssnum

4
Key Types

Two kinds of keys with respect to a table T
A sequential key keys value is monotonic with
insertion order
A non-sequential key keys value is unrelated to
the insertion order to table

5
Review of Index Terminology

Index data structure (Btree, Hash, ISAM)
pointers to data
ISAM structure Balanced tree structure with a
predetermined number of levels. Interior nodes
of an ISAM structure never change, but there may
be overflow chain at the leaves. Use ISAM when
range queries are important and there are few
updates
Sparse index one pointer per data page
Dense index one pointer per data record
Clustering index may be sparse or dense and
implies a data organization
e.g. Btree on A, data records sorted on A
Non-clustering index enforces nothing about
data organization, so must be dense

6
ISAM (Index Sequential Access Method)

These structures are similar to B-trees but are
optimized for infrequently changing data
Each leaf page maps to a track and sibling leaves
are on the same or neighboring tracks
If an insertion occurs to a full page, use
overflow chaining

In ISAM structures, tree is fixed but leaves may
overflow, forming chains
7
Advantages/Disadvantages in ISAM

Advantages
Support range queries even better than B-trees,
because the utilization of each page often near
100, reducing the number of pages that need to
be retrieved
Reduce locking overhead, because nonleaf nodes of
the structure are never locked
Disadvantages
Sequential keys cause concurrency bottleneck on
ISAM structures when there are many inserts
Whether inserts are sequential or not, ISAM
structures will overflow under heavy insert
traffic
This structures also suffer from large key sizes

8
Clustering vs. Non-clustering

Clustering index may be sparse (up to the DBMS
implementer)
Good for range (e.g., R.A between 5 and 15) and
prefix queries (e.g., R.Name like Sm)
Near key values in data structures correspond to
near tuples
Good for concurrency Usually

9
Clustering Indexes and Concurrency Control

If no clustering index, then insertions occur at
end of a heap (file organized by time of
insertion)Concurrent insertions will then
conflict on last page of heap
If there is a clustering index, e.g. on
SocSecNum, then consecutive insertions will
likely to far apart.Low contention.
However, if key value proportional to time of
insertion, then clustering index based on B-tree
causes a problem.WHY ???Would hashing make any
difference???

10
Clustering Index Bad News

Inserts tend to be placed in the middle of the
table. This can cause overflows, destroying the
benefits of clustering
Similarly for updates
So, may be a good idea to use fairly low page
utilization when using a clustering index

11
Non-clustering Index

Non-clustering index data structure but no
imposition on structure of table. May have
several per table
Dense, so some queries can be answered without
access to table. For example, assume a
non-clustering index on attributes A, B, and C of
RSelect B, C from R where A 5
Good for point queries and for selective
multi-point, and for extremal queries. May or
may not be useful for join queries.

12
Non-clustering Indexes and Selectivity Example 1

Pages are 4Kbytes
Attribute A takes on 20 different values
Query is multi-point on A
There is a non-clustering index on A
If record is 50 bytes long, there are 80 records
per page. Nearly every page will have a matching
record
Dont create non-clustering index

13
Non-clustering Indexes and Selectivity Example 2

Pages are 4Kbytes
Attribute A takes on 20 different values
Query is multi-point on A
There is a non-clustering index on A
If record is 2000 bytes long, there are 2 records
per page. Only 1 in ten pages will have a
matching record.
Create non-clustering index

14
Quantitative Conclusions

F number of records per page
D number of different values of attribute
P number of pages prefetched when performing a
scan
If D lt F P, then no non-clustering index
Otherwise, create non-clustering index provided
multipoint queries on attribute are frequent
Practice Derive this.

15
Composite Indexes (1)

An index based on more than one attribute
Clustering or non-clustering
The poor performance in inverse order to
composite index
City(name, latitude, longitude, population)
clustering composite index (latitude, longitude)

select name from city where population gt
10000 and latitude 22 and longitude gt 5 and
longitude lt 15
select name from city where population gt
10000 and longitude 22 and latitude gt 5 and
latitude lt 15
16
Composite Indexes (2)

Benefits
A dense composite index can something answer a
query completely
A query on all attributes of a composite index
will likely return far fewer records than a query
on only some of those attributes
A composite index is an efficient way to support
the uniqueness of multiple attributes
A composite index can support kinds of
geographical queries
Disadvantages
A large key size
An update to any of its attribute will cause the
index to be modified

17
Index Types Offered (1)

DEC RDB
Clustering dense B-tree, sparse hash
Non-clustering B-tree
IBM DB2, SQL/DS, OS/2
Clustering dense B-tree
Non-clustering B-tree
Oracle
Clustering dense B-tree, sparse hash
Non-clustering B-tree

18
Index Types Offered (2)

Ingres
Clustering dense B-tree, sparse hash, sparse
ISAM
Non-clustering B-tree, hash, ISAM
Sybase
Clustering sparse B-tree
Non-clustering B-tree

19
General Care and Feeding

Here are some maintenance tips on indexes
Give them a face lift Eliminate overflow chains
on index or table
Drop indexes when they hurt performance.(Rebuild
clustering indexes after sorting first)Example
During a batch insertion, you may be better off
having no secondary indexes
Check query plan
Run the catalog statistics update package
regularly

20
Reasons System Might Not Use an Index

Catalog may not be up to dateOptimizer may
believe table is too small
Query may be badly specified. For example, in
some systems (e.g. Oracle v6 and earlier)
Select from employee where salary/12 gt
4000would not use salary index whereas
following wouldSelect from employee where
salary gt 4000 12

21
Reasons System Might Not Use an Index (2)

The use of a string functionSelect from
employeewhere substr(name, 1, 1) G
A comparison with NULLSelect from employee
where salary is null
The use of a bind variable
Apply a different type to corresponded tables
attribute type

22
Scenario 1

Employee(ssnum, name, dept, manager, salary)
There are no updates. Here are queries. Which
indexes should be established?
Count all the employees that have a certain
salary (frequent)
Find the employees that have the maximum (or
minimum) salary within a particular department
(rare)
Find the employee with a certain social security
number (frequent)

23
Scenario 1 Action

Non-clustering index on salary, since the first
query can be answered solely based on the
non-clustering index on salary
Sparse clustering index on social security number
if the employee tuples are small, because a
sparse index may be a level shorter than a dense
one
Non-clustering composite index on (dept, salary)
using a B-tree for second query, should it become
more important

24
Scenario 2

The Employee table has no clustering indexes
Performs poorly when there are bursts of inserts
Locking is page-based

25
Scenario 2 Action

Employee is organized as a heap, so all inserts
to employee occur on the last page, making that
page a locking hot spot
Find some way to smooth out the bursts of inserts
Use record-level locking
Create a clustering index based on hashing on
customer number or social security number.
Alternatively, create a B-tree-based clustering
index on social security number.

Write a Comment

User Comments (0)