Title: Chapter 3. Tuning Indexes
1Chapter 3. Tuning Indexes
- May 2002
- Prof. Sang Ho Lee
- School of Computing, Soongsil University
- shlee_at_computing.soongsil.ac.kr
2Type of Queries (1)
- A point query returns at most one record based on
an equality selection - Select name from employee where id 8478
- A multipoint query returns several records based
on an equality selection - Select name from employee where salary 40000
- A range query returns a set of records whose
values lie in an interval - Select name from employee where salary gt 50000
and salary lt 60000 - A prefix match query on an attribute or sequence
of attributes X Specifies only a prefix of X - last name Gates and first name like G
3Type of Queries (2)
- An extremal query obtains a set of records whose
value on some attribute is a minimum or maximum - Select name from employee where salary
MAX(select salary from employee) - An ordering query displays a set of records in
order of the value of some attribute(s) - Select from employee order by salary
- A grouping query partitions the results of a
query into groups - Select dept, AVG(salary) from employee group by
dept - A join query links two or more tables
- Select employee.ssnum from employee,
studentwhere employee.ssum student.ssnum
4Key Types
- Two kinds of keys with respect to a table T
- A sequential key keys value is monotonic with
insertion order - A non-sequential key keys value is unrelated to
the insertion order to table
5Review of Index Terminology
- Index data structure (Btree, Hash, ISAM)
pointers to data - ISAM structure Balanced tree structure with a
predetermined number of levels. Interior nodes
of an ISAM structure never change, but there may
be overflow chain at the leaves. Use ISAM when
range queries are important and there are few
updates - Sparse index one pointer per data page
- Dense index one pointer per data record
- Clustering index may be sparse or dense and
implies a data organization - e.g. Btree on A, data records sorted on A
- Non-clustering index enforces nothing about
data organization, so must be dense
6ISAM (Index Sequential Access Method)
- These structures are similar to B-trees but are
optimized for infrequently changing data - Each leaf page maps to a track and sibling leaves
are on the same or neighboring tracks - If an insertion occurs to a full page, use
overflow chaining
In ISAM structures, tree is fixed but leaves may
overflow, forming chains
7Advantages/Disadvantages in ISAM
- Advantages
- Support range queries even better than B-trees,
because the utilization of each page often near
100, reducing the number of pages that need to
be retrieved - Reduce locking overhead, because nonleaf nodes of
the structure are never locked - Disadvantages
- Sequential keys cause concurrency bottleneck on
ISAM structures when there are many inserts - Whether inserts are sequential or not, ISAM
structures will overflow under heavy insert
traffic - This structures also suffer from large key sizes
8Clustering vs. Non-clustering
- Clustering index may be sparse (up to the DBMS
implementer) - Good for range (e.g., R.A between 5 and 15) and
prefix queries (e.g., R.Name like Sm) - Near key values in data structures correspond to
near tuples - Good for concurrency Usually
9Clustering Indexes and Concurrency Control
- If no clustering index, then insertions occur at
end of a heap (file organized by time of
insertion)Concurrent insertions will then
conflict on last page of heap - If there is a clustering index, e.g. on
SocSecNum, then consecutive insertions will
likely to far apart.Low contention. - However, if key value proportional to time of
insertion, then clustering index based on B-tree
causes a problem.WHY ???Would hashing make any
difference???
10Clustering Index Bad News
- Inserts tend to be placed in the middle of the
table. This can cause overflows, destroying the
benefits of clustering - Similarly for updates
- So, may be a good idea to use fairly low page
utilization when using a clustering index
11Non-clustering Index
- Non-clustering index data structure but no
imposition on structure of table. May have
several per table - Dense, so some queries can be answered without
access to table. For example, assume a
non-clustering index on attributes A, B, and C of
RSelect B, C from R where A 5 - Good for point queries and for selective
multi-point, and for extremal queries. May or
may not be useful for join queries.
12Non-clustering Indexes and Selectivity Example 1
- Pages are 4Kbytes
- Attribute A takes on 20 different values
- Query is multi-point on A
- There is a non-clustering index on A
- If record is 50 bytes long, there are 80 records
per page. Nearly every page will have a matching
record - Dont create non-clustering index
13Non-clustering Indexes and Selectivity Example 2
- Pages are 4Kbytes
- Attribute A takes on 20 different values
- Query is multi-point on A
- There is a non-clustering index on A
- If record is 2000 bytes long, there are 2 records
per page. Only 1 in ten pages will have a
matching record. - Create non-clustering index
14Quantitative Conclusions
- F number of records per page
- D number of different values of attribute
- P number of pages prefetched when performing a
scan - If D lt F P, then no non-clustering index
- Otherwise, create non-clustering index provided
multipoint queries on attribute are frequent - Practice Derive this.
15Composite Indexes (1)
- An index based on more than one attribute
- Clustering or non-clustering
- The poor performance in inverse order to
composite index - City(name, latitude, longitude, population)
- clustering composite index (latitude, longitude)
select name from city where population gt
10000 and latitude 22 and longitude gt 5 and
longitude lt 15
select name from city where population gt
10000 and longitude 22 and latitude gt 5 and
latitude lt 15
16Composite Indexes (2)
- Benefits
- A dense composite index can something answer a
query completely - A query on all attributes of a composite index
will likely return far fewer records than a query
on only some of those attributes - A composite index is an efficient way to support
the uniqueness of multiple attributes - A composite index can support kinds of
geographical queries - Disadvantages
- A large key size
- An update to any of its attribute will cause the
index to be modified
17Index Types Offered (1)
- DEC RDB
- Clustering dense B-tree, sparse hash
- Non-clustering B-tree
- IBM DB2, SQL/DS, OS/2
- Clustering dense B-tree
- Non-clustering B-tree
- Oracle
- Clustering dense B-tree, sparse hash
- Non-clustering B-tree
18Index Types Offered (2)
- Ingres
- Clustering dense B-tree, sparse hash, sparse
ISAM - Non-clustering B-tree, hash, ISAM
- Sybase
- Clustering sparse B-tree
- Non-clustering B-tree
19General Care and Feeding
- Here are some maintenance tips on indexes
- Give them a face lift Eliminate overflow chains
on index or table - Drop indexes when they hurt performance.(Rebuild
clustering indexes after sorting first)Example
During a batch insertion, you may be better off
having no secondary indexes - Check query plan
- Run the catalog statistics update package
regularly
20Reasons System Might Not Use an Index
- Catalog may not be up to dateOptimizer may
believe table is too small - Query may be badly specified. For example, in
some systems (e.g. Oracle v6 and earlier) - Select from employee where salary/12 gt
4000would not use salary index whereas
following wouldSelect from employee where
salary gt 4000 12
21Reasons System Might Not Use an Index (2)
- The use of a string functionSelect from
employeewhere substr(name, 1, 1) G - A comparison with NULLSelect from employee
where salary is null - The use of a bind variable
- Apply a different type to corresponded tables
attribute type
22Scenario 1
- Employee(ssnum, name, dept, manager, salary)
- There are no updates. Here are queries. Which
indexes should be established? - Count all the employees that have a certain
salary (frequent) - Find the employees that have the maximum (or
minimum) salary within a particular department
(rare) - Find the employee with a certain social security
number (frequent)
23Scenario 1 Action
- Non-clustering index on salary, since the first
query can be answered solely based on the
non-clustering index on salary - Sparse clustering index on social security number
if the employee tuples are small, because a
sparse index may be a level shorter than a dense
one - Non-clustering composite index on (dept, salary)
using a B-tree for second query, should it become
more important
24Scenario 2
- The Employee table has no clustering indexes
- Performs poorly when there are bursts of inserts
- Locking is page-based
25Scenario 2 Action
- Employee is organized as a heap, so all inserts
to employee occur on the last page, making that
page a locking hot spot - Find some way to smooth out the bursts of inserts
- Use record-level locking
- Create a clustering index based on hashing on
customer number or social security number.
Alternatively, create a B-tree-based clustering
index on social security number.