Title: Overview of Indexing
1Overview of Indexing
- Chapter 8 Part II.
- Introduction to indexing
- First glimpse at indices and workloads
2Understanding the Workload
- For each query in workload
- Which relations does it access?
- Which attributes are retrieved?
- Which attributes are involved in selection/join
conditions? - How selective are these conditions likely to be?
- For each update in workload
- Which attributes are involved in selection/join
conditions? - How selective are these conditions likely to be?
- The type of update (INSERT/DELETE/UPDATE), and
the attributes that are affected.
3Choice of Indexes
- What indexes should we create?
- Which relations should have indexes?
- What field(s) should be the search key?
- Should we build several indexes?
- For each index, what kind of an index should it
be? - Clustered vs. unclustered? Hash vs. tree?
- Clustering must be used sparingly and only when
justified by frequent queries that benefit from
clustering. - At most one index can be clustered. (Why?)
- Consider utilizing index-only evaluation. (e.g.,
avg(age))
4Choice of Indexes One Approach
- Consider most important queries in turn.
- Consider best plan using current indexes, and
see if a better plan possible with additional
index. If so, create it. - Consider impact on updates in workload!
- Trade-off Indexes can make queries go faster,
updates slower. Require disk space, too. - Obviously, we must understand how DBMS evaluates
queries and creates query evaluation plans
5Choice of Indexes Simple Approach
- For now, we discuss simple 1-table queries.
6Index Selection Guidelines
- Attributes in WHERE clause are candidates for
index keys. - Exact match condition suggests hash index.
- Range query suggests tree index.
- Clustering is especially useful for range queries
- Clustering can also help equality queries if
there are many duplicates.
7Index Selection Guidelines
- Multi-attribute search keys considered when WHERE
clause contains several conditions. - Order of attributes is important for range
queries. - Such indexes can sometimes enable index-only
strategies for important queries. - Question For index-only strategies, is
clustering important ?
8Index Selection Guidelines
- Try to choose indexes that benefit as many
queries as possible. - Since only one index can be clustered per
relation, choose it based on important queries
that would benefit the most from clustering.
9Examples of Clustered Indexes
SELECT E.dno FROM Emp E WHERE E.agegt40
- B tree index on E.dno?
- B tree index on E.age ?
- Trade-offs
- How selective is the condition? (all gt 40?) or
(only some gt 40) - Is the index clustered?
10Examples of Clustered Indexes
SELECT E.dno, COUNT () FROM Emp E WHERE
E.agegt10 GROUP BY E.dno
- Consider the GROUP BY query.
- Index on E.age ? E.dno ?
- Issues
- Use Index on E.age ?
- If many tuples have E.age gt 10, using E.age index
and sorting the retrieved tuples may be costly. - Use Index on E.dno ?
- Clustered E.dno index may be better!
- What about without WHERE condition?
11Examples of Clustered Indexes
SELECT E.dno FROM Emp E WHERE E.hobbyStamps
- B tree index on E.hobby?
- NOTE is equality query.
- NOTE may contain many duplicates.
- Clustered or Unclustered index ?
- CONCLUDE Clustering on E.hobby helps!
- QUESTION what if index is unclustered ?
- CONCLUDE may prefer to do a full scan.
12Indexes with Composite Search Keys
Composite Search Keys Search on combination of
fields (sal and age).
11,80
11
12
12,10
name
age
sal
12,20
12
bob
10
12
13,75
13
cal
80
11
ltage, salgt
ltagegt
joe
12
20
sue
13
75
10,12
10
20
20,12
Data records sorted by name
75,13
75
80,11
80
ltsal, agegt
ltsalgt
Data entries in index sorted by ltsal,agegt
Data entries sorted by ltsalgt
13Equality and Composite Search Keys
- Equality query Every field value is equal to a
constant value. - Examples
- age20
- sal 75
- age20 and sal 75
- sal 75 and age20
11,80
11
12
12,10
name
age
sal
12,20
12
bob
10
12
13,75
13
cal
80
11
ltage, salgt
ltagegt
joe
12
20
sue
13
75
10,12
10
20
20,12
Data records sorted by name
75,13
75
80,11
80
ltsal, agegt
ltsalgt
Data entries in index sorted by ltsal,agegt
Data entries sorted by ltsalgt
14Composite Search Keys
- If retrieve Emp records with age30 AND sal4000
- Index on ltage,salgt would be better than an index
on age or an index on sal. - Choice of index key orthogonal to clustering
15Ranges and Composite Search Keys
Examples of composite key indexes using
lexicographic order.
- Range query Some field value is not a constant
but a range. - Examples
- age12 and sal gt 10
- age 12
11,80
11
12
12,10
name
age
sal
12,20
12
bob
10
12
13,75
13
cal
80
11
ltage, salgt
ltagegt
joe
12
20
sue
13
75
10,12
10
20
20,12
Data records sorted by name
75,13
75
80,11
80
ltsal, agegt
ltsalgt
Data entries in index sorted by ltsal,agegt
Data entries sorted by ltsalgt
16Composite Search Keys
- If condition is 20ltagelt30 AND 3000ltsallt5000
- Clustered tree index on ltage,salgt or ltsal,agegt is
best. - If condition is age30 AND 3000ltsallt5000
- Clustered ltage,salgt index much better than
ltsal,agegt index! - Composite indexes are larger, updated more often.
17Index-Only Plans
- Answer a query without retrieving actual tuples
- Is that possible ?
- If index with suitable information is available.
- Why is it a good idea ?
18Index-Only Plans
SELECT D.mgr FROM Dept D, Emp E WHERE
D.dnoE.dno
ltE.dnogt
SELECT D.mgr, E.eid FROM Dept D, Emp E WHERE
D.dnoE.dno
- A number of queries can be answered without
retrieving any tuples from one or more of the
relations involved if a suitable index is
available.
ltE.dno,E.eidgt
Tree index!
SELECT E.dno, COUNT() FROM Emp E GROUP BY
E.dno
ltE.dnogt
SELECT E.dno, MIN(E.sal) FROM Emp E GROUP BY
E.dno
ltE.dno,E.salgt
Tree index!
ltE. age,E.salgt or ltE.sal, E.agegt
SELECT AVG(E.sal) FROM Emp E WHERE E.age25
AND E.sal BETWEEN 3000 AND 5000
Tree!
19Index-Only Plans
Does index-only evaluation make sense?
SELECT E.dno, COUNT() FROM Emp E GROUP BY
E.dno
ltE.dnogt ?
ltE.dnogt ? ltE.salgt ? ltE.dno,E.salgt ?
SELECT E.dno, MIN(E.sal) FROM Emp E GROUP BY
E.dno
SELECT AVG(E.sal) FROM Emp E WHERE E.age25
AND E.sal BETWEEN 3000 AND 5000
ltE. age,E.salgt or ltE.sal, E.agegt?
Tree index!
20Index-Only Plans Multi-Key Index
- PROS
- The chance for index-only evaluation is
increased. - CONS
- - Index size larger.
- - Update response for any field.
21Index-Only Plans
- Tree index on ltdno,agegt, or on
- ltage,dnogt
- Which is better?
SELECT E.dno, COUNT () FROM Emp E WHERE
E.age30 GROUP BY E.dno
22Index-Only Plans
- Tree index on ltdno,agegt, or on
- ltage,dnogt
- Which is better?
SELECT E.dno, COUNT () FROM Emp E WHERE
E.age30 GROUP BY E.dno
SELECT E.dno, COUNT () FROM Emp E WHERE
E.agegt30 GROUP BY E.dno
What if we consider the second query?
23Index-Only Plans Multiple Relations
ltE.dnogt Or ltD.dnogt Or ltD.mgrgt
SELECT DISTINCT ( D.mgr ) FROM Dept D, Emp
E WHERE D.dnoE.dno
24Index-Only Plans
ltE.dnogt Or ltD.dnogt Or ltD.mgrgt
SELECT D.mgr FROM Dept D, Emp E WHERE
D.dnoE.dno
SELECT D.mgr, E.eid FROM Dept D, Emp E WHERE
D.dnoE.dno
ltE.dno,E.eidgt
Tree index!
25Summary
- Understanding nature of workload for application,
and the performance goals is essential to
developing a good design. - What are the important queries and updates?
- What attributes/relations are involved?
26More Summary
- Indexes must be chosen to speed up important
queries - Index maintenance overhead on updates to key
fields. - Choose indexes that can help many queries, if
possible. - Build indexes to support index-only strategies.
- Clustering is an important decision only one
index on a given relation can be clustered! - Order of fields in composite index key can be
important.