Query processing and optimization - PowerPoint PPT Presentation

1 / 45

About This Presentation

Title:

Query processing and optimization

Description:

SC(courseid,takes) = ceil( Ntakes/V(courseid, takes) ) = ceil(700/9) = 78 ... B = ceil( 78/64) = 2 pages. Selection s (1/2) Linear search ... – PowerPoint PPT presentation

Number of Views:36

Avg rating:3.0/5.0

Slides: 46

Provided by: nikosriz

Category:

more less

Transcript and Presenter's Notes

Title: Query processing and optimization

1
Query processing and optimization
2
Definitions

Query processing
translation of query into low-level activities
evaluation of query
data extraction
Query optimization
selecting the most efficient query evaluation

3
Query Processing (1/2)

SELECT FROM student WHERE namePaul
Parse query and translate
check syntax, verify names, etc
translate into relational algebra (RDBMS)
create evaluation plans
Find best plan (optimization)
Execute plan

4
Query Processing (2/2)
parser and translator
relational algebra expression
query
optimizer
evaluation engine
evaluation plan
output
data statistics
data
data
5
Relational Algebra (1/2)

Query language
Operations
select s
project p
union ?
difference -
product x
join

6
Relational Algebra (2/2)

SELECT FROM student WHERE namePaul
snamePaul(student)
pname( scidgt00112235(student) )
pname(scoursenameAdvanced DBs((student cid
takes) courseid course) )

7
Why Optimize?

Many alternative options to evaluate a query
pname(scoursenameAdvanced DBs((student cid
takes) courseid course) )
pname((student cid takes) courseid
scoursenameAdvanced DBs(course)) )
Several options to evaluate a single operation
snamePaul(student)
scan file
use secondary index on student.name
Multiple access paths
access path how can records be accessed

8
Evaluation plans

Specify which access path to follow
Specify which algorithm to use to evaluate
operator
Specify how operators interleave
Optimization
estimate the cost of each plan (not all plans)
select plan with lowest estimated cost

9
Estimating Cost

What needs to be considered
Disk I/Os
sequential
random
CPU time
Network communication
What are we going to consider
Disk I/Os
page reads/writes
Ignoring cost of writing final output

10
Operations and Costs
11
Operations and Costs (1/2)

Operations s, p, ?, ?, -, x,
Costs
NR number of records in R
LR size of record in R
FR blocking factor
number of records in page
BR number of pages to store relation R
V(A,R) number of distinct values of attribute A
in R
SC(A,R) selection cardinality of A in R
A key S(A,R)1
A nonkey S(A,R) NR / V(A,R)
HTi number of levels in index I
rounding up fractions and logarithms

12
Operations and Costs (2/2)

relation takes
700 tuples
student cid 8 bytes
course id 4 bytes
9 courses
100 students
page size 512 bytes
output size (in pages) of query which students
take the Advanced DBs course?
Ntakes 700
V(courseid, takes) 9
SC(courseid,takes) ceil( Ntakes/V(courseid,
takes) ) ceil(700/9) 78
f floor( 512/8 ) 64
B ceil( 78/64) 2 pages

13
Selection s (1/2)

Linear search
read all pages, find records that match (assuming
equality search)
average cost
nonkey BR, key 0.5BR
Binary search
on ordered field
average cost
m additional pages to be read
m ceil( SC(A,R)/FR ) - 1
Primary/Clustered Index
average cost
single record HTi 1
multiple records HTi ceil( SC(A,R)/FR )

14
Selection s (2/2)

Secondary Index
average cost
key field HTi 1
nonkey field
worst case HTi SC(A,R)
linear search more desirable if many matching
records

15
Complex selection sexpr

conjunctive selections
perform simple selection using ?i with the lowest
evaluation cost
e.g. using an index corresponding to ?i
apply remaining conditions ? on the resulting
records
cost the cost of the simple selection on
selected ?
multiple indices
select indices that correspond to ?is
scan indices and return RIDs
answer intersection of RIDs
cost the sum of costs record retrieval
disjunctive selections
multiple indices
union of RIDs
linear search

16
Projection and set operations

SELECT DISTINCT cid FROM takes
p requires duplicate elimination
sorting
set operations require duplicate elimination
R ? S
R ? S
sorting

17
Sorting

efficient evaluation for many operations
required by query
SELECT cid,name FROM student ORDER BY name
implementations
internal sorting (if records fit in memory)
external sorting

18
External Sort-Merge Algorithm (1/3)

Sort stage create sorted runs
i0
repeat
read M pages of relation R into memory
sort the M pages
write them into file Ri
increment i
until no more pages
N i // number of runs

19
External Sort-Merge Algorithm (2/3)

Merge stage merge sorted runs
//assuming N lt M
allocate a page for each run file Ri // N pages
allocated
read a page Pi of each Ri
repeat
choose first record (in sort order) among N
pages, say from page Pj
write record to output and delete from page Pj
if page is empty read next page Pj from Rj
until all pages are empty

20
External Sort-Merge Algorithm (3/3)

Merge stage merge sorted runs
What if N gt M ?
perform multiple passes
each pass merges M-1 runs until relation is
processed
in next pass number of runs is reduced
final pass generated sorted output

21
Sort-Merge Example
run
pass
R1
22
Sort-Merge cost

BR the number of pages of R
Sort stage 2 BR
read/write relation
Merge stage
initially runs to be merged
each pass M-1 runs sorted
thus, total number of passes
at each pass 2 BR pages are read
read/write relation
apart from final write
Total cost
2 BR 2 BR - BR

23
Projection

p?1,?2 (R)
remove unwanted attributes
scan and drop attributes
remove duplicate records
sort resulting records using all attributes as
sort order
scan sorted result, eliminate duplicates
(adjucent)
cost
initial scan sorting final scan

24
Join

pname(scoursenameAdvanced DBs((student cid
takes) courseid course) )
implementations
nested loop join
block-nested loop join
indexed nested loop join
sort-merge join
hash join

25
Nested loop join (1/2)

R S
for each tuple tR of R
for each tS of S
if (tR tS match) output tR.tS
end
end
Works for any join condition
S inner relation
R outer relation

26
Nested loop join (2/2)

Costs
best case when smaller relation fits in memory
use it as inner relation
BRBS
worst case when memory holds one page of each
relation
S scanned for each tuple in R
NR Bs BR

27
Block nested loop join (1/2)

for each page XR of R
foreach page XS of S
for each tuple tR in XR
for each tS in XS
if (tR tS match) output tR.tS
end
end
end
end

28
Block nested loop join (2/2)

Costs
best case when smaller relation fits in memory
use it as inner relation
BRBS
worst case when memory holds one page of each
relation
S scanned for each page in R
BR Bs BR

29
Indexed nested loop join

R S
Index on inner relation (S)
for each tuple in outer relation (R) probe index
of inner relation
Costs
BR NR c
c the cost of index-based selection of inner
relation
relation with fewer records as outer relation

30
Sort-merge join

R S
Relations sorted on the join attribute
Merge sorted relations
pointers to first record in each relation
read in a group of records of S with the same
values in the join attribute
read records of R and process
Relations in sorted order to be read once
Cost
cost of sorting BS BR

31
Hash join

R S
use h1 on joining attribute to map records to
partitions that fit in memory
records of R are partitioned into R0 Rn-1
records of S are partitioned into S0 Sn-1
join records in corresponding partitions
using a hash-based indexed block nested loop join
Cost 2(BRBS) (BRBS)

32
Exercise joins

R S
NR215
BR 100
NS26
BS 30
B index on S
order 4
full nodes
nested loop join best case - worst case
block nested loop join best case - worst case
indexed nested loop join

33
Evaluation

evaluate multiple operations in a plan
materialization
pipelining

34
Materialization

create and read temporary relations
create implies writing to disk
more page writes

35
Pipelining (1/2)

creating a pipeline of operations
reduces number of read-write operations
implementations
demand-driven - data pull
producer-driven - data push

36
Pipelining (2/2)

can pipelining always be used?
any algorithm?
cost of R S
materialization and hash join BR 3(BRBS)
pipelining and indexed nested loop join NR HTi

courseid
pipelined
materialized
R
S
scoursenameAdvanced DBs
cid
student
takes
course
37
Query Optimization
38
Choosing evaluation plans

cost based optimization
enumeration of plans
R S T, 12 possible orders
cost estimation of each plan
overall cost
cannot optimize operation independently

39
Cost estimation

operation (s, p, )
implementation
size of inputs
size of outputs
sorting

40
Size Estimation (1/2)

SC(A,R)
multiplying probabilities
probability that a record satisfy none of ?

41
Size Estimation (2/2)

R x S
NR NS
R S
R ? S ? NR NS
R ? S key for R maximum output size is Ns
R ? S foreign key for R NS
R ? S A, neither key of R nor S
NRNS / V(A,S)
NSNR / V(A,R)

42
Expression Equivalence

conjunctive selection decomposition
commutativity of selection
combining selection with join and product
s?1(R x S) R ?1 S
commutativity of joins
R ?1 S S ?1 R
distribution of selection over join
s?1?2(R S) s?1(R) s?2 (S)
distribution of projection over join
pA1,A2(R S) pA1(R) pA2 (S)
associativity of joins R (S T) (R S)
T

43
Cost Optimizer (1/2)

transforms expressions
equivalent expressions
heuristics, rules of thumb
perform selections early
perform projections early
replace products followed by selection s (R x S)
with joins R S
start with joins, selections with smallest result
create left-deep join trees

44
Cost Optimizer (2/2)
pname
ccourseid index-nested loop
scoursenam Advanced DBs
cid hash join
student
takes
course
45
Cost Evaluation Exercise

pname(scoursenameAdvanced DBs((student cid
takes) courseid course) )
R student cid takes
S course
NS 10 records
assume that on average there are 50 students
taking each course
blocking factor 2 records/page
what is the cost of scoursenameAdvanced DBs (R
courseid S)
what is the cost of R scoursenameAdvanced
DBsS
assume relations can fit in memory

46
Summary

Estimating the cost of a single operation
Estimating the cost of a query plan
Optimization
choose the most efficient plan

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Lecture 6: Query Processing; Hurry up! PowerPoint PPT Presentation

Lecture 6: Query Processing; Hurry up! - Lecture 6: Query Processing; Hurry up! Join Algorithms (ctd.) Sort-Merge External Sorting Costs and Complexities Mechanics Parsing Optimization Overview | PowerPoint PPT presentation | free to view

Query processing and optimization PowerPoint PPT Presentation

Query processing and optimization - Query processing and optimization Definitions Query processing translation of query into low-level activities evaluation of query data extraction Query optimization ... | PowerPoint PPT presentation | free to view

Multi-Query Optimization and Applications PowerPoint PPT Presentation

Multi-Query Optimization and Applications - Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay Motivation Queries often involve repeated computation Queries on ... | PowerPoint PPT presentation | free to view

Chapter 15 Algorithms for Query Processing and Optimization PowerPoint PPT Presentation

Chapter 15 Algorithms for Query Processing and Optimization - Typical steps in processing a high-level query. Query in a high-level query language like SQL ... WHERE PNAME= Aquarius' AND PNUMBER=PNO AND ESSN=SSN AND BDATE. ... | PowerPoint PPT presentation | free to view

Top-k Query Processing PowerPoint PPT Presentation

Top-k Query Processing - Top-k Query Processing Optimal aggregation algorithms for middleware Ronald Fagin, Amnon Lotem, and Moni Naor + Sushruth P. + Arjun Dasgupta | PowerPoint PPT presentation | free to view

Query Optimization In Compressed Database Systems PowerPoint PPT Presentation

Query Optimization In Compressed Database Systems - Query Optimization In Compressed Database Systems Zhiyuan Chen and Johannes Gehrke Cornell University Flip Korn AT&T Labs Why Compression? CPU speed outpaces Disk ... | PowerPoint PPT presentation | free to view

Adaptive Query Processing PowerPoint PPT Presentation

Adaptive Query Processing - Actuate. Measure: monitor cardinalities, resource utilization, etc ... Actuate: switch to a new query plan, hopefully without wasting too much work ... | PowerPoint PPT presentation | free to view

Adaptive Query Processing in the Looking Glass PowerPoint PPT Presentation

Adaptive Query Processing in the Looking Glass - ... Challenging meta problems in AQP for continuous queries need to be addressed Larger and more complex plan ... queries need to ... Correlated & skewed data ... | PowerPoint PPT presentation | free to view

Distributed Query Processing PowerPoint PPT Presentation

Distributed Query Processing - Ship tuples in block-wise fashion (batch); smooth burstiness. Query Execution Techniques for DD ... optimizers do not scale for 10,000s of sites ... | PowerPoint PPT presentation | free to view

Query Optimization Over Web Services PowerPoint PPT Presentation

Query Optimization Over Web Services - Query Optimization Over ... /Ki* is minimized Profiling combined with query processing for trying out various chunk ... Let WS1, . . . , WSn be a plan with a ... | PowerPoint PPT presentation | free to view

The State of the Art in Distributed Query Processing PowerPoint PPT Presentation

The State of the Art in Distributed Query Processing - The State of the Art in Distributed Query Processing. by Donald Kossmann ... Client-server distributed DB models. Heterogeneous distributed DB models ... | PowerPoint PPT presentation | free to view

Query Processing and Networking Infrastructures PowerPoint PPT Presentation

Query Processing and Networking Infrastructures - Day 2: Seed some cross-fertilized research. Especially with networking ... Day 2: Research Synergies w/Networking. Queries as indirection, revisited ... | PowerPoint PPT presentation | free to view

Eddies: Continuously Adaptive Query processing PowerPoint PPT Presentation

Eddies: Continuously Adaptive Query processing - Eddies: Continuously Adaptive Query processing R. Avnur, J.M. Hellerstein UCB ACM Sigmod 2000 Problem Statement Context: large federated and shared-nothing databases ... | PowerPoint PPT presentation | free to view

Top-k Query Processing - Top-k Query Processing Optimal aggregation algorithms for middleware Ronald Fagin, Amnon Lotem, and Moni Naor + Sushruth P. + Arjun Dasgupta NRA What if we also want ... | PowerPoint PPT presentation | free to view

Spatial Database Query Processing using Indices PowerPoint PPT Presentation

Spatial Database Query Processing using Indices - Spatial Database Query Processing using Indices Donghui Zhang CCIS Ph.D. Seminar CCIS, Northeastern University Content Spatial Database Selection Query R-tree k-d-B ... | PowerPoint PPT presentation | free to view

Chapter 14 Query Optimization PowerPoint PPT Presentation

Chapter 14 Query Optimization - Chapter 14 Query Optimization Chapter 14: Query Optimization Introduction Catalog Information for Cost Estimation Estimation of Statistics Transformation of ... | PowerPoint PPT presentation | free to view

Understanding MySQL Slow Query Logs for Database Performance PowerPoint PPT Presentation

Understanding MySQL Slow Query Logs for Database Performance - In this PPT, we will look into what the MySQL slow query logs show us. We will also look into configuring it and ways to use it. | PowerPoint PPT presentation | free to view

benefits of social media optimization PowerPoint PPT Presentation

benefits of social media optimization - Social Media Optimization has been creating a buzz in the market since its inception. We provide Social Media Optimization service as well as training program to students and marketing professional in Mumbai, India. Social Media Optimization is the latest tool that has been proven for bringing in referral traffic to a website using various social media tools. | PowerPoint PPT presentation | free to view

What is (SEO) Search Engine Optimization Introduction PowerPoint PPT Presentation

What is (SEO) Search Engine Optimization Introduction - SEO(Search engine optimization) is a Way which helps search engines find and rank your site higher than the Ultimate millions of other sites in response to a search query. http://www.seopitara.com/what-is-search-engine-optimization-seo/ | PowerPoint PPT presentation | free to view

Openxcell conducts a successful webinar on Magento Optimization PowerPoint PPT Presentation

Openxcell conducts a successful webinar on Magento Optimization - Magento optimization is an important aspect of ecommerce. Read more to find out Openxcell’s recent webinar on magento optimization. Read more! | PowerPoint PPT presentation | free to view

Database Performance Tuning and Query Optimization PowerPoint PPT Presentation

Database Performance Tuning and Query Optimization - Chapter 11 Database Performance Tuning and Query Optimization Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel | PowerPoint PPT presentation | free to view

Oracle10g SQL Optimization PowerPoint PPT Presentation

Oracle10g SQL Optimization - Dinesh Das Server Technologies Oracle Corporation Agenda Optimizer Enhancements More comprehensive cost model: includes CPU and I/O cost, dynamic sampling Query ... | PowerPoint PPT presentation | free to view

Complete Magento Store Performance Optimization Process PowerPoint PPT Presentation

Complete Magento Store Performance Optimization Process - Complete Magento 2 Speed Optimization Process to Speed up your eCommerce Store. Our Magento 2 Developers follow each complex strategy to get the best results. read more at: https://www.vihadigitalcommerce.com/speed-up-magento-store-with-performance-optimization-techniques-part-1/ | PowerPoint PPT presentation | free to view

Implementation and Research Issues in Query Processing for Wireless Sensor Networks PowerPoint PPT Presentation

Implementation and Research Issues in Query Processing for Wireless Sensor Networks - 1. Implementation and Research Issues in Query Processing for ... Under the hood: Intelligent query processing: query optimization, power efficient execution ... | PowerPoint PPT presentation | free to view

Identifying SQL Query Slow Performance: MySQL and PostgreSQL PowerPoint PPT Presentation

Identifying SQL Query Slow Performance: MySQL and PostgreSQL - One of the leading factors for bad database performance is slow SQL queries. Slow queries directly impact database performance, which also causes many types of application performance issues. As DBAs, it is critical to monitor, find, and rectify slow queries. Addressing SQL query slow performance is one of the top tasks performed as a part of the database optimization process. In this PPT, we will talk about slow queries and how to handle them for some databases. | PowerPoint PPT presentation | free to view

Speculative Query Processing PowerPoint PPT Presentation

Speculative Query Processing - Train on recent actions of the user. Large learning rate. Probability of joins ... Web queries. Multi-user database. Client collaboration. Use speculation in ... | PowerPoint PPT presentation | free to view