Title: Erhan Erdin
1 Computer Architecture Support for Database
Applications
2Outline
- Introduction
- Methodology of the Experiment
- Analysis of OLTP workloads
- Analysis of DSS workloads
- Conclusion
3Introduction
- Today Database workloads alone motivate the sale
of vast quantities of symmetric multiprocessor
(SMP) machines,
4Introduction
- Unfortunately, due to some challenges,
commercial applications are often ignored in
preference to technical benchmarks, such as
SPEC(Standard Performance Evaluation Corporation) - Reasons
- Complex standardized benchmarks.
- Large hardware requirements for full scale.
- Numerous configuration parameters.
- Lack of useful proprietary information.
5What is SMP
- method of work management that treats all
processors equally - threads that can run concurrently on any
available processor - improves the total throughput of the system
- requires applications that can take advantage of
multi-threaded parallelism
6SMP ARCHITECTURE
7SMP(Continued)
- Advantages of SMP
- High performance
- Simplicity to program
- Easier load balancing
- Disadvantages of SMP
- Low availability
- Low scalability
8Database Workloads
- OLTP(Online transaction processing)
- Ex Airline reservation systems
- DSS(Decision Support Systems)
- Ex Datawarehouse systems
9Characteristics of OLTP and DSS
- OLTP
- uses short, moderately complex queries that read
and/or modify a relatively small portion of the
overall database. - have a high degree of multiprogramming,
- DSS
- typically long-running, moderately to very
complex queries, that scan large portions of the
database in a read-mostly fashion. - The multiprogramming level in DSS systems is
typically - much lower than that of OLTP systems.
10Motivation
- Since SPEC evaluations dont hold for DBMS,
architectural behavior of two standard database
workloads will be investigated in terms of - cycles per instruction (CPI) decomposition,
- cache miss rates,
- branch behavior.
- superscalarness,
- out-of-order execution
-
11Methodology Experimental Platform
- a commodity four-processor Intel-based SMP
server running Windows NT is chosen. -
12(No Transcript)
13IO System Configurations(OLTP)
14IO System Configurations(DSS)
15Software Architecture(OLTP)
- Transaction Processing Councils TPC-C benchmark
16Software Architecture(OLTP)
17Software Architecture(DSS)
- Transaction Processing Councils TPC-D benchmark
- the activity of a wholesale supplier in doing
complex business analysis. - analysis pricing and promotions, market share
study,shipping management, supply and Demand
management, profit and revenue management and
customer satisfaction study. - 17 read-only queries and 2 update queries,
18Software Architecture(DSS)
19Pentium Pro Processor Architecture
20Potential sources of stalls
- misses to the L1 instruction cache
- a branch misprediction
- the instruction mix of the workload
- the out-of-order execution engine
21Measurement Methodology
- NT performance monitor
- Pentium Pro hardware counters.
- Intel tool called emon
22Analysis of OLTP Workloads
- OLTP does short, moderately complex transactions
- small, random I/O operations
- large number of concurrent users, a high degree
of multiprogramming. - database implements locking,logging
- The combination of these tasks
- Large instruction working set
- Larger data footprint
23Experimental Results CPI
24Experimental Results Memory System Behavior
- How do OLTP cache miss rates vary with L2 cache
size?
25Experimental Results Memory System
- What effects do larger caches have on OLTP
throughput and stall cycles?
26Experimental Results Processor Issues
How useful is superscalar issue and retire for
OLTP?
27Experimental Results Processor Issues
- How effective is branch prediction for OLTP?
28Experimental Results Processor Issues
- Is out-of-order execution successful at hiding
stalls for OLTP?
29Experimental Results Multiprocessor Scaling
Issues
- How well does OLTP performance scale as the
number of processors increases?
30 Experimental Results Multiprocessor Scaling
Issues
- How do OLTP CPI components change as the number
of processors is scaled?
31Experimental Results Multiprocessor Scaling
Issues
- How prevalent are cache misses to dirty data in
other processors caches for OLTP?
32Experimental Results Multiprocessor Scaling
Issues
- Is the four-state (MESI) invalidation-based cache
coherence protocol worthwhile for OLTP?
33Experimental Results Multiprocessor Scaling
Issues
- How does OLTP memory system performance scale
with increasing cachesizes and increasing
processor count?
34Analysis of Decision SupportWorkloads
- DSS queries are typically long-running,
moderately to very complex queries, - Scan large portions of the database in a
read-mostly fashion. - Large sequential disk I/O read operations.
- The multiprogramming level in DSS systems is
typically lower than that of OLTP systems.
35Dss Workload
36Experimental ResultsMemory System Behaviour
- How do DSS cache miss rates vary with L2 cache
size?
37 Experimental ResultsMemory System Behaviour
- What impact do larger L2 caches have on DSS
database performance and stall cycles?
38 Experimental ResultsMemory System Behaviour
- How prevalent are cache misses to dirty data in
other processors caches in DSS?
39Experimental ResultsMemory System Behaviour
- Is the four-state (MESI) invalidation-based cache
coherence protocol worthwhile for DSS?
40Experimental ResultsMemory System Behaviour
- How does DSS memory system performance scale with
increasing cache sizes?
41Experimental Results Processor Issues
- How useful is superscalar issue and retire for
DSS?
BEHAVES LIKE OLTP
42Experimental Results Processor Issues
- How effective is branch prediction for DSS?
43Experimental Results Processor Issues
- Is out-of-order execution successful at hiding
stalls for DSS?
44Conclusions for OLTP
- out-of-order execution is only somewhat effective
for this database workload. - increased superscalar width for the out-of-order
engine may be helpful. - Innovation needed in branch prediction algorithms
and hardware structures to better support
database workloads. - caches are effective at reducing the processor
traffic to memory - Three-state (MSI) cache coherence protocol would
be better - the amount of time when the memory system is
unavailable decreases with larger caches,
increases with of processors
45Conclusions for DSS
- out-of-order execution provides potentially more
benefit for DSS than OLTP - DSS performance is less sensitive to L2 cache
size than OLTP performance. - Existing branch prediction schemes are more
effective for this workload. - Increasing the micro-operation retire width in
the Pentium Pros out-of-order RISC core may
provide performance improvements - Dirty misses are less prevalent for DSS than
OLTP.