Title: Admission Control and Request Scheduling in Dynamic ECommerce Web Sites
1Admission Control and Request Scheduling in
Dynamic E-Commerce Web Sites
- Sameh Elnikety, Erich Nahum,
- John Tracey, Willy Zwaenepoel
C.S. Dept. EPFL
IBM T.J.Watson Research Center
2Dynamic Content
1
2
3
3Increasing Online Commerce
- 11B in 3rd Quarter 2002 (up 37)
- 11B in last 2 months of 2002 (up 40)
(Source News.com)
4Two Key Problems
- Overloaded Web Sites
- The Slashdot Effect
- Unanticipated load causes site to crash
- Unresponsive Web Sites
- The Abandoned Shopping Cart
- Unacceptable delays lead to reduced usage
- Reduced usage leads to reduced
How can we address these problems for dynamic
sites?
5Generating Dynamic Content
Database Server
Web Server
Dynamic Content Generator
http
- Consists of 3 Components
- Web Server static content
- Dynamic Content Generator Java servlets
- DB Server state of the business
6Outline
- Motivation Background
- The Gatekeeper Proxy
- Admission Control
- Request Scheduling
- Experimental Environment
- Results
- Summary and Conclusions
7Admission Control
- To prevent overload, perform admission control
- Notion of capacity in the system
- Identify the job ahead of time amount of work
generated - Only let jobs in if they wont overload system
- Once you reach full capacity
- Make jobs wait
- Drop jobs
8The Gatekeeper Transparent Proxy
Web Server
Dynamic Content Generator
Gate Keeper
Database Server
http
- Transparently intercepts DB requests
- connections to the DB via the JDBC interface
- Maintains several measurement-based estimates
- Total capacity of the database
- Current estimate of DB load
- Work generated by each query type
9Estimating Work by Query Type
Web Server
Dynamic Content Generator
Gate Keeper
Database Server
http
- Key Observations
- Queries of the same type take (roughly) the same
time - Different queries differ greatly in execution
time - Any web site has a finite number of query types
- Gatekeeper maintains per-query work estimates
10Service Time Distributions
11Service Time Distributions
12TPC-W Execution Times
(note times are in log scale)
13Estimating System Capacity
Web Server
Dynamic Content Generator
Gate Keeper
Database Server
http
- Query execution time load or work units of a
job - Database capacity max work units before
overload - Rough approximation
- Unit approximates resource usage
- Use binary search to determine capacity
- More elaborate methods (adaptive, control
theoretic, etc)
14Admission Control - Example
15Scheduling Theory and Practice
- Theory SRPT scheduling is best
- SRPT shortest remaining processing time
- Proven to have minimum response time (Schrage 68)
- Perfect prediction of work costs
- Pre-emption has zero overhead, does not affect
service time - Practice not so simple
- Pre-emption isnt free (context switch costs,
cache affinity) - Priorities and inheritance
- Deadlock (e.g., Q1 is holding a lock when
pre-empted) - Gatekeeper
- Use shortest job first (SJF) policy
- Once a job (query) is admitted, it is never
pre-empted
16Request Scheduling - Example
- (0500) (50010)
- 1010 ? 505
- (010) (10500)
- 520 ? 260
500
10
10
500
17Outline
- Motivation Background
- The Gatekeeper Proxy
- Experimental Environment
- Software Hardware
- Metrics Methodology
- Results
- Summary and Conclusions
18Workload Generation
Requests
Responses
- Workload generators typically used for
experimental server performance evaluation - Many available for use with static content
- WebStone, SPECweb, SURGE, httperf, WaspClient
- Only 1 available for e-Commerce TPC-W
19TPC-W
- Transaction Processing Council (TPC-W)
- TPC more known for database workloads like TPC-D
- Provides specification, not source
- Use the implementation from Dynaserver project at
Rice - Models a large e-commerce site Amazon
- Web serving, searching, browsing, shopping carts
- Secure purchasing (SSL), best sellers, new
products - Customer registration, administrative updates
- Persistent data
- Static images on Web Server
- All others on back-end database
20TPC-W Snapshot
Image
Promo
Shopping Cart
Next Interaction
21TPC-W Interactions
- 14 Interactions, e.g.
- Home (read-only query)
- Best sellers (complex)
- Secure payment (ssl)
- Shopping cart (update query)
- Workload Mixes
- Browsing (95 read-only)
- Shopping (80 read-only)
- Ordering (50 read-only)
22TPC-W Queries
- SELECT c_uname FROM customer WHERE c_id 10
- SELECT i_id, i_title, a_fname, a_lname
- FROM item, author, order_line
- WHERE item.i_id order_line.ol_i_id
- AND item.i_a_id author.a_id
- AND order_line.ol_o_id
- (SELECT MAX(o_id)-3333 FROM orders)
- AND item.i_subject ARTS
- GROUP BY i_id, i_title, a_fname, a_lname
- ORDER BY SUM(ol_qty) DESC
- FETCH FIRST 50 ROWS ONLY
3 ms
4000 ms
23TPC-W Frequencies
24Software
Database Server
Web Server
Dynamic Content Generator
http
25Hardware
Apache Tomcat
MySQL DB2
http
sql
26Emulated Clients
Emulated Clients
Apache Tomcat
MySQL DB2
http
sql
- Remote Browser Emulator
- Session duration
- Think time
- Markov model
- Load is a function of the number of clients
27Experiments
- Performance Metrics
- Throughput (interactions/minute)
- Response time (msec, submission to completion)
- Examine each as a function of load ( of clients)
- Examine two locking approaches
- Locking in the database (slower, more general)
- Locking in the application server (faster, less
general) - Methodology
- Average of 5 runs
- Each run lasts 600 seconds
- Measurement starts after 100 second warm-up
- 90 confidence intervals
28Outline
- Motivation Background
- The Gatekeeper Proxy
- Experimental Environment
- Results
- Admission Control
- Request Scheduling
- Summary and Conclusions
29Admission Control - Throughput
30Admission Control - Throughput
31Admission Control - Explanation
(Captured using systat utility on Linux)
32Admission Control - Explanation
- Memory Pressure
- Clients 200 to 300
- Captured using Rabbit (Athlon performance
counters) - L1 data cache miss increases 24
- L1 DTLB miss L2 DLTB hit increases 25
- L1 DTLB miss L2 DLTB miss increases 23
- Database Processes
- Kernel linear and logarithmic overhead
- (e.g., maintain the ready queue)
- Database logarithmic overhead
- (e.g., list operations, sorting, searching)
33Throughput DB Lock Contention
34Throughput - DB2
35Outline
- Motivation Background
- The Gatekeeper Proxy
- Experimental Environment
- Results
- Admission Control
- Request Scheduling
- Summary and Conclusions
36Request Scheduling - Response Time
37Response Time - DB Lock Contention
38Request Scheduling - Analysis
- Same throughput, lower response time
- Response time Waiting time Execution
(service) time - Fairness
- FIFO all wait for same amount of time
- SJF favors short requests
Q How much are long jobs penalized?
39Request Scheduling - Explanation
- Short Job Exec Search
- Response time breakdown
- Service time unchanged
- 400 ms
- Waiting time reduced
- 8000 ms - 100 ms
- 80x difference!
40Request Scheduling - Explanation
- Long Job Admin Response
- Response time breakdown
- Service time unchanged
- 4800 ms
- Waiting time increases
- 12890 ms - 15621 ms
- Wait time increases 21
- Response time increases 13
41Request Scheduling - Explanation
- Average over all requests
-
- Response time breakdown
- Service time unchanged
- 428 ms
- Waiting time decreases
- 8856 ms - 225 ms
42Preventing Starvation
Aging mechanism, locking in App Server
43Preventing Starvation
Aging mechanism, locking in DB
44Related Work
- Admission Control/QoS for Static Content Web
Servers - Bhatti99, Li00, Voigt01, Abdelzaher02, Pradhan02,
Voigt02 - Identify content via IP addr, URL, Cookie
- Provide throughput/resp. time/BW guarantees
- Request Scheduling
- Crovella99, Bansal01, Schroeder02
- Use SRPT scheduling for static content servers
- Better response time, reasonable fairness, better
overload protection - Dynamic Content
- Dynaserver project at Rice/EPFL
- Iyengar97, Challenger00 Fragments, dependency
graphs, caching - Akamai Edge Side Includes
45Summary
- Presented the Gatekeeper Proxy
- Transparent, DB-independent
- Admission Control
- Consistent performance during overload
- Improves throughput 10
- Request Scheduling using SJF
- Improves response time 14 times
- Penalizes long jobs only 13
46Future Work
- Workloads where application server is bottleneck
- Place Gatekeeper in front of application server
- Workload characterization
- Get dynamic site traces from IGS
- See if TPC-W is representative
- System support for dynamic content
- Use Linux profiling support to identify
bottlenecks - Implement and evaluate improvements
- Scaling issues in multiple-tiered Web sites
- Content-aware back-end redirection
47Thank You!
48TPC-W Queries
49TPC-W Resources (Shopping Mix)
Conclusion Bottleneck is DB Lock contention
50Limit Connections
- Bottleneck is Database
- N max connections
- At most N complex queries
- Prevent overloading
- At most N simple queries
- Under utilization