Title: Optimizing Data Aggregation for Clusterbased Internet Services
1Optimizing Data Aggregation for Cluster-based
Internet Services
- Lingkun Chu, Hong Tang, Tao Yang
- University of California, Santa Barbara
- Kai Shen
- University of Rochester
2Internet Services and Data Aggregation
- Large-scale service clusters AOL, Yahoo!, MSN,
Google, Teoma/Ask Jeeves. - 24x7 availability.
- Scalability (Large data sets. High traffic)
- Efficient resource management
- Programming support for reliable and scalable
network services is very important. - This talk focuses on programming and runtime
support for data aggregation in cluster-based
services.
3Example of Internet Services Search Engine
Index servers (partition 1)
Query caches
Firewall/ Traffic switch
Web server/ Query handler
Local-area network
Index servers (partition 2)
Doc server (partition 2)
Index servers (partition 3)
Doc server (partition 1)
4Neptune Programming and Runtime Support for
Cluster-based Services
- Programming support
- Component-oriented.
- High-level primitives for service
aggregation/replication in clusters. - Runtime support
- Service discovery, service invocation, load
balancing, failover management, service
differentiation, and replica consistency. - Applications
- Discussion groups online auctions persistent
cache BLAST-based protein sequence match. - Teoma/AskJeeves search.
5Outline
- Background on Internet Services.
- Data Aggregation Semantics API
- Runtime System Design and Implementation.
- Experimental Evaluation.
6Data Aggregation Introduction
- Internet services often partition data into
multiple groups for data parallelism and
management simplification. - Aggregation combines partial results from
multiple data partitions. - Aggregation for high performance and availability
is hard. - Need explicit programming support and efficient
runtime system design.
7Design Objectives for Scalable Data Aggregation
- Programming primitive
- Easy-to-use.
- General and flexible.
- Runtime support
- Scalable to a large number of partitions.
- Low response time and high throughput.
- All must be achieved in a cluster environment!
- Component failures.
- Platform heterogeneity among partitions due to
hardware/application irregularity.
8Data Aggregation Call (DAC)The Basic Semantics
DAC(P, opproc , opreduce)
Requirement of reduce() commutative and
associative.
partition 1
partition 2
partition 3
partition 4
9Adding Quality Control to DAC
- What if a server fails or is very slow?
- Aggregation quality guarantee
- Partial aggregation results may still be useful.
- Aggregation quality Percentage of partitions
contributed to the aggregation result. - Soft deadline guarantee
- Better to return partial results promptly than
waiting for too long.
DAC(P, opproc , opreduce ,q, t)
10Summary of Key Design Ideas
- Load-adaptive tree reduction
- Minimizes response time
- Sustains throughput
- Tolerates faults/unresponsiveness.
- A hybrid thread/event-driven node architecture.
- Staged timeout that proactively prunes slow or
unresponsive servers from a reduction tree.
11Design Choices for Aggregation
- Three reduction schemes
- Base without programming support.
- Flat random delegated roots.
- Hierarchical dynamic, load-aware.
Service Providers
12Optimization in Tree-based Aggregation
- Form a reduction tree dynamically for each
request - Load changes from one request to another request.
Dynamic trees can help balance load. - Need to tolerate node slowness and failures
- Optimization issues in tree formation
- Optimize the tree shape
- High outgoing degree implies high aggregation
cost, causing load unbalanced. - Tree depth affects latency (long path).
- Machine assignment
- Assign slow machines to leaf nodes.
13Load-adaptive Tree Formation (LAT)
7
G
H
6
5
4
3
2
1
D
E
F
A
B
D
C
E
F
G
H
14LAT Summary
- Steps
- Collecting server load information.
- Assigning operations to servers.
- Constructing the reduction tree.
- Adjusting the tree shape.
- Time complexity O(nlogn).
15Runtime System Architecture
Service
Consumer
DAC
DAC
Client
Module
Request
16Handling Failures and Unresponsiveness
- Cases
- Server stopped No heartbeat packets.
- Server unresponsive Very long queue.
- Solutions
- Exclude stopped servers from the reduction tree.
- Slow nodes are already on leafs.
- Use staged timeout to eagerly prune unresponsive
servers.
17Evaluation
- Application deployments Index search server
NCBIs BLAST protein sequence matcher online
facial recognizer. - Hardware A cluster of Linux servers
- 30 dual-CPU (400MHz P-II), 512MB MEM
- 4 quad-CPU (500MHz P-II), 1GB MEM.
- Benchmark I Search engine index server
- Dataset 28 partitions, 1-1.2GB each.
- Workload Trace-driven (One week trace from
Ask.com). - Benchmark II CPU-spinning microbenchmark.
- Workload Synthetic.
18Ease of Use
- Applications Index server NCBIs BLAST protein
sequence matcher online facial recognizer. - First implemented without DAC.
- A graduate student modified it with DAC.
19Comparison of Three Aggregation Approaches
- 24 dual-CPU nodes, index server benchmark.
10
39
20Scalability (simulation)
(B) Scalability Throughput
(A) Scalability Response Time
0.5
100
0.4
80
0.3
60
Throughput (req/sec)
Response Time (s)
40
0.2
Throughput
95 Demand level
60 Demand level
80 Demand level
0.1
20
90 Demand level
0
0
100
200
300
400
500
100
200
300
400
500
Number of Server Partitions
Number of Server Partitions
21Handling Server Failures without Replication
- LAT with Staged timeout (ST).
- Event-driven request scheduling (ED).
- Three versions No-optimization, ED-only, EDST.
60
22Related Work
- Clustering middleware and distributed systems.
- TACCSOSP97, MultiSpaceUSENIX99,
NinjaUSEINIX02, NeptuneOSDI02 - MPI tree reduction.
- MagPIePPoPP99, KaronisIPDPS00,
- Database aggregation based on SQL queries.
- TAGOSDI02, ShatdalSIGMOD95, ...
- Application specific aggregation in Google,
Inktomi, Teoma/Ask Jeeves.
23Contributions and Summaries
- New programming primitive for data aggregation.
- Efficient runtime system design with
- LAT tree formation.
- Hybrid thread/event-driven scheduling
- Staged timeout.
- Achieve low response time, high throughput, and
high availability together. - http//www.cs.ucsb.edu/projects/neptune