Memory Management for Scalable Web Data Servers

About This Presentation

Title:

Description:

Number of Views:26

Avg rating:3.0/5.0

Slides: 17

Provided by: rona93

Learn more at: https://dsf.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: Memory Management for Scalable Web Data Servers

1
Memory Management for Scalable Web Data Servers

2
Motivation

3
Motivation (continued)

Papers Goal Develop buffer management
techniques to best utilize the aggregate memory
of the machines in the cluster to reduce I/O
bottleneck.

4
Outline

5
Web Server Architecture

6
Web Server Architecture Part Deux

Primary server Server where the client request
is serviced.
Owner Server that manages a persistent data
item.
Owners maintain a directory of copies of the data
pages they own in global memory.
Paper considers algorithms for read-only
workloads.

7
Memory Management

8
More on Memory Management

Owner gets another node to forward page if
possible.
Otherwise, it gets it from disk and keeps a copy
in memory.
The page in Owner memory is labeled as hated.
Eviction Policy hated pages are evicted first.
When primary server receives a page, it must
choose another page to evict. Three algorithms
(no hated pages)
Client-Server
Duplicate Elimination
Hybrid

9
Client-Server

10
Duplicate Elimination

Considers the cost difference between evicting a
single page (singlet) and a duplicate page.
Duplicate pages eliminated first since they are
cheep to fetch.
2 LRU lists singlets and duplicates
Increases the percentage of database in memory.
Main Drawback hot duplicate page is replaced
before a cold singlet.
But how do we keep track of duplicates?

11
Keeping Track of Duplicates

12
Hybrids are Our Friends

13
More on Hybrid Algorithm

14
Simulation Model and Workload

15
And the survey says . . .

Duplication is good at high skews, but bad at low
skews
At low skews (uniform access frequency)
Duplicate elimination has good global memroy
utilization.
At high skews (zipf over 1)
Client-Server keeps hot pages at all nodes in
memory
duplication is good
Hybrid nears the performance of the better choice
in both scenarios.
Varying database size, using diverse file sizes,
and more nodes gave similar results.