Memory Management for Scalable Web Data Servers - PowerPoint PPT Presentation

About This Presentation
Title:

Memory Management for Scalable Web Data Servers

Description:

Eviction Policy: hated pages ... performance impact of eviction on the next page ... At eviction, heads of the two lists are compared. Replace page with ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 17
Provided by: rona93
Learn more at: https://dsf.berkeley.edu
Category:

less

Transcript and Presenter's Notes

Title: Memory Management for Scalable Web Data Servers


1
Memory Management for Scalable Web Data Servers
  • S. Venkataraman, M. Livny, J. F. Naughton

2
Motivation
  • Popular Web sites have heavy traffic
  • 2 main bottlenecks
  • CPU overhead of servicing TCP/IP connections
  • I/O of fetching the many pages serviced
  • CPU solution cluster of servers connected to one
    File Server
  • If each node had own subset of web site, the
    cluster is succeptible to skew.

3
Motivation (continued)
  • Papers Goal Develop buffer management
    techniques to best utilize the aggregate memory
    of the machines in the cluster to reduce I/O
    bottleneck.

4
Outline
  • Web Server Architecture
  • 3 Memory Management Techniques
  • Client-Server
  • Duplicate Elimination
  • Hybrid
  • Performance Evaluation
  • Discussion

5
Web Server Architecture
  • Cluster of cooperating servers connected by a
    fast network
  • Each node has own disks and memory
  • Each node runs the same copy of the server
  • Round robin router distributes requests

6
Web Server Architecture Part Deux
  • Primary server Server where the client request
    is serviced.
  • Owner Server that manages a persistent data
    item.
  • Owners maintain a directory of copies of the data
    pages they own in global memory.
  • Paper considers algorithms for read-only
    workloads.

7
Memory Management
  • Memory Hierarchy
  • Primary server memory
  • Owner memory
  • memory at other servers
  • disk
  • Each request is broken into page-sized units.
  • If Primary has page in memory, then done.
    Otherwise, ask Owner for it.

8
More on Memory Management
  • Owner gets another node to forward page if
    possible.
  • Otherwise, it gets it from disk and keeps a copy
    in memory.
  • The page in Owner memory is labeled as hated.
  • Eviction Policy hated pages are evicted first.
  • When primary server receives a page, it must
    choose another page to evict. Three algorithms
    (no hated pages)
  • Client-Server
  • Duplicate Elimination
  • Hybrid

9
Client-Server
  • An LRU list of pages is kept.
  • Very simple.
  • Increases local hits.
  • Lots of duplication possible.

10
Duplicate Elimination
  • Considers the cost difference between evicting a
    single page (singlet) and a duplicate page.
  • Duplicate pages eliminated first since they are
    cheep to fetch.
  • 2 LRU lists singlets and duplicates
  • Increases the percentage of database in memory.
  • Main Drawback hot duplicate page is replaced
    before a cold singlet.
  • But how do we keep track of duplicates?

11
Keeping Track of Duplicates
  • When a page goes from singlet to duplicate
  • Happens during a forward, so this is trivial. (no
    additional messages)
  • When a page goes from duplicate to singlet
  • Owner receives a message that a page was evicted.
  • If only one copy remains, owner sends a message
    to that server.
  • message can be piggy backed. (no additional
    messages)

12
Hybrids are Our Friends
  • Estimate performance impact of eviction on the
    next page reference.
  • Consider both likelihood of reference and cost of
    re-access.
  • Latency of fetching page p back into memory
    C(p).
  • cost of going to disk vs. cost of going to remote
    memory.
  • Likelihood that page p will be accessed next
    W(p)
  • W(p) 1 / (Elapsed time since last reference)
  • Expected cost E(p) W(p) C(p)

13
More on Hybrid Algorithm
  • Two LRU lists maintained just like Duplicate
    Elimination
  • At eviction, heads of the two lists are compared.
  • Replace page with lower Expected Cost

14
Simulation Model and Workload
  • 8 nodes of 64 MB each.
  • Message cost 1ms/page
  • Link Bandwidth 15MB / sec
  • All files are the same size.
  • Access frequency Zipfian distribution.
  • zipf parameter controls wide range of skews
  • zipf of zero means access frequencies are uniform
  • Pages of files are declustered among all servers.

15
And the survey says . . .
  • Duplication is good at high skews, but bad at low
    skews
  • At low skews (uniform access frequency)
  • Duplicate elimination has good global memroy
    utilization.
  • At high skews (zipf over 1)
  • Client-Server keeps hot pages at all nodes in
    memory
  • duplication is good
  • Hybrid nears the performance of the better choice
    in both scenarios.
  • Varying database size, using diverse file sizes,
    and more nodes gave similar results.

16
Discussion
  • Web sites have predictable hit rates, can that be
    used somehow?
  • Can we recycle evicted pages?
Write a Comment
User Comments (0)
About PowerShow.com