Title: Cuckoo: Replication Policies
1CuckooReplication Policies Load Shedding
Algorithms
Andrew Klosterman Greg Ganger PARALLEL DATA
LABORATORY Carnegie Mellon University
2Motivation An Overloaded NFS Server
C
D
- NFS Servers export filesystems
- Clients mount exported filesystems
3Motivation An Overloaded NFS Server
C
D
- Common solutions to relieve load
- Buy a better server
- Move responsible data to another server
4Motivation Whats a Cuckoo to do?
- It should dynamically identify heavily accessed
data - Automatically copy to other servers to allow
load-shedding - A Replication Policy is responsible for this
operation - It should automatically identify most work
intensive ops - Offload to other servers with replicas when busy
- A load-shedding algorithm is responsible for this
operation
Objective Identify simple replication policies
and load shedding algorithms that will allow us
to offload large amounts of work from a NFS server
5Replication in Cuckoo
- Cuckoo must determine what to replicate from
- The replication policy in use
- Access traces received from NFS servers
-
- Replication is performed periodically
- Assumed replication period of one day for
analysis - Short-Lived Objects Created between replications
- Long-Lived Objects Exists at beginning of
replication - What objects should the planner replicate?
- Replicate only most utilized objects
- Replicate objects whose replicas can be synced
easily
6Replication Policies Most Popular RO Policy
- Simple and presents no consistency issues
- Define popularity by number of ops seen
7Replication Policies Most Popular RO Policy
- This is not a good replication policy
- Can only offload up to 12 of all operations
- Can only offload about 1 of all bytes
transferred - Most operations accounted for are metadata ops
- Offloading metadata ops not important
- Work intensive operations are probably not
metadata ops - Lessons Learned
- Tune replication policies for data objects
- Operation level granularity is not sufficient
- Number of ops seen is not a good ranking scheme
8Replication Policies Data Objects
- Many long-lived objects have file data that is
mostly RO - Visible if viewing location of unique bytes
read/written - Considerable data is read from these objects
A 100 RO object
RO Portion
A mostly RO object
RW Portion
RO Portion
Green area sees no write ops Red area sees
read/write ops
9Replication Policies Data Objects
- Many long-lived objects have file data that is
mostly RO - Considerable data is read from these objects
10Replication Policies ROF/RWP Policy
- Mostly RO objects are the sweet spot for Cuckoo
- They are used in the ROF/RWP replication policy
by - Ranking data objects by bytes read
- Replicating data objects that have
- Large fraction of total bytes read from RO
sections (ROF) - High proportion of bytes read vs. bytes written
in RW sections (RWP)
11Replication Policies ROF/RWP Policy
- Mostly RO objects are the sweet spot for Cuckoo
- They are used in the ROF/RWP replication policy
by - Ranking data objects by bytes read
- Replicating data objects that have
- Large fraction of total bytes read from RO
sections (ROF) - High proportion of bytes read vs. bytes written
in RW sections (RWP)
Bytes Offloadable1K Objects Replicated DEAS
02/02/03
Bytes Written 37.2
Offloadable Bytes Read 62.2
Unoffloadable Bytes Read 0.6
12Summary
- We have identified a replication policy that
- Is capable of offloading a small fraction of all
ops - Can offload a large amount of data read
- We would like more traces
- That are busier
- That show accesses to more than one server
- Come see our posters!
13Related Work/Products
- Rainfinity RainStorage. http//www.rainfinity.com
- Scott Baker and John H. Hartmann. The Mirage NFS
Router. University of Arizona Computer Science
Technical Report TR-02-04. November, 2002. - Daniel Ellard, Jonathan Ledlie and Margo Seltzer.
Passive NFS Tracing of Email and Research
Workloads. USENIX File and Storage Technologies
Conference, pages 203-216, San Francisco,
California. March, 2003. - Daniel Ellard, Michael Mesnier, Eno Thereska,
Gregory R. Ganger, and Margo Seltzer.
Attribute-Based Prediction of File Properties.
Harvard Computer Science Technical Report
TR-16-03. December, 2003.