Grid Computing

About This Presentation

Title:

Grid Computing

Description:

Grid Computing. Session 2. David Yates. January 23, 2004. Acknowledgments ... Nick LeRoy, Alain Roy, Joseph Stanley, Andrea Arpaci-Dusseau, Remzi ... – PowerPoint PPT presentation

Number of Views:194

Avg rating:3.0/5.0

Slides: 117

Provided by: david259

Category:

more less

Transcript and Presenter's Notes

Title: Grid Computing

1
Grid Computing

Session 2
David Yates
January 23, 2004

2
Acknowledgments

Most of these slides were written by the authors
of the papers
Most of the editorial comments are mine
Session 2 covers additional papers on data
management (metadata and replication), papers on
the interaction between data management and
scheduling, and storage and file systems for Grid
data

3
Session 2 Papers
A. Chervenak, E. Deelman, I. Foster, L. Guy, W.
Hoschek, A. Iamnitchi, C. Kesselman, P. Kunst, M.
Ripeanu, B, Schwartzkopf, H, Stockinger, K.
Stockinger, B. Tierney, Giggle A Framework for
Constructing Scaleable Replica Location
Services. In Supercomputing 2002, November
2002.http//www.globus.org/research/papers/giggle
.pdf G. Singh, S. Bharathi, A. Chervenak, E.
Deelman, C. Kesselman, M. Mahohar, S. Pail, L.
Pearlman, A Metadata Catalog Service for Data
Intensive Applications. In Supercomputing 2003,
Phoenix, AZ, November, 2003. http//www.globus.org
/research/papers/mcs_sc2003.pdf Matei Ripeanu
and Ian Foster, A Decentralized, Adaptive,
Replica Location Service. In Eleventh IEEE
International Symposium on High Performance
Distributed Computing, Edinburgh, Scotland, July
2002.http//people.cs.uchicago.edu/matei/PAPERS/
hpdc-02.pdf Kavitha Ranganathan and Ian Foster,
Computation Scheduling and Data Replication
Algorithms for Data Grids. Chapter 22 in Grid
Resource Management State of the Art and Future
Trends, Jarek Nabrzyski, Jennifer M. Schopf, and
Jan Weglarz editors, Kluwer Academic Publishers,
2003.http//www-unix.mcs.anl.gov/schopf/BookFina
l.pdf
4
Session 2 Papers, Continued
George Kola, Tevfik Kosar and Miron Livny,
Run-time Adaptation of Grid Data-placement
Jobs. To appear in Parallel and Distributed
Computing Practices, 2004.http//www.cs.wisc.edu/
condor/stork/papers/runtime_adaptation-pdcp2004.pd
f Renato J. Figueiredo, Nirav H. Kapadia and
Jose A. B. Fortesy, The PUNCH Virtual File
System Seamless Access to Decentralized Storage
Services in a Computational Grid. In Tenth IEEE
International Symposium on High Performance
Distributed Computing, San Francisco, CA, August
2001. http//punch.purdue.edu/HubInfo/publications
/2001/hpdc-renato.pdf John Bent, Venkateshwaran
Venkataramani, Nick LeRoy, Alain Roy, Joseph
Stanley, Andrea Arpaci-Dusseau, Remzi H.
Arpaci-Dusseau and Miron Livny, Flexibility,
Manageability, and Performance in a Grid Storage
Appliance. In Eleventh IEEE Symposium on High
Performance Distributed Computing, Edinburgh,
Scotland, July 2002.http//www.cs.wisc.edu/condor
/nest/papers/nest-hpdc-02.pdf Sean Rhea, Patrick
Eaton, Dennis Geels, Hakim Weatherspoon, Ben Zhao
and John Kubiatowicz, Pond the OceanStore
Prototype. In Second USENIX Conference on File
and Storage Technologies, March
2003.http//oceanstore.cs.berkeley.edu/publicatio
ns/papers/pdf/fast2003-pond.pdf
5
Giggle Overview

Giggle GIGa-scale Global Location Engine
A Framework for Constructing Scaleable Replica
Location Services
Data intensive applications
Replicate data at multiple locations
A Replica Location Service (RLS) is a distributed
registry service that records the locations of
data copies and allows discovery of replicas
Maintains mappings between logical identifiers
and target names
Physical targets Map to exact locations of
replicated data
Logical targets Map to another layer of logical
names, allowing storage systems to move data
without informing the RLS
Issues
Locating replicas of desired files
Creating new replicas
Scalability
Reliability

6
Giggle Architecture
Replica Location Indexes (RLIs)
RLI
RLI
LRC
LRC
LRC
LRC
LRC
Local Replica Catalogs (LRCs)

LRCs contain consistent information about
logical-to-targetmappings at a site
RLIs nodes aggregate information about LRCs
Arbitrary levels of RLI hierarchy (see paper for
example)

7
Giggle A Flexible Replica Location Service
Framework

Allows users to make tradeoffs among
consistency
space overhead
reliability
update costs
query costs
By different combinations of five essential
elements, the framework supports a variety of RLS
designs. Five elements
1. Consistent Local State
2. Global State with relaxed consistency
3. Soft state mechanisms for maintaining global
state
4. Compression of state updates
5. Membership and Partitioning information
maintenance

8
Components of RLS Implementation

Front-end Server
Multi-threaded
Supports Globus Grid Security Infrastructure
(GSI) authentication
Common implementation for LRC and RLI
Back-end Server
mySQL relational database(or PostgreSQL
database)
Holds logical name to target name mappings
Client APIs C and Java
Client Command line tool

9
Implementation Features

Two types of soft state updates from LRCs to RLIs
Complete list of logical names registered in LRC
Bloom filter compressed summaries of LRC
Immediate mode
When active, send updates after 30 seconds
(configurable) or after fixed number (100
default) of updates
Send full updates at a reduced rate
User-defined attributes
May be associated with logical or target names
Partitioning (without bloom filters)
Divide LRC soft state updates among RLI index
nodes using pattern matching of logical names
Membership service
Static configuration only
Eventually use OGSA registration techniques

10
Wide Area Complete Soft State Update Performance

LRCs in Geneva and Pisa updating RLI at Glasgow
Full soft state updates quite slow for large
databases, dominated by update costs on RLI
database
Performance does not scale as LRCs grow need
compression of soft state updates

11
Performance of an LCR ServerUpdating an RLI
Server

Number of SQL operations generated at single RLI
and LRC servers for complete and incremental
updates
Servers need to be configured (statically or
dynamically) to use update scheme that is most
appropriate for expected rate of updates to LRC

12
Future Work

Continued development of RLS as part ofGlobus
Tookit
http//www.globus.org/rls
http//cern.ch/grid-data-management
Reliable replication service
Replicate data objects and register them in RLS
Provide fault tolerance
Consistency services
Versioning
Subscription
RLS will become an OGSA grid service
Replica location grid service specification will
be standardized through Global Grid Forum

13
Metadata Catalog Service for Data Intensive
Applications

Metadata is information that describes data
objects
Application-specific
Temperature, longitude, latitude, depth
Time, duration, sensor
Application-independent
Creator, logical name, time created, access
control
Collections of data objects e.g., data
collected during an experiment
Logical views of data objects allow users to
group data objects according to their interests

14
Types Of Metadata

Physical metadata
Depends on location of data object and
characteristics of the storage system
Logical metadata
a) How data objects were created or modified
By whom, when, using what equipment or
computational engine
By what process experimental output,
simulation or analysis results
With what input conditions or parameters
b) Description of what the data represent
Precipitation over Africa for December 1998
Particle collisions in the LHC for period of 1
second
We restrict the Metadata Catalog Service (MCS)
schema to logical metadata

15
Why is a Metadata Catalog Service Needed?

Essential for scientists and applications to
Record information about the creation,
transformation, meaning and quality of data items
Query for data items based on these descriptive
attributes
Identifying data items correctly is essential for
correct analysis of experimental and simulation
results
Traditionally, scientists have used ad hoc
methods to keep track of what data items
represent
Descriptive file names, datasets, directories,
lab notebooks, memory
These methods do not scale to terabyte and
petabyte data sets consisting of millions of data
items

16
An Example MCS Usage Scenario
17
Data Model
Logical data item (file)
Logical Collection
Logical View
18
MCS Prototype Schema

Logical file metadata
logical file name
data type
version number
master copy location
container information
information about the creator
last modifier of the data
Logical collection metadata
collection name
description
set of files in a collection
annotations on the collection
information about the creator and modifier(s)
collection hierarchy information (parent
collection id)
Logical view metadata
view name, attributes, description, creator /
modifier(s)

19
Prototype Design

Initial Prototype
Simple, centralized Metadata Service
Based on open source web service (Apache / Axis)
and relational database technology (mySQL)

Web interface is expensive (incurs 80 overhead)
Adding to metadata catalog scales well with
database size

Web interface even more expensive (95)
Querying is 2x - 20x faster than adding
Querying metadata catalog also scales with
database size

Complex queries are about 8x12x slower than
simple queries
Overhead of web interface and increase in
database size both have performance penalty

23
Status and Future Work

Evaluated alternative back end technologies
Evaluated methods to reduce web interface
overhead
Initial prototype is relational (mySQL)
Requires shredding and reconstructing XML data
Difficult tradeoffs between complexity of storing
XML metadata and query efficiency
XML metadata is not a very natural fit for MCSs
relational database back end
But native XML databases have poor query
performance
Evaluate use of native XML databases (Xindice,
commercial XML databases)
New implementation will be based on OGSA Database
Access and Integration (DAI) Service
Being standardized through Global Grid Forum
Reference implementation involving IBM, Oracle,
UK eScience researchers, academic institutions
Provides both relational and native XML back ends
Provides a grid service front end with grid
security
Provides a general pass-through SQL query
interface
Testing OGSA DAI services with ESG metadata

24
Future Work, Continued

Re-evaluate MCS schema
How can we better support (multiple)
domain-specific schema?
ESG makes extensive use of user-defined
attributes to support domain-specific metadata
schemas
Key requirement for metadata services - easily
extensible
Need rich, efficient mechanisms for adding
user-defined attributes
Reconsider usefulness of pre-defined attributes
How useful are pre-defined attributes?
ESG not using many of MCSs pre-defined
attributes
Will we use more of these as we integrate further
with other grid tools for workflow management,
provenance, etc.?
Support for provenance information (describes
data transformations)
Unify MCS schema with Chimera data catalog schema
Distribution and federation of heterogeneous
metadata services
Want to federate multiple metadata catalogs
(e.g., THREDDS)
Current work assumes strict consistency is a
requirement
Explore relaxed consistency models heterogeneous
metadata services export discovery information to
aggregating index nodes

25
A Decentralized, Adaptive Replica Location Service

Replica location problem
Replication often used to improve reliability,
access latency or availability
Need efficient mechanism to locate replicas
Map logical ID to replica location(s)
Common to cooperative proxy cache, distributed
object systems
In Grids client presents a LFN and asks for one,
many or all PFNs

26
End-to-end Argument

Impossible to provide a completely consistent
view of the system in a distributed, asynchronous
environment.
Giggle presents a framework for building replica
location services.
We argue that the performance of the overall
system benefits from relaxed consistency
semantics at lower system levels.
Interesting tradeoffs between inconsistency
levels and operational costs.

27
Example Application Requirements

Requirements for data intensive, scientific
applications (GriPhyN project)
Scale 1 billion replicas by 2006, 10 times
larger by 2010
Decentralization sites able to operate
independently (100s of sites)
Replica lookup rates are order(s) of magnitude
higher than update rates
Efficient queries for ad-hoc sets of files

28
Lossy Data Compression Bloom Filters

Probabilistic technique for compressed set
representation
Good compression ratios at a cost of low false
positive rate

29
Bloom Filters, Continued

Simple mathematical model to design filters
Accuracy/space (bandwidth) tradeoffs can be
adjusted on the fly

of hash functions
30
Overlay Networks

Generally used to provide functionality not
available at lower network levels (e.g.,
multicast and security)
Why do we use overlays?
Resilient Overlay Networks (MIT)
projectimproves network availability between
Internet connected end-points by more than one
order of magnitude
Work well file-sharing P2P systems scaled to
more than 100k nodes (e.g., Gnutella, KaZaa)
Easy to adapt to heterogeneity in available
resources

31
Soft-state Mechanisms

Producer sends state to receiver(s) over a
(lossy) channel
Receivers keep state and associated timeouts
Advantages
Decouples state producer and consumer no
explicit failure detection and state removal
messages
Eventual full state
Adaptive traditionally fixed, empirically
determined update rates, however state produces
can obey more complex rules
Work well in practice RSVP, RIP, or MDS-2.

32
Assembling the Pieces

Replica add/delete
Digest dissemination
Replica lookup
Nodes cache responses
to benefit from locality in request flow
Storage sites and Replica Location Nodes (RLNs)
join-in and leave
Typically one or more RLN per administrative
domain

RLNA
Client
33
Resource Requirements Estimate

Compact Muon Solenoid (CMS) high-energy physics
experiment requirements for 2006
0.5G replicas overall (avg. 10 replicas/file)
100 sites (replica location nodes)
overall 10,000 lookups/sec, 10 updates/sec on
average, update propagation delay 30 sec
Translates into
RLNs needs 1GB of memory (lt0.05 false positives)
Generated traffic lt200 Mbps per overlay link

34
Prototype Implementation

Python code for fast prototyping
Bloom filters
False positive rates match theoretical results
Fast lookup, add, delete operations

35
Prototype Performance

Replica Location Node
Lookup rates 645 req/sec and 7700 req/sec
Add, delete about half of lookup performance
Overall performance
Tested with 24 nodes on a LAN
50M replicas (about 2M per node)
3 simulated clients per node
Peaks at 2000 lookups/sec concurrently with 1200
updates/sec
Propagates update in 30 sec

36
Future Work

Improve prototype performance
Enhance overlay organization mechanisms to
reflect various goodness criteria
Match infrastructure (reduce generated traffic
overhead)
Match user behavior (file sharing ? overlay
topology)Small-World File-Sharing Communities
in Infocom 2004
Reduce latency
Maximize availability
Emulation environment to be able to perform
controlled large scale experiments
Test on wide area deployments

37
COMPUTATION SCHEDULING AND DATA REPLICATION
ALGORITHMS FOR DATA GRIDS

Scheduling algorithms for large-scale data
intensive problems in Grids
e.g. High Energy Physics experiments like CMS (at
CERN) which will generate petabytes of data per
year
Challenge
multiple, potentially independent sources of jobs
large number of storage, compute, and network
resources
huge amounts of input / output data
Decentralized solutions for simplicity and
feasibility
Jobs are data-intensive ? important to take data
location into account while scheduling
Replication of data to reduce latency caused by
remote data access

38
Contributions

A general and extensible scheduling framework for
computational grids
A wide variety of scheduling algorithms can be
implemented using this framework
Simulator ChicagoSim uses framework to explore
effectiveness of different approaches /
algorithms for scheduling
Paradigm for scheduling Integrated job
scheduling and data replication

39
System Model

Model a Grid as a collection of sites - each site
has
Certain number of processors
Limited Storage
Users associated with the local site
Set of files initially at site
Users generate jobs - each job
Needs certain input files before it can execute
Executes on a single processor
Has access to all files at its local site

40
Scheduling Framework
N users
User

J
E External Schedulers
ES
ES
J
J
Local Scheduler
D
DS
LS
DS
LS
Migrate data
Request remote data
S Sites
Q
D
J
J
DataSet Scheduler
Computers
Storage
Computers
Storage
Schedule on idle node
Monitor popularity
D
D
D
J
J
J
Computers
Storage
Different mappings between Users and External
Schedulers lead to different architectures
41
Job and Data Scheduling Algorithms

Two distinct functionalities External Scheduler
and Dataset Scheduler
Job Scheduling algorithms
External Scheduler runs job at -
Random A randomly selected site
LeastLoaded The site that currently has the
least load
RandLeastLoaded A site randomly selected from
the n least-loaded sites
DataPresent The least loaded site that already
has the required data
Local The site where the job originated
Local scheduling is performed FIFO

42
Data Scheduling Algorithms

Dataset Scheduling algorithms
Datasets for jobs are replicated at -
Caching No active replication takes place
datasets are cached and managed LRU
DataRandom Replicate popular datasets at a
random site when local sites load exceeds a
threshold
DataLeastLoaded Replicate popular datasets at
the least loaded site when local sites load
exceeds a threshold
DataRandLeastLoaded Replicate popular datasets
at a random site picked from the n least loaded
sites when local sites load exceeds a threshold
Datasets are also cached and storage at each site
ismanaged LRU

43
Simulation Parameters
Dataset popularity is modeled by picking input
datasets from a geometric distribution
44

Performance from simulation results varies widely
(6x or more)
Integrated approaches (DataPresent selective
replication) perform best
Data-driven approach without selective
replication (DataPresent Caching) performs
worse than baseline policies (Random and Local)
Adding randomization to least loaded job
scheduling yields significant gain

Data-driven scheduling approaches (DataPresent
?) perform best
Caching always reduces data transferred (no data
is transferred with DataPresent Caching)

Integrated approaches (DataPresent selective
replication) perform best
Load-based replication, like load-based
scheduling, is a good idea

47
Summary and Future Work

Important to address both job scheduling and data
replication and impact of one on the other
An integrated approach performs best among the
strategies considered
data-driven job scheduling
proactive selective dataset replication
Future Work
Workloads from Fermi Lab user access patterns and
CMS workload generator
Visualization tool for ChicagoSim
Experiments gauging sensitivities to
Bandwidth, storage / cache size, CPU speed
Heterogeneity in Grid
(user location, storage, compute elements)
Network topology / contention
File popularity / job popularity
Validate simulation results on real Grid testbeds
Explore adaptive algorithms that select
algorithms dynamically depending on current Grid
conditions

48
Run-time Adaptation of Grid Data Placement Jobs

Grid presents a continuously changing environment
Data intensive applications are being run on the
grid
Data intensive applications have two parts
Data placement part
Computation part

49
Data Placement
A Data Intensive Application
Stage in data
Data placement
Compute
Stage out data
Data placement encompasses data transfer,
staging, replication, data positioning, space
allocation and de-allocation
50
Current Approach

Fedex
Hand Tuning
Network Weather Service
Not useful for high-bandwidth, high-latency
networks
TCP Auto-tuning
16-bit windows size and window scale option
limitations

51
Our Approach

Full automation
Continuously monitor environment characteristics
Perform tuning whenever characteristics change
Ability to dynamically and automatically choose
an appropriate protocol
Ability to switch to alternate protocol in case
of failure

52
The Big Picture
53
Profilers

Memory Profiler
Optimal memory block-size and incremental
block-size
Disk Profiler
Optimal disk block-size and incremental
block-size
Network Profiler
Determines bandwidth, latency and the number of
hops between a given pair of hosts
Uses pathrate, traceroute and diskrouter
bandwidth test tool

54
Parameter Tuner

Generates optimal parameters for data transfer
between a given pair of hosts
Calculates TCP buffer size as the bandwidth-delay
product
Calculates the optimal disk buffer size based on
TCP buffer size
Uses a heuristic to calculate the number of TCP
streams
No of streams 1 No of hops with latency gt
10ms
Rounded to an even number

55
Data Placement Scheduler

Data placement is a real job
A meta-scheduler (e.g. DAGMan in Condor) is used
to coordinate data placement and computation
Sample data placement job
dap_type transfer
src_url diskrouter//slic04.sdsc.edu/s/s1
dest_urldiskrouter//quest2.ncsa.uiuc.edu/d/d1

56
Data Placement Scheduler

Used Stork, a prototype data placement scheduler
Tuned parameters are fed to Stork
Stork uses the tuned parameters to adapt data
placement jobs

57
Coordinating DAG
58
Scalability

There is no centralized server
Parameter tuner can be run on any computation
resource
Profiler data is 100s of bytes per host
There can be multiple data placement schedulers

59
Real World Experiment

DPOSS data had to be transferred from SDSC
located in San Diego to NCSA located at Chicago

Transfer
60
Management Site (skywalker.cs.wisc.edu)
SDSC (slic04.sdsc.edu)
NCSA (quest2.ncsa.uiuc.edu)
StarLight (ncdm13.sl.startap.net)
61
Data Transfer from SDSC to NCSA using Run-time
Protocol Auto-tuning
Transfer Rate (MB/s)
Time
Network outage
Auto-tuning turned on
62
Parameter Tuning
Network parameters for GridFTP before and after
auto-tuning feature of Stork being turned on
63
Alternate Protocol Failover

dap_type transfer
src_url diskrouter//slic04.sdsc.edu/s/data1
dest_urldiskrouter//quest2.ncsa.uiuc.edu/d/data
1
alt_protocolsnest-nest, gsiftp-gsiftp
In case of diskrouter failure, Stork will switch
to other protocols in the order specified

64
Testing Alternate Protocol Failover
Transfer Rate (MB/s)
Time
DiskRouter server killed
DiskRouter server restarted
65
Conclusion

Run-time adaptation has a significant impact (20
times improvement in our test case)
Profiling data has the potential to be used for
network management and data mining
Network misconfigurations
Network outages
Dynamic protocol selection and alternate protocol
failover increase resilience and improve overall
throughput

66
Future Work

Enhance dynamic protocol selection in Stork to
select best protocol
Performance (support for different requirements)
Security ?
Reliability ?
Dynamically select which route to use in
transfers
Dynamically deploy diskrouters at Grid nodes
Combine route selection and diskrouters to make
the best use of network bandwidth

67
The PUNCH Virtual File System (PVFS)

Seamless Access to Decentralized Storage Services
in a Computational Grid
Goal computational grids to distribute and
deliver computing services to users anytime,
anywhere
Challenge data management

68
PUNCH
punch.purdue.edu
Web enabling
Applications
Data
Virtual file system
Resource management
Compute servers
69
Logical User Accounts

Problems with traditional user accounts
No support for dynamic access policies
Cannot cross administrative domains
Complicates resource management
Logical user accounts provide a capability that
allows users to check out accounts dynamically
via a resource management system
Shadow accounts allocated to users on demand at
compute server
File accounts store data for one or more users
at file server

70
Traditional vs. Logical user accounts
71
PUNCH Virtual File System PVFS Goals

Unmodified applications
Unmodified O/S clients, servers
Heterogeneous platforms
Block-based data transfers
gt De-facto standard NFS

72
NFS-based Virtual File System

Additional functionality is required
shadow-file account multiplexing, uid mapping
Possible solutions
Enhanced NFS clients and/or servers
NFS call forwarding via middle tier proxies

73
Network File System (NFS)
74
(No Transcript)
75
Multiplexing and access control

Client A

Server C

Client B

File system gateway
Server D
76
Performance Results
Andrew File System Benchmark on PVFS
Note Client machine was a 4-CPU, 480MHz
UltraSPARC connected to a 2-CPU 400MHz
server via a 100Mb/s switched Ethernet.
Data shown is the average across 200 samples.
77
User workload characteristics
Andrew gt 100 transactions/s
78
Related Work

Explicit file transfers Globus (RFT / GridFTP),
Portable Batch System, others
Implicit transfers
Condor Custom libraries
Legion Custom NFS servers
PUNCH V0.5 Standard NFS clients/servers
SFS proxy-based, but no account multiplexing

79
Future Work

Coarse-grain locality
Placement
Migration
Fine-grain locality
Middleware-driven consistency
Proxy caching / prefetching

80
Flexibility, Manageability and Performance in a
Grid Storage Appliance

Two Trends
Data sets
Performance

Storage appliances address both trends

81
Storage Appliances and -

Storage Appliances Great for basic file service
Easy to manage Plug in and it works
Good performance Specialized just for I/O
Reliable and available too
Storage Appliances for the Grid Mismatch?
Inflexible Few, specific protocols (e.g., NFS)
Costly 10x the cost of PC a few disks
Difficult to integrate Just one piece of the
puzzle

82
A Solution NeST

NeST A Storage Appliance for the Grid
Flexible Multiple simultaneous protocols
Virtual protocol layer
Low-cost Use commodity machines
Dynamic adaptation
Grid-aware Integrate w/ higher-level systems
Design specifically for the Grid

83
NeST Protocol Layer
Physical network layer

Virtualizes different protocols
Mediates access to network

Dispatcher
Storage Mgr
Transfer Mgr
Physical storage layer
84
NeST Dispatcher

Mediates interaction between other components
Gathers information, advertises

Physical network layer
Dispatcher
Storage Mgr
Transfer Mgr
Physical storage layer
85
NeST Storage Manager
Physical network layer

Space management
Access control
Virtualizes physical storage

Dispatcher
Storage Mgr
Transfer Mgr
Physical storage layer
86
NeST Transfer Manager
Physical network layer

Implementsscheduling policies
Chooses concurrency model

Dispatcher
Storage Mgr
Transfer Mgr
Physical storage layer
87
Flexibility Multiple protocols

Problem How to support multiple protocols?
One approach Just a Bunch of Servers (JBOS)
Problems with JBOS
Lack of control (scheduling)
Painful administration
No shared code
Larger memory footprint

wu-ftpd
nfsd
httpd
JBOS Server
88
NeST Flexibility By Design

NeST Integrate protocols and gain advantage
Implementation like VFS
Integration introduces new challenges
Different protocols allow different auth models
More expensive to add a new protocol
Less fault isolation

89
NeST vs JBOS
Linux cluster - Dual PIII - 1 GB Ram - linux
2.2.19 Each protocol - 4 clients - 10 MB files
35
35
30
30
25
25
20
20
Server bandwidth (MB/s)
15
15
10
10
linux nfsd
Apache
wu-ftpd
5
5
0
0
NFS
Chirp
HTTP
Total
GridFTP

For each protocol, NeST is comparable to JBOS
server

90
Exerting Scheduling Control

Different scheduling policies
FCFS
Cache-awareExploiting Gray-Box Knowledge of
Buffer-Cache Management in USENIX 2002
Proportional share
Proportional share scheduling
Allows administrators to set protocol proportions
e.g., favor NFS
Very difficult in JBOS

91
Proportional Share
35
30
25
Linux cluster - Dual PIII - 1 GB Ram - linux
2.2.19 Each protocol - 4 clients - 10 MB files
20
Server bandwidth (MB/s)
15
10
5
0
FCFS
1111
1211
1114
Scheduling configuration

In most cases, achieves Jains metric of
fairness gt 0.98 (1 is fair)

92
Grid-Aware Mechanisms

Basic functionality
Users and groups Dynamic creation / deletion
does not need administrative intervention
Access control Generic AFS-style ACLs
Advanced functionality
QoS Preferential scheduling
Advertises into global scheduling systems
Flexible protocol and authentication mechanisms
Self-cleaning storage guarantees Lots

93
Storage Guarantees Lots

Characteristics of Lots
Capacity Total amount of data lot can store
Duration Time for which data is guaranteed to
exist
Set of files Multiple files may co-exist within
lot
Self-cleaning
Expired lots become best-effort lots
Lot management
Either default set created by administrator,
ORuse resource management protocol to create
before usage
Implementation File system quotas
Advantage Integrates cleanly with local access
methods
Disadvantage Performance hit for large writes