Title: The Storage Fabric of the Grid: The Network Storage Stack
1The Storage Fabric of the GridThe Network
Storage Stack
James S. Plank Director Logistical Computing
and Internetworking (LoCI) Laboratory Department
of Computer Science University of Tennessee
Cluster and Computational Grids for Scientific
Computing September 12, 2002, Le Chateau de
Faverges de la Tour, France
2Grid Research The Fabric Layer
Application
Middleware
The Fabric Layer
Resources
3What is the Fabric Layer?
- Networking TCP/IP
- Storage Files in a file system
- Computation Processes managed by an OS
4What is the Fabric Layer?
- Networking TCP/IP
- Storage Files in a file system
- Computation Processes managed by an OS
Most Grid research accepts these as
givens. (Examples MPI, GridFTP)
5LoCIs Research Agenda
Redefine the fabric layer based on End-to-End
Principles
Application
Application
Application
LoRS
LoRS
Transport
exNode
exProc
Network
IBP Depot
IBP NFU
Data / Link / Physical
Access / Physical
Access / Physical
Communication
Storage
Computation
6What Should This Get You?
- Scalabililty
- Flexibility
- Fault-tolerance
- Composability
I.E. Better Grids
7LoCI Lab Personnel
Directors Jim Plank Micah Beck Exec
Director Terry Moore Grad Students Erika
Fuentes Sharmila Kancherla Xiang Li Linzhen Xuan
Research Staff Scott Atchley Alexander
Bassi Ying Ding Hunter Hagewood Jeremy
Millar Stephen Soltesz Yong Zheng Undergrad
Students Isaac Charles Rebecca Collins Kent
Galbraith Dustin Parr
8Collaborators
- Jack Dongarra (UT - NetSolve, Linear Algebra)
- Rich Wolski (UCSB - Network Weather Service)
- Fran Berman (UCSD/NPACI - Scheduling)
- Henri Casanova (UCSD/NPACI - Scheduling)
- Laurent LeFevre (INRAI/ENS - Multicast, Active
Networking)
9The Network Storage Stack
Applications
- A Fundamental Organizing Principle
- Like the IP Stack
- Each level encapsulates details from the lower
levels, while still exposing details to higher
levels
Logistical File System
Logistical Tools
L-Bone
exNode
IBP
Local Access
Physical
10The Network Storage Stack
Applications
- A Fundamental Organizing Principle
- Like the IP Stack
- Each level encapsulates details from the lower
levels, while still exposing details to higher
levels
Logistical File System
Logistical Tools
L-Bone
exNode
IBP
Local Access
Physical
11The Network Storage Stack
LoRS The Logistical Runtime System Aggregation
tools and methodologies
The L-bone Resource discovery proximity queries
The exNode A data structure for aggregation
IBP (Internet Backplane Protocol) Allocating
and managing network storage
12IBP The Internet Backplane Protocol
Low-level primitives and software for
- Managing and using state in the network.
- Inserting storage in the network so that
- Applications may use it advantageously.
- Storage owners do not lose control of their
resources. - The whole system is truly scalable and
fault-tolerant
13The Byte ArrayIBPs Unit of Storage
- You can think of it as a buffer.
- You can think of it as a file.
- Append-only semantics.
- Transience built in.
14The IBP Client API
- Can be used by anyone who can talk to the
server. - Seven procedure calls in three categories
- Allocation (1)
- Data transfer (4)
- Management (2)
- not really, but close...
15Client API Allocation
- IBP_allocate(char host, int maxsize,
IBP_attributes attr) - Like a network malloc()
- Returns a trio of capabilities.
- Read / Write / Manage
- ASCII Strings (obfuscated)
- No user-defined file names
- Big flat name space.
- No registration required to pass capabilities.
16Allocation Attributes
- Time-Limited or Permanent
- Soft or Hard
- Read/Write semantics
- Byte Array
- Pipe
- Circular Queue
17Client API Data Transfer
2-party
- IBP_store(write-cap, bytes, size, ...)
- IBP_deliver(read-cap, pointer, size, ...)
- IBP_copy(read-cap, write-cap, size, ...)
- IBP_mcopy(...)
3-party
N-party/other things
18IBP Client API Management
- IBP_manage()/IBP_status()
- Allows for resizing byte arrays.
- Allows for extending/shortening the time limit on
time-limited allocations. - Manages reference counts on the read/write
capabilities. - State probing.
19IBP Servers
- Daemons that serve local disk or memory.
- Root access not required.
- Can specify sliding time limits or revokability.
- Encourages resource sharing.
20Typical IBP usage scenario
21Logistical Networking Strategies
Sender
1
Receiver
IBP
Network
Sender
IBP
2
Receiver
Network
Sender
IBP
3
Receiver
IBP
IBP
Sender
4
IBP
Receiver
IBP
IBP
22XSuffrage on MCell/APST
(NetSolveIBP) (GRAMGASS) (NetSolveNFS)
NWS
Tokyo Institute of Technology
NetSolve IBP
APST Daemon APST Client
NetSolve NFS
23MCell/APST Experimental Results
- Experimental Setting
- MCell simulation with 1,200 tasks
- composed of 6 Monte-Carlo Simulations
- input files 1, 20, 100 MB
- 4 scenarios Initially
- (a) all input files are only in Japan
- (b) 100MB files staged in California
- (c) in addition, one 100MB file
- staged in Tennessee
- (d) all input files replicated everywhere
24The Network Storage Stack
LoRS The Logistical Runtime System Aggregation
tools and methodologies
The L-bone Resource Discovery Proximity queries
The exNode A data structure for aggregation
IBP Allocating and managing network storage
(like a network malloc)
25The Logistical Backbone (L-Bone)
- LDAP-based storage resource discovery.
- Query by capacity, network proximity,
geographical proximity, stability, etc. - Periodic monitoring of depots.
- Uses the Network Weather Service (NWS) for live
measurements and forecasting.
26Snapshot August, 2002
Approximately 1.6 TB of publicly accessible
storage (Scaling to a petabyte someday)
27The Network Storage Stack
LoRS The Logistical Runtime System Aggregation
tools and methodologies
The L-bone Resource Discovery Proximity queries
The exNode A data structure for aggregation
IBP Allocating and managing network storage
(like a network malloc)
28The exNode
- The Network File Pointer.
- Analogous to the Unix inode.
- Map byte-extents to IBP buffers (or other
allocations). - XML-based data structure/serialization.
- Allows for replication, flexible decomposition of
data. - Also allows for end-to-end services.
- Arbitrary metadata.
29The exNode (XML-based)
IBP Depots
Network
0
100
200
300
A
B
C
30The Network Storage Stack
LoRS The Logistical Runtime System Aggregation
tools and methodologies
The L-bone Resource Discovery Proximity queries
The exNode A data structure for aggregation
IBP Allocating and managing network storage
(like a network malloc)
31Logistical Runtime System
- Aggregation for
- Capacity
- Performance (striping)
- More performance (caching)
- Reliability (replication)
- More reliability (ECC)
- Logistical purposes (routing)
32Logistical Runtime System
- Basic Primitives
- Upload Create a network file from local data
- Download Get bytes from a network file.
- Augment Add more replicas to a network file.
- Trim Remove replicas from a network file.
- Stat Get information about the network file.
- Refresh Alter the time limits of the IBP buffers.
33Upload
34Augment to Tennessee
35Augment to Santa Barbara
36Stat (ls)
37Failures do happen.
38Download
39Trimming(dead capability removal)
40End-To-End Services
- MD5 Checksums stored per exNode block to detect
corruption. - Encryption is a per-block option.
- Compression is an per-block option.
- Parity/Coding is in the design.
41Parity / Coding
IBP Buffers
Network
2
3
ExNode with Coding
42Scalability
- No bottlenecks
- Really hard problems left unsolved, but for the
most part, the lower levels shouldnt need
changing. - Naming
- Good scheduling
- Consistency / File System semantics
- Computation
43Status
Applications
- IBP/L-Bone/exNode/Tools all supported.
- Apps Mail, IBP-ster, Video IBP-ster, IBPvo --
demo at SC-02 - Other institutions (see L-Bone)
Logistical File System
Logistical Tools
L-Bone
exNode
IBP
Local Access
Physical
44Whats Coming Up?
- More nodes on the L-Bone
- More collaboration with applications groups
- Research on performance and scheduling
- Logistical File System
- A Computation Stack
- Code / Information at loci.cs.utk.edu
45The Storage Fabric of the GridThe Network
Storage Stack
James S. Plank Director Logistical Computing
and Internetworking (LoCI) Laboratory Department
of Computer Science University of Tennessee
Cluster and Computational Grids for Scientific
Computing September 12, 2002, Le Chateau de
Faverges de la Tour, France
46Replication Experiment 1
Harvard
UCSB
UTK
UNC
UCSD
TAMU
Turin, IT Stuttgart, DE
3 MB file
47Replication Experiment 1
Depot Availability at UTK
Depot Availability at UCSD
Depot Availability at Harvard
860 Download Attempts 100 Success
857 Download Attempts 100 Success
751 Download Attempts 100 Success
48Most Frequent Download Path
From UTK
From Harvard
From UCSD
49Replication Experiment 2
- Deleted 12 of the 21 IBP allocations
- Downloaded from UTK
1,225 Attempts 93.88 Success
3 MB file