Title: Peer-to-Peer Supported Cache System for File Transfer
1Peer-to-Peer SupportedCache System for File
Transfer
- 2003.8.28
- Joonbok Lee
- KAIST
- jblee_at_cosmos.kaist.ac.kr
2Contents
- Motivation
- Problem Statement
- Related Work
- Approach
- Simulation
- Conclusion
- Reference
31. Motivation
- KAIST Netflow Measurement (2002.10.4)
- Analyze the flow data of KAIST Border Router.
10MB
Fig 1. The byte ratio in terms of Protocols
Fig 2. Cumulative Distribution Function of the
files transferred by FTP and HTTP.
- Some Findings
- The amount of bandwidth consumed by FTP is
similar with the one consumed by HTTP - 78 of the FTP traffic is due to the large files
which is larger than 10MB.
1/17
42. Problem Statement
- Unnegligible access to the large multimedia data.
Jung00 - FTP Traffic
- 17 of total traffic.
- 78 of them are larger than 10MB.
- 11 of them were failed during transfer.
- The large files transferred by FTP generate much
traffic, and many of them takes long time. - To solve this problem, we propose HTTP/FTP proxy
cache which is scalable in terms of bandwidth and
storage.
2/17
53. Related Work
- The researches which solve large files transfer.
- RepliCache A New Approach to Scalable Networking
Storage System for Large Objects Jung97 - Proactive Web caching with cumulative prefetching
Jung00 - The researches which has scalable architecture.
- Squirrel A decentralized peer-to-peer web cache
Iyer02 - Peer-to-Peer Caching Scheme to Address Flash
CrowdsStading02
3/17
64. Approach
- 4.1 Motivation
- 4.2 Cache with Peer-to-Peer Storage
- 4.3 Model
- 4.4 Detail Design
4/17
74.1 Motivation
- Peer-to-Peer Architecture as a Cache
- Scalability (bandwidth, computing power and
storage) - Cost
- Overhead (to find object and to persist system)
- The Latency
- One of the important metric of cache performance.
- the lookup time delivery time
- Delivery time is depend on the file size.
- Small files the lookup time dominate Large
files the deliver time dominate
5/17
84.2 Cache with Peer-to-Peer Storage
- Hybrid Approach
- Scalability peer-to-peer storage
- Lookup and control central cache.
- Peer-to-Peer two-layer storage
- The storage in central cache
- Expected to be always available, low latency.
- Store small files.
- The second tier storages
- can be unavailable.
- Store large files.
6/17
94.3 Model
Os1
Os1
OL1
Connectivity Cloud
Web Proxy Cache with FTP supporting module
Os1
,Os2
Local Area Network
Peer-to-Peer Storage
Peer n
Peer 1
Peer 2
OL1
OL2
OL1
OL1
OS1 ,OS2 Small objectOL1, OL2 Large object
Fig 3. Cache with two-layer storage
7/17
104.4 Detail Design
- 2 new components to support FTP and large files.
- Preserve transparency of File Location
- FTP Cache Daemon
- Store the state of FTP connection
- Make the URL of files transferred by FTP
- Check consistency.
- P2P Storage Manager
- Control its own storage.
- Managed by object table in central cache.
FTP/HTTP Server
Control
Data
3
4
Object Table
StorageManager
HTTP Cache Daemon
FTP Cache Daemon
1
FTP/HTTP Client
P2P Storage Manager
FTP/HTTP Client
P2P Storage Manager
Fig 4. Control and Data connection between
components
8/17
115. Simulation
- 5.1 Simulation Environment
- 5.2 Simulation Result
9/17
125.1 Simulation Environment
- Trace
- Requested FTP file list
- Gather the FTP control (port 21) packet and
produce the trace - 2002.10.23 2002.11.5 ( two weeks )
- 76,880 (783GB) file requests.
- 417 clients
- Assumption
- Local Network 100Mbps
- Simulated Caches
- Cache A 100GB Storage, 100Mbps
- Cache B Infinite Storage, 100Mbps
- Cache C Infinite Storage, Infinite Bandwidth
- Cache D Cache with Peer-to-Peer Storage
10/17
135.2 Simulation Result Hit Ratio
Fig 5. Cache Hit Ratio
Fig 6. Outbound traffic
- No strict storage control
- Some peers may have same files in their storage
- Even though some peers have available storage,
the other peers can remove the file from their
cache as a victim. - degrade the performance of storage, but not
much.
11/17
145.2 Simulation Result Latency
Fig 7. Average latency of 95105MB files
Fig 8. Average latency of 95105KB files
Without the increase of small files latency, we
can reduce the latency of large files.
12/17
155.2 Simulation Result Cache Hit Ratio
degradation by the peer failure
30
Fig 8. Cache hit ratio degradation by the peer
failure
13/17
166. Conclusion
- Shows that much amount of traffic is produced by
FTP by the measurement. Among them,78 were
occurred by the files larger than 10MB. - Propose the cache system which has two-layer
storage using peer-to-peer architecture. It is
transparent to the location of files. - Shows that two layer storage has good performance
for the large files as well as small files using
trace-driven simulation. - Can reduce the outbound traffic and latency by
caching using our sistem. - Other issues
- Collaboration between proposed systems.
- Load balancing between peers.
- Security problem.
15/17
177. Reference
- Jaeyeon Jung, RepliCache Enhancing Web Caching
Architecture with Replication of Large Objects - Jaeyeon Jung, Dongman Lee and Kilnam Chon,
"Proactive Web Caching with Cumulative
Prefetching for Large Multimedia Data" , Computer
Networks 33 (2000) pp. 645-655 - Sitaram Iyer, Ant Rowstron and Peter Druschel,
Squirrel A decentralized peer-to-peer web
cache In Proceedings of the PODC 02, Monterey,
CA - Tyron Stading, Petros Maniatis, Mary Baker,
Peer-to-Peer Caching Schemes to Address Flash
Crowds, In Proceedings of the IPTPS 02, MA, USA - Hyun-chul Kim, Joonbock Lee, Jungwon Suh, and
Kilnam Chon, Measurements of File-Systems
Deployed on High-Performance Research and
Education Networks, Technical Report - I.Stoica , R. Morris, D. Karger, F.Kaas hoek, and
H.Balakrishnan. Chord A scalable
content-addressable network. In Proceedings of
the ACM SIGCOMM 2001 Technical Conference, San
Diego, CA, USA, August 2001 - S. Ratnasamy, P. Francis, M. Handley, R. Karp,
and S. Shenker. A scalable content-addressable
network. In Proceedings of the ACM SIGCOMM 2001
Technical Conference, San Diego, CA, USA, August
2001.
16/17
187. Reference
- A. Rowstron and P. Druschel, "Pastry Scalable,
distributed object location and routing for
large-scale peer-to-peer systems". IFIP/ACM
International Conference on Distributed Systems
Platforms (Middleware), Heidelberg, Germany,
pages 329-350, November, 2001. - Ian Clarke, Theodore W. Hong, Scott G. Miller,
Oskar Sandberg, and Brandon Wiley, "Protecting
Free Expression Online with Freenet," IEEE
Internet Computing 6(1), January/February 2002. - William J. Bolosky, John R. Douceur, David Ely,
and Marvin Theimer, Feasibility of a Serverless
Distributed File System Deployed on an Existing
Set of Desktop PCs In proceeding of SIGMETRICS
2000 - Internet RFC 959 File Transfer Protocol
17/17
19Request File
Appendix A
HTTP
Handle a request like web proxy cache
Check Protocol
FTP
not cached
Lookup Object Table
cached
inconsistent
small
Check Consistency
Check File Size
Large
consistent
central server
Check Cached Location
peer
Opens data connection between server and client
Open FTP control connections to both peer which
has file and peer which requests is.
Central cache opens data connection to client.
Server opens data connection to central cache.
Transfer file
Transfer file
Central cache opens data connection to client.
Make FTP data connections between two the peers.
Update Object Table
Transfer file
Transfer file
Update Object Table
Update Object Table