Advanced Operating Systems

About This Presentation

Title:

Advanced Operating Systems

Description:

Advanced Operating Systems Lecture 9: Distributed Systems Architecture University of Tehran Dept. of EE and Computer Engineering By: Dr. Nasser Yazdani – PowerPoint PPT presentation

Number of Views:883

Avg rating:3.0/5.0

Slides: 146

Provided by: Larry415

Category:

more less

Transcript and Presenter's Notes

Title: Advanced Operating Systems

1
Advanced Operating Systems
Lecture 9 Distributed Systems Architecture

University of Tehran
Dept. of EE and Computer Engineering
By
Dr. Nasser Yazdani

2
Covered topic

Distributed Systems Architectures
References
Chapter 2 of the text book
Anatomy of Grid

3
Outline

Distributed Systems Architecture
Client-server
Grid computing
Peer to peer Computing
Cloud Computing

4
Architectural Models

Concerned with
The placement of the components across a network
of computers
The interrelationships between the components
Common Architectures
Client server, Web
Grid
Peer to peer
Cloud

5
Clients and Servers

General interaction between a client and a server.

1.25
6
Processing Level

The general organization of an Internet search
engine into three different layers

1-28
7
Multitiered Architectures (1)

Alternative client-server organizations (a) (e).

1-29
8
Multitiered Architectures (2)

An example of a server acting as a client.

1-30
9
Client-Server

Creating for example a hotmail? What are the
options?
One server?
Several servers?

10
Multiple Servers
11
HTTP Basics (Review)

HTTP layered over bidirectional byte stream
Almost always TCP
Interaction
Client sends request to server, followed by
response from server to client
Requests/responses are encoded in text
Stateless
Server maintains no information about past client
requests

12
How to Mark End of Message? (Review)

Size of message ? Content-Length
Must know size of transfer in advance
Delimiter ? MIME-style Content-Type
Server must escape delimiter in content
Close connection
Only server can do this

13
HTTP Request (review)

Request line
Method
GET return URI
HEAD return headers only of GET response
POST send data to the server (forms, etc.)
URL (relative)
E.g., /index.html
HTTP version

14
HTTP Request (cont.) (review)

Request headers
Authorization authentication info
Acceptable document types/encodings
From user email
If-Modified-Since
Referrer what caused this page to be requested
User-Agent client software
Blank-line
Body

15
HTTP Request (review)
16
HTTP Request Example (review)

GET / HTTP/1.1
Accept /
Accept-Language en-us
Accept-Encoding gzip, deflate
User-Agent Mozilla/4.0 (compatible MSIE 5.5
Windows NT 5.0)
Host www.intel-iris.net
Connection Keep-Alive

17
HTTP Response (review)

Status-line
HTTP version
3 digit response code
1XX informational
2XX success
200 OK
3XX redirection
301 Moved Permanently
303 Moved Temporarily
304 Not Modified
4XX client error
404 Not Found
5XX server error
505 HTTP Version Not Supported
Reason phrase

18
HTTP Response (cont.) (review)

Headers
Location for redirection
Server server software
WWW-Authenticate request for authentication
Allow list of methods supported (get, head,
etc)
Content-Encoding E.g x-gzip
Content-Length
Content-Type
Expires
Last-Modified
Blank-line
Body

19
HTTP Response Example (review)

HTTP/1.1 200 OK
Date Tue, 27 Mar 2001 034938 GMT
Server Apache/1.3.14 (Unix) (Red-Hat/Linux)
mod_ssl/2.7.1 OpenSSL/0.9.5a DAV/1.0.2
PHP/4.0.1pl2 mod_perl/1.24
Last-Modified Mon, 29 Jan 2001 175418 GMT
ETag "7a11f-10ed-3a75ae4a"
Accept-Ranges bytes
Content-Length 4333
Keep-Alive timeout15, max100
Connection Keep-Alive
Content-Type text/html
..

20
Typical Workload (Web Pages)

Multiple (typically small) objects per page
File sizes
Heavy-tailed
Pareto distribution for tail
Lognormal for body of distribution
-- For reference/interest only --
Embedded references
Number of embedded objects
pareto p(x) akax-(a1)

21
HTTP 0.9/1.0 (mostly review)

One request/response per TCP connection
Simple to implement
Disadvantages
Multiple connection setups ? three-way handshake
each time
Several extra round trips added to transfer
Multiple slow starts

22
Single Transfer Example

Client

Server
SYN
0 RTT
SYN
Client opens TCP connection
1 RTT
ACK
DAT
Client sends HTTP request for HTML
Server reads from disk
ACK
DAT
FIN
2 RTT
ACK
Client parses HTML Client opens TCP connection
FIN
ACK
SYN
SYN
3 RTT
ACK
DAT
Client sends HTTP request for image
Server reads from disk
ACK
4 RTT
DAT
Image begins to arrive
23
More Problems

Short transfers are hard on TCP
Stuck in slow start
Loss recovery is poor when windows are small
Lots of extra connections
Increases server state/processing
Server also forced to keep TIME_WAIT connection
state
-- Things to think about --
Why must server keep these?
Tends to be an order of magnitude greater than
of active connections, why?

24
Persistent Connection Solution (review)

Multiplex multiple transfers onto one TCP
connection
How to identify requests/responses
Delimiter ? Server must examine response for
delimiter string
Content-length and delimiter ? Must know size of
transfer in advance
Block-based transmission ? send in multiple
length delimited blocks
Store-and-forward ? wait for entire response and
then use content-length
Solution ? use existing methods and close
connection otherwise

25
Persistent Connection Example (review)

Client

Server
0 RTT
DAT
Server reads from disk
Client sends HTTP request for HTML
ACK
DAT
1 RTT
ACK
Client parses HTML Client sends HTTP request for
image
DAT
Server reads from disk
ACK
DAT
2 RTT
Image begins to arrive
26
Persistent HTTP (review)

Nonpersistent HTTP issues
Requires 2 RTTs per object
OS must work and allocate host resources for each
TCP connection
But browsers often open parallel TCP connections
to fetch referenced objects
Persistent HTTP
Server leaves connection open after sending
response
Subsequent HTTP messages between same
client/server are sent over connection

Persistent without pipelining
Client issues new request only when previous
response has been received
One RTT for each referenced object
Persistent with pipelining
Default in HTTP/1.1
Client sends requests as soon as it encounters a
referenced object
As little as one RTT for all the referenced
objects

27
HTTP Caching

Clients often cache documents
Challenge update of documents
If-Modified-Since requests to check
HTTP 0.9/1.0 used just date
HTTP 1.1 has an opaque entity tag (could be a
file signature, etc.) as well
When/how often should the original be checked for
changes?
Check every time?
Check each session? Day? Etc?
Use Expires header
If no Expires, often use Last-Modified as estimate

28
Example Cache Check Request

GET / HTTP/1.1
Accept /
Accept-Language en-us
Accept-Encoding gzip, deflate
If-Modified-Since Mon, 29 Jan 2001 175418 GMT
If-None-Match "7a11f-10ed-3a75ae4a"
User-Agent Mozilla/4.0 (compatible MSIE 5.5
Windows NT 5.0)
Host www.intel-iris.net
Connection Keep-Alive

29
Ways to cache

Client-directed caching
Web Proxies
Server-directed caching
Content Delivery Networks (CDNs)

30
Web Proxy Caches

User configures browser Web accesses via cache
Browser sends all HTTP requests to cache
Object in cache cache returns object
Else cache requests object from origin server,
then returns object to client

origin server
Proxy server
HTTP request
HTTP request
client
HTTP response
HTTP response
HTTP request
HTTP response
client
origin server
31
Caching Example (1)

Assumptions
Average object size 100,000 bits
Avg. request rate from institutions browser to
origin servers 15/sec
Delay from institutional router to any origin
server and back to router 2 sec
Consequences
Utilization on LAN 15
Utilization on access link 100
Total delay Internet delay access delay
LAN delay
2 sec minutes milliseconds

origin servers
public Internet
1.5 Mbps access link
institutional network
10 Mbps LAN
32
Caching Example (2)

Possible solution
Increase bandwidth of access link to, say, 10
Mbps
Often a costly upgrade
Consequences
Utilization on LAN 15
Utilization on access link 15
Total delay Internet delay access delay
LAN delay
2 sec msecs msecs

origin servers
public Internet
10 Mbps access link
institutional network
10 Mbps LAN
33
Caching Example (3)

Install cache
Suppose hit rate is .4
Consequence
40 requests will be satisfied almost immediately
(say 10 msec)
60 requests satisfied by origin server
Utilization of access link reduced to 60,
resulting in negligible delays
Weighted average of delays
.62 sec .410msecs lt 1.3 secs

origin servers
public Internet
1.5 Mbps access link
institutional network
10 Mbps LAN
institutional cache
34
Problems

Over 50 of all HTTP objects are uncacheable
why?
Not easily solvable
Dynamic data ? stock prices, scores, web cams
CGI scripts ? results based on passed parameters
Obvious fixes
SSL ? encrypted data is not cacheable
Most web clients dont handle mixed pages well
?many generic objects transferred with SSL
Cookies ? results may be based on passed data
Hit metering ? owner wants to measure of hits
for revenue, etc.
What will be the end result?

35
Content Distribution Networks (CDNs)

The content providers are the CDN customers.
Content replication
CDN company installs hundreds of CDN servers
throughout Internet
Close to users
CDN replicates its customers content in CDN
servers. When provider updates content, CDN
updates servers

origin server in North America
CDN distribution node
CDN server in S. America
CDN server in Asia
CDN server in Europe
36
Content Distribution Networks Server Selection

Replicate content on many servers
Challenges
How to replicate content
Where to replicate content
How to find replicated content
How to choose among know replicas
How to direct clients towards replica

37
Server Selection

Which server?
Lowest load ? to balance load on servers
Best performance ? to improve client performance
Based on Geography? RTT? Throughput? Load?
Any alive node ? to provide fault tolerance
How to direct clients to a particular server?
As part of routing ? anycast, cluster load
balancing
As part of application ? HTTP redirect
As part of naming ? DNS

38
Application Based

HTTP supports simple way to indicate that Web
page has moved (30X responses)
Server receives Get request from client
Decides which server is best suited for
particular client and object
Returns HTTP redirect to that server
Can make informed application specific decision
May introduce additional overhead ? multiple
connection setup, name lookups, etc.
OK solution in general, but
HTTP Redirect has some flaws especially with
current browsers
Incurs many delays, which operators may really
care about

39
Naming Based

Client does DNS name lookup for service
Name server chooses appropriate server address
A-record returned is best one for the client
What information can name server base decision
on?
Server load/location ? must be collected
Information in the name lookup request
Name service client ? typically the local name
server for client

40
How Akamai Works

Clients fetch html document from primary server
E.g. fetch index.html from cnn.com
URLs for replicated content are replaced in html
E.g. ltimg srchttp//cnn.com/af/x.gifgt replaced
with ltimg srchttp//a73.g.akamaitech.net/7/23/cn
n.com/af/x.gifgt
Client is forced to resolve aXYZ.g.akamaitech.net
hostname

41
How Akamai Works

How is content replicated?
Akamai only replicates static content ()
Modified name contains original file name
Akamai server is asked for content
First checks local cache
If not in cache, requests file from primary
server and caches file
(At least, the version were talking about
today. Akamai actually lets sites write code
that can run on Akamais servers, but thats a
pretty different beast)

42
How Akamai Works

Root server gives NS record for akamai.net
Akamai.net name server returns NS record for
g.akamaitech.net
Name server chosen to be in region of clients
name server
TTL is large
G.akamaitech.net nameserver chooses server in
region
Should try to chose server that has file in cache
- How to choose?
Uses aXYZ name and hash
TTL is small ? why?

43
Simple Hashing

Given document XYZ, we need to choose a server to
use
Suppose we use modulo
Number servers from 1n
Place document XYZ on server (XYZ mod n)
What happens when a servers fails? n ? n-1
Same if different people have different measures
of n
Why might this be bad?

44
Consistent Hash

view subset of all hash buckets that are
visible
Desired features
Balanced in any one view, load is equal across
buckets
Smoothness little impact on hash bucket
contents when buckets are added/removed
Spread small set of hash buckets that may hold
an object regardless of views
Load across all views of objects assigned to
hash bucket is small

45
Consistent Hash Example

Construction
Assign each of C hash buckets to random points on
mod 2n circle, where, hash key size n.
Map object to random position on circle
Hash of object closest clockwise bucket

0
14
Bucket
4
12
8

Smoothness ? addition of bucket does not cause
movement between existing buckets
Spread Load ? small set of buckets that lie
near object
Balance ? no bucket is responsible for large
number of objects

46
How Akamai Works
cnn.com (content provider)
DNS root server
Akamai server
Get foo.jpg
12
11
Get index.html
5
1
2
3
Akamai high-level DNS server
6
4
Akamai low-level DNS server
7
Nearby matchingAkamai server
8
9

End-user

10
Get /cnn.com/foo.jpg
47
Akamai Subsequent Requests
cnn.com (content provider)
DNS root server
Akamai server
Get index.html
1
2
Akamai high-level DNS server
Akamai low-level DNS server
7
8
Nearby matchingAkamai server
9

End-user

10
Get /cnn.com/foo.jpg
48
Impact on DNS Usage

DNS is used for server selection more and more
What are reasonable DNS TTLs for this type of use
Typically want to adapt to load changes
Low TTL for A-records ? what about NS records?
How does this affect caching?
What do the first and subsequent lookup do?

49
HTTP (Summary)

Simple text-based file exchange protocol
Support for status/error responses,
authentication, client-side state maintenance,
cache maintenance
Workloads
Typical documents structure, popularity
Server workload
Interactions with TCP
Connection setup, reliability, state maintenance
Persistent connections
How to improve performance
Persistent connections
Caching
Replication

50
Grid

What is Grid?
Grid Projects Applications
Grid Technologies
Globus
CompGrid

51
(No Transcript)
52
(No Transcript)
53
Definition

A type of parallel and distributed system that
enables the sharing, selection, aggregation of
geographically distributed resources
Computers PCs, workstations, clusters,
supercomputers, laptops, notebooks, mobile
devices, PDA, etc
Software e.g., ASPs renting expensive special
purpose applications on demand
Catalogued data and databases e.g. transparent
access to human genome database
Special devices/instruments e.g., radio
telescope SETI_at_Home searching for life in
galaxy.
People/collaborators.
depending on their availability, capability,
cost, and user QoS requirements
for solving large-scale problems/applications.
thus enabling the creation of virtual
organization (VOs)

54
Resources assets, capabilities, and knowledge

Capabilities (e.g. application codes, analysis
tools)
Compute Grids (PC cycles, commodity clusters,
HPC)
Data Grids
Experimental Instruments
Knowledge Services
Virtual Organisations
Utility Services

55
Why go Grid?

Hot subject
Try it, experience it to learn the potential
Will enable true ubiquitous computing in future
Today, proven in some areas intraGrids
But still long way to World Wide Grid
State of art techniques, tools are difficult
Short term goals? Use another technology
Does your system have Grid characteristics?
Distributed users, large scale and heterogeneous
resources, across domains

56
Grids main idea

To treat CPU cycles and software like
commodities.
Enable the coordinated use of geographically
distributed resources in the absence of central
control and existing trust relationships.
Computing power is produced much like utilities
such as power and water are produced for
consumers.
Users will have access to power on demand
When the Network is as fast as the computers
internal links, the machine disintegrates across
the Net into a set of special purpose appliances
Gilder Technology Report June 2000

57
Computational Grids and Electric Power Grids
58
(No Transcript)
59
(No Transcript)
60
(No Transcript)
61
(No Transcript)
62
(No Transcript)
63
What do users want ?

Grid Consumers
Execute jobs for solving varying problem size and
complexity
Benefit by selecting and aggregating resources
wisely
Tradeoff timeframe and cost
Grid Providers
Contribute (idle) resource for executing
consumer jobs
Benefit by maximizing resource utilisation
Tradeoff local requirements market opportunity

64
(No Transcript)
65
(No Transcript)
66
Grid Applications

Distributed HPC (Supercomputing)
Computational science.
High-Capacity/Throughput Computing
Large scale simulation/chip design parameter
studies.
Content Sharing (free or paid)
Sharing digital contents among peers (e.g.,
Napster)
Remote software access/renting services
Application service provides (ASPs) Web
services.
Data-intensive computing
Drug Design, Particle Physics, Stock
Prediction...
On-demand, real-time computing
Medical instrumentation Mission Critical.
Collaborative Computing
Collaborative design, Data exploration,
education.
Service Oriented Computing (SOC)
Towards economic-based Utility Computing New
paradigm, new applications, new industries, and
new business.

67
Grid Projects

Australia
Nimrod-G
Gridbus
GridSim
Virtual Lab
DISCWorld
GrangeNet
..new coming up
Europe
UNICORE
Cactus
UK eScience
EU Data Grid
EuroGrid
MetaMPI
XtremeWeb
and many more.
India
I-Grid

USA
Globus
Legion
OGSA
Sun Grid Engine
AppLeS
NASA IPG
Condor-G
Jxta
NetSolve
AccessGrid
and many more...
Cycle Stealing .com Initiatives
Distributed.net
SETI_at_Home, .
Entropia, UD, Parabon,.
Public Forums
Global Grid Forum
Australian Grid Forum

68
Grid Requirements

Identity authentication
Authorization policy
Resource discovery
Resource characterization
Resource allocation
(Co-)reservation, workflow
Distributed algorithms
Remote data access
High-speed data transfer

Performance guarantees
Monitoring Adaptation
Intrusion detection
Resource management
Accounting payment
Fault management
System evolution
Etc.

69
Resource ManagementProblem

Enabling secure, controlled remote access to
computational resources and management of remote
computation
Authentication and authorization
Resource discovery characterization
Reservation and allocation
Computation monitoring and control

70
Grid-based Computation Challenges

Locate suitable computers
Authenticate with appropriate sites
Allocate resources on those computers
Initiate computation on those computers
Configure those computations
Select appropriate communication methods
Compute with suitable algorithms
Access data files, return output
Respond appropriately to resource changes

71
Leading Grid Middleware Developments

Globus Toolkit (mainly developed at ANL and USC)
Service-oriented toolkit from the Globus
project,to be used in Grid applications, not
targeted at end-user
Services for resource selection and allocation,
authentication, file system access and file
transfer,
Largest user-base in projects worldwide
Open-source software, commercial support by IBM
and Platform Computing

72
The Globus Alliance

Globus Project , since 1996
Ian Foster (Argonne National Lab),
Carl Kesselman (University of Southern
Californias Information Science Institute)
Develop protocols, middleware and tools for Grid
computing
Globus Alliance, since Sept 2003
International scope
University of Edinburghs EPCC
Swedish Center for Parallel Computers (PDC)
Advisory council of Academic Affiliates from
Asia-Pacific, Europe, US

73
Globus Toolkit

GT2 (2.4 released in 2002) reference
implementation of Grid fabric protocols
GRAM for job submissions
MDS for resource discovery
GridFTP for data transfer
GSI security
GT3 (3.0 released July 2003) redesign
OGSI based
Grid services, built on SOAP and XML
GT3.2 released March 31, 2004

74
Globus Toolkit Services

Job submission and management (GRAM)
Uniform Job Submission
Security (GSI)
PKI-based Security (Authentication) Service
Information services (MDS)
LDAP-based Information Service
Remote file management (GASS) and transfer
(GridFTP)
Remote Storage Access Service
Remote Data Catalogue and Management Tools
Support by Globus 2.0 released in 2002
Resource selection and allocation (GIIS, GRIS)

75
Resource Specification Language

Common notation for exchange of information
between components
Syntax similar to MDS/LDAP filters
RSL provides two types of information
Resource requirements Machine type, number of
nodes, memory, etc.
Job configuration Directory, executable, args,
environment
API provided for manipulating RSL

76
Some Useful Definitions

Network Protocol
A formal description of message formats and a set
of rules for exchange of messages
Rules define sequences of message exchange, and
potentially resulting behavior
Protocol may define state-change in endpoint
Network Enabled Services
Defines a set of capabilities
Protocol defines interaction with service
All services require protocols, although not all
protocols are to services

77
More definitions

Resource
Entity that is to be shared
Provides some capabilities, that can be accessed
via interface (API) or protocol
Application Programmer Interface (API)
Software Development Kit (SDK)
Package that enables application development,
consisting of one or more APIs, and programming
tools

78
Protocols Make the Grid

Protocols and APIs
Protocols enable interoperability
APIs enable portability
Sharing is about interoperability, so
Grid architecture should be about protocols

79
Grid Services Architecture Previous Perspective
a rich variety of applications ...
Applns
Appln Toolkits
Remote data toolkit
Remote sensors toolkit
Async. collab. toolkit
Remote viz toolkit
Remote comp. toolkit
...
Protocols, authentication, policy, resource
management, instrumentation, discovery, etc.,
etc.
Grid Services
Grid Fabric
Grid-enabled archives, networks, computers,
display devices, etc. associated local services

80
Characteristics of Grid Services Architecture

Identifies separation of concerns
Isolates Grids from languages and specific
programming environments
Makes provisions for generic and application
specific functionality
Protocols not explicit in architecture
fails to make clear distinction between language,
service and networking issues

81
Layered Grid Protocol Architecture
Application
User
Grid
Resource
Connectivity
Fabric
82
Important Points

Being Grid-enabled requires speaking appropriate
protocols
Protocol only requirement, not reachability
Protocols can be used to bridge local resources
or local Grids
Intergrid as analog to Internet
Built on Internet protocols
Independent of language and implementation
Focus on interaction over network
Services exist at each level

83
Protocols, services and interfaces
Applications
Languages/Frameworks
Connectivity APIs
Connectivity Protocols
Local Access APIs and protocols
Fabric Layer
84
How does Globus fit in?

Defines connectivity and resource protocols
Enables definition of grid and user protocols
Globus provides some of these, others defined by
other groups
Defines range of APIs and SDKs that leverage
Resource, Grid and User protocols

85
Fabric

Local access to logical resource
May be real component, e.g. CPU, software module,
filesystem
May be logical component, e.g. Condor pool
Protocol or API mediated
Fabric elements include
SSP, ASP, peer-to-peer, Entropia-like, and
enterprise level solutions

86
Connectivity Protocols

Two classes of connectivity protocols underlie
all other components
Internet communication
Application, transport and internet layer
protocols
I.e., transport, routing, DNS, etc.
Security
Authentication and delegation
Discussed below

87
Security

Protocols
TLS with delegation
Services
K5ssl, Globus Authorization Service
APIs
GSS-API, GAA, SASL, gss_assist
SDKs
GlobusIO

88
Resource Protocols

Resource management,
Storage system access
Network quality of service
Data movement
Resource information

89
Resource Management

Protocols
GRAMGARA (on HTTP)
Resource services
Gatekeeper, JobManager, SlotManager
APIs and SDKs
GRAM API, JavaCog Client, DUROC

90
Data Transport

Protocols
Grid FTP, LDAP for replica catalog
Services
FTP, LDAP replica catalog
APIs and SDKs
GridFTP client library, copy URL API, replica
catalog access, replica selection

91
Resource Information

Protocol
LDAP V3, Registration/Discovery protocol
Service
GRIS
APIs SDKs
C API JNDI, PerlLDAP, .

92
Grid Protocols

Grid Information Index Services
LDAP and Service registration protocol,
GIIS service
LDAP APIs and specialized information API
Co-allocation and brokering
GRAM (HTTPRSL)
DUROC service
DUROC client API, end-to-end reservation API

93
Grid Protocols (cont)

Online authentication, authorization services
HTTP
MyProxy, Group policy servers
Myproxy API, GAA API,
Many others (e.g.)
Resource discovery (Matchmaker)
Fault recovery

94
User Protocols

In general, there are many of these, they tend to
be on off, and not well defined
Examples
Portal toolkits (e.g. Hotpage)
Netsolve
Cactus framework

95
Why Study Peer to peer systems?

To understand how they work
To build your own peer to peer system
To understand the techniques and principles
within them
To modify, adapt, reuse these techniques and
principles in other related areas
Cloud computing
Sensor networks
To grow the body of knowledge about distributed
systems

96
Searching Fetching

Human I want to watch that great 80s cult
classic Better Off Dead
Search better off dead -gt better_off_dead.mov
or -gt 0x539fba83ajdeadbeef
Locate sources of better_off_dead.mov
Download the file from them

96
97
Searching
N2
N1
N3
Internet
Keytitle ValueMP3 data
?
Client
Publisher
Lookup(title)
N6
N4
N5
98
Search Approaches

Centralized
Flooding
A hybrid Flooding between Supernodes
Structured

98
99
Different types of searches

Needles vs. Haystacks
Searching for top 40, or an obscure punk track
from 1981 that nobodys heard of?
Search expressiveness
Whole word? Regular expressions? File names?
Attributes? Whole-text search?
(e.g., p2p gnutella or p2p google?)

100
Framework

Common Primitives
Join how to I begin participating?
Publish how do I advertise my file?
Search how to I find a file?
Fetch how to I retrieve a file?

101
Centralized

Centralized Database
Join on startup, client contacts central server
Publish reports list of files to central server
Search query the server gt return node(s) that
store the requested file

102
Napster Example Publish
I have X, Y, and Z!
123.2.21.23
103
Napster Search
123.2.0.18
Where is file A?
104
Napster Discussion

Pros
Simple
Search scope is O(1) for even complex searches
(one index, etc.)
Controllable (pro or con?)
Cons
Server maintains O(N) State
Server does all processing
Single point of failure
Technical failures legal (napster shut down
2001)

105
Query Flooding

Join Must join a flooding network
Usually, establish peering with a few existing
nodes
Publish no need, just reply
Search ask neighbors, who ask their neighbors,
and so on... when/if found, reply to sender.
TTL limits propagation

106
Example Gnutella
Where is file A?
107
Flooding Discussion

Pros
Fully de-centralized
Search cost distributed
Processing _at_ each node permits powerful search
semantics
Cons
Search scope is O(N)
Search time is O(???)
Nodes leave often, network unstable
TTL-limited search works well for haystacks.
For scalability, does NOT search every node. May
have to re-issue query later

108
Supernode Flooding

Join on startup, client contacts a supernode
... may at some point become one itself
Publish send list of files to supernode
Search send query to supernode, supernodes flood
query amongst themselves.
Supernode network just like prior flooding net

109
Supernode Network Design
110
Supernode File Insert
I have X!
123.2.21.23
111
Supernode File Search
Where is file A?
112
Supernode Which nodes?

Often, bias towards nodes with good
Bandwidth
Computational Resources
Availability!

113
Stability and Superpeers

Why superpeers?
Query consolidation
Many connected nodes may have only a few files
Propagating a query to a sub-node would take more
b/w than answering it yourself
Caching effect
Requires network stability
Superpeer selection is time-based
How long youve been on is a good predictor of
how long youll be around.

114
Superpeer results

Basically, just better than flood to all
Gets an order of magnitude or two better scaling
But still fundamentally o(search) o(per-node
storage) O(N)
central O(1) search, O(N) storage
flood O(N) search, O(1) storage
Superpeer can trade between

114
115
Structured SearchDistributed Hash Tables

Academic answer to p2p
Goals
Guatanteed lookup success
Provable bounds on search time
Provable scalability
Makes some things harder
Fuzzy queries / full-text search / etc.
Read-write, not read-only
Hot Topic in networking since introduction in
2000/2001

116
Searching Wrap-Up
Type O(search) storage Fuzzy?
Central O(1) O(N) Yes
Flood O(N) O(1) Yes
Super lt O(N) gt O(1) Yes
Structured O(log N) O(log N) not really
117
DHT Overview

Abstraction a distributed hash-table (DHT)
data structure
put(id, item)
item get(id)
Implementation nodes in system form a
distributed data structure
Can be Ring, Tree, Hypercube, Skip List,
Butterfly Network, ...

118
DHT Overview (2)

Structured Overlay Routing
Join On startup, contact a bootstrap node and
integrate yourself into the distributed data
structure get a node id
Publish Route publication for file id toward a
close node id along the data structure
Search Route a query for file id toward a close
node id. Data structure guarantees that query
will meet the publication.
Important difference get(key) is for an exact
match on key!
search(spars) will not find file(briney
spars)
We can exploit this to be more efficient

119
DHT Example - Chord

Associate to each node and file a unique id in an
uni-dimensional space (a Ring)
E.g., pick from the range 0...2m
Usually the hash of the file or IP address
Properties
Routing table size is O(log N) , where N is the
total number of nodes
Guarantees that a file is found in O(log N) hops

from MIT in 2001
120
DHT Consistent Hashing
Key 5
K5
Node 105
N105
K20
Circular ID space
N32
N90
K80
A key is stored at its successor node with next
higher ID
121
DHT Chord Basic Lookup
N120
N10
Where is key 80?
N105
N32
N90 has K80
N90
K80
N60
122
DHT Chord Finger Table
1/2
1/4
1/8
1/16
1/32
1/64
1/128
N80

Entry i in the finger table of node n is the
first node that succeeds or equals n 2i
In other words, the ith finger points 1/2n-i way
around the ring

123
Node Join

Compute ID
Use an existing node to route to that ID in the
ring.
Finds s successor(id)
ask s for its predecessor, p
Splice self into ring just like a linked list
p-gtsuccessor me
me-gtsuccessor s
me-gtpredecessor p
s-gtpredecessor me

123
124
DHT Chord Join

Assume an identifier space 0..8
Node n1 joins

Succ. Table
0
i id2i succ 0 2 1 1 3 1 2 5
1
1
7
2
6
3
5
4
125
DHT Chord Join

Node n2 joins

Succ. Table
0
i id2i succ 0 2 2 1 3 1 2 5
1
1
7
2
6
Succ. Table
i id2i succ 0 3 1 1 4 1 2 6
1
3
5
4
126
DHT Chord Join
Succ. Table
i id2i succ 0 1 1 1 2 2 2 4
0

Nodes n0, n6 join

Succ. Table
0
i id2i succ 0 2 2 1 3 6 2 5
6
1
7
Succ. Table
i id2i succ 0 7 0 1 0 0 2 2
2
2
6
Succ. Table
i id2i succ 0 3 6 1 4 6 2 6
6
3
5
4
127
DHT Chord Join
Succ. Table
Items
7
i id2i succ 0 1 1 1 2 2 2 4
0

Nodes n1, n2, n0, n6
Items f7, f2

0
Succ. Table
Items
1
1
7
i id2i succ 0 2 2 1 3 6 2 5
6
2
6
Succ. Table
i id2i succ 0 7 0 1 0 0 2 2
2
Succ. Table
i id2i succ 0 3 6 1 4 6 2 6
6
3
5
4
128
DHT Chord Routing
Succ. Table
Items
7
i id2i succ 0 1 1 1 2 2 2 4
0

Upon receiving a query for item id, a node
Checks whether stores the item locally
If not, forwards the query to the largest node in
its successor table that does not exceed id

0
Succ. Table
Items
1
1
7
i id2i succ 0 2 2 1 3 6 2 5
6
query(7)
2
6
Succ. Table
i id2i succ 0 7 0 1 0 0 2 2
2
Succ. Table
i id2i succ 0 3 6 1 4 6 2 6
6
3
5
4
129
DHT Chord Summary

Routing table size?
Log N fingers
Routing time?
Each hop expects to 1/2 the distance to the
desired id gt expect O(log N) hops.

130
DHT Discussion

Pros
Guaranteed Lookup
O(log N) per node state and search scope
Cons
This line used to say not used. ButNow being
used in a few apps, including BitTorrent.
Supporting non-exact match search is (quite!) hard

131
The limits of searchA Peer-to-peer Google?

Complex intersection queries (the who)
Billions of hits for each term alone
Sophisticated ranking
Must compare many results before returning a
subset to user
Very, very hard for a DHT / p2p system
Need high inter-node bandwidth
(This is exactly what Google does - massive
clusters)
But maybe many file sharing queries are okay...

132
Fetching Data

Once we know which node(s) have the data we
want...
Option 1 Fetch from a single peer
Problem Have to fetch from peer who has whole
file.
Peers not useful sources until d/l whole file
At which point they probably log off. )
How can we fix this?

132
133
Chunk Fetching

More than one node may have the file.
How to tell?
Must be able to distinguish identical files
Not necessarily same filename
Same filename not necessarily same file...
Use hash of file
Common MD5, SHA-1, etc.
How to fetch?
Get bytes 0..8000 from A, 8001...16000 from B
Alternative Erasure Codes

134
BitTorrent Overview

Swarming
Join contact centralized tracker server, get a
list of peers.
Publish Run a tracker server.
Search Out-of-band. E.g., use Google to find a
tracker for the file you want.
Fetch Download chunks of the file from your
peers. Upload chunks you have to them.
Big differences from Napster
Chunk based downloading (sound familiar? )
few large files focus
Anti-freeloading mechanisms

135
BitTorrent

Periodically get list of peers from tracker
More often
Ask each peer for what chunks it has
(Or have them update you)
Request chunks from several peers at a time
Peers will start downloading from you
BT has some machinery to try to bias towards
helping those who help you

135
136
BitTorrent Publish/Join
Tracker
137
BitTorrent Fetch
138
BitTorrent Summary

Pros
Works reasonably well in practice
Gives peers incentive to share resources avoids
freeloaders
Cons
Central tracker server needed to bootstrap swarm
(Tracker is a design choice, not a requirement,
as you know from your projects. Modern
BitTorrent can also use a DHT to locate peers.
But approach still needs a search mechanism)

139
Writable, persistent p2p

Do you trust your data to 100,000 monkeys?
Node availability hurts
Ex Store 5 copies of data on different nodes
When someone goes away, you must replicate the
data they held
Hard drives are huge, but cable modem upload
bandwidth is tiny - perhaps 10 Gbytes/day
Takes many days to upload contents of 200GB hard
drive. Very expensive leave/replication
situation!

140
Whats out there?
Central Flood Super-node flood Route
Whole File Napster Gnutella Freenet
Chunk Based BitTorrent KaZaA (bytes, not chunks) DHTs eDonkey2000
141
P2P Summary

Many different styles remember pros and cons of
each
centralized, flooding, swarming, unstructured and
structured routing
Lessons learned
Single points of failure are bad
Flooding messages to everyone is bad
Underlying network topology is important
Not all nodes are equal
Need incentives to discourage freeloading
Privacy and security are important
Structure can provide theoretical bounds and
guarantees

142
Some Questions

Why do people get together?
to share information
to share and exchange resources they have
books, class notes, experiences, videos, music
cds
How can computers help people
find information
find resources
exchange and share resources

143
Cloud Computing InfrastructureTake a seat
prepare to fly

Anh M. Nguyen
CS525, UIUC, Spring 2009

144
What is cloud computing?

I dont understand what we would do differently
in the light of Cloud Computing other than
change the wordings of some of our ads
Larry Ellision, Oracles CEO
I have not heard two people say the same thing
about it cloud. There are multiple definitions
out there of the cloud
Andy Isherwood, HPs Vice President of European
Software Sales
Its stupidity. Its worse than stupidity its a
marketing hype campaign.
Richard Stallman, Free Software Foundation founder