Title: Advanced Operating Systems
1Advanced Operating Systems
Lecture 9 Distributed Systems Architecture
- University of Tehran
- Dept. of EE and Computer Engineering
- By
- Dr. Nasser Yazdani
2Covered topic
- Distributed Systems Architectures
- References
- Chapter 2 of the text book
- Anatomy of Grid
3Outline
- Distributed Systems Architecture
- Client-server
- Grid computing
- Peer to peer Computing
- Cloud Computing
4Architectural Models
- Concerned with
- The placement of the components across a network
of computers - The interrelationships between the components
- Common Architectures
- Client server, Web
- Grid
- Peer to peer
- Cloud
5Clients and Servers
- General interaction between a client and a server.
1.25
6Processing Level
- The general organization of an Internet search
engine into three different layers
1-28
7Multitiered Architectures (1)
- Alternative client-server organizations (a) (e).
1-29
8Multitiered Architectures (2)
- An example of a server acting as a client.
1-30
9Client-Server
- Creating for example a hotmail? What are the
options? - One server?
- Several servers?
10Multiple Servers
11HTTP Basics (Review)
- HTTP layered over bidirectional byte stream
- Almost always TCP
- Interaction
- Client sends request to server, followed by
response from server to client - Requests/responses are encoded in text
- Stateless
- Server maintains no information about past client
requests
12How to Mark End of Message? (Review)
- Size of message ? Content-Length
- Must know size of transfer in advance
- Delimiter ? MIME-style Content-Type
- Server must escape delimiter in content
- Close connection
- Only server can do this
13HTTP Request (review)
- Request line
- Method
- GET return URI
- HEAD return headers only of GET response
- POST send data to the server (forms, etc.)
- URL (relative)
- E.g., /index.html
- HTTP version
14HTTP Request (cont.) (review)
- Request headers
- Authorization authentication info
- Acceptable document types/encodings
- From user email
- If-Modified-Since
- Referrer what caused this page to be requested
- User-Agent client software
- Blank-line
- Body
15HTTP Request (review)
16HTTP Request Example (review)
- GET / HTTP/1.1
- Accept /
- Accept-Language en-us
- Accept-Encoding gzip, deflate
- User-Agent Mozilla/4.0 (compatible MSIE 5.5
Windows NT 5.0) - Host www.intel-iris.net
- Connection Keep-Alive
17HTTP Response (review)
- Status-line
- HTTP version
- 3 digit response code
- 1XX informational
- 2XX success
- 200 OK
- 3XX redirection
- 301 Moved Permanently
- 303 Moved Temporarily
- 304 Not Modified
- 4XX client error
- 404 Not Found
- 5XX server error
- 505 HTTP Version Not Supported
- Reason phrase
18HTTP Response (cont.) (review)
- Headers
- Location for redirection
- Server server software
- WWW-Authenticate request for authentication
- Allow list of methods supported (get, head,
etc) - Content-Encoding E.g x-gzip
- Content-Length
- Content-Type
- Expires
- Last-Modified
- Blank-line
- Body
19HTTP Response Example (review)
- HTTP/1.1 200 OK
- Date Tue, 27 Mar 2001 034938 GMT
- Server Apache/1.3.14 (Unix) (Red-Hat/Linux)
mod_ssl/2.7.1 OpenSSL/0.9.5a DAV/1.0.2
PHP/4.0.1pl2 mod_perl/1.24 - Last-Modified Mon, 29 Jan 2001 175418 GMT
- ETag "7a11f-10ed-3a75ae4a"
- Accept-Ranges bytes
- Content-Length 4333
- Keep-Alive timeout15, max100
- Connection Keep-Alive
- Content-Type text/html
- ..
20Typical Workload (Web Pages)
- Multiple (typically small) objects per page
- File sizes
- Heavy-tailed
- Pareto distribution for tail
- Lognormal for body of distribution
- -- For reference/interest only --
- Embedded references
- Number of embedded objects
- pareto p(x) akax-(a1)
21HTTP 0.9/1.0 (mostly review)
- One request/response per TCP connection
- Simple to implement
- Disadvantages
- Multiple connection setups ? three-way handshake
each time - Several extra round trips added to transfer
- Multiple slow starts
22Single Transfer Example
Server
SYN
0 RTT
SYN
Client opens TCP connection
1 RTT
ACK
DAT
Client sends HTTP request for HTML
Server reads from disk
ACK
DAT
FIN
2 RTT
ACK
Client parses HTML Client opens TCP connection
FIN
ACK
SYN
SYN
3 RTT
ACK
DAT
Client sends HTTP request for image
Server reads from disk
ACK
4 RTT
DAT
Image begins to arrive
23More Problems
- Short transfers are hard on TCP
- Stuck in slow start
- Loss recovery is poor when windows are small
- Lots of extra connections
- Increases server state/processing
- Server also forced to keep TIME_WAIT connection
state - -- Things to think about --
- Why must server keep these?
- Tends to be an order of magnitude greater than
of active connections, why?
24Persistent Connection Solution (review)
- Multiplex multiple transfers onto one TCP
connection - How to identify requests/responses
- Delimiter ? Server must examine response for
delimiter string - Content-length and delimiter ? Must know size of
transfer in advance - Block-based transmission ? send in multiple
length delimited blocks - Store-and-forward ? wait for entire response and
then use content-length - Solution ? use existing methods and close
connection otherwise
25Persistent Connection Example (review)
Server
0 RTT
DAT
Server reads from disk
Client sends HTTP request for HTML
ACK
DAT
1 RTT
ACK
Client parses HTML Client sends HTTP request for
image
DAT
Server reads from disk
ACK
DAT
2 RTT
Image begins to arrive
26Persistent HTTP (review)
- Nonpersistent HTTP issues
- Requires 2 RTTs per object
- OS must work and allocate host resources for each
TCP connection - But browsers often open parallel TCP connections
to fetch referenced objects - Persistent HTTP
- Server leaves connection open after sending
response - Subsequent HTTP messages between same
client/server are sent over connection
- Persistent without pipelining
- Client issues new request only when previous
response has been received - One RTT for each referenced object
- Persistent with pipelining
- Default in HTTP/1.1
- Client sends requests as soon as it encounters a
referenced object - As little as one RTT for all the referenced
objects
27HTTP Caching
- Clients often cache documents
- Challenge update of documents
- If-Modified-Since requests to check
- HTTP 0.9/1.0 used just date
- HTTP 1.1 has an opaque entity tag (could be a
file signature, etc.) as well - When/how often should the original be checked for
changes? - Check every time?
- Check each session? Day? Etc?
- Use Expires header
- If no Expires, often use Last-Modified as estimate
28Example Cache Check Request
- GET / HTTP/1.1
- Accept /
- Accept-Language en-us
- Accept-Encoding gzip, deflate
- If-Modified-Since Mon, 29 Jan 2001 175418 GMT
- If-None-Match "7a11f-10ed-3a75ae4a"
- User-Agent Mozilla/4.0 (compatible MSIE 5.5
Windows NT 5.0) - Host www.intel-iris.net
- Connection Keep-Alive
29Ways to cache
- Client-directed caching
- Web Proxies
- Server-directed caching
- Content Delivery Networks (CDNs)
30Web Proxy Caches
- User configures browser Web accesses via cache
- Browser sends all HTTP requests to cache
- Object in cache cache returns object
- Else cache requests object from origin server,
then returns object to client
origin server
Proxy server
HTTP request
HTTP request
client
HTTP response
HTTP response
HTTP request
HTTP response
client
origin server
31Caching Example (1)
- Assumptions
- Average object size 100,000 bits
- Avg. request rate from institutions browser to
origin servers 15/sec - Delay from institutional router to any origin
server and back to router 2 sec - Consequences
- Utilization on LAN 15
- Utilization on access link 100
- Total delay Internet delay access delay
LAN delay - 2 sec minutes milliseconds
origin servers
public Internet
1.5 Mbps access link
institutional network
10 Mbps LAN
32Caching Example (2)
- Possible solution
- Increase bandwidth of access link to, say, 10
Mbps - Often a costly upgrade
- Consequences
- Utilization on LAN 15
- Utilization on access link 15
- Total delay Internet delay access delay
LAN delay - 2 sec msecs msecs
origin servers
public Internet
10 Mbps access link
institutional network
10 Mbps LAN
33Caching Example (3)
- Install cache
- Suppose hit rate is .4
- Consequence
- 40 requests will be satisfied almost immediately
(say 10 msec) - 60 requests satisfied by origin server
- Utilization of access link reduced to 60,
resulting in negligible delays - Weighted average of delays
- .62 sec .410msecs lt 1.3 secs
origin servers
public Internet
1.5 Mbps access link
institutional network
10 Mbps LAN
institutional cache
34Problems
- Over 50 of all HTTP objects are uncacheable
why? - Not easily solvable
- Dynamic data ? stock prices, scores, web cams
- CGI scripts ? results based on passed parameters
- Obvious fixes
- SSL ? encrypted data is not cacheable
- Most web clients dont handle mixed pages well
?many generic objects transferred with SSL - Cookies ? results may be based on passed data
- Hit metering ? owner wants to measure of hits
for revenue, etc. - What will be the end result?
35Content Distribution Networks (CDNs)
- The content providers are the CDN customers.
- Content replication
- CDN company installs hundreds of CDN servers
throughout Internet - Close to users
- CDN replicates its customers content in CDN
servers. When provider updates content, CDN
updates servers
origin server in North America
CDN distribution node
CDN server in S. America
CDN server in Asia
CDN server in Europe
36Content Distribution Networks Server Selection
- Replicate content on many servers
- Challenges
- How to replicate content
- Where to replicate content
- How to find replicated content
- How to choose among know replicas
- How to direct clients towards replica
37Server Selection
- Which server?
- Lowest load ? to balance load on servers
- Best performance ? to improve client performance
- Based on Geography? RTT? Throughput? Load?
- Any alive node ? to provide fault tolerance
- How to direct clients to a particular server?
- As part of routing ? anycast, cluster load
balancing - As part of application ? HTTP redirect
- As part of naming ? DNS
38Application Based
- HTTP supports simple way to indicate that Web
page has moved (30X responses) - Server receives Get request from client
- Decides which server is best suited for
particular client and object - Returns HTTP redirect to that server
- Can make informed application specific decision
- May introduce additional overhead ? multiple
connection setup, name lookups, etc. - OK solution in general, but
- HTTP Redirect has some flaws especially with
current browsers - Incurs many delays, which operators may really
care about
39Naming Based
- Client does DNS name lookup for service
- Name server chooses appropriate server address
- A-record returned is best one for the client
- What information can name server base decision
on? - Server load/location ? must be collected
- Information in the name lookup request
- Name service client ? typically the local name
server for client
40How Akamai Works
- Clients fetch html document from primary server
- E.g. fetch index.html from cnn.com
- URLs for replicated content are replaced in html
- E.g. ltimg srchttp//cnn.com/af/x.gifgt replaced
with ltimg srchttp//a73.g.akamaitech.net/7/23/cn
n.com/af/x.gifgt - Client is forced to resolve aXYZ.g.akamaitech.net
hostname
41How Akamai Works
- How is content replicated?
- Akamai only replicates static content ()
- Modified name contains original file name
- Akamai server is asked for content
- First checks local cache
- If not in cache, requests file from primary
server and caches file - (At least, the version were talking about
today. Akamai actually lets sites write code
that can run on Akamais servers, but thats a
pretty different beast)
42How Akamai Works
- Root server gives NS record for akamai.net
- Akamai.net name server returns NS record for
g.akamaitech.net - Name server chosen to be in region of clients
name server - TTL is large
- G.akamaitech.net nameserver chooses server in
region - Should try to chose server that has file in cache
- How to choose? - Uses aXYZ name and hash
- TTL is small ? why?
43Simple Hashing
- Given document XYZ, we need to choose a server to
use - Suppose we use modulo
- Number servers from 1n
- Place document XYZ on server (XYZ mod n)
- What happens when a servers fails? n ? n-1
- Same if different people have different measures
of n - Why might this be bad?
44Consistent Hash
- view subset of all hash buckets that are
visible - Desired features
- Balanced in any one view, load is equal across
buckets - Smoothness little impact on hash bucket
contents when buckets are added/removed - Spread small set of hash buckets that may hold
an object regardless of views - Load across all views of objects assigned to
hash bucket is small
45Consistent Hash Example
- Construction
- Assign each of C hash buckets to random points on
mod 2n circle, where, hash key size n. - Map object to random position on circle
- Hash of object closest clockwise bucket
0
14
Bucket
4
12
8
- Smoothness ? addition of bucket does not cause
movement between existing buckets - Spread Load ? small set of buckets that lie
near object - Balance ? no bucket is responsible for large
number of objects
46How Akamai Works
cnn.com (content provider)
DNS root server
Akamai server
Get foo.jpg
12
11
Get index.html
5
1
2
3
Akamai high-level DNS server
6
4
Akamai low-level DNS server
7
Nearby matchingAkamai server
8
9
10
Get /cnn.com/foo.jpg
47Akamai Subsequent Requests
cnn.com (content provider)
DNS root server
Akamai server
Get index.html
1
2
Akamai high-level DNS server
Akamai low-level DNS server
7
8
Nearby matchingAkamai server
9
10
Get /cnn.com/foo.jpg
48Impact on DNS Usage
- DNS is used for server selection more and more
- What are reasonable DNS TTLs for this type of use
- Typically want to adapt to load changes
- Low TTL for A-records ? what about NS records?
- How does this affect caching?
- What do the first and subsequent lookup do?
49HTTP (Summary)
- Simple text-based file exchange protocol
- Support for status/error responses,
authentication, client-side state maintenance,
cache maintenance - Workloads
- Typical documents structure, popularity
- Server workload
- Interactions with TCP
- Connection setup, reliability, state maintenance
- Persistent connections
- How to improve performance
- Persistent connections
- Caching
- Replication
50Grid
- What is Grid?
- Grid Projects Applications
- Grid Technologies
- Globus
- CompGrid
51(No Transcript)
52(No Transcript)
53Definition
- A type of parallel and distributed system that
enables the sharing, selection, aggregation of
geographically distributed resources - Computers PCs, workstations, clusters,
supercomputers, laptops, notebooks, mobile
devices, PDA, etc - Software e.g., ASPs renting expensive special
purpose applications on demand - Catalogued data and databases e.g. transparent
access to human genome database - Special devices/instruments e.g., radio
telescope SETI_at_Home searching for life in
galaxy. - People/collaborators.
- depending on their availability, capability,
cost, and user QoS requirements - for solving large-scale problems/applications.
- thus enabling the creation of virtual
organization (VOs)
54Resources assets, capabilities, and knowledge
- Capabilities (e.g. application codes, analysis
tools) - Compute Grids (PC cycles, commodity clusters,
HPC) - Data Grids
- Experimental Instruments
- Knowledge Services
- Virtual Organisations
- Utility Services
55Why go Grid?
- Hot subject
- Try it, experience it to learn the potential
- Will enable true ubiquitous computing in future
- Today, proven in some areas intraGrids
- But still long way to World Wide Grid
- State of art techniques, tools are difficult
- Short term goals? Use another technology
- Does your system have Grid characteristics?
- Distributed users, large scale and heterogeneous
resources, across domains
56Grids main idea
- To treat CPU cycles and software like
commodities. - Enable the coordinated use of geographically
distributed resources in the absence of central
control and existing trust relationships. - Computing power is produced much like utilities
such as power and water are produced for
consumers. - Users will have access to power on demand
- When the Network is as fast as the computers
internal links, the machine disintegrates across
the Net into a set of special purpose appliances
Gilder Technology Report June 2000
57Computational Grids and Electric Power Grids
58(No Transcript)
59(No Transcript)
60(No Transcript)
61(No Transcript)
62(No Transcript)
63What do users want ?
- Grid Consumers
- Execute jobs for solving varying problem size and
complexity - Benefit by selecting and aggregating resources
wisely - Tradeoff timeframe and cost
- Grid Providers
- Contribute (idle) resource for executing
consumer jobs - Benefit by maximizing resource utilisation
- Tradeoff local requirements market opportunity
64(No Transcript)
65(No Transcript)
66Grid Applications
- Distributed HPC (Supercomputing)
- Computational science.
- High-Capacity/Throughput Computing
- Large scale simulation/chip design parameter
studies. - Content Sharing (free or paid)
- Sharing digital contents among peers (e.g.,
Napster) - Remote software access/renting services
- Application service provides (ASPs) Web
services. - Data-intensive computing
- Drug Design, Particle Physics, Stock
Prediction... - On-demand, real-time computing
- Medical instrumentation Mission Critical.
- Collaborative Computing
- Collaborative design, Data exploration,
education. - Service Oriented Computing (SOC)
- Towards economic-based Utility Computing New
paradigm, new applications, new industries, and
new business.
67Grid Projects
- Australia
- Nimrod-G
- Gridbus
- GridSim
- Virtual Lab
- DISCWorld
- GrangeNet
- ..new coming up
- Europe
- UNICORE
- Cactus
- UK eScience
- EU Data Grid
- EuroGrid
- MetaMPI
- XtremeWeb
- and many more.
- India
- I-Grid
- USA
- Globus
- Legion
- OGSA
- Sun Grid Engine
- AppLeS
- NASA IPG
- Condor-G
- Jxta
- NetSolve
- AccessGrid
- and many more...
- Cycle Stealing .com Initiatives
- Distributed.net
- SETI_at_Home, .
- Entropia, UD, Parabon,.
- Public Forums
- Global Grid Forum
- Australian Grid Forum
68Grid Requirements
- Identity authentication
- Authorization policy
- Resource discovery
- Resource characterization
- Resource allocation
- (Co-)reservation, workflow
- Distributed algorithms
- Remote data access
- High-speed data transfer
- Performance guarantees
- Monitoring Adaptation
- Intrusion detection
- Resource management
- Accounting payment
- Fault management
- System evolution
- Etc.
69Resource ManagementProblem
- Enabling secure, controlled remote access to
computational resources and management of remote
computation - Authentication and authorization
- Resource discovery characterization
- Reservation and allocation
- Computation monitoring and control
70Grid-based Computation Challenges
- Locate suitable computers
- Authenticate with appropriate sites
- Allocate resources on those computers
- Initiate computation on those computers
- Configure those computations
- Select appropriate communication methods
- Compute with suitable algorithms
- Access data files, return output
- Respond appropriately to resource changes
71Leading Grid Middleware Developments
- Globus Toolkit (mainly developed at ANL and USC)
- Service-oriented toolkit from the Globus
project,to be used in Grid applications, not
targeted at end-user - Services for resource selection and allocation,
- authentication, file system access and file
transfer, - Largest user-base in projects worldwide
- Open-source software, commercial support by IBM
and Platform Computing
72The Globus Alliance
- Globus Project , since 1996
- Ian Foster (Argonne National Lab),
- Carl Kesselman (University of Southern
Californias Information Science Institute) - Develop protocols, middleware and tools for Grid
computing - Globus Alliance, since Sept 2003
- International scope
- University of Edinburghs EPCC
- Swedish Center for Parallel Computers (PDC)
- Advisory council of Academic Affiliates from
Asia-Pacific, Europe, US
73Globus Toolkit
- GT2 (2.4 released in 2002) reference
implementation of Grid fabric protocols - GRAM for job submissions
- MDS for resource discovery
- GridFTP for data transfer
- GSI security
- GT3 (3.0 released July 2003) redesign
- OGSI based
- Grid services, built on SOAP and XML
- GT3.2 released March 31, 2004
74Globus Toolkit Services
- Job submission and management (GRAM)
- Uniform Job Submission
- Security (GSI)
- PKI-based Security (Authentication) Service
- Information services (MDS)
- LDAP-based Information Service
- Remote file management (GASS) and transfer
(GridFTP) - Remote Storage Access Service
- Remote Data Catalogue and Management Tools
- Support by Globus 2.0 released in 2002
- Resource selection and allocation (GIIS, GRIS)
75Resource Specification Language
- Common notation for exchange of information
between components - Syntax similar to MDS/LDAP filters
- RSL provides two types of information
- Resource requirements Machine type, number of
nodes, memory, etc. - Job configuration Directory, executable, args,
environment - API provided for manipulating RSL
76Some Useful Definitions
- Network Protocol
- A formal description of message formats and a set
of rules for exchange of messages - Rules define sequences of message exchange, and
potentially resulting behavior - Protocol may define state-change in endpoint
- Network Enabled Services
- Defines a set of capabilities
- Protocol defines interaction with service
- All services require protocols, although not all
protocols are to services
77More definitions
- Resource
- Entity that is to be shared
- Provides some capabilities, that can be accessed
via interface (API) or protocol - Application Programmer Interface (API)
- Software Development Kit (SDK)
- Package that enables application development,
consisting of one or more APIs, and programming
tools
78Protocols Make the Grid
- Protocols and APIs
- Protocols enable interoperability
- APIs enable portability
- Sharing is about interoperability, so
- Grid architecture should be about protocols
79Grid Services Architecture Previous Perspective
a rich variety of applications ...
Applns
Appln Toolkits
Remote data toolkit
Remote sensors toolkit
Async. collab. toolkit
Remote viz toolkit
Remote comp. toolkit
...
Protocols, authentication, policy, resource
management, instrumentation, discovery, etc.,
etc.
Grid Services
Grid Fabric
Grid-enabled archives, networks, computers,
display devices, etc. associated local services
80Characteristics of Grid Services Architecture
- Identifies separation of concerns
- Isolates Grids from languages and specific
programming environments - Makes provisions for generic and application
specific functionality - Protocols not explicit in architecture
- fails to make clear distinction between language,
service and networking issues
81Layered Grid Protocol Architecture
Application
User
Grid
Resource
Connectivity
Fabric
82Important Points
- Being Grid-enabled requires speaking appropriate
protocols - Protocol only requirement, not reachability
- Protocols can be used to bridge local resources
or local Grids - Intergrid as analog to Internet
- Built on Internet protocols
- Independent of language and implementation
- Focus on interaction over network
- Services exist at each level
83Protocols, services and interfaces
Applications
Languages/Frameworks
Connectivity APIs
Connectivity Protocols
Local Access APIs and protocols
Fabric Layer
84How does Globus fit in?
- Defines connectivity and resource protocols
- Enables definition of grid and user protocols
- Globus provides some of these, others defined by
other groups - Defines range of APIs and SDKs that leverage
Resource, Grid and User protocols
85Fabric
- Local access to logical resource
- May be real component, e.g. CPU, software module,
filesystem - May be logical component, e.g. Condor pool
- Protocol or API mediated
- Fabric elements include
- SSP, ASP, peer-to-peer, Entropia-like, and
enterprise level solutions
86Connectivity Protocols
- Two classes of connectivity protocols underlie
all other components - Internet communication
- Application, transport and internet layer
protocols - I.e., transport, routing, DNS, etc.
- Security
- Authentication and delegation
- Discussed below
87Security
- Protocols
- TLS with delegation
- Services
- K5ssl, Globus Authorization Service
- APIs
- GSS-API, GAA, SASL, gss_assist
- SDKs
- GlobusIO
88Resource Protocols
- Resource management,
- Storage system access
- Network quality of service
- Data movement
- Resource information
89Resource Management
- Protocols
- GRAMGARA (on HTTP)
- Resource services
- Gatekeeper, JobManager, SlotManager
- APIs and SDKs
- GRAM API, JavaCog Client, DUROC
90Data Transport
- Protocols
- Grid FTP, LDAP for replica catalog
- Services
- FTP, LDAP replica catalog
- APIs and SDKs
- GridFTP client library, copy URL API, replica
catalog access, replica selection
91Resource Information
- Protocol
- LDAP V3, Registration/Discovery protocol
- Service
- GRIS
- APIs SDKs
- C API JNDI, PerlLDAP, .
92Grid Protocols
- Grid Information Index Services
- LDAP and Service registration protocol,
- GIIS service
- LDAP APIs and specialized information API
- Co-allocation and brokering
- GRAM (HTTPRSL)
- DUROC service
- DUROC client API, end-to-end reservation API
93Grid Protocols (cont)
- Online authentication, authorization services
- HTTP
- MyProxy, Group policy servers
- Myproxy API, GAA API,
- Many others (e.g.)
- Resource discovery (Matchmaker)
- Fault recovery
94User Protocols
- In general, there are many of these, they tend to
be on off, and not well defined - Examples
- Portal toolkits (e.g. Hotpage)
- Netsolve
- Cactus framework
95Why Study Peer to peer systems?
- To understand how they work
- To build your own peer to peer system
- To understand the techniques and principles
within them - To modify, adapt, reuse these techniques and
principles in other related areas - Cloud computing
- Sensor networks
- To grow the body of knowledge about distributed
systems
96Searching Fetching
- Human I want to watch that great 80s cult
classic Better Off Dead - Search better off dead -gt better_off_dead.mov
or -gt 0x539fba83ajdeadbeef - Locate sources of better_off_dead.mov
- Download the file from them
96
97Searching
N2
N1
N3
Internet
Keytitle ValueMP3 data
?
Client
Publisher
Lookup(title)
N6
N4
N5
98Search Approaches
- Centralized
- Flooding
- A hybrid Flooding between Supernodes
- Structured
98
99Different types of searches
- Needles vs. Haystacks
- Searching for top 40, or an obscure punk track
from 1981 that nobodys heard of? - Search expressiveness
- Whole word? Regular expressions? File names?
Attributes? Whole-text search? - (e.g., p2p gnutella or p2p google?)
100Framework
- Common Primitives
- Join how to I begin participating?
- Publish how do I advertise my file?
- Search how to I find a file?
- Fetch how to I retrieve a file?
101Centralized
- Centralized Database
- Join on startup, client contacts central server
- Publish reports list of files to central server
- Search query the server gt return node(s) that
store the requested file
102Napster Example Publish
I have X, Y, and Z!
123.2.21.23
103Napster Search
123.2.0.18
Where is file A?
104Napster Discussion
- Pros
- Simple
- Search scope is O(1) for even complex searches
(one index, etc.) - Controllable (pro or con?)
- Cons
- Server maintains O(N) State
- Server does all processing
- Single point of failure
- Technical failures legal (napster shut down
2001)
105Query Flooding
- Join Must join a flooding network
- Usually, establish peering with a few existing
nodes - Publish no need, just reply
- Search ask neighbors, who ask their neighbors,
and so on... when/if found, reply to sender. - TTL limits propagation
106Example Gnutella
Where is file A?
107Flooding Discussion
- Pros
- Fully de-centralized
- Search cost distributed
- Processing _at_ each node permits powerful search
semantics - Cons
- Search scope is O(N)
- Search time is O(???)
- Nodes leave often, network unstable
- TTL-limited search works well for haystacks.
- For scalability, does NOT search every node. May
have to re-issue query later
108Supernode Flooding
- Join on startup, client contacts a supernode
... may at some point become one itself - Publish send list of files to supernode
- Search send query to supernode, supernodes flood
query amongst themselves. - Supernode network just like prior flooding net
109Supernode Network Design
110Supernode File Insert
I have X!
123.2.21.23
111Supernode File Search
Where is file A?
112Supernode Which nodes?
- Often, bias towards nodes with good
- Bandwidth
- Computational Resources
- Availability!
113Stability and Superpeers
- Why superpeers?
- Query consolidation
- Many connected nodes may have only a few files
- Propagating a query to a sub-node would take more
b/w than answering it yourself - Caching effect
- Requires network stability
- Superpeer selection is time-based
- How long youve been on is a good predictor of
how long youll be around.
114Superpeer results
- Basically, just better than flood to all
- Gets an order of magnitude or two better scaling
- But still fundamentally o(search) o(per-node
storage) O(N) - central O(1) search, O(N) storage
- flood O(N) search, O(1) storage
- Superpeer can trade between
114
115Structured SearchDistributed Hash Tables
- Academic answer to p2p
- Goals
- Guatanteed lookup success
- Provable bounds on search time
- Provable scalability
- Makes some things harder
- Fuzzy queries / full-text search / etc.
- Read-write, not read-only
- Hot Topic in networking since introduction in
2000/2001
116Searching Wrap-Up
Type O(search) storage Fuzzy?
Central O(1) O(N) Yes
Flood O(N) O(1) Yes
Super lt O(N) gt O(1) Yes
Structured O(log N) O(log N) not really
117DHT Overview
- Abstraction a distributed hash-table (DHT)
data structure - put(id, item)
- item get(id)
- Implementation nodes in system form a
distributed data structure - Can be Ring, Tree, Hypercube, Skip List,
Butterfly Network, ...
118DHT Overview (2)
- Structured Overlay Routing
- Join On startup, contact a bootstrap node and
integrate yourself into the distributed data
structure get a node id - Publish Route publication for file id toward a
close node id along the data structure - Search Route a query for file id toward a close
node id. Data structure guarantees that query
will meet the publication. - Important difference get(key) is for an exact
match on key! - search(spars) will not find file(briney
spars) - We can exploit this to be more efficient
119DHT Example - Chord
- Associate to each node and file a unique id in an
uni-dimensional space (a Ring) - E.g., pick from the range 0...2m
- Usually the hash of the file or IP address
- Properties
- Routing table size is O(log N) , where N is the
total number of nodes - Guarantees that a file is found in O(log N) hops
from MIT in 2001
120DHT Consistent Hashing
Key 5
K5
Node 105
N105
K20
Circular ID space
N32
N90
K80
A key is stored at its successor node with next
higher ID
121DHT Chord Basic Lookup
N120
N10
Where is key 80?
N105
N32
N90 has K80
N90
K80
N60
122DHT Chord Finger Table
1/2
1/4
1/8
1/16
1/32
1/64
1/128
N80
- Entry i in the finger table of node n is the
first node that succeeds or equals n 2i - In other words, the ith finger points 1/2n-i way
around the ring
123Node Join
- Compute ID
- Use an existing node to route to that ID in the
ring. - Finds s successor(id)
- ask s for its predecessor, p
- Splice self into ring just like a linked list
- p-gtsuccessor me
- me-gtsuccessor s
- me-gtpredecessor p
- s-gtpredecessor me
123
124DHT Chord Join
- Assume an identifier space 0..8
- Node n1 joins
Succ. Table
0
i id2i succ 0 2 1 1 3 1 2 5
1
1
7
2
6
3
5
4
125DHT Chord Join
Succ. Table
0
i id2i succ 0 2 2 1 3 1 2 5
1
1
7
2
6
Succ. Table
i id2i succ 0 3 1 1 4 1 2 6
1
3
5
4
126DHT Chord Join
Succ. Table
i id2i succ 0 1 1 1 2 2 2 4
0
Succ. Table
0
i id2i succ 0 2 2 1 3 6 2 5
6
1
7
Succ. Table
i id2i succ 0 7 0 1 0 0 2 2
2
2
6
Succ. Table
i id2i succ 0 3 6 1 4 6 2 6
6
3
5
4
127DHT Chord Join
Succ. Table
Items
7
i id2i succ 0 1 1 1 2 2 2 4
0
- Nodes n1, n2, n0, n6
- Items f7, f2
0
Succ. Table
Items
1
1
7
i id2i succ 0 2 2 1 3 6 2 5
6
2
6
Succ. Table
i id2i succ 0 7 0 1 0 0 2 2
2
Succ. Table
i id2i succ 0 3 6 1 4 6 2 6
6
3
5
4
128DHT Chord Routing
Succ. Table
Items
7
i id2i succ 0 1 1 1 2 2 2 4
0
- Upon receiving a query for item id, a node
- Checks whether stores the item locally
- If not, forwards the query to the largest node in
its successor table that does not exceed id
0
Succ. Table
Items
1
1
7
i id2i succ 0 2 2 1 3 6 2 5
6
query(7)
2
6
Succ. Table
i id2i succ 0 7 0 1 0 0 2 2
2
Succ. Table
i id2i succ 0 3 6 1 4 6 2 6
6
3
5
4
129DHT Chord Summary
- Routing table size?
- Log N fingers
- Routing time?
- Each hop expects to 1/2 the distance to the
desired id gt expect O(log N) hops.
130DHT Discussion
- Pros
- Guaranteed Lookup
- O(log N) per node state and search scope
- Cons
- This line used to say not used. ButNow being
used in a few apps, including BitTorrent. - Supporting non-exact match search is (quite!) hard
131The limits of searchA Peer-to-peer Google?
- Complex intersection queries (the who)
- Billions of hits for each term alone
- Sophisticated ranking
- Must compare many results before returning a
subset to user - Very, very hard for a DHT / p2p system
- Need high inter-node bandwidth
- (This is exactly what Google does - massive
clusters) - But maybe many file sharing queries are okay...
132Fetching Data
- Once we know which node(s) have the data we
want... - Option 1 Fetch from a single peer
- Problem Have to fetch from peer who has whole
file. - Peers not useful sources until d/l whole file
- At which point they probably log off. )
- How can we fix this?
132
133Chunk Fetching
- More than one node may have the file.
- How to tell?
- Must be able to distinguish identical files
- Not necessarily same filename
- Same filename not necessarily same file...
- Use hash of file
- Common MD5, SHA-1, etc.
- How to fetch?
- Get bytes 0..8000 from A, 8001...16000 from B
- Alternative Erasure Codes
134BitTorrent Overview
- Swarming
- Join contact centralized tracker server, get a
list of peers. - Publish Run a tracker server.
- Search Out-of-band. E.g., use Google to find a
tracker for the file you want. - Fetch Download chunks of the file from your
peers. Upload chunks you have to them. - Big differences from Napster
- Chunk based downloading (sound familiar? )
- few large files focus
- Anti-freeloading mechanisms
135BitTorrent
- Periodically get list of peers from tracker
- More often
- Ask each peer for what chunks it has
- (Or have them update you)
- Request chunks from several peers at a time
- Peers will start downloading from you
- BT has some machinery to try to bias towards
helping those who help you
135
136BitTorrent Publish/Join
Tracker
137BitTorrent Fetch
138BitTorrent Summary
- Pros
- Works reasonably well in practice
- Gives peers incentive to share resources avoids
freeloaders - Cons
- Central tracker server needed to bootstrap swarm
- (Tracker is a design choice, not a requirement,
as you know from your projects. Modern
BitTorrent can also use a DHT to locate peers.
But approach still needs a search mechanism)
139Writable, persistent p2p
- Do you trust your data to 100,000 monkeys?
- Node availability hurts
- Ex Store 5 copies of data on different nodes
- When someone goes away, you must replicate the
data they held - Hard drives are huge, but cable modem upload
bandwidth is tiny - perhaps 10 Gbytes/day - Takes many days to upload contents of 200GB hard
drive. Very expensive leave/replication
situation!
140Whats out there?
Central Flood Super-node flood Route
Whole File Napster Gnutella Freenet
Chunk Based BitTorrent KaZaA (bytes, not chunks) DHTs eDonkey2000
141P2P Summary
- Many different styles remember pros and cons of
each - centralized, flooding, swarming, unstructured and
structured routing - Lessons learned
- Single points of failure are bad
- Flooding messages to everyone is bad
- Underlying network topology is important
- Not all nodes are equal
- Need incentives to discourage freeloading
- Privacy and security are important
- Structure can provide theoretical bounds and
guarantees
142Some Questions
- Why do people get together?
- to share information
- to share and exchange resources they have
- books, class notes, experiences, videos, music
cds - How can computers help people
- find information
- find resources
- exchange and share resources
143Cloud Computing InfrastructureTake a seat
prepare to fly
- Anh M. Nguyen
- CS525, UIUC, Spring 2009
144What is cloud computing?
- I dont understand what we would do differently
in the light of Cloud Computing other than
change the wordings of some of our ads - Larry Ellision, Oracles CEO
- I have not heard two people say the same thing
about it cloud. There are multiple definitions
out there of the cloud - Andy Isherwood, HPs Vice President of European
Software Sales - Its stupidity. Its worse than stupidity its a
marketing hype campaign. - Richard Stallman, Free Software Foundation founder
145Next Lecture
- Communication among distributed systems.
- Remote Procedure Call (RPC)
- References
- Chapter 4 of the book