Title: Last class review
1Last class review
2Today
- Grid Computing
- Peer to peer computing (P2P)
3Software Trends
Grid Computing
Multi-tier Server-side
P2P Computing Ubiquous/pervasive
Component programming
Client-server Classes
Application complexity
Object-oriented programming
monolithic
Structured programming
Time (years)
1970 1980 1990 2000
4Grid Computing
- High performance distributed applications in
large-scale internetworks - Coordinated resource sharing and problem solving
in dynamic, multi-istitutional virtual
organizations
5Network Exponentials
- Network vs. computer performance
- Computer speed doubles every 18 months (Moores
law) - Network speed doubles every 9 months
- Difference order of magnitude per 5 years
- 1986 to 2000
- Computers x 500
- Networks x 340,000
- 2001 to 2010
- Computers x 60
- Networks x 4000
6The 13.6 TF TeraGridComputing at 40 Gb/s
TeraGrid/DTF NCSA, SDSC, Caltech, Argonne
www.teragrid.org
7International Virtual Data Grid Laboratory iVDGL
U.S. PIs Avery, Foster, Gardner, Newman, Szalay
www.ivdgl.org
8The Grid Problem
- Flexible, secure, coordinated resource sharing
among dynamic collections of individuals,
institutions, and resource - Enable communities (virtual organizations) to
share geographically distributed resources as
they pursue common goals -- assuming the absence
of - central location,
- central control,
- omniscience,
- existing trust relationships.
9One View of Requirements
- Identity authentication
- Authorization policy
- Resource discovery
- Resource characterization
- Resource allocation
- (Co-)reservation, workflow
- Distributed algorithms
- Remote data access
- High-speed data transfer
- Performance guarantees
- Monitoring
- Adaptation
- Intrusion detection
- Resource management
- Accounting payment
- Fault management
- System evolution
- Etc.
- Etc.
-
10Elements of the Problem
- Resource sharing
- Computers, storage, sensors, networks,
- Sharing always conditional issues of trust,
policy, negotiation, payment, - Coordinated problem solving
- Beyond client-server distributed data analysis,
computation, collaboration, - Dynamic, multi-institutional virtual orgs
- Community overlays on classic org structures
- Large or small, static or dynamic
11Resource Sharing Requirements
- Members should be trustful and trustworthy.
- Sharing is conditional.
- Should be secure.
- Sharing should be able to change dynamically over
time. - Need for discovery and registering of resources.
- Can be peer to peer or client/server.
- Same resource may be used in different ways.
- All these point to well defined architecture and
protocols.
12The Globus ProjectMaking Grid computing a
reality
- Close collaboration with real Grid projects in
science and industry - Development and promotion of standard Grid
protocols to enable interoperability and shared
infrastructure - Development and promotion of standard Grid
software APIs and SDKs to enable portability and
code sharing - The Globus Toolkit Open source, reference
software base for building grid infrastructure
and applications - Global Grid Forum Development of standard
protocols and APIs for Grid computing
13Globus Toolkit
- A software toolkit addressing key technical
problems in the development of Grid enabled
tools, services, and applications - Offer a modular bag of technologies
- Enable incremental development of grid-enabled
tools and applications - Implement standard Grid protocols and APIs
- Make available under liberal open source license
14Layered Grid Architecture
15Key Protocols
- The Globus Toolkit centers around four key
protocols - Connectivity layer
- Security Grid Security Infrastructure (GSI)
- Resource layer
- Resource Management
- Information Services
- Data Transfer
- Also key collective layer protocols
- Info Services, Replica Management, etc.
16Peer to Peer Computing
17What is Peer-to-Peer?
- A model of communication where every node in the
network acts alike. - As opposed to the Client-Server model, where one
node provides services and other nodes use the
services.
18Advantages of P2P Computing
- No central point of failure
- E.g., the Internet and the Web do not have a
central point of failure. - Most internet and web services use the
client-server model (e.g. HTTP), so a specific
service does have a central point of failure. - Scalability
- Since every peer is alike, it is possible to add
more peers to the system and scale to larger
networks.
19Disadvantages of P2P Computing
- Decentralized coordination
- How to keep global state consistent?
- Need for distributed coherency protocols.
- All nodes are not created equal.
- Computing power, bandwidth have an impact on
overall performance. - Programmability
- As a corollary of decentralized coordination.
20P2P Computing Applications
- File sharing
- Process sharing
- Collaborative environments
- Instant messaging
- New forms of content delivery, distribution
21P2P File Sharing Applications
- Improves data availability
- Replication to compensate for failures.
- E.g., Napster, Gnutella, Freenet, KaZaA
(FastTrack)
22P2P Process Sharing Applications
- For large-scale computations
- Data analysis, data mining, scientific computing
- E.g., SETI_at_Home, Folding_at_Home, distributed.net,
World-Wide Computer
23P2P Collaborative Applications
- For remote real-time human collaboration.
- Instant messaging, virtual meetings, shared
whiteboards, teleconferencing, tele-presence. - E.g., talk, IRC, ICQ, AOL Messenger, Yahoo!
Messenger, Jabber, MS Netmeeting, NCSA Habanero,
Games
24P2P Technical Challenges
- Peer identification
- Routing protocols
- Network topologies
- Peer discovery
- Communication/coordination protocols
- Quality of service
- Security
- Fine-grained resource management
25P2P Topologies
- Centralized
- Ring
- Hierarchical
- Decentralized
- Hybrid
26Centralized
- Client/server
- Web servers
- Databases
- Napster search
- Instant Messaging
- Popular Power
27Ring
- Fail-over clusters
- Simple load balancing
- Assumption
- Single owner
28Hierarchical
29Decentralized
- Gnutella
- Freenet
- Hive
- Internet routing
30Centralized Centralized
- N-tier apps
- Database heavy systems
- Web services gateways
- Grand Central
31Centralized Ring
- Serious web applications
- High availability servers
32Centralized Decentralized
- Clip2 Gnutella Reflector
- FastTrack / KaZaA
- Morpheus
- Email
33What about other topologies?
- Centralized Hierarchical?
- Back end tree of information
- Caching architectures
- Decentralized Ring?
- P2P network of fail-over clusters
- Decentralized Hierarchical?
- Decentralized Centralized?
34Strengths and Weaknesses
- Plenty of topologies to choose from
- What is each kind good for?
- Need a set of properties to measure
35Things to Measure
- Manageability
- How hard is it to keep working?
- Information coherence
- How authoritative is info? (Auditing,
non-repudiation) - Extensibility
- How easy is it to grow?
- Fault tolerance
- How well can it handle failures?
- Security
- How hard is it to subvert?
- Resistance to legal or political intervention
- How hard is it to shut down? (Can be good or bad)
- Scalability
- How big can it grow?
36Centralized
- Manageable
- Coherent
- Extensible
- Fault Tolerant
- Secure
- Lawsuit-proof
- Scalable
- System is all in one place
- All information is in one place
- No one can add on to system
- Single point of failure
- Simply secure one host
- Easy to shut down
- One machine. But in practice?
37Ring
- Manageable
- Coherent
- Extensible
- Fault Tolerant
- Secure
- Lawsuit-proof
- Scalable
- Simple rules for relationships
- Easy logic for state
- Only ring owner can add
- Fail-over to next host
- As long as ring has one owner
- Shut down owner
- Just add more hosts
38Hierarchical
- Manageable
- Coherent
- Extensible
- Fault Tolerant
- Secure
- Lawsuit-proof
- Scalable
- Chain of authority
- Cache consistency
- Add more leaves, rebalance
- Root is vulnerable
- Too easy to spoof links
- Just shut down the root
- Hugely scalable DNS
39Decentralized
- Manageable
- Coherent
- Extensible
- Fault Tolerant
- Secure
- Lawsuit-proof
- Scalable
- Very difficult, many owners
- Difficult, unreliable peers
- Anyone can join in!
- Redundancy
- Difficult, open research
- No one to sue!
- Theory yes Practice no
40Centralized Ring
- Manageable
- Coherent
- Extensible
- Fault Tolerant
- Secure
- Lawsuit-proof
- Scalable
- Just manage the ring
- As coherent as ring
- No more than ring
- Ring is a huge win
- As secure as ring
- Still single place to shut down
- Ring is a huge win
Common architecture for web applications
41Centralized Decentralized
- Manageable
- Coherent
- Extensible
- Fault Tolerant
- Secure
- Lawsuit-proof
- Scalable
- Same as decentralized
- Better than decentralized
- Anyone can still join!
- Plenty of redundancy
- Same as decentralized
- Still no one to sue
- Looking very hopeful
Best architecture for P2P networks?
42Centralized vs. Decentralized
- Centralized is pretty good!
- Manageable
- Coherent
- Security
- Decentralized is exciting
- Extensible
- Massive fault tolerance
- Lawsuit-proof
- Scalability is the big question
43Conclusions
- Centralized is easy to deal with
- Major architecture for distributed systems
- Combines well with rings
- Decentralized is good, needs research
- Coherence, Manageability, Security
- Scalability
- Hierarchical is overlooked
- Combining architectures is powerful
- P2P does not have to be descentralized when
centralized is good
44Napster
- P2P concept existed since early 90s
- Napster ronovated interest in P2P systems
- Napster features
- Central indexing and searching service
- File downloading in a peer-to-peer point-to-point
manner.
45Gnutella
- Peer-to-peer indexing and searching service.
- Peer-to-peer point-to-point file downloading
using HTTP. - A gnutella node needs a server (or a set of
servers) to start-up. gnutellahosts.com
provides a service with reliable initial
connection points. This fact introduces a single
point of failure
46The Gnutella protocol (v0.4)
- PING Notify a peer of your existence
- PONG Reply to a PING request
- QUERY Find a file in the network
- RESPONSE Give the location of a file
- PUSHREQUEST Request a server behind a firewall
to push a file out to a client.
47Gnutella Decentralized Model
48Gnutella Research Directions
- Download failures
- Scalability
- Fragmented development
- Encouragement of content sharing
- Reducing browsing downtime
- Reducing unnecessary network traffic
- Creating and maintaining a healthy network
structure rebalancing, different TTL strategies,
priorities - Addressing security concerns.
49JXTA
- Connecting devices and applications by providing
common P2P services to heterogeneous devices,
operating systems, programming languages, and
applications - Open source
- www.jxta.org
50JXTA
- JXTA defines a set of Protocols
- JXTA defines XML message formats and protocols,
for communication between peers - Protocols are used to discover peers, advertise
and discover resources, communicate and route
messages, and provide monitoring - Asynchronous based on query/response model. Can
be implemented in any language and sent across
different networks
51JXTA Architecture
JXTA Applications
JXTA Services Search Indexing Discovery Membershi
p
JXTA Core Peer groups Peer Pipes Peer
Monitoring Peer Advertisements Peer Ids Security
52Freenet
- Peer-to-peer indexing and searching service.
- Peer-to-peer file downloading.
- Files served use the same route as searches (not
point-to-point) - Provides for anonymity.
53KaZaA/Morpheus
- Hybrid indexing/searching model
- Not centralized like Napster, not decentralized
like Gnutella. - Peer-to-peer file downloading using HTTP.
- SmartStream for incomplete file downloads.
- FastStream for partial file downloads.
- SuperNodes elected dynamically if sufficient
bandwidth and processing power hybrid topology
model. - A central server keeps user registrations, logs
usage, and helps bootstrapping peer discovery.