PeertoPeer and GRID Computing, 2G1526 Lecture 01 - PowerPoint PPT Presentation

1 / 42

About This Presentation

Title:

PeertoPeer and GRID Computing, 2G1526 Lecture 01

Description:

Use the forum and not email as much as possible. Do not be afraid. 2004-12-22 ... lookup(key) data. Insert(key, data) Send(IP address, data) Receive (IP address) data ... – PowerPoint PPT presentation

Number of Views:116

Avg rating:3.0/5.0

Slides: 43

Provided by: sei112

Category:

more less

Transcript and Presenter's Notes

Title: PeertoPeer and GRID Computing, 2G1526 Lecture 01

1
Peer-to-Peer and GRID Computing, 2G1526Lecture
01

Seif Haridi
LECS, KTH
Seif_at_imit.kth.se

2
Overview

Organization
Course overview
Getting started (introduction to distributed
systems, Peer-to-Peer systems and distributed
algorithms)

3
Organization/Objectives
4
Objectives

Introduction (P2P, DHTs)
Understand some of the fundamental results of
distributed algorithms (3-4 lectures)
Study Peer-to-Peer Systems and Algorithms in more
detail (4-5 lectures)
Focus is on algorithmic aspects
Introduction to GRID systems
Hand-on experience with Peer-to-Peer middleware,
and GRID services
Learn how to read/present research papers

5
Non objectives

Learning in detail about all middleware for
constructing GRID applications
Learn how to program distributed applications
Web services
Java and distributed computing
Mozart and distributed computing
Look at
M.L. Liu, Distributed Computing
P. Van Roy and S. Haridi, Concepts, Techniques
and Models of Computer Programming

6
Distributed Systems2G1526

2G1526
written final assignment and presentation 60
Midterm exam 20
Assignments through the course 20
Course homepage
http//www.imit.kth.se/courses/2G1526
Teaching
Lectures
Consultation using groupware
Teaching assistant
Ali Ghodsi

7
Teachers

Course responsible Lectures
Seif Haridi seif_at_imit.kth.se
Teaching assistant exercises/tutorials
Ali Ghodsi ali_at_sics.se

8
Lecture Structure

Reminder of last lecture
Overview
Content
Summary
Reading suggestions

9
Material

Lectures are based on mainly on the following
book
(Distributed Algorithms) Hagit Attiya and
Jennifer Welch, Distributed Computing,
fundamentals, simulations, and advanced topics
(Peer-to-Peer systems), Research Papers,
available on the course webpage
(GRID), The Grid Blueprint for a New Computing
Infrastructure, 2nd Edition, Morgan Kaufmann,
2004. ISBN 1-55860-933-4
The handouts are in most cases self explanatory
Available from the webpage, the day before the
lecture

10
Reading Suggestions

Will be available on webpage (Lectures)

11
Assignments

There will be one final assignments
You will have to study and present few (2-3)
research papers
Other assignments will be delivered at the
tutorial sessions

12
General information

Reading of papers
In groups of two or three
Each group will read one or two research papers
For each paper studied
Identify the problem
Explain the solution(s) presented in the paper
Identify positive and negative aspects of the
paper
Propose your own solution if any
Provide a report
Give a presentation to the class

13
Assignment Groups

Done at the first tutorial/exercise session

14
Feedback in General

Approach me directly, (any time) or arrange for
appointment
Use the forum and not email as much as possible
Do not be afraid

15
Questions and Using Brakes!

Please do ask questions during the lectures
repeat an explanation
give better explanation
for an example?
Please say when things go too fast!
Please say when things go too slow!

16
Background Knowledge

I assume the following some knowledge on
Programming languages knowledge C/Java
Operating systems knowledge basic concepts
Networking basic concepts
Algorithms and data structures
Some knowledge of distributed systems
I will try to be as elementary as possible
Ask me if lacking some knowledge

17
Course Overview
18
Distributed system

A simplified view

Processor
Communication Medium
Process
Thread
Communication channel
Node processor/process
19
Distributed System

Set of computing nodes that cooperate in order to
achieve a well defined goal
Nodes cooperate through communication
Communication is by message passing at the
fundamental level

20
Parallel vs. Distributed System

Parallel systems
All processors are employed to perform one large
task
Distributed systems
Each processor has its own semi-independent
agenda, for various reasons
Sharing of resources
Availability and fault-tolerance
Processors need to coordinate their actions

21
What is a Distributed System?

Distributed hardware
N processing elements (processor memory),
processor, PE
Interconnected by some network
Distributed software
No centralized OS, each PE has its own copy of OS
No physically centralized file system
Means for inter-process communication

22
Why Distributed Systems?

Information exchange (collaborative work)
Resource sharing (e.g. printer, backup storage,
disk units, etc.)
Resource sharing (applications, information,
media, services)
Cost reduction
Increase of availability (partial-failure)
Increase of performance through parallelism,...

23
Main characteristics

Asynchrony
Absolute and relative times at which events take
place are not known precisely
Limited local knowledge
Each node is aware only of information it
acquires through communication
Each node has a local view (no global view)
Partial Failures
Each component (node/network channel) can fail
independently. Other components continue to
operate

24
Basic Algorithms in Message-Passing Systems

Initially models for message passing systems with
no failures, later various failure models are
introduced
Two timing models
Synchronous
Asynchronous
Complexity measures
Number of messages
Time

Introduction to Peer-to-Peer Systems

26
P2P an exciting social development

Internet users cooperating to share, for example,
music files
Napster, Gnutella, Morpheus, KaZaA, etc.
Skype, free Internet IP telephony
Lots of attention from the popular press
The ultimate form of democracy on the Internet
The ultimate threat to copy-right protection on
the Internet
Many vendors have launched P2P efforts

27
What is P2P?
Client
Client
Client
Internet
Client
Client

A distributed system architecture
No centralized control
Nodes are symmetric in function
Typically many nodes, but unreliable and
heterogeneous

28
Traditional Distributed Computing client/server
Server
Client
Client
Internet
Client
Client

Successful architecture, and will continue to be
so
Tremendous engineering necessary to make server
farms scalable and robust

29
Application-level overlays
Site 3
Site 2
N
N
N
ISP1
ISP2
Site 1
N
N
ISP3

One per application
Nodes are decentralized

Site 4
N
P2P systems are overlay networks without central
control
30
(Potential) P2P advantages

Allows for scalable incremental growth
Aggregate tremendous amount of computation and
storage resources
Tolerate faults or intentional attacks

31
Example P2P problem lookup
N2
N1
N3
Internet
Keytitle Valuefile data
?
Client
Publisher
Lookup(title)
N6
N4
N5

At the heart of all P2P systems

32
Centralized lookup (Napster)
N2
N1
SetLoc(title, N4)
N3
Client
DB
N4
Publisher_at_
Lookup(title)
Keytitle Valuefile data
N8
N9
N7
N6
Simple, but O(N) state and a single point of
failure
33
Flooded queries (Gnutella)
N2
N1
Lookup(title)
N3
Client
N4
Publisher_at_
Keytitle ValueMP3 data
N6
N8
N7
N9
Robust, but worst case O(N) messages per
lookup No guarantees to find the data item
34
Distributed Hash Tables
Distributed applications
data
Lookup (key)
Insert(key, data)
Distributed hash tables
.
node
node
node

Nodes are the hash buckets
Key identifies data uniquely
DHT balances keys and data across nodes
DHT replicates, caches, routes lookups, etc.

35
Why DHTs now?

Demand pulls
Growing need for security and robustness
Large-scale distributed apps are difficult to
build
Many applications use location-independent data
Technology pushes
Bigger, faster, and better every PC can be a
server
Scalable lookup algorithms are available
Trustworthy systems from untrusted components

36
DHT is a good interface
DHT
UDP/IP
Send(IP address, data) Receive (IP address) ? data
lookup(key) ? data Insert(key, data)

Supports a wide range of applications, because
few restrictions
Keys have no semantic meaning
Value is application dependent
Minimal interface

37
DHT is a good shared infrastructure

Applications inherit some security and robustness
from DHT
DHT replicates data
Resistant to malicious participants
Low-cost deployment
Self-organizing across administrative domains
Allows to be shared among applications
Large scale supports Internet-scale workloads

38
DHTs support many applications

Distributed File Systems CFS, OceanStore,
PAST,Arla/DKS
Web cache/archives Squirrel, ..
Censor-resistant stores Eternity, FreeNet,..
Event notification Scribe, DKS
Naming systems ChordDNS, INS, ..
Query and indexing Kademlia,
Communication primitives I3,
Backup store HiveNet
Distributed Authorizations Delegation

data is location-independent
39
Cooperative read-only file sharing
File system
block
Lookup (key)
insert (key, block)
Distributed hash tables
.
node
node
node

DHT is a robust block store
Client of DHT implements the block storage of the
file system

40
File representationself-authenticating data
File System key995
431SHA-1
901 SHA-1
144 SHA-1

995 key901 key732 Signature
key431 key795
a.txt ID144

(i-node block)

(data)
(root block)
(directory blocks)

DHT key for block is SHA-1(content block)

41
File representationself-authenticating data
File System key995
431SHA-1
901 SHA-1
144 SHA-1

995 key901 key732 Signature
key431 key795
a.txt ID144

(i-node block)

(data)
(root block)
(directory blocks)