PeertoPeer and GRID Computing, 2G1526 Lecture 01 - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

PeertoPeer and GRID Computing, 2G1526 Lecture 01

Description:

Use the forum and not email as much as possible. Do not be afraid. 2004-12-22 ... lookup(key) data. Insert(key, data) Send(IP address, data) Receive (IP address) data ... – PowerPoint PPT presentation

Number of Views:116
Avg rating:3.0/5.0
Slides: 43
Provided by: sei112
Category:

less

Transcript and Presenter's Notes

Title: PeertoPeer and GRID Computing, 2G1526 Lecture 01


1
Peer-to-Peer and GRID Computing, 2G1526Lecture
01
  • Seif Haridi
  • LECS, KTH
  • Seif_at_imit.kth.se

2
Overview
  • Organization
  • Course overview
  • Getting started (introduction to distributed
    systems, Peer-to-Peer systems and distributed
    algorithms)

3
Organization/Objectives
4
Objectives
  • Introduction (P2P, DHTs)
  • Understand some of the fundamental results of
    distributed algorithms (3-4 lectures)
  • Study Peer-to-Peer Systems and Algorithms in more
    detail (4-5 lectures)
  • Focus is on algorithmic aspects
  • Introduction to GRID systems
  • Hand-on experience with Peer-to-Peer middleware,
    and GRID services
  • Learn how to read/present research papers

5
Non objectives
  • Learning in detail about all middleware for
    constructing GRID applications
  • Learn how to program distributed applications
  • Web services
  • Java and distributed computing
  • Mozart and distributed computing
  • Look at
  • M.L. Liu, Distributed Computing
  • P. Van Roy and S. Haridi, Concepts, Techniques
    and Models of Computer Programming

6
Distributed Systems2G1526
  • 2G1526
  • written final assignment and presentation 60
  • Midterm exam 20
  • Assignments through the course 20
  • Course homepage
  • http//www.imit.kth.se/courses/2G1526
  • Teaching
  • Lectures
  • Consultation using groupware
  • Teaching assistant
  • Ali Ghodsi

7
Teachers
  • Course responsible Lectures
  • Seif Haridi seif_at_imit.kth.se
  • Teaching assistant exercises/tutorials
  • Ali Ghodsi ali_at_sics.se

8
Lecture Structure
  • Reminder of last lecture
  • Overview
  • Content
  • Summary
  • Reading suggestions

9
Material
  • Lectures are based on mainly on the following
    book
  • (Distributed Algorithms) Hagit Attiya and
    Jennifer Welch, Distributed Computing,
    fundamentals, simulations, and advanced topics
  • (Peer-to-Peer systems), Research Papers,
    available on the course webpage
  • (GRID), The Grid Blueprint for a New Computing
    Infrastructure, 2nd Edition, Morgan Kaufmann,
    2004. ISBN 1-55860-933-4
  • The handouts are in most cases self explanatory
  • Available from the webpage, the day before the
    lecture

10
Reading Suggestions
  • Will be available on webpage (Lectures)

11
Assignments
  • There will be one final assignments
  • You will have to study and present few (2-3)
    research papers
  • Other assignments will be delivered at the
    tutorial sessions

12
General information
  • Reading of papers
  • In groups of two or three
  • Each group will read one or two research papers
  • For each paper studied
  • Identify the problem
  • Explain the solution(s) presented in the paper
  • Identify positive and negative aspects of the
    paper
  • Propose your own solution if any
  • Provide a report
  • Give a presentation to the class

13
Assignment Groups
  • Done at the first tutorial/exercise session

14
Feedback in General
  • Approach me directly, (any time) or arrange for
    appointment
  • Use the forum and not email as much as possible
  • Do not be afraid

15
Questions and Using Brakes!
  • Please do ask questions during the lectures
  • repeat an explanation
  • give better explanation
  • for an example?
  • Please say when things go too fast!
  • Please say when things go too slow!

16
Background Knowledge
  • I assume the following some knowledge on
  • Programming languages knowledge C/Java
  • Operating systems knowledge basic concepts
  • Networking basic concepts
  • Algorithms and data structures
  • Some knowledge of distributed systems
  • I will try to be as elementary as possible
  • Ask me if lacking some knowledge

17
Course Overview
18
Distributed system
  • A simplified view

Processor
Communication Medium
Process
Thread
Communication channel
Node processor/process
19
Distributed System
  • Set of computing nodes that cooperate in order to
    achieve a well defined goal
  • Nodes cooperate through communication
  • Communication is by message passing at the
    fundamental level

20
Parallel vs. Distributed System
  • Parallel systems
  • All processors are employed to perform one large
    task
  • Distributed systems
  • Each processor has its own semi-independent
    agenda, for various reasons
  • Sharing of resources
  • Availability and fault-tolerance
  • Processors need to coordinate their actions

21
What is a Distributed System?
  • Distributed hardware
  • N processing elements (processor memory),
    processor, PE
  • Interconnected by some network
  • Distributed software
  • No centralized OS, each PE has its own copy of OS
  • No physically centralized file system
  • Means for inter-process communication

22
Why Distributed Systems?
  • Information exchange (collaborative work)
  • Resource sharing (e.g. printer, backup storage,
    disk units, etc.)
  • Resource sharing (applications, information,
    media, services)
  • Cost reduction
  • Increase of availability (partial-failure)
  • Increase of performance through parallelism,...

23
Main characteristics
  • Asynchrony
  • Absolute and relative times at which events take
    place are not known precisely
  • Limited local knowledge
  • Each node is aware only of information it
    acquires through communication
  • Each node has a local view (no global view)
  • Partial Failures
  • Each component (node/network channel) can fail
    independently. Other components continue to
    operate

24
Basic Algorithms in Message-Passing Systems
  • Initially models for message passing systems with
    no failures, later various failure models are
    introduced
  • Two timing models
  • Synchronous
  • Asynchronous
  • Complexity measures
  • Number of messages
  • Time

25
  • Introduction to Peer-to-Peer Systems

26
P2P an exciting social development
  • Internet users cooperating to share, for example,
    music files
  • Napster, Gnutella, Morpheus, KaZaA, etc.
  • Skype, free Internet IP telephony
  • Lots of attention from the popular press
  • The ultimate form of democracy on the Internet
  • The ultimate threat to copy-right protection on
    the Internet
  • Many vendors have launched P2P efforts

27
What is P2P?
Client
Client
Client
Internet
Client
Client
  • A distributed system architecture
  • No centralized control
  • Nodes are symmetric in function
  • Typically many nodes, but unreliable and
    heterogeneous

28
Traditional Distributed Computing client/server
Server
Client
Client
Internet
Client
Client
  • Successful architecture, and will continue to be
    so
  • Tremendous engineering necessary to make server
    farms scalable and robust

29
Application-level overlays
Site 3
Site 2
N
N
N
ISP1
ISP2
Site 1
N
N
ISP3
  • One per application
  • Nodes are decentralized

Site 4
N
P2P systems are overlay networks without central
control
30
(Potential) P2P advantages
  • Allows for scalable incremental growth
  • Aggregate tremendous amount of computation and
    storage resources
  • Tolerate faults or intentional attacks

31
Example P2P problem lookup
N2
N1
N3
Internet
Keytitle Valuefile data
?
Client
Publisher
Lookup(title)
N6
N4
N5
  • At the heart of all P2P systems

32
Centralized lookup (Napster)
N2
N1
SetLoc(title, N4)
N3
Client
DB
N4
Publisher_at_
Lookup(title)
Keytitle Valuefile data
N8
N9
N7
N6
Simple, but O(N) state and a single point of
failure
33
Flooded queries (Gnutella)
N2
N1
Lookup(title)
N3
Client
N4
Publisher_at_
Keytitle ValueMP3 data
N6
N8
N7
N9
Robust, but worst case O(N) messages per
lookup No guarantees to find the data item
34
Distributed Hash Tables
Distributed applications
data
Lookup (key)
Insert(key, data)
Distributed hash tables
.
node
node
node
  • Nodes are the hash buckets
  • Key identifies data uniquely
  • DHT balances keys and data across nodes
  • DHT replicates, caches, routes lookups, etc.

35
Why DHTs now?
  • Demand pulls
  • Growing need for security and robustness
  • Large-scale distributed apps are difficult to
    build
  • Many applications use location-independent data
  • Technology pushes
  • Bigger, faster, and better every PC can be a
    server
  • Scalable lookup algorithms are available
  • Trustworthy systems from untrusted components

36
DHT is a good interface
DHT
UDP/IP
Send(IP address, data) Receive (IP address) ? data
lookup(key) ? data Insert(key, data)
  • Supports a wide range of applications, because
    few restrictions
  • Keys have no semantic meaning
  • Value is application dependent
  • Minimal interface

37
DHT is a good shared infrastructure
  • Applications inherit some security and robustness
    from DHT
  • DHT replicates data
  • Resistant to malicious participants
  • Low-cost deployment
  • Self-organizing across administrative domains
  • Allows to be shared among applications
  • Large scale supports Internet-scale workloads

38
DHTs support many applications
  • Distributed File Systems CFS, OceanStore,
    PAST,Arla/DKS
  • Web cache/archives Squirrel, ..
  • Censor-resistant stores Eternity, FreeNet,..
  • Event notification Scribe, DKS
  • Naming systems ChordDNS, INS, ..
  • Query and indexing Kademlia,
  • Communication primitives I3,
  • Backup store HiveNet
  • Distributed Authorizations Delegation

data is location-independent
39
Cooperative read-only file sharing
File system
block
Lookup (key)
insert (key, block)
Distributed hash tables
.
node
node
node
  • DHT is a robust block store
  • Client of DHT implements the block storage of the
    file system

40
File representationself-authenticating data
File System key995
431SHA-1
901 SHA-1
144 SHA-1

995 key901 key732 Signature
key431 key795
a.txt ID144

(i-node block)

(data)
(root block)
(directory blocks)
  • DHT key for block is SHA-1(content block)

41
File representationself-authenticating data
File System key995
431SHA-1
901 SHA-1
144 SHA-1

995 key901 key732 Signature
key431 key795
a.txt ID144

(i-node block)

(data)
(root block)
(directory blocks)
  • A Merkle tree is a tree where the value
    associated with a node is one way function of
    the values of nodes children

42
Backup store
  • Goal backup on other users machines
  • Observations
  • Many user machines are not backed up
  • Backup requires significant manual effort
  • Many machines have lots of spare disk
  • Using DHT
  • Merkle tree to validate integrity of data
  • Administrative and financial costs are less for
    all participants
  • Backups are robust (automatic off-site backups)
  • Blocks are stored once, if key sha1(data)
Write a Comment
User Comments (0)
About PowerShow.com