P2PDisCo%20 - PowerPoint PPT Presentation

About This Presentation
Title:

P2PDisCo%20

Description:

P2PDisCo Java Distributed Computing for Workstations Using Chedar Peer-to-Peer Middleware Presentation for 7th International Workshop on Javatm for Parallel and ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 23
Provided by: Jark1
Category:

less

Transcript and Presenter's Notes

Title: P2PDisCo%20


1
P2PDisCo Java Distributed Computing for
Workstations Using Chedar Peer-to-Peer Middleware
Presentation for 7th International Workshop on
Javatm for Parallel and Distributed Computing
(IWJPDC 2005)
  • 4.4.2005
  • Mikko Vapa, research student
  • Department of Mathematical Information Technology
  • University of Jyväskylä, Finland
  • http//tisu.it.jyu.fi/cheesefactory
  • With co-authors Niko Kotilainen, Matthieu Weber,
    Joni Töyrylä and Jarkko Vuori

2
Overview
  • This paper introduces Peer-to-Peer Distributed
    Computing (P2PDisCo) software providing interface
    for distributing the computation of Java programs
    to multiple workstations
  • P2PDisCo has been built over Chedar peer-to-peer
    middleware
  • Currently, P2PDisCo is being used for speeding up
    the training of neural networks with evolutionary
    algorithm

3
Peer-to-Peer Networks
  • Peer-to-Peer (P2P) networks allow sharing of
    resources (e.g., computing power, storage space,
    network bandwidth, printers) over the Internet
  • In contrast to clusters, in P2P networks all the
    tasks and responsibilities for managing the
    network are shared between peers
  • This means that there exists no single control
    entity responsible for providing the services
  • Because P2P networks do not require a dedicated
    hardware, distributing computation among
    workstations is usually a cost-effective solution

4
Related Work
  • There are many alternatives for distributing
    computing using Java programming language
  • Programming language independent distributed
    computing tools such as Globus Toolkit with Java
    Commodity Grid Kit
  • Programming language dependent Java distributed
    computing software
  • Java extensions requiring changes to Java
    compiler and/or Java Virtual Machine (JVM)
  • JavaParty as an example
  • Java libraries providing special class libraries
    without a need for modifications to the Java
    compiler or JVM
  • JavaSymphony and P2PDisCo as examples

5
Related Work
  • Many of these distribution tools have some
    centralized components in them
  • Globus uses centralized indexes for resource
    discovery whereas in P2PDisCo the resource
    discovery is decentralized and provided by the
    Chedar peer-to-peer network
  • In JavaSymphony all the computing resources are
    centrally configured under JS-Shell whereas in
    P2PDisCo no central management exists
  • There are also some implementations of Java
    distributed computing that use peer-to-peer
    network for locating resources
  • An example of such system is GT-P2PRMI allowing
    Remote Method Invocation (RMI) bindings and
    lookups to be executed via a modified RMIRegistry
    called P2PRMIRegistry

6
Chedar P2P Middleware
  • Chedar (CHEap Distributed Architecture) is
    peer-to-peer middleware designed for the needs of
    peer-to-peer applications
  • Chedar constructs a pure peer-to-peer network
    using topology management algorithms and provides
    functionalities for locating resources in the
    network
  • Implementation of Chedar is based on Java
    Standard Edition, thus providing platform
    independency and easy adaptation to different
    hardware

7
Chedar P2P Middleware
  • Each Chedar node maintains a database of locally
    available resources for example information about
    which applications are running on the device or
    what files are located in the node
  • Resources can contain meta-information about
    themselves for example the version number for
    applications and last modification date for files
  • Resource database is stored as an XML document
    using a specific Document Type Definition (DTD)
  • This organization of data allows making rich and
    complex queries to the database in the form of
    XPath expressions

8
Chedar P2P Middleware
  • Chedar node keeps a list of neighbors it is
    connected to through TCP sockets
  • TCP provides reliable data delivery and the
    disappearance of a neighbor can be detected with
    TCP timeout
  • The neighbor list is updated based on heuristics
    such as number of relayed query replies and the
    actual query replies provided by the neighbor to
    form an efficient topology for resource discovery
  • As a search mechanism we currently use
    Breadth-First Search (BFS) algorithm, which
    scales to small network sizes and guarantees to
    locate all resources in the network
  • In our experiments, the query traffic in the
    network of 200 workstations with 100 Mb/s
    Ethernet connections has not yet posed a
    significant problem and therefore a more
    efficient version of the query algorithm has not
    been implemented

9
Chedar P2P Middleware
  • Each query contains a Message-ID and a query
    XPath description
  • Whenever a query enters a Chedar node, the node
    checks its resource database for matching
    resources to XPath expression and if resource is
    found, a reply message is sent back using the
    route, which the query came from
  • To properly relay the reply message back to the
    query originator, the message needs to contain
    the same Message-ID as the query had
  • For communicating between two peers, Chedar
    provides a point-to-point communication protocol
    allowing basic message passing primitives to be
    executed by P2P applications
  • The protocol uses the same path as the reply
    message to deliver messages between peers

10
Peer-to-Peer Distributed Computing
  • Problem
  • Evolving neural networks in a simulator needs a
    lot of computing power
  • One computer is not enough for many research
    cases
  • Solution
  • Distribute computation across desktop computers
    all over the University of Jyväskylä
  • It has to be as invisible as possible to the user
    of the network simulator
  • The simulator should not interfere with the
    desktop use of the distributed computers
  • As a solution Peer-to-Peer Distributed Computing
    (P2PDisCo) was developed on top of Chedar

11
P2PDisCo - Architecture
  • The node that wants to distribute its computation
    (denoted as master) needs to query resources,
    receive query replies and send data (parameters)
    for the computation
  • The node that offers computation time has to
    implement Distributed interface to be able to
    receive start, stop and is application running
    signals
  • Reading of parameters and writing of results are
    done for the streams offered by P2PDisCo

12
P2PDisCo - Architecture
Chedar node
Chedar node
Chedar node
Master
Chedar node
Chedar node
13
P2PDisCo - Architecture
Chedar node
Chedar node
Chedar node
Master
Chedar node
Chedar node
14
P2PDisCo - Architecture
Chedar node
Chedar node
Query
Query
Chedar node
Master
Chedar node
Chedar node
Query who has the resource NetSimulator
available?
15
P2PDisCo - Architecture
Reply
Chedar node
Chedar node
Reply
Reply
Chedar node
Master
Chedar node
Chedar node
Reply I do!
16
P2PDisCo - Architecture
Task
Chedar node
Chedar node
Task
Task
Chedar node
Master
Chedar node
Chedar node
17
P2PDisCo - Architecture
Chedar node
Chedar node
Chedar node
Master
Chedar node
Chedar node
Computation
18
P2PDisCo - Architecture
Result
Chedar node
Chedar node
Result
Result
Chedar node
Master
Chedar node
Chedar node
Results are sent back to the master
node. Calculation ends and everybody is happy.
19
What happens inside a Chedar node
Chedar node starts the distributed application,
hijacking its file operations.
Distributed program
Task
Chedar
Result
Any Java program that uses files to read input
and store output can be distributed
20
Security Concerns
  • Because of security concerns the distributed
    application has been beforehand installed to the
    computers and it is not automatically delivered
    during the task distribution
  • In the task distribution only the execution
    parameters i.e. configuration files are
    transferred
  • Also, currently the IP addresses of master nodes
    are restricted such that only certain IP
    addresses are allowed to start computations

21
Future Work
  • At this time P2PDisCo is just a tool for our
    research project to speed up the computations of
    NeuroSearch neural network resource discovery
    algorithm
  • Possible improvements
  • Checkpointing of computation such that if
    connection is lost the computation can be resumed
    from the same point
  • Master could leave the network and gather results
    afterwards
  • Extending API of P2PDisCo to allow direct
    communication between computing nodes, which
    makes it possible to parallelize the evolutionary
    algorithm for multiple computers with other
    architectures than master-slave, such as the
    panmictic model commonly used for parallelization
    of evolutionary algorithms

22
Thank You!
  • Any questions?
Write a Comment
User Comments (0)
About PowerShow.com