hadoop training in hyderabad@kellytechnologies - PowerPoint PPT Presentation

About This Presentation
Title:

hadoop training in hyderabad@kellytechnologies

Description:

Hadoop Institutes : kelly technologies is the best Hadoop Training Institutes in Hyderabad. Providing Hadoop training by real time faculty in Hyderabad. www.kellytechno.com/Hyderabad/Course/Hadoop-Training – PowerPoint PPT presentation

Number of Views:51
Slides: 27
Provided by: kellytechnologies
Category: Other

less

Transcript and Presenter's Notes

Title: hadoop training in hyderabad@kellytechnologies


1
Distributed Computing Overviews
Presented By
Kelly Technologies
www.kellytechno.com
2
Agenda
  • What is distributed computing
  • Why distributed computing
  • Common Architecture
  • Best Practice
  • Case study
  • Condor
  • Hadoop HDFS and map reduce

www.kellytechno.com
3
What is Distributed Computing/System?
  • Distributed computing
  • A field of computing science that studies
    distributed system.
  • The use of distributed systems to solve
    computational problems.
  • Distributed system
  • Wikipedia
  • There are several autonomous computational
    entities, each of which has its own local memory.
  • The entities communicate with each other by
    message passing.
  • Operating System Concept
  • The processors communicate with one another
    through various communication lines, such as
    high-speed buses or telephone lines.
  • Each processor has its own local memory.

www.kellytechno.com
4
What is Distributed Computing/System?
  • Distributed program
  • A computing program that runs in a distributed
    system
  • Distributed programming
  • The process of writing distributed program

www.kellytechno.com
5
What is Distributed Computing/System?
  • Common properties
  • Fault tolerance
  • When one or some nodes fails, the whole system
    can still work fine except performance.
  • Need to check the status of each node
  • Each node play partial role
  • Each computer has only a limited, incomplete view
    of the system. Each computer may know only one
    part of the input.
  • Resource sharing
  • Each user can share the computing power and
    storage resource in the system with other users
  • Load Sharing
  • Dispatching several tasks to each nodes can help
    share loading to the whole system.
  • Easy to expand
  • We expect to use few time when adding nodes. Hope
    to spend no time if possible.

www.kellytechno.com
6
Why Distributed Computing?
  • The nature of application
  • Performance
  • Computing intensive
  • The task could consume a lot of time on
    computing. For example, p
  • Data intensive
  • The task that deals with a lot mount or large
    size of files. For example, Facebook, LHC(Large
    Hadron Collider).
  • Robustness
  • No SPOF (Single Point Of Failure)
  • Other nodes can execute the same task executed on
    failed node.

www.kellytechno.com
7
Common Architectures
  • Communicate and coordinate works among concurrent
    processes
  • Processes communicate by sending/receiving
    messages
  • Synchronous/Asynchronous

www.kellytechno.com
8
Common Architectures
  • Master/Slave architecture
  • Master/slave is a model of communication where
    one device or process has unidirectional control
    over one or more other devices
  • Database replication
  • Source database can be treated as a master and
    the destination database can treated as a slave.
  • Client-server
  • web browsers and web servers

www.kellytechno.com
9
Common Architectures
  • Data-centric architecture
  • Using a standard, general-purpose relational
    database management system ?? customized
    in-memory or file-based data structures and
    access method
  • Using dynamic, table-driven logic in ?? logic
    embodied in previously compiled programs
  • Stored procedures ?? logic running in
    middle-tier application servers
  • Shared databases as the basis for communicating
    between parallel processes ?? direct
    inter-process communication via message passing
    function

www.kellytechno.com
10
Best Practice
  • Data Intensive or Computing Intensive
  • Data size and the amount of data
  • The attribute of data you consume
  • Computing intensive
  • We can move data to the nodes where we can
    execute jobs
  • Data Intensive
  • We can separate/replicate data to difference
    nodes, then we can execute our tasks on these
    nodes
  • Reduce data replication when executing tasks
  • Master nodes need to know data location
  • No data loss when incidents happen
  • SAN (Storage Area Network)
  • Data replication on different nodes
  • Synchronization
  • When splitting tasks to different nodes, how can
    we make sure these tasks are synchronized?

www.kellytechno.com
11
Best Practice
  • Robustness
  • Still safe when one or partial nodes fail
  • Need to recover when failed nodes are online. No
    further or few action is needed
  • Condor restart daemon
  • Failure detection
  • When any nodes fails, master nodes can detect
    this situation.
  • Eg Heartbeat detection
  • App/Users dont need to know if any partial
    failure happens.
  • Restart tasks on other nodes for users

www.kellytechno.com
12
Best Practice
  • Network issue
  • Bandwidth
  • Need to think of bandwidth when copying files
    from one node to other nodes if we would like to
    execute the task on the nodes if no data in these
    nodes.
  • Scalability
  • Easy to expand
  • Hadoop configuration modification and start
    daemon
  • Optimization
  • What can we do if the performance of some nodes
    is not good?
  • Monitoring the performance of each node
  • According to any information exchange like
    heartbeat or log
  • Resume the same task on another nodes

www.kellytechno.com
13
Best Practice
  • App/User
  • shouldnt know how to communicate between nodes
  • User mobility user can access the system from
    some point or anywhere
  • Grid UI (User interface)
  • Condor submit machine

www.kellytechno.com
14
Case study - Condor
  • Condor
  • Computing intensive jobs
  • Queuing policy
  • Match task and computing nodes
  • Resource Classification
  • Each resource can advertise its attributes and
    master can classify according to this

www.kellytechno.com
15
Case study - Condor
www.kellytechno.com
16
Case study - Condor
  • Role
  • Central Manger
  • The collector of information, and the negotiator
    between resources and resource requests
  • Execution machine
  • Responsible for executing condor tasks
  • Submit machine
  • Responsible for submitting condor tasks
  • Checkpoint servers
  • Responsible for storing all checkpoint files for
    the tasks

www.kellytechno.com
17
Case study - Condor
  • Robustness
  • One execution machine fails
  • We can execute the same task on other nodes.
  • Recovery
  • Only need to restart the daemon when the failed
    nodes are online

www.kellytechno.com
18
Case study - Condor
  • Resource sharing
  • Each condor user can share computing power with
    other condor users.
  • Synchronization
  • Users need to take care by themselves
  • Users can execute MPI job in a condor pool but
    need to think of the issues of synchronization
    and Deadlock.
  • Failure detection
  • Central manager can know when nodes fails
  • Based on update notification sent by nodes
  • Scalability
  • Only execute few commands when new nodes are
    online.

www.kellytechno.com
19
Case study - Hadoop
  • HDFS
  • Namenode
  • manages the file system namespace and regulates
    access to files by clients.
  • determines the mapping of blocks to DataNodes.
  • Data Node
  • manage storage attached to the nodes that they
    run on
  • save CRC codes
  • send heartbeat to namenode.
  • Each data is split as a chunk and each chuck is
    stored on some data nodes.
  • Secondary Namenode
  • responsible for merging fsImage and EditLog

www.kellytechno.com
20
Case study - Hadoop
www.kellytechno.com
21
Case study - Hadoop
  • Map-reduce Framework
  • JobTracker
  • Responsible for dispatch job to each tasktracker
  • Job management like removing and scheduling.
  • TaskTracker
  • Responsible for executing job. Usually
    tasktracker launch another JVM to execute the
    job.

www.kellytechno.com
22
Case study - Hadoop
www.kellytechno.com
From Hadoop - The Definitive Guide
23
Case study - Hadoop
  • Data replication
  • Data are replicated to different nodes
  • Reduce the possibility of data loss
  • Data locality. Job will be sent to the node where
    data are.
  • Robustness
  • One datanode fails
  • We can get data from other nodes.
  • One tasktracker failed
  • We can start the same task on different node
  • Recovery
  • Only need to restart the daemon when the failed
    nodes are online

www.kellytechno.com
24
Case study - Hadoop
  • Resource sharing
  • Each hadoop user can share computing power and
    storage space with other hadoop users.
  • Synchronization
  • No synchronization
  • Failure detection
  • Namenode/Jobtracker can know when
    datanode/tasktracker fails
  • Based on heartbeat

www.kellytechno.com
25
Case study - Hadoop
  • Scalability
  • Only execute few commands when new nodes are
    online.
  • Optimization
  • A speculative task is launched only when a task
    takes too much time on one node.
  • The slower task will be killed when the other one
    has been finished

www.kellytechno.com
26
Thank You
www.kellytechno.com
Write a Comment
User Comments (0)
About PowerShow.com