Title: How to Build a Cluster Using GNU/Linux
1How to Build a Cluster Using GNU/Linux
2GRID
Redundancy
STONITH
MPI
NUMA
RMI
High Availability
SMP
Single System Image
API
Reliability
N-Tier Architecture
What is a Cluster?
Heartbeat
Redundancy
No Single Point of Failure
Distributed Objects
MOSIX
SOMITH
NFS
Parallel Processing
CFS
Linux
Multi Version Concurrency Control
Beowulf
Scalability
3CLIENTS
What is a Cluster?
NODES
DATA
4What is a Cluster?
- A cluster is a type of parallel or distributed
system that - Consists of a collection of interconnected whole
computers, - and is used as a single unified computing
resource. - Gregory F. Pfister
- In Search of Clusters
5(No Transcript)
6Four Properties of a Cluster
- Users do not know they are using a cluster
- Nodes within a cluster do not know they are part
of a cluster - Applications running in the cluster do not know
that they are running inside a cluster - Other servers on the network do no know they are
servicing a cluster node
1
2
3
4
7What is a Cluster?
- A cluster is a type of parallel or distributed
system that - Consists of a collection of interconnected whole
computers, - and is used as a
- Gregory F. Pfister
- In Search of Clusters
single unified computing
resource
8Users do not know they are using a cluster
1
- If users do know they are using a cluster,
they are using distinct, distributed servers and
not a single unified computing resource.
9What is a Cluster?
- A cluster is a type of parallel or distributed
system that - Consists of a collection of interconnected
computers, - and is used as a
- Gregory F. Pfister
- In Search of Clusters
whole
single unified computing
resource
10Nodes within a cluster do not know they are part
of a cluster
2
- The failure of one node has no effect on the
other nodes (they are whole or complete). - A node can be rebooted or removed without
affecting the other nodes.
11Applications running in the cluster do not know
that they are running inside a cluster
3
- If an application must be modified to run inside
the cluster, then the application is no longer
using the cluster as a single unified computing
resource. - The cluster architecture does not force objects
within the cluster to send messages to each
other.
12Other servers on the network do not know they are
servicing a cluster node
4
- The servers that provide services to the cluster
nodes also do not know that they are talking to
nodes inside a cluster.
13Unified
14A Single Unified Computing Resource
15A Single Unified Computing Resource
16A Highly Available Single Unified Computing
Resource
No Single Point of Failure
17(No Transcript)
18Definition of Terms
- Process - A running program
- Daemon - A process running on Linux
- Service - A daemon and the effects it produces
- Resource - Service and its operating environment
(configuration files, network mechanism used to
access the service) - Failover - When a resource moves from one
computer to another
19Load Balancing
CLIENTS
NODES
DATA
20Load Balancing
21Load Balancing
- Netfilter Hooks
- Routing
- Linux Virtual Server
22Load Balancing
23Load Balancing
24Load Balancing
25Load Balancing
26Load Balancing
27Load Balancing
28Load Balancing
29Load Balancing
.
30High Availability Cluster
31High Availability
- Failover the cluster load balancing resource to a
backup Load Balancer (Backup Director) - Remove cluster nodes from the cluster when they
fail
32Highly Available Load Balancer
Failover Cluster Load Balancing Resource
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38(No Transcript)
39Ability To Remove Failed Nodes
Ldirectord Removes Failed Nodes
40(No Transcript)
41(No Transcript)
42(No Transcript)
43Authoritative Data
44-
- Every ten years programmers notice that the
number of distributed applications is relatively
small. They look at the programming interfaces
and they think that the problem is that the
programming model is not close enough to whatever
programming model is currently in vogue (messages
in the 1970s, procedure calls in the 1980s, and
the objects in the 1990s). A furious bout of
language and protocol design takes place and a
new distributed computing paradigm is announced
that is compliant with the latest programming
model. - A Note on Distributed Computing
- Waldo, Wyant, Wollarth, Kendall
45Objects in a Cluster
The cluster architecture does not force objects
within the cluster to send messages to each other.
46Authoritative Data
- Session data (a web page shopping cart) is stored
outside the cluster. - All nodes share access to authoritative data.
- If one node fails other nodes can resume session
using this session data. - Objects within the cluster are not forced to
communicate with each other.
47Authoritative Data Stored Outside the Cluster
HTTP POST
HTTP POST
Transaction
SQL Server
48Authoritative Data - Concurrency
- Shared Storage is implemented by a device outside
the cluster - The shared storage device (NAS, SQL Server)
provides - Object Persistence Layer
- Concurrency Control (Lock Arbitration)
- Eliminates the need for objects in the cluster to
implement a concurrency control mechanism using
messages
49The GNU/Linux Enterprise Cluster in the Data
Center
50(No Transcript)
51(No Transcript)
52(No Transcript)
53(No Transcript)
54(No Transcript)
55(No Transcript)
56(No Transcript)
57www.linuxvirtualserver.orgwww.linux-ha.orgwww.sy
stemimager.org