Automatic Clustering of Grid Nodes - PowerPoint PPT Presentation

About This Presentation
Title:

Automatic Clustering of Grid Nodes

Description:

3 Landmarks: UT(Austin), Rice, CMU. 36 Compute Nodes: Rice, UT-Dallas, TAMU-College Station, TAMU-Galveston. Intra-Domain Clustering ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 35
Provided by: qian3
Learn more at: https://www2.cs.uh.edu
Category:

less

Transcript and Presenter's Notes

Title: Automatic Clustering of Grid Nodes


1
Automatic Clustering of Grid Nodes
  • Nov 14, 2005
  • Qiang Xu, Jaspal Subhlok
  • University of Houston

2
Grid Scheduler
Network Link Latency, Bandwidth
Computational Resource CPU, memory
I will decide which group of nodes are best for
an application!!!
Network Topology
3
Network Topology
  • Fine-grained physical network topology --- Hard!
  • heterogeneous, dynamic, and distributed nature
    of a grid system
  • We focus on the logical network topology
  • logical network topology the connectivity
    between nodes based on the observed behavior.
  • 1) Easier to compute
  • 2) Sufficient to tackle the resource
    selection problem

4
Discover Clusters/Logical Topology
A set of nodes with IP addresses /
hostnames Connectivity?
5
Discover Clusters/Logical Topology
Cluster A
Dist(AB)
Dist(AC)
Dist(BC)
Cluster C
Cluster B
nodes close to each other ? same cluster
6
Outline
  • Introduction
  • Internet ? Geometric Space
  • Automatic Clustering
  • Experiments and Result
  • Conclusion

7
Internet Topology Map 1
A macroscopic snapshot of the Internet 4 April
2005 - 17 April 2005.
8
Internet Topology Map 2
Internet map as of 1998 by Bill Cheswick, Bell
Labs Hal Burch, CMU
9
Why Geometric Space ?
Internet Topology Map --- Complex! Geometric
Space (N-Dimension Euclidean Space)
GNP(Global Network Positioning) --- T. S. Eugene
Ng and Hui Zhang, INFOCOM'02
I cant tell the distance between nodes!!
10
Magic Landmarks!
12
3
8
Landmark
Node
Landmarks A set of distributed nodes across the
internet
11
Geometric Space
  1. One axis per landmark
  2. Coordinate of nodes Latency from each landmark.

X412
Z43
Y48
12
Internet ? Geometric Space
Simple Geometric Space
Complex Internet Structure
13
Advantage of Geometric Space
  • Simple --- distance in Geometric Space is well
    defined, e.g. the Euclidean distance.
  • Scalable --- for M Nodes
  • Pairwise distance among M nodes ? MM probes
  • Mapping to Geometric space ? MN probes
  • N is the number of landmarks a number 7
    is
  • known to be sufficient.
  • Easy to manage --- only need to control the
    landmarks

14
Outline
  • Introduction
  • Internet ? Geometric Space
  • Automatic Clustering
  • Experiments and Result
  • Conclusion

15
Again the problem!
16
Place Nodes in Geometric Space !
How do I cluster?
Simple Geometric Space
17
Distance and Threshold
  • Network Distance
  • Threshold
  • If Distance lt Threshold, nodes belong to the same
    logical cluster
  • N is the of landmarks
  • T parameter describes how close nodes have to be
    to be in the same cluster
  • for a typical domain to be one cluster ,T 1ms


18
Build Unidirected Graph
  • All grid nodes are graph nodes
  • Add an edge between nodes if Distance lt Threshold


19
Typical Case
  • Edge exist if Distance lt Threshold

Clusters are obvious and easy to distinguish!
20
Pathological Case
  • Border Node ?
  • Where are the clusters?
  • General Case Find maximal cliques in the
    graph each clique is a cluster

21
Summary of Inter-domain Clustering
  1. Place Nodes in the geometric space.
  2. Calculate the Euclidean distance.
  3. Build a graph based on distance and Threshold.
  4. Find the maximal cliques.

inter-domain clustering --- ? good!
intra-domain clustering --- ? not good enough!
22
Intra-domain clustering
  • Nodes in the same domain but in different
    subnets.
  • Short latency --- less than 1ms.
  • Landmark-based approach --- resolution is not
    sufficient!
  • measurement error real latency
  • We need to change the approach for intra-domain
    clustering !

23
Intra-domain Clustering
  • Distance between nodes is directly measured
    latency instead of projected geometrical
    distance.
  • (M M but M is smaller and measurements are
    quick.)
  • Basis for clustering is relative
  • Distance between any two nodes inside a
    cluster is within ß of the smallest distance in
    the cluster.

24
Intra-domain Clustering Procedure
Initially each node is a cluster Each edge is
measured latency
REPEAT Select least cost edge, say connecting
clusters A and B If A and B are not the same
cluster and if this edge cost is within ß of
least cost edges inside A and B, then combine
them into one cluster
25
Outline
  • Introduction
  • Internet ? Geometric Space
  • Automatic Clustering
  • Experiments and Result
  • Conclusion

26
Experiments
  • Inter-Domain Clustering
  • 3 Landmarks UT(Austin), Rice, CMU
  • 36 Compute Nodes Rice, UT-Dallas, TAMU-College
    Station, TAMU-Galveston
  • Intra-Domain Clustering
  • 4 clusters at University of Houston
  • PGH201, Itanium, Opetron, Stokes
  • TCP Ping(not ICMP Ping) to measure latency

27
Inter-domain Cluster ( 2 landmarks)
  • Cannot
  • distinguish
  • between
  • UT Dallas
  • TAMU Galveston
  • UT Dallas
  • TAMU Galveston
  • TAMU College Station
  • ?Rice

28
Inter-domain Cluster ( 3 landmarks)
  • 4 clusters
  • are well
  • distinguished
  • UT Dallas
  • TAMU Galveston
  • TAMU College Station
  • ?Rice

29
Inter-domain Cluster ( 2 landmarks)
  • UT Dallas
  • TAMU Galveston
  • TAMU College Station
  • ?Rice

30
Intra-domain Cluster latency
Clusters PGH201 Opteron Itanium Stokes
PGH201 0.09 0.32 0.32 0.30
Opteron 0.25 0.09 0.09 0.50
Itanium 0.30 0.10 0.10 0.35
Stokes 0.40 0.50 0.60 0.10



Latency between Nodes (ms)
31
Illustration of Intra-domain Clusters
  • UT Dallas
  • TAMU Galveston
  • TAMU College Station
  • ?Rice

32
Future Work
  • Integrate into a grid scheduling system
  • Use Bandwidth as a factor for clustering
  • Dynamically update logical clusters
  • Nodes behind a NAT (Network address translation)
    -- nodes with local IP addresses

33
Conclusions
  • Efficient and scalable procedure to
    hierarchically group distributed nodes into
    logical clusters
  • Validation with experiments on nodes distributed
    across Texas
  • An important step for scheduling in a grid
    environment.

34
Questions?
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com