Grid performance, grid benchmarks, grid metrics presentation

About This Presentation

Transcript and Presenter's Notes

Title: Grid performance, grid benchmarks, grid metrics

1
Grid performance, grid benchmarks, grid metrics

Zsolt Németh
MTA SZTAKI Computer and Automation Research
Institute
zsnemeth_at_sztaki.hu
http//www.lpds.sztaki.hu/zsnemeth

2
Outline

What is the grid?
What is grid performance?
Are benchmarks useful?
How can be grid metrics defined?

3
What is the grid?
4
Distributed applications

A set of cooperative processes

5
Distributed applications

Processes require resources

Printer
Network
Memory
CPU
Database
Storage
Librabries
I/O devices
6
Distributed applications

Resources can be found on computational nodes

Network
Printer
CPU
Storage
Mapping
Memory
Database
I/O devices
Libraries
CPU
7
Distributed applications
Application Cooperative processes

Process control?
Security?
Naming?
Communication?
Input / output?
File access?

Physical layer Computational nodes
8
Distributed applications
Application Cooperative processes

Virtual machine
Process control ?
Security ?
Naming ?
Communication ?
Input / output ?
File access ?

Physical layer Computational nodes
9
Conventional distributed environments and grids

Distributed resources are virtually unified by a
software layer
A virtual machine is introduced between the
application and the physical layer
Provides a single system image to the application
Types
Conventional (PVM, some implementations of MPI)
Grid (Globus, Legion)

10
Conventional distributed environments and grids

What is the essential difference?

11
Conventional distributed environments and grids

Geographical extent?

12
Conventional distributed environments and grids

Performance?

13
Conventional distributed environments and grids

Tools and services?

14
Conventional distributed environments and grids

How is the virtual machine built up?
What does execution mean?
What is the semantics of execution?

15
Description of grid

flexible, secure, coordinated resource sharing
among dynamic collections of individuals,
institutions and resources (The anatomy of the
grid)
single, seamless, computational environment in
which cycles, communication and data are shared
(Legion the Next Step Toward a Nationwide
Virtual Computer)
widearea environment that transparently consists
of workstations, personal computers, graphic
rendering engines, supercomputers and
nontraditional devices (Legion - A View from
50,000 Feet)
collection of geographically separated resources
connected by a high speed network, a software
layer which transforms a collection of
independent resources into a single, coherent
virtual machine (Metacomputing - Whats in it
for me)

16
Conventional environments

Processes
Have resource requests

Mapping
Processes are mapped onto nodes
Resource assignment is implicit

Physical level
17
Grid

Processes
Have resource requirements

Mapping
Assign nodes to resources?

Physical layer
18
Grid the resource abstraction

Processes
Have resource needs

Physical layer
19
Grid the user abstraction

Processes
Belong to a user

User of the virtual machine is authorised to use
the constituting resources
Have no login access to the node the resource
belongs to

Physical layer
Local, physical users (user accounts)

20
The grid abstraction

Semantically the grid is nothing but abstraction
Resource abstraction
Physical resources can be assigned to virtual
resource needs (matched by properties)
Grid provides a mapping between virtual and
physical resources
User abstraction
User of the physical machine may be different
from the user of the virtual machine
Grid provides a temporal mapping between virtual
and physical users

21
Conventional distributed environments and grids
Smith 4 nodes
Smith, 4 CPU, memory, storage
Smith 1 CPU
smith_at_n1.edu
smith_at_n1.edu
default_at_foo.com
griduser_at_mynode.hu
smith_at_n2.edu
22
Grid performance
23
What is grid performance at all?

Performance of grid infrastructure or
performance of grid application?
Traditionally performance is
Speed
Throughput
Bandwidth, etc.
Using grids
Quantitative reasons
Qualitative reasons QoS
Economic aspects

24
Grid performance analysis scenarios

Resource brokering evaluate the performance of a
given resource if it is appropriate for a certain
job
At runtime check if a resource can maintain an
acceptable/required performance
At runtime check if a job can evolve according
to checkpoints
Find obvious idling/waiting spots
Find bad communication patterns
Find serious performance skew
Post mortem see if brokering strategy was
correct
Etc.

25
What is grid performance at all?

supercomputer

cluster

26
What is grid performance at all?

supercomputer
task is done in 20 minutes

cluster
task is done in 12 hours

27
What is grid performance at all?

supercomputer
task is done in 20 minutes
available tomorrow night

cluster
task is done in 12 hours
available now

28
What is grid performance at all?

supercomputer
task is done in 20 minutes
available tomorrow night
costs 200/hour

cluster
task is done in 12 hours
available now
costs 15/hour

29
What is grid performance at all?

Grid is about resource sharing
What is the benefit of sharing
acceptable for resource owners
acceptable for resource users
Speed, bandwidth, capacity, etc. is just one
aspect
Properness, fairness, effectiveness of assignment
of processes to resources

30
Grid performance
Performance?
31
Grid performance
Performance?
Virtual layer
Physical layer
Measurement
32
Grid performance
Performance?
Virtual layer
Physical layer
Measurement
33
Interaction of application and the infrastructure

Performance application perf. ? infrastructure
perf.
Signature model (Pablo group)
Application signature
e.g. instructions/FLOPs
Scaling factor (capabilities of the resources)
e.g. FLOPs/seconds
Execution signature
application signature scaling factor
E.g. instructions/second instructions/FLOPS
FLOPs/seconds

34
Possible performance problems in grids

All that may occur in a distributed application
Plus
Effectiveness of resource brokering
Synchronous availability of resources
Resources may change during execution
Various local policies
Shared use of resources
Higher costs of some activities
The corresponding symptoms must be characterised

35
Grid performance metrics

Abstract representation of measurable quantities
MR1xR2x...Rn
Usual metrics
Speedup, efficiency
Load, queue length, etc.
Such strict values are not characteristic in grid
Cannot be interpreted
Cannot be compared
New metrics
Local metrics and grid metrics
Symbolic description / metrics

36
Processing monitoring information

Trace data reduction
Proportional to time t, processes P, metrics
dimension n
Statistical clustering (reducing P)
Similar temporal behaviours are classified
Questionnable if works for grids
Representative processes are recorded for each
class
Statistical projection pursuit (reducing n)
reduces the dimension by identifying significant
metrics
Sampling frequency (reducing t)

37
Performance tuning, optimisation

The execution cannot be reproduced
Post-mortem optimisation is not viable
On-line steering is necessary though, hard to
realise
Sensors and actuators
Application and implementation dependent
E.g Autopilot, Falcon
Average behaviour of applications can be improved
Post-mortem tuning of the infrastructure (if
possible)
Brokering decisions
Supporting services

38
Grid benchmarking
39
Grid performance,resource performance

The traditional way benchmarking
As suggested by GGF-GBRG

40
Running benchmarks

Benchmarks are executed on a virtual machine

41
Running benchmarks

Benchmarks are executed on a virtual machine
The virtual machine may change (composed of
different resources) from run to run

42
Running benchmarks

Benchmarks are executed on a virtual machine
The virtual machine may change (composed of
different resources) from run to run
Benchmark result is representative to one certain
virtual machine

43
Running benchmarks

Benchmarks are executed on a virtual machine
The virtual machine may change (composed of
different resources) from run to run
Benchmark result is representative to one certain
virtual machine
What can it show about the entire grid?
What can it show about a certain resource?

44
Grid benchmarking
Measurement
Performance?
Virtual layer
Physical layer
45
Grid metrics
46
Local metrics

Load averages, CPU user, system, idle
percentages, network bandwidth, cache hit ratio,
available memory, page faults, etc.
Performance is a trajectory in a
multi-dimensional space
Cannot be compared
Cannot be interpreted
processes 55.2, user 70, system 0, idle 30
underloaded 64-CPU system
processes 55.2, user 70, system 30, idle 0
64-CPU system, serious overheads
processes 72.8, user 99, system 1, idle 0
slightly overloaded 64-CPU system
processes 4.1, user 99, system 1, idle 0
seriously overloaded 1-CPU system
Fine details are even more complex to evaluate

47
Local metrics, global (grid) metrics

Local metrics are transformed into some globally
understandable performance figures
What are the dimensions?
What is the transformation?

48
Global metrics

MIPS, MFLOPS, Gbit/s, etc.
Comparable, interpretable
Most users have no idea about the computing power
they really require
These are usually nominal and not actual values
Too general characterisation fine details are
hidden

49
Benchmark metrics

Benchmarks are for comparing computer systems
A well selected benchmark set
sensitive to different factors CPU intensive,
communication intensive, I/O intensive jobs
able to show fine details cache behaviour,
floating point capabilities, etc.
able to show behaviour at different levels
instruction, loop, procedure, application
These figures can be obtained actively require
time, resources

50
Benchmark metrics

Given a local database with local and benchmark
performance records
get the local performance figures
low cost OS functionality
look up the database for benchmark performance
there may not be record for actual local
performance
symbolic (fuzzy) interpolation
the actual benchmark figures can be estimated
actual execution of benchmarks is costly if not
impossible
Estimated benchmark figures give a
characterisation of the system in a comparable
and interpretable way
Sounds reasonable but not enough

51
Benchmark metrics

Benchmarks may show actual execution performance
but it is not enough
Real-life experiments execution time may show no
correlation to actual load
start every job and suffer resource starvation
wait until resources are available and start
specific jobs
Resource management policy must be taken into
consideration

52
Job startup times

corona.iif.hu, SUN Ultra Enterprise 10000, 64 CPU
Sun Grid Engine
Time between submission and actual start
1 processor job within 1 minute
2 processor job mostly within 1 minute
4 processor job 2-3 hours
8 processor job 1-2 days
9 processor job 1-2 days
16 processor job 2-3 days
25 processor job gt 4-5 days
See online
http//www.lpds.sztaki.hu/zsnemeth/apart/statisti
cs/statistics.shtml

53
Resource performance characterisation

Execution phase resource performance can be
characterized in the space of benchmark metrics
analyse relationship between local metrics a
benchmark results
find the principal components
Waiting phase a stochastic model
find the parameters of the distribution

54
Resource performance characterisation

These parameters (?i, ?i, t1, t2,tn ) can be
distributed in an information system
Interpretable the stochastic model and the
benchmark set give an appropriate framework
Comparable figures have the same meaning within
this framework

55
Ongoing work

Exploring the statistical properties of
benchmarks and system parameters
Intensive benchmark experiments
Getting the most out of figures
Principal component analysis which figures are
really meaningful
Testing the stability of statistic data
http//www.lpds.sztaki.hu/zsnemeth/apart/statisti
cs/statistics.shtml
Exploring the way how benchmark results can be
estimated from past measurements
Database management
Symbolic interpolation

56
Conclusion

A semantic definition for grids
the presence of user and resource abstraction
Grid performance has a more complex meaning
Resource abstraction requires abstraction in the
performance characterisation, too
separation of local (physical) an global
(virtual) metrics
benchmarking is not viable
but benchmarks can serve as metrics
Experiments with resource characterisation

Write a Comment

User Comments (0)

About PowerShow.com

Grid performance, grid benchmarks, grid metrics PowerPoint PPT Presentation