Grid performance, grid benchmarks, grid metrics - PowerPoint PPT Presentation

About This Presentation
Title:

Grid performance, grid benchmarks, grid metrics

Description:

Load averages, CPU user, system, idle percentages, network bandwidth, cache hit ... to different factors: CPU intensive, communication intensive, I/O intensive jobs ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 57
Provided by: cyfrone
Category:

less

Transcript and Presenter's Notes

Title: Grid performance, grid benchmarks, grid metrics


1
Grid performance, grid benchmarks, grid metrics
  • Zsolt Németh
  • MTA SZTAKI Computer and Automation Research
    Institute
  • zsnemeth_at_sztaki.hu
  • http//www.lpds.sztaki.hu/zsnemeth

2
Outline
  • What is the grid?
  • What is grid performance?
  • Are benchmarks useful?
  • How can be grid metrics defined?

3
What is the grid?
4
Distributed applications
  • A set of cooperative processes

5
Distributed applications
  • Processes require resources

Printer
Network
Memory
CPU
Database
Storage
Librabries
I/O devices
6
Distributed applications
  • Resources can be found on computational nodes

Network
Printer
CPU
Storage
Mapping
Memory
Database
I/O devices
Libraries
CPU
7
Distributed applications
Application Cooperative processes
  • Process control?
  • Security?
  • Naming?
  • Communication?
  • Input / output?
  • File access?

Physical layer Computational nodes
8
Distributed applications
Application Cooperative processes
  • Virtual machine
  • Process control ?
  • Security ?
  • Naming ?
  • Communication ?
  • Input / output ?
  • File access ?

Physical layer Computational nodes
9
Conventional distributed environments and grids
  • Distributed resources are virtually unified by a
    software layer
  • A virtual machine is introduced between the
    application and the physical layer
  • Provides a single system image to the application
  • Types
  • Conventional (PVM, some implementations of MPI)
  • Grid (Globus, Legion)

10
Conventional distributed environments and grids
  • What is the essential difference?

11
Conventional distributed environments and grids
  • Geographical extent?

12
Conventional distributed environments and grids
  • Performance?

13
Conventional distributed environments and grids
  • Tools and services?

14
Conventional distributed environments and grids
  • How is the virtual machine built up?
  • What does execution mean?
  • What is the semantics of execution?

15
Description of grid
  • flexible, secure, coordinated resource sharing
    among dynamic collections of individuals,
    institutions and resources (The anatomy of the
    grid)
  • single, seamless, computational environment in
    which cycles, communication and data are shared
    (Legion the Next Step Toward a Nationwide
    Virtual Computer)
  • widearea environment that transparently consists
    of workstations, personal computers, graphic
    rendering engines, supercomputers and
    nontraditional devices (Legion - A View from
    50,000 Feet)
  • collection of geographically separated resources
    connected by a high speed network, a software
    layer which transforms a collection of
    independent resources into a single, coherent
    virtual machine (Metacomputing - Whats in it
    for me)

16
Conventional environments
  • Processes
  • Have resource requests
  • Mapping
  • Processes are mapped onto nodes
  • Resource assignment is implicit

Physical level
17
Grid
  • Processes
  • Have resource requirements
  • Mapping
  • Assign nodes to resources?

Physical layer
18
Grid the resource abstraction
  • Processes
  • Have resource needs

Physical layer
19
Grid the user abstraction
  • Processes
  • Belong to a user
  • User of the virtual machine is authorised to use
    the constituting resources
  • Have no login access to the node the resource
    belongs to
  • Physical layer
  • Local, physical users (user accounts)

20
The grid abstraction
  • Semantically the grid is nothing but abstraction
  • Resource abstraction
  • Physical resources can be assigned to virtual
    resource needs (matched by properties)
  • Grid provides a mapping between virtual and
    physical resources
  • User abstraction
  • User of the physical machine may be different
    from the user of the virtual machine
  • Grid provides a temporal mapping between virtual
    and physical users

21
Conventional distributed environments and grids
Smith 4 nodes
Smith, 4 CPU, memory, storage
Smith 1 CPU
smith_at_n1.edu
smith_at_n1.edu
default_at_foo.com
griduser_at_mynode.hu
smith_at_n2.edu
22
Grid performance
23
What is grid performance at all?
  • Performance of grid infrastructure or
    performance of grid application?
  • Traditionally performance is
  • Speed
  • Throughput
  • Bandwidth, etc.
  • Using grids
  • Quantitative reasons
  • Qualitative reasons QoS
  • Economic aspects

24
Grid performance analysis scenarios
  1. Resource brokering evaluate the performance of a
    given resource if it is appropriate for a certain
    job
  2. At runtime check if a resource can maintain an
    acceptable/required performance
  3. At runtime check if a job can evolve according
    to checkpoints
  4. Find obvious idling/waiting spots
  5. Find bad communication patterns
  6. Find serious performance skew
  7. Post mortem see if brokering strategy was
    correct
  8. Etc.

25
What is grid performance at all?
  • supercomputer
  • cluster

26
What is grid performance at all?
  • supercomputer
  • task is done in 20 minutes
  • cluster
  • task is done in 12 hours

27
What is grid performance at all?
  • supercomputer
  • task is done in 20 minutes
  • available tomorrow night
  • cluster
  • task is done in 12 hours
  • available now

28
What is grid performance at all?
  • supercomputer
  • task is done in 20 minutes
  • available tomorrow night
  • costs 200/hour
  • cluster
  • task is done in 12 hours
  • available now
  • costs 15/hour

29
What is grid performance at all?
  • Grid is about resource sharing
  • What is the benefit of sharing
  • acceptable for resource owners
  • acceptable for resource users
  • Speed, bandwidth, capacity, etc. is just one
    aspect
  • Properness, fairness, effectiveness of assignment
    of processes to resources

30
Grid performance
Performance?
31
Grid performance
Performance?
Virtual layer
Physical layer
Measurement
32
Grid performance
Performance?
Virtual layer
Physical layer
Measurement
33
Interaction of application and the infrastructure
  • Performance application perf. ? infrastructure
    perf.
  • Signature model (Pablo group)
  • Application signature
  • e.g. instructions/FLOPs
  • Scaling factor (capabilities of the resources)
  • e.g. FLOPs/seconds
  • Execution signature
  • application signature scaling factor
  • E.g. instructions/second instructions/FLOPS
    FLOPs/seconds

34
Possible performance problems in grids
  • All that may occur in a distributed application
  • Plus
  • Effectiveness of resource brokering
  • Synchronous availability of resources
  • Resources may change during execution
  • Various local policies
  • Shared use of resources
  • Higher costs of some activities
  • The corresponding symptoms must be characterised

35
Grid performance metrics
  • Abstract representation of measurable quantities
  • MR1xR2x...Rn
  • Usual metrics
  • Speedup, efficiency
  • Load, queue length, etc.
  • Such strict values are not characteristic in grid
  • Cannot be interpreted
  • Cannot be compared
  • New metrics
  • Local metrics and grid metrics
  • Symbolic description / metrics

36
Processing monitoring information
  • Trace data reduction
  • Proportional to time t, processes P, metrics
    dimension n
  • Statistical clustering (reducing P)
  • Similar temporal behaviours are classified
  • Questionnable if works for grids
  • Representative processes are recorded for each
    class
  • Statistical projection pursuit (reducing n)
  • reduces the dimension by identifying significant
    metrics
  • Sampling frequency (reducing t)

37
Performance tuning, optimisation
  • The execution cannot be reproduced
  • Post-mortem optimisation is not viable
  • On-line steering is necessary though, hard to
    realise
  • Sensors and actuators
  • Application and implementation dependent
  • E.g Autopilot, Falcon
  • Average behaviour of applications can be improved
  • Post-mortem tuning of the infrastructure (if
    possible)
  • Brokering decisions
  • Supporting services

38
Grid benchmarking
39
Grid performance,resource performance
  • The traditional way benchmarking
  • As suggested by GGF-GBRG

40
Running benchmarks
  • Benchmarks are executed on a virtual machine

41
Running benchmarks
  • Benchmarks are executed on a virtual machine
  • The virtual machine may change (composed of
    different resources) from run to run

42
Running benchmarks
  • Benchmarks are executed on a virtual machine
  • The virtual machine may change (composed of
    different resources) from run to run
  • Benchmark result is representative to one certain
    virtual machine

43
Running benchmarks
  • Benchmarks are executed on a virtual machine
  • The virtual machine may change (composed of
    different resources) from run to run
  • Benchmark result is representative to one certain
    virtual machine
  • What can it show about the entire grid?
  • What can it show about a certain resource?

44
Grid benchmarking
Measurement
Performance?
Virtual layer
Physical layer
45
Grid metrics
46
Local metrics
  • Load averages, CPU user, system, idle
    percentages, network bandwidth, cache hit ratio,
    available memory, page faults, etc.
  • Performance is a trajectory in a
    multi-dimensional space
  • Cannot be compared
  • Cannot be interpreted
  • processes 55.2, user 70, system 0, idle 30
  • underloaded 64-CPU system
  • processes 55.2, user 70, system 30, idle 0
  • 64-CPU system, serious overheads
  • processes 72.8, user 99, system 1, idle 0
  • slightly overloaded 64-CPU system
  • processes 4.1, user 99, system 1, idle 0
  • seriously overloaded 1-CPU system
  • Fine details are even more complex to evaluate

47
Local metrics, global (grid) metrics
  • Local metrics are transformed into some globally
    understandable performance figures
  • What are the dimensions?
  • What is the transformation?

48
Global metrics
  • MIPS, MFLOPS, Gbit/s, etc.
  • Comparable, interpretable
  • Most users have no idea about the computing power
    they really require
  • These are usually nominal and not actual values
  • Too general characterisation fine details are
    hidden

49
Benchmark metrics
  • Benchmarks are for comparing computer systems
  • A well selected benchmark set
  • sensitive to different factors CPU intensive,
    communication intensive, I/O intensive jobs
  • able to show fine details cache behaviour,
    floating point capabilities, etc.
  • able to show behaviour at different levels
    instruction, loop, procedure, application
  • These figures can be obtained actively require
    time, resources

50
Benchmark metrics
  • Given a local database with local and benchmark
    performance records
  • get the local performance figures
  • low cost OS functionality
  • look up the database for benchmark performance
  • there may not be record for actual local
    performance
  • symbolic (fuzzy) interpolation
  • the actual benchmark figures can be estimated
  • actual execution of benchmarks is costly if not
    impossible
  • Estimated benchmark figures give a
    characterisation of the system in a comparable
    and interpretable way
  • Sounds reasonable but not enough

51
Benchmark metrics
  • Benchmarks may show actual execution performance
    but it is not enough
  • Real-life experiments execution time may show no
    correlation to actual load
  • start every job and suffer resource starvation
  • wait until resources are available and start
    specific jobs
  • Resource management policy must be taken into
    consideration

52
Job startup times
  • corona.iif.hu, SUN Ultra Enterprise 10000, 64 CPU
  • Sun Grid Engine
  • Time between submission and actual start
  • 1 processor job within 1 minute
  • 2 processor job mostly within 1 minute
  • 4 processor job 2-3 hours
  • 8 processor job 1-2 days
  • 9 processor job 1-2 days
  • 16 processor job 2-3 days
  • 25 processor job gt 4-5 days
  • See online
  • http//www.lpds.sztaki.hu/zsnemeth/apart/statisti
    cs/statistics.shtml

53
Resource performance characterisation
  • Execution phase resource performance can be
    characterized in the space of benchmark metrics
  • analyse relationship between local metrics a
    benchmark results
  • find the principal components
  • Waiting phase a stochastic model
  • find the parameters of the distribution

54
Resource performance characterisation
  • These parameters (?i, ?i, t1, t2,tn ) can be
    distributed in an information system
  • Interpretable the stochastic model and the
    benchmark set give an appropriate framework
  • Comparable figures have the same meaning within
    this framework

55
Ongoing work
  • Exploring the statistical properties of
    benchmarks and system parameters
  • Intensive benchmark experiments
  • Getting the most out of figures
  • Principal component analysis which figures are
    really meaningful
  • Testing the stability of statistic data
  • http//www.lpds.sztaki.hu/zsnemeth/apart/statisti
    cs/statistics.shtml
  • Exploring the way how benchmark results can be
    estimated from past measurements
  • Database management
  • Symbolic interpolation

56
Conclusion
  • A semantic definition for grids
  • the presence of user and resource abstraction
  • Grid performance has a more complex meaning
  • Resource abstraction requires abstraction in the
    performance characterisation, too
  • separation of local (physical) an global
    (virtual) metrics
  • benchmarking is not viable
  • but benchmarks can serve as metrics
  • Experiments with resource characterisation
Write a Comment
User Comments (0)
About PowerShow.com