Title: PerformanceDriven Processor Allocation
1Performance-Driven Processor Allocation
- Julita Corbalan, Xavier Martorell, Jesus Labarta
- juli,xavim,jesus_at_ac.upc.es
- DAC-UPC
2Objective
- Scheduling parallel applications in Shared Memory
Multiprogrammed systems - Allocate processors to applications that
can take advantage of them - Implemented in an SGI Origin2000 with 64
processors
3Outline
- Introduction Related Work
- NANOS Execution Environment
- Performance-Driven Processor AllocationPDPA
- Evaluation
- Conclusions Future Work
4Introduction
- Scheduling problem allocate processors to
applications - Space-Sharing / Time-Sharing
- Number of processes Number of Processors
- Process Control Tucker89
- Space-sharing approaches
- P fixed at submission time
- FCFS, SJF, SCDF Majumdar88,...
- P defined at execution time (Adaptive / Dynamic)
- Equal-allocation of the resources Equipartition
McCan93 - Processor allocation proportional to the
application performance
5Introduction (2)
- Processor allocation proportional to application
performance - Drawback Application performance is not known
before its execution - Solution Calculate it a priori
- Executing several times with different P and
input data - Extrapolate the values based on a few samples
- These approaches may not be valid
- Application performance depends on run-time
parameters Initial data placement, process
migrations, distance between processors and
memory, - It can be impracticable e.g. infinite input data
sets
6Related Work
- Dynamic performance analysis
- Self-Tuning Nguyen96, efficiency calculated at
run-time as a function of idleness, system and
communication overhead - Adaptive/Dynamic processor allocation policies
- Equal_efficiency Nguyen96, tries to achieve the
same efficiency on all processors - Dynamic Allocation, based on the idleness
McCann93 - Allocates the knee of the efficiency/execution
time curve Eager89
7Our proposal
- We propose
- Dynamic performance analysis
- Real speedup
- Calculated at run-time
- Allocate processors to applications that can
take advantage of them - Dynamic partitioning
- Cost conscious re-allocations (memory locality)
- Really efficient use of processors
- Dynamic multiprogramming level
- Coordination between the medium long term
schedulers
8Outline
- Introduction Related work
- NANOS Execution Environment
- Performance-Driven Processor AllocationPDPA
- Evaluation
- Conclusions Future Work
9NANOS Execution Environment
-Controls the application arrival -Coordinated
with the CPU Manager
FCFS
Queued applications
OpenMP Parallel Applications (malleable)
Queueing System
Start new application
-Implements the scheduling policy -Informs the
applications about its decisions -Enforces the
processor allocation
New application?
-Request processors -Informs about its performance
Proc. request, speedup
CPU Manager
Proc. allocated
Resume, bind, ...
SelfAnalyzer
Operating System
Shared Memory Multiprocessor
.
10Outline
- Introduction Related work
- NANOS Execution Environment
- Performance-Driven Processor Allocation PDPA
- Dynamic Performance Analysis SelfAnalyzer
- Performance-Driven Processor Allocation policy
- Dynamic Multiprogramming Level
- Evaluation
- Conclusions Future Work
11Dynamic Performance Analysis SelfAnalyzer
- Tool to estimate the application speedup and
execution time
- Based on iterative parallel applications
- Source code available
- SelfAnalyzer calls inserted by the user or the
compiler - Source code not available
- Dynamic Periodicity Detection
- SelfAnalyzer dynamically loaded
12Dynamic Performance Analysis SelfAnalyzer(2)
- Speedup calculated as the relationship between
T(1) and T(P)
Serialization!!
13Performance-Driven Processor Allocation
- Space-Sharing
- Allocation for acceptable efficiency (S(p)/p)
- In the range low_eff , high_eff 50-70
- Run-To-Completion
- Minimum allocation of one processor
- Dynamic partitioning, re-allocations when
- Applications inform about their speedups
- Application arrival/Application end
- Remembers the application state
- Allocation, performance
14Performance-Driven Processor Allocation(2)
- Policy parameters step, low_eff and high_eff
NewAppl Pmin(Free Proc., Proc. Requested)
NO_REF
Eff(p)lthigh_eff Eff(p)gtlow_eff
DEC
STABLE
INC
15Dynamic Multiprogramming Level
- Multiprogramming level (ML)
- Number of applications running concurrently
- Static/Dynamic ML
- Coordination between the medium long term
schedulers - If (new_appl_fits()?)
start_new_appl() - new_appl_fits() defined by the scheduling policy
- Free processors during several quanta
- start_new_appl() implemented by the queuing
system
16Outline
- Introduction Related work
- NANOS Execution Environment
- Performance-Driven Processor AllocationPDPA
- Evaluation
- Processor Allocation Policies
- Applications Workloads
- Execution Time Processor Allocation
- Conclusions Future Work
17Processor Allocation Policies
- Equip equal CPUs to each running application
- PDPA DML our proposal
- Equal_eff equal efficiency in all the processors
- SGI-MP native IRIX Scheduler
- MP_BLOCKTIME200000
- OMP_DYNAMICTRUE
18Applications Workloads
- Architecture System
- SGI Origin2000 with 64 processors IRIX 6.5.8
- Applications Open MP
- Swim(44.2), Bt(20.85), Hydro2d(6.3), apsi(1)
- Workloads
- Multiprogramming Level set to 4
- Request 32 processors each application
19Exec.Time Proc. Allocation
Limited processor allocation
Total execution time reduced
Appl. exc. time slightly increased
20Exec.Time Proc. Allocation
Performance affected by the multiprogrammed
execution
Total exec. Time improved
Allocation proportional to the performance
21SGI vs. PDPA
4476 vs. 4 processes migrations !!!!
Processor Affinity Process Control
22PDPA behavior (zoom)
Tuning algorithm
23Outline
- Introduction Related Work
- NANOS Execution Environment
- Performance-Driven Processor AllocationPDPA
- Evaluation
- Conclusions Future Work
24Conclusions
- It is important to provide an accurate
performance information - SelfAnalyzer dynamic, accurate, easy to use
- PDPA allocates processors to applications that
can take advantage of them - The Dynamic Multiprogramming Level improves the
system performance - Coordinating the medium long term schedulers
25Future Work
- Dynamic performance analysis
- Non-iterative applications
- PDPA
- Space SharingTime Sharing
- Evaluation in a open environment
- Step, low_eff and high_eff need further research
- Number of reallocations limited
- Coordination medium long term schedulers
- New policies
26More contact info...
- http//www.ac.upc.es/NANOS
- http//www.ac.upc.es/homes/juli
- juli_at_ac.upc.es
27Related Work
- Dynamic performance analysis
- Self-Tuning Nguyen96, efficiency calculated at
run-time as a function of idleness, system and
communication overhead - Dynamic processor allocation policies
- Equal_efficiency Nguyen96, tries to achieve the
same efficiency on all processors - Dynamic Allocation, based on the idleness
McCann93 - Allocates the knee of the efficiency/execution
time curve Eager89
It does not calculate the real speedup
It does not ensure an efficient use of processors
Excessive number of reallocations
Uses a priori information
28Performance-Driven Processor Allocation(3)
- Advantages
- PDPA works with run-time information
- Ensures that processors are always efficiently
used - Drawbacks
- The tuning algorithm can introduce overhead
inside the application - Dynamic step
- Some processors can remain unallocated
- Dynamic Multiprogramming Level