Designing Parallel Operating Systems via Parallel Programming - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Designing Parallel Operating Systems via Parallel Programming

Description:

Euro-Par - August 31- September 3, 2004 - Pisa (Italy) Designing Parallel Operating Systems ... September 3, 2004 - Pisa (Italy) Global Strobe (time slice ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 38
Provided by: juanfe
Category:

less

Transcript and Presenter's Notes

Title: Designing Parallel Operating Systems via Parallel Programming


1
Designing Parallel Operating Systemsvia Parallel
Programming
Eitan Frachtenberg1, Kei Davis1, Fabrizio
Petrini1, Juan Fernández1,2 and José Carlos
Sancho1 1Performance and Architecture Lab
(PAL) 2Grupo de Arquitectura y Computación
Paralelas (GACOP) CCS-3 Modeling, Algorithms
and Informatics Dpto. Ingeniería y Tecnología
de Computadores Los Alamos National Laboratory,
NM 87545, USA Universidad de Murcia, 30071
Murcia, SPAIN URL http//www.c3.lanl.gov
URL http//www.ditec.um.es
emailjuanf_at_um.es
2
Motivation
  • Clusters have been the most successful player in
    high-performance computing in the last decade

OS
OS
OS
OS
OS
OS
OS
OS
HARDWARE Independent Nodes High-speed
Network SOFTWARE Commodity OS Parallel Apps
System Software
3
Motivation
  • Ever-increasing demand for computing capability
    is driving the construction of ever-larger
    clusters

2
3
1
Earth Simulator 5120 Processors
Thunder (LLNL) 4096 Processors
ASCI Q (LANL) 8192 Processors
Systems are becoming more complex, less efficient
and less reliable
4
Motivation
  • Clusters are loosely-coupled systems used for
    solving inherently tightly-coupled problems
  • Parallel software keeps all the pieces together
  • Development of parallel software is a time- and
    resource- consuming task due to its complexity

PROBLEM parallel software has neither evolved
nor scaled accordingly to cluster sizes
SOLUTION new approach to the design of parallel
software for large-scale clusters
5
Goals
  • Target
  • New methodology for the design of parallel
    software
  • Simplicity, performance, scalability, reliability
  • Backbone to integrate all nodes into a parallel
    OS
  • Vision
  • BSP-like system running MIMD applications
  • (variable granularity in the order of hundreds
    of ?s)
  • Approach
  • BSP-like global control and coordination of all
    system activities
  • Small set of collective communication primitives
    for global coordination

6
Outline
  • Motivation and Goals
  • Toward a Parallel Operating System
  • Core Primitives
  • Parallel Software Design
  • Case Studies
  • Concluding remarks

7
Toward a Parallel OS
  • Designing a Parallel OS
  • Lack of global coordination (loose coupling)
  • Redundant/missing functionality (complexity)

Resource Management
Parallel Application
. . .
Parallel File System
Comm Protocol 1
Comm Protocol 2
. . .
Comm Protocol N
Hardware
8
Toward a Parallel OS
  • Scientific applications are tightly coupled
  • Data dependencies between nodes
  • They exchange messages very often
  • but the processing nodes are bolted together
    in a loosely coupled fashion

Need for global control and coordination of all
the system activities, enforced by global
collective communication primitives
9
Toward a Parallel OS
  • Designing a Parallel OS
  • System-level, global control and coordination of
    all application and system software activities

10
Toward a Parallel OS
  • Parallel applications use point-to-point and
    collective communication
  • System software tasks are either collective
    operations or can be cast in terms of them

Parallel applications and system software can be
built atop the same communication primitives
11
Toward a Parallel OS
  • Designing a Parallel OS
  • Least common denominator of system and
    application software ? Core Primitives

Resource Management
Parallel Application
. . .
Parallel File System
Global control and coordination
Comm Protocol 1
Comm Protocol 2
. . .
Comm Protocol N
Core Primitives
Hardware
12
Outline
  • Motivation and Goals
  • Toward a Parallel Operating System
  • Core Primitives
  • Parallel Software Design
  • Case Studies
  • Concluding remarks

13
Core Primitives
  • Parallel software built atop three primitives
  • Xfer-And-Signal
  • Transfer block of data to a set of nodes
  • Optionally signal local/remote event upon
    completion
  • Test-Event
  • Poll local event
  • Compare-And-Write
  • Compare global variable on a set of nodes
  • Optionally write global variable on the same set
    of nodes

14
Core Primitives
  • Parallel software built atop three primitives
  • Xfer-And-Signal (QsNet)
  • Node S transfers block of data to nodes D1, D2,
    D3 and D4

S
15
Core Primitives
  • Parallel software built atop three primitives
  • Xfer-And-Signal (QsNet)
  • Node S transfers block of data to nodes D1, D2,
    D3 and D4
  • Events triggered at source and destinations

S
16
Core Primitives
  • Parallel software built atop three primitives
  • Compare-And-Write (QsNet)
  • Node S compares variable V on nodes D1, D2, D3
    and D4

S
  • Is V ?, ? ?, gt to Value?

17
Core Primitives
  • Parallel software built atop three primitives
  • Compare-And-Write (QsNet)
  • Node S compares variable V on nodes D1, D2, D3
    and D4
  • Partial results are combined in the switches

S
18
Outline
  • Motivation and Goals
  • Toward a Parallel Operating System
  • Core Primitives
  • Parallel Software Design
  • Case Studies
  • Concluding remarks

19
Toward a Parallel OS
  • Global control/coordination of all system
    activities
  • Global Strobe
  • (time slice starts)

Task 1
  • Global
  • Synchronization

Task 2
Time Slice (hundreds of ?s)
  • Global
  • Synchronization

Task 3
  • Global Strobe
  • (time slice ends)

20
Parallel Software Design
  • Using the core primitives
  • Global control and coordination
  • Strobe sent at regular intervals (time slices)
  • Compare-And-Write Xfer-And-Signal (Master)
  • Test-Event (Slaves)
  • All system activities are tightly coupled
  • Global information is required to schedule
    resources, global synchronization facilitates the
    task but it is not enough
  • Global resource scheduling
  • Exchange of requirements/restrictions
  • Xfer-And-Signal Test-Event
  • Resource scheduling

21
Parallel Software Design
SYSTEM SOFTWARE
22
Parallel Software Design
  • Using the core primitives

23
Parallel Software Design
Can we really build system software using this
new approach?
24
Outline
  • Motivation and Goals
  • Introduction
  • Core Primitives
  • Parallel Software Design
  • Case Studies
  • Concluding remarks

25
Case Studies
  • Experimental Setup

26
Case Studies
  • STORM (Scalable TOol for Resource Management)
  • Architecture
  • Set of dæmons running on the management/compute
    nodes
  • Built atop the three core primitives
  • BSP-like behavior management activities are
    synchronized and scheduled every few hundreds of
    microseconds
  • Functionality
  • Job Launching
  • Job Scheduling (FCFS, gang scheduling and others)
  • New scheduling algorithms can be plugged in
  • Resource Accounting

27
Case Studies
  • Job Launching send/execute/check for completion
  • 40 times faster than the best reported
    result!!!

28
Case Studies
  • BCS-MPI (Buffered CoScheduled MPI)
  • Architecture
  • Set of cooperative threads running in the NIC
  • Built atop the three core primitives
  • BSP-like behavior communications are
    synchronized and scheduled every few hundreds of
    microseconds
  • Functionality
  • Subset of the MPI standard
  • Paves the way to provide
  • Traffic segregation
  • Deterministic replay of user applications
  • System-level fault tolerance

29
Case Studies
  • SWEEP3D and SAGE Performance (IA32)
  • Production-level MPI versus BCS-MPI

0.5 SPEEDUP
2 SPEEDUP
30
Outline
  • Motivation and Goals
  • Introduction
  • Core Primitives
  • Parallel Software Design
  • Case Studies
  • Concluding remarks

31
Concluding Remarks
  • Methodology for designing parallel software
  • Coordination of all system and application
    software activities in a BSP-like fashion
  • Parallel applications and system software built
    atop a basic set of collective primitives for
    global coordination
  • Backbone to integrate all nodes into a parallel
    OS
  • Promising preliminary results demonstrate that
    this approach is indeed feasible

32
Future Work
  • Kernel-level implementation
  • User-level solution is already working
  • Deterministic replay of MPI programs
  • Ordered resource scheduling may enforce
    reproducibility
  • Transparent fault tolerance
  • Global coordination simplifies the state of the
    machine

33
Designing Parallel Operating Systemsvia Parallel
Programming
Eitan Frachtenberg1, Kei Davis1, Fabrizio
Petrini1, Juan Fernández1,2 and José Carlos
Sancho1 1Performance and Architecture Lab
(PAL) 2Grupo de Arquitectura y Computación
Paralelas (GACOP) CCS-3 Modeling, Algorithms
and Informatics Dpto. Ingeniería y Tecnología
de Computadores Los Alamos National Laboratory,
NM 87545, USA Universidad de Murcia, 30071
Murcia, SPAIN URL http//www.c3.lanl.gov
URL http//www.ditec.um.es
emailjuanf_at_um.es
34
Parallel Software Design
  • Using the core primitives

35
Case Studies
  • Job Scheduling gang scheduling
  • Very small time slices RESPONSIVENESS !!!

36
Toward a Parallel OS
  • BCS-MPI real-time communication scheduling
  • Global Strobe
  • (time slice starts)

Exchange of comm requirements
  • Global
  • Synchronization

Communication scheduling
Time Slice (hundreds of ?s)
  • Global
  • Synchronization

Real transmission
  • Global Strobe
  • (time slice ends)

37
Toward a Parallel OS
  • BCS-MPI real-time communication scheduling
Write a Comment
User Comments (0)
About PowerShow.com