High Performance Computing and the FLAME Framework - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

High Performance Computing and the FLAME Framework

Description:

distribution of message boards data load. Agents only communicate via MBs ... Message Synchronisation: Synchronisation of boards involves the propagation of ... – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 33
Provided by: esse
Category:

less

Transcript and Presenter's Notes

Title: High Performance Computing and the FLAME Framework


1
High Performance Computing and the FLAME Framework
  • Prof C Greenough, LS Chin and Dr DJ Worth
  • STFC Rutherford Appleton Laboratory
  • Prof M Holcombe and Dr S Coakley
  • Computer Science, Sheffield University

2
Why High Performance Computing?
  • Application can not be run on a conventional
    computing system
  • Insufficient memory
  • Insufficient compute power
  • High Performance Computing (HPC) generally now
    means
  • Large multi-processor system
  • Complex communications hardware
  • Specialised attached processors
  • GRID/Cloud computing

3
Issues in High Performance Computing
  • Parallel system are in constant development
  • Their hardware architectures are ever changing
  • simple distributed memory on multiple processors
  • share memory between multiple processors
  • hybrid systems
  • clusters of share memory multiple processors
  • clusters of multi-core systems
  • the processors often have a multi-level cache
    system

4
Issues in High Performance Computing
  • Most have high speed multi-level communication
    switches
  • GRID architectures are now being used for very
    large simulations
  • many large high-performance systems
  • loosely coupled together over the internet
  • Performance can be improved by optimising to a
    specific architecture
  • Can very easily become architecture dependent

5
The FLAME Framework
6
Characteristics of FLAME
  • Based on X-Machines
  • Agents
  • Have memory
  • Have states
  • Communicate through messages
  • Structure of Application
  • Embedded in XML and C-code
  • Application generation driven by state graph
  • Agent communication managed by library

7
Characteristics of FLAME
  • The Data Load
  • Size of agents internal memory
  • The number of size of message boards
  • The Computational Load
  • Work performed in any state change
  • Any I/O performed
  • FLAME Framework
  • Programme generator (serial/parallel)
  • Provides control of states
  • Provide communications network

8
Initial Parallel Implementation
  • Based on
  • the distribution of agents computational load
  • distribution of message boards data load
  • Agents only communicate via MBs
  • Cross-node message information is made available
    to agents by message board synchronisation
  • Communication between nodes are minimised
  • Halo regions
  • Message filtering

9
Geometric Partitioning
Processors Pi
halos
radius
10
Parallelism in FLAME
11
Issues with HPC and FLAME
  • Parallelism is hidden in the XML model and the
    C-code this is in term of agent locality or
    groupings
  • Communications captured in XML
  • In agent function descriptions
  • In message descriptions
  • The States are the computational load
  • weight not known until run time
  • could be fine or course grained
  • Initial distribution based on a static analysis
  • Final distributions method be based on dynamic
    behaviour

12
Parallelism in FLAME
Parallel agents grouped on parallel
nodes. Messages synchronised Message board
library allows both serial and parallel versions
to work
Implementation details hidden from
modellers System automatically manages the
simulation
13
Message Boards
  • Decoupled from the FLAME framework
  • Well defined Application Program Interface (API)
  • Includes functions for creating, deleting,
    managing and accessing information on the Message
    Boards
  • Details such as internal data representations,
    memory management and communication strategies
    are hidden
  • Uses multi-threading for work and communications

14
FLAME the Message Boards
15
Message Board API
  • MB Management
  • create, delete, add message, clear board
  • Access to message information (iterators)
  • plain, filtered, sorted, randomise
  • MB Synchronisation
  • moving information between nodes
  • full data replication very expensive
  • filtered information using tagging
  • overlapped with computation

16
The MB Environment
  • Message Board Management
  • MB_Env_Init - Initialises MB environment
  • MB_Env_Finalise - Finalises the MB environment
  • MB_Create - Creates a new Message Board object
  • MB_AddMessage - Adds a message to a Message Board
  • MB_Clear - Clears a Message Board
  • MB_Delete - Deletes a Message Board

17
The Message Board API (2)
  • Message Selection Reading - Iterators
  • MB_Iterator_Create - Creates an iterator
  • MB_Iterator_CreateSorted - Create a sorted
    iterator
  • MB_Iterator_CreateFiltered - Create a filtered
    iterator
  • MB_Iterator_Delete - Deletes an iterator
  • MB_Iterator_Rewind - Rewinds an iterator
  • MB_Iterator_Randomise - Randomises an Iterator
  • MB_Iterator_GetMessage - Returns next message

18
The Message Board API (3)
  • Message Synchronisation Synchronisation of
    boards involves the propagation of message data
    out across the processing nodes as required by
    the agents on each node
  • MB_SyncStart - Synchronises a message board
  • MB_SyncTest - Tests for synchronisation
    completion
  • MB_SyncComplete - Completes the synchronisation

19
The Message Board API (4)
  • MB Sychronisation
  • The simplest form is full replication of message
    data - very expensive in communication and memory
  • The MB uses message tagging to reduce the volume
    of data being transferred and stored
  • Tagging uses message FILTERs to select message
    information to be transferred
  • FILTERs are specified in the Model File XMML

20
The Message Board API (5)
  • Selection based on filters
  • Filters defined in XMML
  • Filters can be used
  • in creating iterators to reduce local message
    list
  • during synchronisation to minimise cross-node
    communications

21
MB Iterators (1)
  • Iterators objects used for traversing Message
    Board content. They provide users access to
    messages while isolating them from the internal
    data representation of Boards.
  • Creating an Iterator generates a list of the
    available messages within the Board against a
    specific criteria. This is a snapshot of the
    content of a local Board.

22
MB Iterators (2)
23
Porting to Parallel Platforms
  • FLAME has been successfully ported to the to
    various HPC systems
  • SCARF 360x2.2 GHz AMD Opteron cores, 1.3TB
    total memory
  • HAPU 128x2.4 GHz Opteron cores, 2GB memory /
    core
  • NW-Grid 384x2.4 GHz Opteron cores, 2 or 4 GB
    memory/core
  • HPCx 2560x1.5GHz Power5 cores, 2GB memory /
    core
  • Legion (Blue Gene/P) 1026xPowerPC 850 MHz 4096
    cores
  • Leviathan (UNIBI) 3xIntel Xeon E5355 (Quad
    Core), 24 cores

24
Test Models
  • Circles Model
  • Very simple agents
  • all have position data
  • x,y,fx,fy,radius in memory
  • Repulsion from neighbours
  • 1message type
  • Domain decomposition
  • C_at_S Model
  • Mix of agents Malls, Firms, People
  • A mixture of state complexities
  • All have position data
  • Agents have range of influence
  • 9 message types
  • Domain decomposition

25
Circles Model
26
C_at_S Model
27
Bielefeld Model
28
Dynamic Load Balancing
  • Work only just started
  • Goal to move agents between compute nodes
  • reduce overall elapsed time
  • increase parallel efficiency
  • There is an interaction between computational
    efficiency and overall elapsed time
  • The requirements of communications and load may
    conflict!

29
Balance - Load vs. Communication
P1
P2
  • Distribution 1
  • P1 13 agents
  • P2 3 agents
  • P2 lt--gt P1 1 channel
  • Distribution 2
  • P1 9 agents
  • P2 7 agents
  • P1 lt--gt P2 6 channels

Distribution A
Frequent
Occasional
Distribution B
30
Moving Wrong Agents
  • Moving wrong agents could increase elapsed time

31
HPC Issues in CLIMACE
  • Size of agent population
  • Granularity of agents
  • is there are large computational load
  • How often do they communicate
  • Inherent parallelism (locality) in model
  • Are the agents in groups
  • Do they have short range communication
  • Size of initial data
  • Size of outputs

32
HCP Challenges for ABM
  • Effect initial static distributions
  • Effect dynamic agent migration algorithms
  • Sophisticated communication strategies
  • To reduce the number of communications
  • To reduce synchronisations
  • To reduce communication volumes
  • Pre-tagging information to allow pre-fetching
  • Overlapping of computation with communications
  • Efficient use of multi-code nodes on large
    systems
  • Efficient use of attached processors
Write a Comment
User Comments (0)
About PowerShow.com