A Scalable FPGA-based Multiprocessor for Molecular Dynamics Simulation - PowerPoint PPT Presentation

About This Presentation

Title:

A Scalable FPGA-based Multiprocessor for Molecular Dynamics Simulation

Description:

Hybrid network of CPU and FPGA hardware. FPGA acts as ... Intensive tasks replaced with hardware engines. MPE handles communication for hardware engines ... – PowerPoint PPT presentation

Number of Views:44

Avg rating:3.0/5.0

Slides: 33

Provided by: arunpatela

Learn more at: https://www.eecg.toronto.edu

Category:

more less

Transcript and Presenter's Notes

Title: A Scalable FPGA-based Multiprocessor for Molecular Dynamics Simulation

1
A Scalable FPGA-based Multiprocessorfor
Molecular Dynamics Simulation

Arun Patel1, Christopher A. Madill2,3, Manuel
Saldaña1, Christopher Comis1, Régis Pomès2,3,
Paul Chow1

1 Department of Electrical and Computer
Engineering, University of Toronto2 Department
of Structural Biology and Biochemistry, The
Hospital for Sick Children 3 Department of
Biochemistry, University of Toronto
Presented By Arun Patel apatel_at_eecg.toronto.edu
Connections 2006 The University of Toronto ECE
Graduate Symposium Toronto, Ontario, Canada June
9th, 2006
2
Introduction

FPGAs can accelerate many computing tasks by up
to 2 or 3 orders of magnitude
Supercomputers and computing clusters have been
designed to improve computing performance
Our work focuses on developing a computing
cluster based on a scalable network of FPGAs
Initial design will be tailored for performing
Molecular Dynamics simulations

3
Molecular Dynamics

Combines empirical force calculations with
Newtons equations of motion
Predicts the time trajectory of small atomic
systems
Computationally demanding

Calculate interatomic forces
Calculate the net force
Integrate Newtonian equations of motion

4
Molecular Dynamics

Combines empirical force calculations with
Newtons equations of motion
Predicts the time trajectory of small atomic
systems
Computationally demanding

Calculate interatomic forces
Calculate the net force
Integrate Newtonian equations of motion

5
Molecular Dynamics

Combines empirical force calculations with
Newtons equations of motion
Predicts the time trajectory of small atomic
systems
Computationally demanding

Calculate interatomic forces
Calculate the net force
Integrate Newtonian equations of motion

6
Molecular Dynamics

Combines empirical force calculations with
Newtons equations of motion
Predicts the time trajectory of small atomic
systems
Computationally demanding

Calculate interatomic forces
Calculate the net force
Integrate Newtonian equations of motion

7
Molecular Dynamics

U

8
Why Molecular Dynamics?
1. Inherently Parallelizable
2. Computationally Demanding
9
Motivation for Architecture

Majority of hardware accelerators achieve
102-103x improvement over S/W by
Pipelining a serially-executed algorithm - or
-
Performing operations in parallel
Such techniques do not address large-scale
computing applications (such as MD)
Much greater speedups are required (104-105)
Not likely with a single hardware accelerator
Ideal solution for large-scale computing?
Scalability of modern HPC platforms
Performance of hardware acceleration

10
The TMD Machine

An investigation of a FPGA-based architecture
Designed for applications that exhibit high
compute-to-communication ratio
Made possible by integration of microprocessors,
high-speed communication interfaces into modern
FPGA packages

11
Inter-Task Communication

Based on Message Passing Interface (MPI)
Popular message-passing standard for distributed
applications
Implementations available for virtually every HPC
platform
TMD-MPI
Subset of MPI standard developed for TMD
architecture
Software library for tasks implemented on
embedded microprocessors
Hardware Message Passing Engine (MPE) for
hardware computing tasks

12
MD Software Implementation
Force Engine
Force Engine
Force Engine
Force Engine
mpiCC

Design Flow
Testing and validation
Parallel design
Software to hardware transition

Interconnection Network
13
Current Work

Replace software processes with hardware
computing engines

Atom Store
Atom Store
Atom Store
Atom Store
Force Engine
Force Engine
Force Engine
Force Engine
Force Engine C ? HDL TMD-MPE Synthesis
Atom Store TMD-MPI ppc-g
PPC-405
PPC-405
XC2VP100
XC2VP100
14
Acknowledgements
Past Members
TMD Group
David Chui Christopher Comis Sam Lee
Dr. Régis Pomès Christopher Madill Arun
Patel Lesley Shannon
Dr. Paul Chow Andrew House Daniel Nunes Manuel
Saldaña Emanuel Ramalho
15
Large-Scale Computing Solutions

Class 1 Machines
Supercomputers or clusters of workstations
10-105 interconnected CPUs

16
Large-Scale Computing Solutions

Class 1 Machines
Supercomputers or clusters of workstations
10-105 interconnected CPUs
Class 2 Machines
Hybrid network of CPU and FPGA hardware
FPGA acts as external co-processor to CPU
Programming model still evolving

17
Large-Scale Computing Solutions

Class 1 Machines
Supercomputers or clusters of workstations
10-105 interconnected CPUs
Class 2 Machines
Hybrid network of CPU and FPGA hardware
FPGA acts as external co-processor to CPU
Programming model still evolving
Class 3 Machines
Network of FPGA-based computing nodes
Recent area of academic and industrial focus

Interconnection Network
18
TMD Communication Infrastructure

Tier 1 Intra-FPGA Communication
Point-to-Point FIFOs are used as communication
channels
Asynchronous FIFOs isolate clock domains
Application-specific network topologies can be
defined

19
TMD Communication Infrastructure

Tier 1 Intra-FPGA Communication
Point-to-Point FIFOs are used as communication
channels
Asynchronous FIFOs isolate clock domains
Application-specific network topologies can be
defined
Tier 2 Inter-FPGA Communication
Multi-gigabit serial transceivers used for
inter-FPGA communication
Fully-interconnected network topology using
2N(N-1) pairs of traces

20
TMD Communication Infrastructure

Tier 1 Intra-FPGA Communication
Point-to-Point FIFOs are used as communication
channels
Asynchronous FIFOs isolate clock domains
Application-specific network topologies can be
defined
Tier 2 Inter-FPGA Communication
Multi-gigabit serial transceivers used for
inter-FPGA communication
Fully-interconnected network topology using
2N(N-1) pairs of traces
Tier 3 Inter-Cluster Communication
Commercially-available switches interconnect
cluster PCBs
Built-in features for large-scale computing
fault-tolerance, scalability

21
TMD Computing Tasks (1/2)

Computing Tasks
Applications are defined as collection of
computing tasks
Tasks communicate by passing messages
Task Implementation Flexibility
Software processes executing on embedded
microprocessors
Dedicated hardware computing engines

Task
Class 3
Class 1
ComputingEngine
Embedded Microprocessor
Processor on CPU Node
22
TMD Computing Tasks (2/2)

Computing Task Granularity
Tasks can vary in size and complexity
Not restricted to one task per FPGA

FPGAs
Tasks
C
A
B
D
E
F
G
H
I
J
K
L
M
23
TMD-MPI Software Implementation
Layer 4 MPI Interface All MPI functions
implemented in TMD-MPI that are available to the
application.
Application
MPI Application Interface
Point-to-Point MPI Functions
Send/Receive Implementation
FSL Hardware Interface
Hardware
24
TMD-MPI Software Implementation
Layer 4 MPI Interface All MPI functions
implemented in TMD-MPI that are available to the
application.
Layer 3 Collective Operations Barrier
synchronization, data gathering and message
broadcasts.
25
TMD-MPI Software Implementation
Layer 4 MPI Interface All MPI functions
implemented in TMD-MPI that are available to the
application.
Layer 3 Collective Operations Barrier
synchronization, data gathering and message
broadcasts.
Layer 2 Communication Primitives MPI_Send and
MPI_Recv methods are used to transmit data
between processes.
26
TMD-MPI Software Implementation
Layer 4 MPI Interface All MPI functions
implemented in TMD-MPI that are available to the
application.
Layer 3 Collective Operations Barrier
synchronization, data gathering and message
broadcasts.
Layer 2 Communication Primitives MPI_Send and
MPI_Recv methods are used to transmit data
between processes.
Layer 1 Hardware Interface Low level methods to
communicate with FSLs for both on and off-chip
communication.
27
TMD Application Design Flow

Step 1 Application Prototyping
Software prototype of application developed
Profiling identifies compute-intensive routines

Application Prototype
28
TMD Application Design Flow

Step 1 Application Prototyping
Software prototype of application developed
Profiling identifies compute-intensive routines
Step 2 Application Refinement
Partitioning into tasks communicating using MPI
Each task emulates a computing engine
Communication patterns analyzed to determine
network topology

Application Prototype
Process A
Process B
Process C
29
TMD Application Design Flow

Step 1 Application Prototyping
Software prototype of application developed
Profiling identifies compute-intensive routines
Step 2 Application Refinement
Partitioning into tasks communicating using MPI
Each task emulates a computing engine
Communication patterns analyzed to determine
network topology
Step 3 TMD Prototyping
Tasks are ported to soft-processors on TMD
Software refined to utilize TMD-MPI library
On-chip communication network verified

Application Prototype
Process A
Process B
Process C
A
B
C
30
TMD Application Design Flow

Step 1 Application Prototyping
Software prototype of application developed
Profiling identifies compute-intensive routines
Step 2 Application Refinement
Partitioning into tasks communicating using MPI
Each task emulates a computing engine
Communication patterns analyzed to determine
network topology
Step 3 TMD Prototyping
Tasks are ported to soft-processors on TMD
Software refined to utilize TMD-MPI library
On-chip communication network verified
Step 4 TMD Optimization
Intensive tasks replaced with hardware engines
MPE handles communication for hardware engines

Application Prototype
Process A
Process B
Process C
A
B
C
B
31
Future Work Phase 2
TMD Version 2 Prototype
32
Future Work Phase 3
The final TMD architecture will contain a
hierarchical network of FPGA chips

Write a Comment

User Comments (0)