GPU Cluster for High Performance Computing

About This Presentation

Title:

GPU Cluster for High Performance Computing

Description:

GPGPU = general-purpose computation using GPU. The Computational ... 9 HP Sepia-2A cards (composite) ServerNet (compositing network) Gomputational. GPU cluster ... – PowerPoint PPT presentation

Number of Views:601

Avg rating:3.0/5.0

Slides: 31

Provided by: zhe3

Category:

more less

Transcript and Presenter's Notes

Title: GPU Cluster for High Performance Computing

1
GPU Cluster for High Performance Computing

Zhe Fan, Feng Qiu, Arie Kaufman,
Suzanne Yoakum-Stover
Stony Brook University

2
Outline

Background
GPU graphics processing unit
GPGPU general-purpose computation using GPU
The Computational GPU Cluster
The Lattice Boltzmann Computation
Performance Evaluation
Application for Urban Dispersion Modeling
Conclusion Future Work

3
Background Whats GPU?

Graphics processing units
nVIDIA NV40 ATI R420
In COTS 3D graphics cards
GeForce 6800 Ultra Radeon X800 XT
Modern GPU 600M vertices, 6G pixels / second

4
Background GPU Growth Rate

Driven by booming market for games
Rendering task is embarrassingly parallel. Can
Efficiently use large number of computational
units

5
Background Graphics Pipeline
Vertex Processing
Transform 3D coords to 2D coords
Rasterization
Texture Memory
Framgment Processing
Combine colors to final image
Composite
6
Background GPU Compute Power

High compute parallelism
6 16 pipelines
4D vector fp32 instructions
More than 100 FLOPs/cycle !
Fast on-board memory
Bandwidth 35.2 GB/sec
Size 256 - 512 MB
Low price

6 Vertex Pipelines
Vertex Processing
Rasterization
256-Bit GDDR3
Texture Memory
16 Fragment Pipelines
Framgment Processing
Composite
7
Background GPGPU

Recent development of GPUs
Programmability
High-level language, Cg
32-bit floating point
General-purpose computation using GPU
GPU accelerates certain computations 10 times
Data parallelism
Computational intensive
Relatively simple data structures and control flow

8
Background GPGPU Examples

Scientific computation
Physically-based simulations
Linear algebra, conjugate gradient solver
Level set
Fast Fourier transform
Image and volume processing
Computational geometry
Database
Flow visualization
Global illumination
GPGPU language
Many papers. See http//www.gpgpu.org

9
Motivations

Scale-up to GPU cluster for large-scale problems
Explore the use for high-performance computing
Outperform COTS CPU clusters for certain
computations
Motivated by
PlayStation2 computational cluster at NCSA
Graphics PC clusters
Humphrey et al., 02, Govindaraju et al., 03,
etc

10
Stony Brook Visual Computing Cluster

Gigabit Ethernet
32 HP PCs
64 Pentium Xeon 2.4GHz
32 2GB DDR Memory
32 120GB Hard Disk
32 GeForce FX 5800 Ultra
32 VolumePro 1000
(volume rendering)
9 HP Sepia-2A cards (composite)
ServerNet (compositing network)

Gomputational
GPU cluster
Visualization
cluster

11
Computational GPU Cluster

Gigabit Ethernet
32 HP PCs
64 Pentium Xeon 2.4GHz
32 2GB DDR Memory
32 120GB Hard Disk
32 GeForce FX 5800 Ultra

1,621 x 32 349 x 32 607 x 32 150 x
32
399 x 32
16 GFLOPS x 32 (Fragment
pipelines capability)
Plugging 32 GPUs, theoretical peak has been
increased by 512 GFLOPS at a price of only
12,768. 1 GFLOPS for 25
12
GPU Cluster Architecture

Currently, read from GPU is slow

13
Lattice Boltzmann Model (LBM)

CFD method on the lattice
Numerical calculations
Stream
Collision
Yields Navier-Stokes for incompressible fluid
Greatly flexible in specifying complex boundaries

A cell of the D3Q19 lattice
14
LBM on a Single GPU

Li et al., Visual Computer 03
Program fragment processing stage

15
Store LBM Data in Textures
16
Scale-up LBM

Domain decomposing
Communication
Read out from GPU
Network transfer
Write into GPUs

17
GPU lt-gt PC data transfer

Read data from GPU
Aggregate necessary boundary data together into a
texture
Read them out in a single operation
Write data into GPU
Reverse the above procedure

18
Network Transfer

MPI
To minimize communication cost
Overlap network transfer time with computation
time
Simplify communication pattern

19
To Compare with CPU Cluster Code

Backgrounds
For our CPU cluster, each node use 1 CPU to
compute
The CPU cluster code hasnt used SSE
(SSE is expected to be about 3 times faster)

CPU cluster code
GPU cluster code
Communication
Read from GPU overhead N / A
17 Network transfer time
Fully overlapped Mostly overlapped

with computation with computation
20
Performance Comparison
Scale-up test (Each node computes 803 sub-domain)
21
Performance Comparison (cont.)
22
For Further Improvement

Use newer generation GPUs
Already 2.2 times faster
Use PCI-Express bus
Much faster GPU lt-gt PC communication
4 GB / sec either way
Multiple GPUs on each PC to be feasible soon
Code can be optimized
Faster network

23
Urban Application

Use LBM to simulate airborne contaminant
dispersion in complex urban environments
Simulation and visualization on a single GPU
Qiu et al., Visualization 2004

24
Simulation Area for GPU Cluster
Times Square Area of New York City

1.66 km x 1.13 km
91 blocks
850 buildings

25
Air Flow, Times Square Area, NYC

480 x 400 x 80
On 30 GPUs
0.31 sec/step

26
Dispersion Plume, Volume Rendered
27
Future Work

Other computations
E.g., PDE, FEM
GPUs as co-processors of CPUs
carr et al., 2002, etc
Online-visualization computational steering
Potential advantage of GPU cluster

28
Conclusions

Cluster of commodity GPUs for high-performance
computing
LBM to simulate airborne dispersion in the urban
environment
Compared with the same model on CPU cluster, GPU
cluster is much faster, and better performance is
anticipated
GPU cluster is very promising for scientific
computation

29
Acknowledgements

NSF CCR0306438
Department of Homeland Security _at_ Environment
Measurement Lab
Hewlett Packard
TeraRecon
Reviewers
Bin Zhang, Wei Li, Ye Zhao, Xiaoming Wei, Klaus
Mueller, Brent Lindquist

30
Thank You !

Write a Comment

User Comments (0)

About PowerShow.com

GPU Cluster for High Performance Computing - PowerPoint PPT Presentation

GPU Cluster for High Performance Computing

GPGPU = general-purpose computation using GPU. The Computational ... 9 HP Sepia-2A cards (composite) ServerNet (compositing network) Gomputational. GPU cluster ... – PowerPoint PPT presentation