A Functional Network Simulator for IBM Cell based Multiprocessor Systems - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

A Functional Network Simulator for IBM Cell based Multiprocessor Systems

Description:

Configure Mambo. Configuration Information for all simulator ... Use Mambo to simulate the Cell processor. Use TUN devices to simulate general network of Cells ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 29
Provided by: jennife290
Category:

less

Transcript and Presenter's Notes

Title: A Functional Network Simulator for IBM Cell based Multiprocessor Systems


1
A Functional Network Simulator for IBM Cell based
Multiprocessor Systems
  • Presented By
  • Vishakha Gupta
  • Advised By
  • Prof. Sudhakar YalamanchiliSchool of Electrical
    and Computer Engineering
  • Website
  • http//www.cc.gatech.edu/vishakha/projects.php

2
Agenda
  • Cell Broadband Engine (CBE) architecture
  • Motivation
  • Design of the Multi-Cell Simulator (MCS)
  • Programming Model
  • Execution Model
  • API
  • Implementation
  • Benchmarks
  • Analysis of benchmark performance
  • Conclusion

3
CBE Architecture
4
CBE Architecture - Overview
  • 64bit Power architecture forms the foundation
  • Dual thread Power Processor Element (PPE)
  • In-order two issue superscalar design
  • Support for simultaneous (up to 2) multithreading
  • Eight Synergistic Processor Elements (SPEs)
  • Based on the SIMD-RISC instruction set
  • 128-entry 128 bit unified register file for all
    data types

5
CBE Architecture Overview 2
  • On-chip Rambus XDR controller with support for
    two banks of Rambus XDR memory
  • Cell processor production die has 235m
    transistors and is 235mm2
  • Excludes networking peripherals or large memory
    arrays on chip
  • Reaches high performance due to high clock speed
    and high-performance XDR DRAM interface

6
CBE Architecture Memory Model
  • Power core
  • 32K 2-way instruction cache and 32 K 4-way set
    associative data cache
  • 256KB local store on SPE, 6 cycle load latency
  • Software must manage data in and out of local
    store
  • Controlled by the memory flow controller
  • Does not participate in hardware cache coherency
  • Aliased in the memory map of the processor
  • PPE can load and store from a memory location
    mapped to the local store (slow)
  • SPE can use the DMA controller to move data to
    its own or other SPEs local store
  • Memory flow controller on SPE can begin to
    transfer the data set of the next task as present
    one is running Double Buffering

7
Multi-Cell Simulator - Motivation
  • Cell architecture suitable for advanced
    visualization, streaming and scientific kind of
    applications
  • Example of heterogeneous multi-core architecture
    talk of the future
  • Feasibility of generating and running parallel
    code on multiple interconnected Cell processors
  • See Roadrunner (Supercomputer being built at LANL
    with 64k AMD Opteron and 16K IBM Cell
    processors)!
  • Great advantage to various research groups like
    compilers
  • Simulate different programming techniques
  • Test their effectiveness on these heterogeneous
    architectures
  • Adapt parallel computing world to work the
    heterogeneous multi-core way

8
Design Goals
  • Ease of use by programmers
  • Convenient APIs for faster and more efficient
    parallel programming
  • Performance
  • Less time should be spent in MCS library
    functions
  • Scalability
  • For massively parallel application simulations

9
Implementation Goals
  • Extensibility
  • Ease of plugging in different interconnects and
    programming models
  • Reliability
  • Easy to debug application if middleware can be
    assumed stable
  • More than being just a functional simulator
  • Latency estimations for different interconnects

10
Programming Model
  • Create a platform consisting of n PPEs and
    m SPEs
  • Programmer can write code as if all on one
    machine
  • Point to point communication between different
    elements (PPE/SPE) in the system
  • Group Communication
  • Form group of SPE/PPE/mixed for collective
    communication
  • Broadcast to all or multicast to an existing
    group
  • Communication units between Elements (PPEs/SPEs)
  • Packet Send/receive data in one call
  • Stream Send/receive data at a specified rate or
    split into multiple buffers

11
Execution Model
  • Communication possibilities
  • PPE to local SPE and vice versa
    DMA/mailbox/channels/memory mapped IO
  • PPE to remote PPE Network API
  • PPE to remote SPE
  • PPE to remote PPE responsible for the given SPE
  • PPE to local SPE
  • SPE to remote SPE -
  • Not expected to make MCS library calls directly
    code bloat in SPE local store (likely, yet to
    test)
  • Copy data over to control PPE
  • Same as PPE to remote SPE

12
Execution Model 2
  • Communication combinations
  • Element to Element or group send/receive which
    can be
  • Blocking or non-blocking
  • Reliable or Unreliable
  • Application can request more parallelism by
    specifying number of threads that should handle
    the send/receive
  • In-order delivery or out of order delivery of
    data
  • Programmer can use common APIs for local as well
    as remote communication
  • Location of a PPE or SPE transparent to
    application
  • But local send and receive optimized

13
Platform View
14
MPI-style Communication
15
Design of the Multi-Cell Simulator
Implementation Units
16
Software Stack
MCS/Pooled Accelerator Library
17
Software - Current
Multi-Cell Simulator
Config File NumHosts4 NumPPE6 NumSPE24 .
More Hosts
.
18
API
  • Network
  • Connection establishment
  • Send, Receive, Query, Wait for both point to
    point and group communication
  • Group
  • Create, modify, delete groups of Cell elements
  • Startup and Cleanup
  • Create and remove data structures for the library
  • Timing
  • Synchronize groups
  • Memory
  • Allocate and de-allocate memory for buffers
    needed by the application code

19
Implementation - Setup
  • Use Mambo to simulate the Cell processor
  • Use TUN devices to simulate general network of
    Cells
  • Configure all simulator instances to fall in the
    same subnet
  • Use bridging support with TUN devices for
    automatic message redirection between network of
    Cell processors
  • Enable TUN-Ethernet forwarding
  • Configure routing tables on simulator as well as
    host

20
Implementation - Basics
  • Library of API implementation to be linked with
    the parallel application
  • Code written completely in C
  • Headers contain all the available function
    prototypes
  • Data exchange between local elements (PPE and SPE
    on same simulator instance) through a fast path
  • No need to make socket calls

21
Implementation 2
  • Thread implementation using pthread library
  • Carefully managed thread pools
  • Increase performance while taking care of
    scalability
  • Multiple queues for handling data send, receive,
    re-order
  • Necessary to avoid contention in heavily threaded
    programs

22
Implementation- Current
23
Benchmarks
  • To be filled in soon

24
Results
  • To be filled in soon

25
Analysis of Results
  • To be filled in soon

26
Conclusion
  • To be filled in soon

27
Future Work
  • Implement communication using different
    interconnect than APIs other that for Ethernet
  • Add latency calculation based on interconnect
  • Automate the complete startup of the simulator
    depending on user input
  • Add additional communication models like block
    and pipe for tightly coupled multi-cell systems

28
References
  • 1 Michael Kistler, Michael Perrone,Fabrizio
    Petrini. "CELL MULTIPROCESSOR COMMUNICATION
    NETWORK BUILT FOR SPEED". In IEEE Micro, 26(3),
    May/June 2006
  • 2 Kewin Krewell. "CELL MOVES INTO THE
    LIMELIGHT". Microprocessor 2/14/05-01
  • 3 Maxim Krasnyansky. "Universal TUN/TAP device
    driver". http//www.kernel.org/pub/linux/kernel/pe
    ople/marcelo/linux-2.4/Documentation/networking/tu
    ntap.txt
  • 4 Cell Broadband Engine resource center.
    http//www-128.ibm.com/developerworks/power/cell/
  • 5 H. Peter Hofstee. Introduction to Cell
    Broadband Engine
Write a Comment
User Comments (0)
About PowerShow.com