BigBen @ PSC - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

BigBen @ PSC

Description:

Title: Portals Direct I/O (PDIO) Author: Nathan Stone Last modified by: Nathan Stone Created Date: 9/28/2005 1:15:29 AM Document presentation format – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 16
Provided by: Nathan375
Learn more at: http://www.lcse.umn.edu
Category:

less

Transcript and Presenter's Notes

Title: BigBen @ PSC


1
BigBen _at_ PSC
2
BigBen _at_ PSC
3
BigBen _at_ PSC
4
BigBen Features
  • Compute Nodes
  • 2068 nodes running Catamount (QK) microkernel
  • Seastar interconnect in a 3-D torus configuration
  • No external connectivity (no TCP)
  • All Inter-node communication is over Portals
  • Applications use MPI which is based on Portals
  • Service I/O Nodes (SIO) Nodes
  • 22 nodes running Suse Linux
  • Also on the Seastar interconnect
  • SIO nodes can have PCI-X hardware installed,
    defining unique roles for each
  • 2 SIO nodes are externally connected to ETF with
    10GigE cards (currently)

5
Portals Direct I/O (PDIO) Details
  • Portals-to-TCP routing
  • PDIO daemons aggregate hundreds of portals data
    streams into a configurable number of outgoing
    TCP streams
  • Heterogenous portals (both QK Linux nodes)
  • Explicit Parallelism
  • Configurable of Portals receivers (on SIO
    nodes)
  • Distributed across multiple 10GigE-connectedServi
    ce I/O (SIO) nodes
  • Corresponding of TCP streams (to the WAN)
  • one per PDIO daemon
  • A Parallel TCP receiver in the Goodhue booth
  • Supports a variable/dynamic number of connections

6
Portals Direct I/O (PDIO) Details
  • Utilizing the ETF network
  • 10GigE end-to-end
  • Benchmarked gt1Gbps in testing
  • Inherent flow-control feedback to application
  • Aggregation protocol allows TCP transmission or
    even remote file system performance to throttle
    the data streams coming out of the application
    (!)
  • Variable message sizes and file metadata
    supported
  • Multi-threaded ring buffer in the PDIO daemon
  • Allows the Portals receiver, TCP sender, and
    computation to proceed asynchronously

7
Portals Direct I/O (PDIO) Config
  • User-configurable/tunable parameters
  • Network targets
  • Can be different for each job
  • Number of streams
  • Can be tuned for optimal host/network utilization
  • TCP network buffer size
  • Can be tuned for maximum throughput over the WAN
  • Ring buffer size/length
  • Controls total memory utilization of PDIO daemons
  • Number of portals writers
  • Can be any subset of the running applications
    processes
  • Remote filename(s)
  • File metadata are propagated through the full
    chain, per write

8
HPC resource and renderer waiting
Compute Nodes
ETF network
Steering
I/O Nodes
iGRID
PSC
9
Launch PPM job, PDIO daemons, and
iGRID recvers
Compute Nodes
recv
recv
recv
ETF network
pdiod
pdiod
pdiod
pdiod
pdiod
pdiod
Steering
I/O Nodes
iGRID
PSC
10
Aggregate data via Portals
Compute Nodes
recv
recv
recv
ETF network
pdiod
pdiod
pdiod
pdiod
pdiod
pdiod
Steering
I/O Nodes
iGRID
PSC
11
Route traffic to ETF net
Compute Nodes
recv
recv
recv
ETF network
pdiod
pdiod
pdiod
pdiod
pdiod
pdiod
Steering
I/O Nodes
iGRID
PSC
12
Recv data _at_ iGRID
Compute Nodes
recv
recv
recv
ETF network
pdiod
pdiod
pdiod
pdiod
pdiod
pdiod
Steering
I/O Nodes
iGRID
PSC
13
Render real-time data
Compute Nodes
recv
recv
recv
ETF network
pdiod
pdiod
pdiod
pdiod
pdiod
pdiod
Steering
I/O Nodes
iGRID
PSC
14
Send steering data back to active job
Compute Nodes
recv
recv
input
recv
ETF network
pdiod
pdiod
pdiod
pdiod
pdiod
pdiod
Steering
I/O Nodes
iGRID
PSC
15
Dynamically update rendering
Compute Nodes
recv
recv
input
recv
ETF network
pdiod
pdiod
pdiod
pdiod
pdiod
pdiod
Steering
I/O Nodes
iGRID
PSC
Write a Comment
User Comments (0)
About PowerShow.com