Title: Quadrics Network Overview Yongyi Feng Yuan Fang
1Quadrics Network Overview Yongyi FengYuan
Fang
2Introduction
- The interconnection network and its associated
software libraries and hardware have become
critical components in High Performance
Clustering Technology - Key players in high-speed interconnects
- Gigabit Ethernet (GigE)
- GigaNet
- SCI
- Myrinet
- GSN (HiPPI-6400)
- QsNet
3Introducation2
4Introducation3
5Two novel innovations of QsNet
- The integration of the virtual-address spaces of
individual nodes into a single, global,
virtual-address space - Network fault tolerance that can detect faults
and automatically re-transmit packets
6Quadrics Interconnect (QsNet)
- Two hardware building blocks make up the network
- Quadrics Network Adapter (Elan)
- 2nd Generation - 64bit/66 MHz. PCI Based
- QSW Multi-Stage Network (Elite)
- Modular Design , Fat Tree Topology
- Very Low Latency , High Bandwidth
- Combined to provide high scalability, flexibility
and tolerance
7Network components
8Elan Network Interface
- Primary processing engines
- The microcode processor supports four threads of
execution (input, DMA, processor-scheduling, and
command-processor) - The thread processor is a 32-bit RISC processor
that aids in the implementation of higher-level
messaging libraries without explicit intervention
from the main CPU.
9Elite Switch
- The Elite Switch are interconnected in a
quaternary fat-tree topology, it provides - eight bidirectional links supporting two virtual
channels in each direction - an internal 16x8 full crossbar switch
- a nominal transmission bandwidth of 400MB/s in
each link direction and a flow-through latency of
35 ns - packet error detection and recovery with routing
and data transactions CRC-protected - two priority levels combined with an aging
mechanism to ensure fair delivery of packets in
the same priority level - hardware support for broadcasts, and
- adaptive routing
10Global Virtual Memory
- Virtual operation The Elan can transfer
information directly between the address spaces
of groups of cooperating processes while
maintaining hardware protection between the
process groups - Virtual operation is based on two concepts
- 1. The Elan virtual memory
- 2. The Elan Context
11Elan Virtual Memory
- The Elan MMU translates the virtual memory
addresses issued by the various on-chip
functional units into physical addresses - Configuration tables of the Elan MMU are
synchronized with the main processors MMU tables
so that the virtual address space can be accessed
by the Elan - The MMU in the Elan can translate between virtual
addresses written in the format of the main
processor and virtual addresses written in the
Elan format
12Virtual Address Translation
- The 64-bit addresses are mapped to Elans 32 bit
addresses. - This means that virtual addresses can be accessed
directly by the main processor while the Elan can
access the same memory by using its own addresses.
13Elan Context
- The Elan replaces the PID value with a context
value. - The context value also determines which remote
processes can access the address space via the
Elan network and where those processes reside.
14Fault Detection Fault Tolerance
- QsNet implements network fault detection and
tolerance in hardware. - If an Elan detects an error during the
transmission of a packet over QsNet, it
immediately sends out an error message without
waiting for a PA token to be received. -
- If an Elite detects an error, it automatically
transmits a error message back to the source and
the destination. During this process, the faulty
link and/or switch is isolated via per-hop fault
detection
15Programming
16Programmable Libraries Hierarchy
17- The Elan and Elan3 libraries are the programming
interface to the Elan3 communications processor
(referred to as the Elan). - The Elan3 programming library, libelan3, is the
lowest level programming library.It is intended
for use by systems and user library programmers
who want to make maximum use of the Elan
communications processor. - The Elan programming library, libelan, is
designed for use by systems programmers who want
to implement higher level message passing
interfaces, such as MPI or shmem. This library
frees the programmer from the low level, revision
dependent details of the Elan while still giving
access to the high performance communications
functionality. - The Elan library builds on the functions of the
Elan3 library and provides a foundation for
higher level message passing libraries, such as
MPI.
18- Elanlib is a higher-level interface that frees
the programmer from the revision-dependent
details of the Elan and extends Elan3lib with
point-to-point, tagged message-passing primitives
(called Tagged Message Ports or Tports).
19Elanlib and TPort
- Elanlib is a machine-independent library that
integrates the main features of Elan3lib with
Tports. - Tports provide basic mechanisms for
point-to-point message passing. - Senders can label each message with a tag, sender
identity, and message size. This information is
known as the envelope. Receivers can receive
their messages selectively, filtering them
according to the senders identity and/or a tag
on the envelope. - The Tports layer handles communication via shared
memory for processes on the same node. - The Tports programming interface is very similar
to that of MPI.
20What does elan give us
- The ELAN system provides an environment for
specifying and prototyping deduction systems in a
language based on rewrite rules controlled by
strategies. - The distribution includes
- An interpreter of the language,
- A very efficient compiler (runs upto 15 million
rules per second), - User's manual and library documentation,
- Examples from tiny to very large ones.
21How to use Elan
- The current version of ELAN includes an
interpreter and a compiler written in C and
JAVA. - Both can interact via an exchange format (REF,
for Reduced ELAN Format) which is a term
representation of ELAN programs. - This format appears to be a convenient way for to
transform ELAN programs making use of ELAN
itself, and is the key for the implementation of
a reflection mechanism in ELAN.
22Architecture of the Elan system
23How to use Elan
- Preliminaries
- Assuming that everything is installed in your
home directory (HOME), - add (HOME)/elan/bin and (HOME)/elan/bin/uname
-m to environment variable PATH - set the environment variable ELANLIB to
(HOME)/elan/elanlib
24How to use Elan
- 2. How to run the ELAN interpreter
- typing "elan"
- Elan options lgi_file spc_file
- Options
- --dump, -d dump of signature, strategies and
rules - --trace, -t num tracing of execution
(default max) - --statistic, -s/-S statistics short/long
- --warningsoff, -w suppress warning messages of
the parser - --quiet, -q quiet regime of execution
- --batch, -b batch regime (no messages at all)
- --elanlib elanlib
- --secondlib, -l lib second elan library (by
default ..) - --command, -C command language
- --export, --cexport export to a .ref file
- --import import from a .ref file
25How to use Elan
- 3. How to run the ELAN compiler
- typing "elanc"
-
- elanc elan optionscompiler options
file.lgi file.spc - Options
- -output
- -verbose
- -nocode
- -nosplit default
- -split
- -quiet
- -debug
- -noOptimiseChoicePoint default
- -optimiseChoicePoint
- -onlyC
- -noC
- -warning
- -O
26How to use Elan
- 4. How to compile the program maked with C
language - To use the functions in the Elan library,
programs must include the header file elan.h. - The library functions reference header les which
are, by default, installed in the directory
/usr/opt/rms/include. - Programs must be linked with libelan. An example
command line to compile a program prog.c is shown
here. - cc -o prog -I/usr/opt/rms/include -lelan
27Questions ?
28Example tping
- The header les and variables used by the program
are shown here. The variables are declared in
main. - include
- include
- include
- include
- include
- include
- include
- include
- int main(int argc, char argv)
- double t
- uint64_t tv2
- ELAN_BASE base
- ELAN_TPORT p
- ELAN_QUEUE q
- int tag 0x69
- char rbuf
- char tbuf
- int reps 100000
29Example tping n e h
- Argument Checking
- int main(int argc, char argv)
- ...
- for (progName argv0 strlen (argv0)
progName argv0 (progName - 1) !
/progName--) - while ((c getopt (argc, argv, "neh")) ! -1)
- switch (c)
- case n
- if ((reps getSize (optarg))
- usage (progName)
- break
- case edoprintbreak
- case hhelp (progName)
- defaultusage (progName)
-
- if (optind argc)
- minNob 0
- else if ((minNob getSize (argvoptind))
- usage (progName)
- if (optind argc)
30Example tping
- This section of main is concerned with
initializing the process to use the Elan library
and setting up a tagged message port. - int main(int argc, char argv)
- ...
- if (!(base elan_baseInit()))
- perror("Failed elan_baseInit()")
- exit(1)
-
- proc base-state-vp
- nproc base-state-nvp
- if (nproc 1)
- exit(1)
- if ((q elan_gallocQueue(base-state,
base-allGroup)) NULL) - perror( "elan_gallocQueue failed" )
- exit (1)
-
- if (!(p elan_tportInit(base-state,q,base-tport
_nslots,base-tport_smallmsg, base-tport_bigmsg,b
ase-waitType, base-retryCount,-shm_key,
base- shm_fifodepth,base-shm_fragsize))) - perror("Failed to to initialise TPORT")
- exit(1)
-
31Questions ?
32Elan Library Function Categories
- The functions in the Elan library, can be grouped
according to the operations they perform. These
groups are - Initialization
- Tagged Message Passing
- Collective Operations
- Queue Operations
- PutGet Operations
- Memory Allocation
- Global Memory Allocation
- Miscellaneous
33Elan Library Function Categories
- Initialization The initialisation functions
prepare for the process to interact with the
network and set up the variables required for
this interaction.
34Elan library function categories
- Tagged Message Passing The tagged message passing
functions make use of ports, known as tagged
message ports, for point-to-point communications.
Both buffered and non-buffered messaging are
supported. Senders and receivers can choose,
independently, whether or not to block waiting
for their messages to be delivered. Messages
carry the identity of the sender (as dened by the
user, not the VPID) plus a user-specied tag.
Receivers can select their messages by sender
and/or by tag.
35Elan library function categories
- Collective Operations
- The collective operations enable a group of
processes to implement a series of communication
and calculation actions more efciently than the
equivalent use of explicit message passing and
calculation functions. The processes can be
synchronised so that all perform their
calculations together.
36Elan library function categories
- Queue Operations
- The Queue operations enable a group of processes
to post short messages to one another using a
queueing mechanism. Each process has one receive
queue which can be posted to by any process.
Processes can either poll or block on the local
queue waiting for messages to arrive. The data
delivered with each Queue request is user denable
and hence can be used to deliver protocol and
data.
37Elan library function categories
- PutGet Operations
- The PutGet operations enable a process to read
from and write directly to the memory space of
it's peers.
38Elan library function categories
- Memory Allocation
- The library provides a set of functions for
dynamic memory management. Memory can be
allocated from either Elan memory or from Main
memory as requested.
39Elan library function categories
- Global Memory Allocation
- The library provides a set of functions for
dynamic global memory management. Global memory
can be allocated from either Elan memory or from
Main memory as requested, over a group of
processes.
40Elan library function categories
- Miscellaneous This group includes functions for
dynamic loading of libraries, for running
threads, for verifying the version of the Elan
library in use and for inspecting the state of
event variables.
41Elan3lib library function categories
- The functions in the Elan3 library, can be
grouped according to the operations they perform.
These groups are - Initialisation
- Communications
- DMA Operations
- Event Handling
- Thread Processes
- Statistics
- Memory Allocation
- Utility
42Elan3lib library function categories
- Initialisation The initialization functions
establish a context for the process so that it
can use the Elan
43Elan3lib library function categories
- Communications
- The communications functions prepare the process
to interact with the network and set up the
capabilities required for processes to
communicate with each other.
44Elan3lib library function categories
- DMA Operations
- The DMA functions enable processes to read and
write each others memory directly.
45Elan3lib library function categories
- Event Handling The event handling functions
provide the mechanism for processes to
synchronize their actions, either by blocking,
busy waiting or polling. - Thread Processes
- User programs can execute threads which run
independently on the Elan. The library includes
functions for starting these threads.
46Elan3lib library function categories
- Statistics A wide variety of statistics on the
performance of the Elan are available to the
programmer. There is a group of functions for
collecting these statistics. - Memory Allocation
- The library provides a set of functions for
memory allocation. Memory can be allocated from
either Elan memory or from Main memory as
requested.
47Elan3lib library function categories
- Utility The utility functions give programmers
access to the nanosecond timer which can be used
to get accurate timings of program execution. In
addition, there are a number of functions which
perform useful conversions, such as converting
Elan virtual addresses to main virtual addresses.
Another group of functions simplies copying data
between different platforms while using the PCI
bus efciently.
48Questions ?
49Elan Exceptions
- If a process in a parallel program performs an
illegal inter-process comms operation the Elan
hardware will generate an exception. and the Elan
driver will send a SIGSEGV to the process. The
RMS core le analysis script runs a utility called
edb to extract information on Elan exceptions.
50Elan Exceptions
- Exceptions are generated under the following
circumstances - Invalid process id
- Invalid address
- Invalid thread address
- Queue overflow
- Bad event
- Unimplemented instruction
- Too many thread instructions
- Uncorrected network error
51Description of Exceptions
- Invalid Process Id
- An exception is generated if a process attempts
to communicate using an invalid process id. In
this case the process was trying to send 8 bytes
of data to process 4 in a 2 process program. - pestilence0 prun -N2 myprog
- ...
- prun dumping elan exception state for ./shmem
- Dma - Invalid Process Process4 Res22
- WakeupFnt4 Cntx9a SuspAddr83
TrapTypeMI_DmaLoop - FaultAddr4 EventAddr100eead0 FSR2450
- 0 FaultAddr0 EventAddr0 FSR0
- 1 FaultAddr0 EventAddr0 FSR0
- 2 FaultAddr0 EventAddr0 FSR0
- 3 FaultAddr0 EventAddr0 FSR0
- Type 009a2003 size 00000008 source 0cb0d040 dest
1fffbff8 - Dest event 0cc1e700 cookie/proc 00000004
- Source event 0cb0f440 cookie/proc 00040000
52Description of Exceptions
- Invalid Address
- An exception is generated if a process attempts
to communicate using an invalid address. In this
case the process was trying to send 8 bytes of
data to an unmapped stack address. - pestilence0 prun -N2 myprog
- ...
- prun dumping elan exception state for ./shmem
- Inputter - Invalid Address Res14
- Fault Area FaultAddr5fffbfe8 EventAddr100ee5d0
FSR66fb State2 Status0 - FaultAddr5fffbfe8 EventAddr100ee5d0 FSR66fb
- NumTransactions4 Overflow0 AckSent0
BadTransaction0 QueuePointer0 - WriteBlock TypeDouble Size 8 Addr5fffbfe8
Cntx0000008d TypeMI_TestForZeroLengthDma ... - DMA Identify Addr00200001 Cntx0000008d
TypeMI_InputDoTrap ... - Setevent Addr0cc1e700 Cntx0000008d
TypeMI_InputDoTrap ... - EOP BADACK Cntx0000008d TypeMI_InputDoTrap ...
53Description of Exceptions
- Command Queue Overflow
- An exception is generated if the user application
issues too many concurrent DMAs causing the Elans
internal queues (1024 deep) to overow. If this
error occurs the user should attempt to throttle
the amount of DMAs being issued concurrently by
their code, this must be done by all processes
per node. - edb bench_pe_256_t_31 has no elan exception
symbol - edb found exception page at 448328c8
- edb exceptions from bench_pe_256_t_31
- Command - Queue Overflow Trap Typeac
- Fault Area FaultAddr0 EventAddr0 FSR0
- Status40fb30ac MI_DmaQueueOverflow
- FaultAddr0 EventAddr0 FSR0
54Description of Exceptions
- Bad Event
- An exception is generated if a process attempts
to synchronise using an invalid event. In this
case ... looks like the source event was the
cause ... - prun dumping elan exception state for ./a.out
- Dma - Bad Event
- WakeupFnt4 Cntx185 SuspAddr83
TrapTypeMI_EventIntUpdateBPtr - FaultAddrbd3c EventAddrc7fb588 FSRe481
- 0 FaultAddr0 EventAddr0 FSR0
- 1 FaultAddr0 EventAddr0 FSR0
- 2 FaultAddr0 EventAddr0 FSR0
- 3 FaultAddr0 EventAddr0 FSR0
- type 01852002 size 00000000 source 00000000 dest
00000000 - Dest event 0c6f3730 cookie/proc 00000001
- Source event 0c7fb588 cookie/proc 00400000
55Description of Exceptions
- Unimplemented Instruction
- An exception is generated if an Elan thread
executes an unimplemented instruction. This is
often a from forced Elan thread call to
elan_exception. The program counter can be used
to determine ... how ??? - prun dumping elan exception state for ./PMB-MPI1
- edb ./PMB-MPI1 has no elan exception symbol
- Thread - Unimplemented Instruction Instr0
- MI_UnimplementedError - Unimplemeted Instruction
- SP0c7045c0 PC0c6f8f09 NPC0c6f8f0c
DIRTYffffffff - g000000000 g100000004 g200000000 g30c7050c0
- g40c6f8f28 g50c704708 g600000000 g700009680
- o000000082 o100000000 o200000082 o300000000
- o40c7047c0 o50c7047c2 o60c7045c0 o70c6f8f00
- l000000005 l10c806500 l200009680 l300000000
- l400000000 l50c806500 l600000000 l700000000
- i000009680 i10c806650 i200100000 i300100100
- i454c5f860 i50000000f i60c6dedd0 i700006a3d
56Description of Exceptions
- Invalid Thread Address
- An exception is generated if an Elan thread
accesses an invalid address. In this case the
thread ... - prun dumping elan exception state for ./PMB-MPI1
- edb ./PMB-MPI1 has no elan exception symbol
- Thread - Invalid Address Res14
- Fault Area FaultAddr382400c EventAddr2018
FSRe481 - MI_UnimplementedError - Data Access Exception
- SP0c7046a0 PC0c6fb1e8 NPC0c6fb1ed
DIRTYffffffff - g000000000 g100000047 g200000000 g30c704f80
- g40000803c g525051970 g600008088 g700000040
- o025051970 o104010000 o20181c000 o303824000
- o400000003 o500000000 o60c7046a0 o70c6fafe4
- l004010000 l100009f90 l203824000 l300008010
- l40000803c l525051970 l600008088 l700000040
- i000200200 i10c806ec0 i200000001 i300000000
- i400000000 i50000000f i60c6f3f80 i70002bf40
- DataFaultSaveFaultAddr382400c EventAddr2018
FSRe481
57Description of Exceptions
- Too Many Thread Instructions
- An exception is generated if an Elan thread
executes too many instructions without calling
break. The limit is ... - prun dumping elan exception state for ./is.A.256
- Thread - Thread Killed
- MI_UnimplementedError - Too Many Instructions
- SP0c7048a0 PC0c6f7158 NPC0c6f715c
DIRTYffffffff - g000000000 g10c6c5a74 g2000001c0 g3000001e0
- g4000001ff g50c705840 g6000b4840 g700000020
- o000000009 o1ffffffff o20c6f0a40 o30c6f0a34
- o400009480 o50c705840 o60c7048a0 o700000000
- l0010000fb l1000000c0 l20c806300 l300000009
- l4000001e0 l50c8065e4 l600000000 l70c806d80
- i00c705680 i10c6c5a74 i200000000 i300000000
- i44007ca34 i5000000fb i60c6e5a50 i700000002
58Description of Exceptions
- Uncorrected Network Error
- An exception is generated if ...
- prun dumping elan exception state for ./mping
- edb ./mping has no elan exception symbol
- Inputter - Network Error 4006bfff
- State4 Status0
- FaultAddrc000 EventAddrc806b20 FSR5e0fb
- NumTransactions3 Overflow0 AckSent1
BadTransaction1 - QueuePointerc805fe0
- Lock Queue Addr0c805fe0 Cntx00000051
- DMA Identify Addr2df00019 Cntx00000051
- EOP ERROR RESET Cntx00000051 TypeMI_InputDoTrap
- 26ebfcf4
BadLength CRC Bad
59Questions ?