Evaluating the Tera MTA Allan Snavely, Wayne Pfeiffer et al

About This Presentation

Title:

Description:

Number of Views:13

Avg rating:3.0/5.0

Slides: 7

Provided by: csewe4

Learn more at: https://cseweb.ucsd.edu

Category:

more less

Transcript and Presenter's Notes

Title: Evaluating the Tera MTA Allan Snavely, Wayne Pfeiffer et al

1
Evaluating the Tera MTAAllan Snavely,
Wayne Pfeiffer et al

2
Evaluating the Tera MTA Executive Summary

A few kernels and applications have been found
for which the MTA achieves higher performance
than other SDSC machines. Such codes have these
characteristics
They do not vectorize well.
They are difficult to parallelize on conventional
machines.
They contain substantial parallelism.
Examples are codes that involve
Integer sorting.
Dynamic, irregular meshes or dynamic, non-uniform
workloads within a regular mesh.
Parallel operations (such as a general
gather/scatter) with poor data locality.
Single-processor performance of the
multithreaded Tera MTA (with a 260-MHz clock) is
typically lower than that of the vector Cray T90
(with a 440-MHz clock). The T90 is faster than
the MTA processor for 4 out of 7 kernels and 2
out of 3 applications compared.
The MTA processor is appreciably faster for one
kernel which does an integer sort.
Single-processor performance of the MTA is
typically higher than that of cache-based,
workstation processors. An MTA processor is
substantially faster than a workstation processor
for 8 out of 9 applications compared. This
indicates the effectiveness of multithreading as
compared to cache utilization.
Scalability on the MTA is good up to 8 processors
in many instances and better for kernels than for
larger applications.
Very good scalability (parallel efficiency
between 0.80 and 1.00 on 8 processors) has been
achieved for 6 out of 7 kernels and 5 out of 11
applications studied.

3
MTA v.s. IBM Blue Horizon
4
MTA v.s. T90
5
Scalability
6
Symbiosis and Congestion Pricing on MTA

Allan Snavelys Ph.D. thesis (Fall 2000) Advisor
Larry Carter.
Symbiosis A term from Biology meaning the
living together of distinct organisms in close
proximity. We adapt that term to refer to an
increase in throughput and job turnaround that
can occur when jobs are coscheduled on a
multithreaded machine.
Congestion Pricing An area of Economics dealing
with the right way of pricing a congestion
externality in such a way that users are caused
to take cognizance of the impact their usage has
on others.
Key Observation Resource sharing among
coscheduled jobs on a multithreaded machine such
as the MTA or SMT is very intimate.
Thesis Jobschedulers which take Symbiosis into
account, when combined with principles of
Congestion Pricing, deliver significant
throughput and turnaround gains and maximize
global user utility when deployed on
multithreaded machines.
See www.sdsc.edu/allans