Evaluating the Tera MTA Allan Snavely, Wayne Pfeiffer et al - PowerPoint PPT Presentation

About This Presentation
Title:

Evaluating the Tera MTA Allan Snavely, Wayne Pfeiffer et al

Description:

Flat, randomized memory (no data cache) Support for automatic parallelization ... way that users are caused to take cognizance of the impact their usage has on others. ... – PowerPoint PPT presentation

Number of Views:13
Avg rating:3.0/5.0
Slides: 7
Provided by: csewe4
Learn more at: https://cseweb.ucsd.edu
Category:

less

Transcript and Presenter's Notes

Title: Evaluating the Tera MTA Allan Snavely, Wayne Pfeiffer et al


1
Evaluating the Tera MTAAllan Snavely,
Wayne Pfeiffer et al
  • Architectural features
  • Massive hardware multithreading
  • Flat, randomized memory (no data cache)
  • Support for automatic parallelization
  • Single programming model for 1 or many processors
  • Designed to scale
  • Goals of Architecture
  • Cover memory and other operational latencies
  • Ease burden on programmer
  • Exploit multiple levels of parallelism
  • Scale
  • Goals of SDSC Evaluation
  • Funded by NSF to evaluate the MTA for the
    purposes of scientific computing
  • Wayne Pfeiffer, Larry Carter PIs

2
Evaluating the Tera MTA Executive Summary
  • A few kernels and applications have been found
    for which the MTA achieves higher performance
    than other SDSC machines. Such codes have these
    characteristics
  • They do not vectorize well.
  • They are difficult to parallelize on conventional
    machines.
  • They contain substantial parallelism.
  • Examples are codes that involve
  • Integer sorting.
  • Dynamic, irregular meshes or dynamic, non-uniform
    workloads within a regular mesh.
  • Parallel operations (such as a general
    gather/scatter) with poor data locality.
  • Single-processor performance of the
    multithreaded Tera MTA (with a 260-MHz clock) is
    typically lower than that of the vector Cray T90
    (with a 440-MHz clock). The T90 is faster than
    the MTA processor for 4 out of 7 kernels and 2
    out of 3 applications compared.
  • The MTA processor is appreciably faster for one
    kernel which does an integer sort.
  • Single-processor performance of the MTA is
    typically higher than that of cache-based,
    workstation processors. An MTA processor is
    substantially faster than a workstation processor
    for 8 out of 9 applications compared. This
    indicates the effectiveness of multithreading as
    compared to cache utilization.
  • Scalability on the MTA is good up to 8 processors
    in many instances and better for kernels than for
    larger applications.
  • Very good scalability (parallel efficiency
    between 0.80 and 1.00 on 8 processors) has been
    achieved for 6 out of 7 kernels and 5 out of 11
    applications studied.

3
MTA v.s. IBM Blue Horizon
4
MTA v.s. T90
5
Scalability
6
Symbiosis and Congestion Pricing on MTA
  • Allan Snavelys Ph.D. thesis (Fall 2000) Advisor
    Larry Carter.
  • Symbiosis A term from Biology meaning the
    living together of distinct organisms in close
    proximity. We adapt that term to refer to an
    increase in throughput and job turnaround that
    can occur when jobs are coscheduled on a
    multithreaded machine.
  • Congestion Pricing An area of Economics dealing
    with the right way of pricing a congestion
    externality in such a way that users are caused
    to take cognizance of the impact their usage has
    on others.
  • Key Observation Resource sharing among
    coscheduled jobs on a multithreaded machine such
    as the MTA or SMT is very intimate.
  • Thesis Jobschedulers which take Symbiosis into
    account, when combined with principles of
    Congestion Pricing, deliver significant
    throughput and turnaround gains and maximize
    global user utility when deployed on
    multithreaded machines.
  • See www.sdsc.edu/allans
Write a Comment
User Comments (0)
About PowerShow.com