Computing Environment - PowerPoint PPT Presentation

About This Presentation
Title:

Computing Environment

Description:

Computing Environment The computing environment rapidly evolving you need to know not only the methods, but also How and when to apply them, – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 14
Provided by: Ming118
Category:

less

Transcript and Presenter's Notes

Title: Computing Environment


1
Computing Environment
  • The computing environment rapidly evolving - you
    need to know not only the methods, but also 
  • How and when to apply them,
  • Which computers to use,
  • What type of code to write,
  • What kind of CPU time and memory requirement your
    jobs will have,
  • What tools (e.g., visualization software) to use
    to analyze the data.

2
Definitions Clock Cycles
  • computer chip operates at discrete intervals
    called clocks. Often measured in nanoseconds (ns)
    or megahertz.
  • 500 mHz (Pentium III) -gt 2 ns
  • 100 mhz (Cray J90) -gt 10 ns
  • May take 4 clocks to do one multiplication
  • May take 30 clocks to start a procedure
  • May take 2 clocks to access memory
  • mHz not the only measure

3
Definitions FLOPS
  • Floating Operations / Second
  • Mflops million FLOPS
  • A good measure of code performance typically
    one add is one flop, one multiplication is also
    on flop
  • Cray J90 Perk 200 Mflops, most codes achieves
    only 1/3 of peak
  • Cray T90 Perk 3.2 Gflops
  • Earth Simulator (NEC XS-5) 8 Gflops
  • Fastest Workstation Processor (DEC Alpha)
    1Gflops

4
MIPS
  • Million instructions per second also a measure
    of computer speed used most the old days

5
Bandwidth
  • The speed at which data flow across a network or
    wire
  • 56K Modem 56 kilobits / second
  • T1 link 1.554 mbits / sec
  • T3 link 45 mbits / sec
  • FDDI 100 mbits / sec
  • Fiber Channel 800 mbits /sec
  • 100 BaseT (fast) Ethernet 100 mbits/ sec
  • Brain system 3 Gbits / s

6
Hardware Evolution
  • Mainframe computers
  • Supercomputers
  • Workstations
  • Microcomputers / Personal Computers
  • Desktop Supercomputers
  • Workstation Super Clusters
  • Handheld, Palmtop, Calculators,
  • et al.

7
Types of Processors
  • Scalar (Serial)
  • One operation per clock cycle
  • Vector
  • Multiple (tens to hundreds) operations per clock
    cycle. Typically achieved at the loop level where
    the instructions are the same or similar for each
    loop index
  • Superscalar
  • Several instructions per clock cycle

8
Types of Computer Systems
  • Single Processor Scalar (e.g., ENIAC, IBM704,
    IBM-PC)
  • Single Processor Vector (CDC7600, Cray-1)
  • Multi-Processor Vector (e.g., Cray XMP, Cray C90,
    Cray J90, NEC SX-5),
  • Single Processor Super-scalar (IBM RS/6000 such
    as Bluesky)
  • Multi-processor scalar (e.g., Multi-processor
    Pentium PC)
  • Multi-processor super-scalar (e.g., DEC Alpha
    based Cray T3E, RS/6000 based IBM SP-2, SGI
    Origin 2000)
  • Clusters of the above (e.g., Linux clusters,
    Earth Simulator Cluster of multiple vector
    processor nodes)

9
Memory Architectures
  • Shared Memory Systems
  • Distributed Memory Systems
  • Memory can be accessed and addressed
  • uniformly by all processors
  • Fast/expensive CPU, Memory, and networks
  • Easy to use
  • Difficult to scale to many (gt 32) processors
  • Each processor has its own memory
  • Others can access its memory only via
  • network communications
  • Often off-the-shelf components,
  • therefore low cost
  • Hard to use, explicit user specification of
  • communications often needed.
  • Single CPU slow. Not suitable for
  • inherently serial codes
  • High-scalability - largest current system
  • has nearly 10K processors

10
Memory Architectures
  • Multi-level memory (cache and main memory)
    architectures
  • Cache fast and expensive memory
  • Typical L1 cache size in current day
    microprocessors 32 K
  • L2 size 256K to 8mb
  • Main memory a few Mb to many Gb.
  • Try to reuse the content of cache as much as
    possible before the content is replaced by new
    data or instructions

11
Issues with Parallel Computing
  • Load-balance / Synchronization
  • Try to give equal amount of workload to each
    processor
  • Try to give processors that finish first more
    work to do (load rebalance)
  • The goal is to keep all processors as busy as
    possible
  • Communication / Locality
  • Inter-processor communications typically the
    biggest overhead on MPP platforms, because
    network is slow relative to CPU speed
  • Try to keep data access local
  • E.g., 2nd-order finite difference

requires data at 3 points
4th-order finite difference
requires data at 5 points
12
A Few Simple Roles for Writing Efficient Code
  • Use multiplies instead of divides whenever
    possible
  • Make innermost loop the longest
  • Slower loop
  • Do 100 i1000
  • Do 10 j1,10
  • a(i,j)
  • 10 continue
  • Faster loop
  • Do 100 j100
  • Do 10 i1,1000
  • a(i,j)
  • 10 continue
  • For the short loop like Do I1,3, write out the
    associated expressions explicitly since the
    startup cost may be very high
  • Avoid complicated logics (IFs) inside Do loops
  • Avoid subroutine and function calls inside DO
    loops
  • Vectorizable codes typically also run faster on
    RISC based super-scalar processors
  • Keep it simple.

13
Transition in Computing Architectures
This chart depicts major NCAR SCD computers from
the 1960s onward, along with the sustained
gigaflops (billions of floating-point
calculations per second) attained by the SCD
machines from 1986 to the end of fiscal year
1999. Arrows at right denote the machines that
will be operating at the start of FY00. The
division is aiming to bring its collective
computing power to 100 Gfps by the end of FY00,
200 Gfps in FY01, and 1 teraflop by FY03. (Source
at http//www.ucar.edu/staffnotes/9909/IBMSP.html)
Write a Comment
User Comments (0)
About PowerShow.com