Introduction to MMX, XMM, SSE and SSE2 Technology - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to MMX, XMM, SSE and SSE2 Technology

Description:

MMX, XMM, SSE and SSE2 Technology Multimedia Extension, Streaming SIMD Extension 11/23/98, 5/6/99, 2/5/03, 5/10/04, 5/4/05 SISD - Single Instruction, Single Data ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 31
Provided by: JamesB225
Category:

less

Transcript and Presenter's Notes

Title: Introduction to MMX, XMM, SSE and SSE2 Technology


1
Introduction to MMX, XMM, SSE and
SSE2Technology
  • Multimedia Extension,Streaming SIMD Extension
  • 11/23/98, 5/6/99, 2/5/03, 5/10/04, 5/4/05

2
SISD - Single Instruction, Single Data
  • Traditional computers
  • In general, one instruction processes one data
    item

Control Unit
Memory
ExecutionUnit
3
SIMD - Single Instruction, Multiple Data
  • One instruction can process multiple data items
  • Useful when large amounts of regularly organized
    data is processed
  • Example Matrix and vector calculations
  • This is the basis of MMX and XMM

Control Unit
Memory
ExecutionUnits
4
MISD
  • MISD Multiple instructions process one data
    item.

5
MIMD
  • MIMD Multiple instructions process multiple
    data items.

6
Your Turn
  • How would you classify a traditional computer
    under this system?
  • How would you classify a Shemp which has multiple
    processors?
  • How would you classify a computer having a Intel
    Dual Core processor?

7
Potential Applications MMX and SSE
  • graphics
  • MEG video/image processing
  • music synthesis
  • speech compression/recognition
  • video conferencing
  • matrix and vector calculations
  • Advanced 3D graphics (SSE2)
  • Speech recognition (SSE2)
  • Scientific and engineering applications (SSE2)

8
MMX
  • 4 new data types
  • New instructions
  • Uses 8 existing 64 bit floating point registers

9
The floating point registers
  • Floating point is processed by eight 80 bit
    registers ST(0), ST(1), ST(7) in the floating
    point unit.
  • When doing floating point arithmetic, these
    registers are organized in a stack.
  • Programming floating point is quite different
    that programming integer arithmetic.
  • Floating point calculations are done using 80
    bits even when the program specifies storing 32
    or 64 bit data values.

10
Advantages of using the floating point registers
in MMX.
  • The registers already exist. Only logic had to
    be added to the chip.
  • The operating system already knows about the
    floating point registers.
  • When a computer is switches from one program to
    another, the state (registers) of the current
    program must be saved so state can be restored
    when the program becomes the active program once
    again.
  • The floating point registers are automatically
    saved as part of the state of a program.
  • MMX worked under existing operating systems!

11
New data types for MMX
  • 64 bits long. One data item can store

12
SSE and SSE2
  • SSE Streaming SIMD Extensions
  • SSE2 introduced eight 128 bit XMM registers
  • These registers are disjoint from the floating
    point/MMX registers
  • SSE (Pentium III) can handle 4 single floating
    point numbers
  • SSE2 (Pentium 4) can also handle 2 double
    floating point numbers

13
New data types for XMM
  • 128 bits Can be used as

14
Your turn
  • Your program uses 3 arrays of 160,000 byte
    integers. We need to add the elements in the
    first two arrays to calculate the third array.
  • Using a standard Pentium, how many operations
    are needed? (One operation includes loading 2
    values into CPU, adding, storing the result and
    the associate loop processing)
  • How many XMM operations would be needed?

15
New instructions
  • Process the new data types 16, 8,4, or 2 data
    items (64 bits or 128 bits) at a time.
  • Types of instructionsAdd / SubtractMultiply/Mul
    tiply and addShiftLogical (AND, NAND, OR,
    XOR)Pack and unpackMoveShuffle and unpack
    (SSE)

16
Saturation
  • Handling overflow when adding 16, 8, 4, or 2
    values at a time is a problem. Programmers can
    specify that when overflow occurs, the sum
    should be replaced by the maximum legal value.
  • Example Unsigned byte addition
    80h A0h 120h gt overflow
    Instead the machine stores FFh.
  • Likewise when subtracting.

17
Comparison operations
  • Consider lt, gt, lt, gt, , and lt gt
    operations.
  • Consider comparing two 64 bits quantities each
    holding 8, 4, or 2 values.
  • Comparing multiple values at a time is a problem.
    So the MMX instructions store 0 for false and
    -1 for true for each of individual data items.

18
Example 1 Calculating Dot Products
  • 7
    Consider calculating S AiBi
    i 0using MMX
  • Assume Ai and Bi are stored as signed 16 bit
    integers.
  • Assume that the products and sums should be
    calculated using 32 bits.
  • Assume that all values have two binary places.

19
Example 1 Calculating Dot Products
  • Storing A and B (64 bit vectors) 0 2
    4 6 8 10 12 14 bytes
    0 1 2 3 4 5 6
    7 subscriptsAB
  • We store each Ai and Bi item as 16 bit integers,
    4 per 64 bit data item. Assume each value has 2
    binary places

20
Example 1 Calculating Dot Products
  • Multiply and add instruction


21
Example 1 MMX Calculating Dot Products
  • Packed Multiply and add instruction

  • Packed Add
  • (Normal) Add

2
20
4
30
3
40
5
50
806
1520
2326
22
Example 1 Calculating Dot Products
  • Approximate algorithm
  • Load left half of A into a FP register.
  • Multiply and add by left half of B.
  • Shift products right 2 bits. (Products should
    have only two binary places.)
  • Repeat with right halves of A and B using a
    different register.
  • Add the second sum to the first.
  • Store the result.

4 words at a time
Two doublewords at a time
23
Example 1 Calculating Dot Products
  • Approximate algorithm (Conclusion)
  • Add the two sums together in EAX to get the
    final sum.

1 double word at a time
24
Example 1 Calculating Dot Products
  • Intel claims that standard Pentiums would require
    40 instructions to carry this out. Using MMX
    technology, only 13 instructions are needed.
    Speed improves by even a greater ratio.

25
Example 224-bit color video blending
  • Suppose we have are displaying 640 by 480 pixel
    video that uses 24 bit colors - 8 bits for red, 8
    for green, and 8 for blue.
  • Suppose we are currently showing one picture
    which we want to fade out and replace by fading
    in a second picture.
  • Suppose that we want to do the fade out/in in 255
    steps.

26
Example 224-bit color video blending
  • For each step, for each of 3 colors and for each
    of the 640 by 480 pixels we must
    calculateResult_pixel NewPicture_pixel
    (i/255) OldPicture_pixel
    (1-(i/255))where i is the step counter.
  • This formula must be calculated640 480 3
    255 235,008,000times on 8 bit data!

27
Example 224-bit color video blending
  • Intel calculates that this requires execution of
    1.4 billion instructions on a standard PC even
    ignoring the calculation of i/255 and (1-i/255)
    and loop control.
  • With MMX, we can calculate 4 values in parallel.
    The number of MMX instructions would be 525
    million. (Because the multiply instruction only
    applies to word data, the byte data must be
    unpacked into words and repacked after the
    calculation.)

28
Also included in MMX
  • Intel increased cache size when MMX was
    introduced (necessary for SIMD machines)
  • Programs run faster on MMX machines even if the
    SIMD instructions are not used
  • Excellent marketing
  • Programs run faster on MMX machine
  • People want/buy MMX
  • Software publishers are encouraged to rewrite
    programs to take advantage of the new instructions

29
Information source
  • http//www.intel.com/drg/mmx/manuals/overview/ind
    ex.htmintro(no longer available)
  • http//developer.intel.com/drg/mmx/manuals/ (no
    longer available)
  • http//www.intel.com/design/Pentium4/manuals/24547
    012.pdf (IA-32 Intel Architecture Software
    Developers Manual, vol. 1)
  • This slide show is MMX.PPT

30
Your Turn
  • 1. Characterize the kinds of problems where SIMD
    is helpful.
  • 2. Give examples of problems where SIMD is
    useful.
Write a Comment
User Comments (0)
About PowerShow.com