Programming the Velocity Engine - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Programming the Velocity Engine

Description:

At http://www.alienorb.com/AltiVec/ AltiVec Tutorial ... Bing-Chang Lai, Phillip John McKerrow Programming the Velocity Engine, AUC, 2001 ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 28
Provided by: aucUo
Category:

less

Transcript and Presenter's Notes

Title: Programming the Velocity Engine


1
Programming theVelocity Engine
Academic Developers Conference 2001
  • Bing-Chang Lai
  • Phillip John McKerrow
  • University of Wollongong

2
Introduction
  • What is a Vector Processor?
  • The Velocity Engine
  • Programming the Velocity Engine
  • Discuss Examples 1 to 3 only
  • QA

3
What is a Vector Processor?
  • Supports Single Instruction Multiple Data (SIMD)
    instructions
  • Originally used in Supercomputers for crunching
    scientific programs
  • Now popular on the desktop as well, for crunching
    multimedia related applications

4
What is a Vector Processor?
  • On desktop, it is usually part of a larger
    processor
  • Examples of Vector Processor Technologies
  • MMX, SSE, 3DNow, AltiVec

5
The Velocity Engine
  • Apples name for AltiVec Technology
  • What is AltiVec Technology then?
  • Refers to technique Motorola used to add vector
    processing capabilities to the G4 (74xx) family
    of processors

6
The Velocity Engine
  • G4 Processor
  • Load/Store Unit
  • Integer Unit
  • Floating Point Unit
  • Vector Unit (AltiVec)

7
Programming the Velocity Engine
  • Specifications
  • AltiVec Technology Programming Interface Manual
  • Available from
  • http//e-www.motorola.com/brdata/
    PDFDB/MICROPROCESSORS/32_BIT/POWERPC/ALTIVEC/ALTIV
    ECPIM.pdf
  • http//www.altivec.org/tech_specifications/
    altivec_pim.pdf

8
Programming the Velocity Engine
  • Compilers
  • Apple AltiVec-related patches to GCC 2.295.2
  • Metroworks Codewarrior
  • Vector types
  • All vectors are 128-bit long
  • Start with keyword vector or __vector
  • Followed by type. Eg. unsigned char, unsigned
    int, signed int and so on

9
Programming the Velocity Engine
  • Vector types

10
Programming the Velocity Engine
  • Vector types

long has been Deprecated
11
Programming the Velocity Engine
  • Vector types

12
Programming the Velocity Engine
  • Vector operations
  • Arithmetic Operations
  • vec_abs (absolute value), vec_add (addition),
    vec_sub (subtraction) ...
  • Boolean Operations
  • vec_and (Logical AND), vec_or (Logical OR) ...
  • vec_cmpeq (Equality), vec_cmple (Less Than or
    Equal To)

13
Programming the Velocity Engine
  • Vector operations
  • Miscellaneous Operations
  • vec_perm (Permutation), vec_merge (Merges two
    vectors into 1) ...
  • Memory Operations
  • vec_st (Store), vec_ld (Load) ...
  • Data Stream Operations
  • vec_dst (Vector Data Stream Touch), vec_dss
    (Vector Data Stream Stop) ...

14
Programming the Velocity Engine
  • Constraints
  • Vector operations all work on 128-bits at a time
    only no more and no less.
  • vec_ld (load) and vec_st (store) all operate on
    16-byte (128-bit) boundaries.
  • This leads to alignment of data issues
  • Loading of data from memory to the processor is
    one of the main bottlenecks.
  • Use cache functions to mark data for load before
    the operation takes place

15
Programming the Velocity Engine
  • The following examples from the paper will be
    discussed
  • Example 1 Element-by-Element access
  • Example 2 Alignment
  • Example 3 Unaligned Loads and Stores
  • The Image Addition program in the Appendix will
    not be discussed

16
Programming the Velocity Engine
  • Example 1 Element-by-Element Access

1 include ltiostreamgt 2 typedef union 3
__vector unsigned char AsVector 4 unsigned
char AsUChar16 5 vec_uchar 6 int main()
7 vec_uchar v1 8 v1.AsVector
(__vector unsigned char) ( 9 '0', '1',
'2', '3', '4', '5', '6', '7', 10 '8', '9',
'A', 'B', 'C', 'D', 'E', 'F') 11 for(int i
0 i lt 16 i) 12 stdcout ltlt
v1.AsUChari 13 stdcout ltlt stdendl 14
return 0 15
17
Programming the Velocity Engine
  • Example 1 Element-by-Element Access
  • Outputs
  • 01234567890ABCDEF
  • Instead of using the union, you can also access
    elements by address and casting

__vector unsigned char v1 for(int i 0 i lt 16
i) stdcout ltlt ((unsigned char )(v1))i
18
Programming the Velocity Engine
  • Example 2 Alignment
  • 16-byte aligned locations have address with the
    least significant 4 bits set to 0. Eg. 0xf0, 0x10
    and so on
  • AltiVec specification specifies vec_malloc and
    vec_free for creating 16-byte aligned blocks for
    vectors.
  • The code finds the aligned address by removing
    setting the 4 l.s.b to 0 and then adding 16.
  • Please note that Apple GCC aligns everything to
    16-byte boundaries

19
Programming the Velocity Engine
  • Example 2 Alignment - Allocate

1 template ltclass Elementgt 2 Element
allocate(unsigned int n) 3 4 // Allocate
n sizeof(Element) 16 bytes 5 Element
p_unal (Element )operator 6
new(nsizeof(Element) 16) 7 //
Align the pointer 8 Element p_al (Element
)align16(p_unaligned) 9 // Store
difference between aligned and unaligned in 10
// byte at location (p_al - 1) 11 unsigned
char p_offset p_al - 1 12 p_offset
p_al - p_unal 13 return p_al 14
20
Programming the Velocity Engine
  • Example 2 Alignment - Deallocate

1 template ltclass Elementgt 2 void
deallocate(Element p_al) 3 4 // Fetch
difference between aligned and unaligned from
5 // byte at location (p_al - 1) 6 //
and calculate p_unal 7 unsigned char
p_offset p_al - 1 8 Element p_unal
(Element )(p_al - p_offset) 9 10
operator delete(p_unal) 11
21
Programming the Velocity Engine
  • Example 2 Alignment - Using

1 // Allocate aligned COUNT unsigned char 2
unsigned char p_aligned allocateltunsigned
chargt(COUNT) 3 4 // Now that it is aligned,
we can load into a vector 5 __vector unsigned
char v vec_ld(p_aligned, 0) 6 7 // Use v
for calculations 8 // .... 9 10 // Free
Buffer 11 deallocateltunsigned chargt(p_aligned)
22
Programming the Velocity Engine
  • Example 3 Unaligned Loads and Store

1 // Load a vector from an unaligned location in
memory 2 __vector unsigned LoadUnaligned(__vector
unsigned char p_v) 3 4 __vector
unsigned char permuteVector vec_lvsl(0, (int
)(p_v)) 5 __vector unsigned char low
vec_ld(0, p_v) 6 __vector unsigned char
high vec_ld(16, p_v) 7 return
vec_perm(low, high, permuteVector) 8
23
Programming the Velocity Engine
  • Example 3 Unaligned Loads and Store

1 void StoreUnaligned(__vector unsigned char v,
2 __vector unsigned char
p_v) 3 4 __vector unsigned char low
vec_ld(0, p_v) 5 __vector unsigned char
high vec_ld(16, p_v) 6 __vector unsigned
char permvec vec_lvsr(0, (int )p_v) 7
__vector unsigned char oxFF vec_splat_u8(0xff)
8 __vector unsigned char ox00
vec_splat_u8(0) 9 __vector unsigned char
mask vec_perm(ox00, oxFF, permvec) 10 v
vec_perm(v, v, permvec) 11 low
vec_sel(low, v, mask) 12 high vec_sel(v,
high, mask) 13 vec_st(low, 0, p_v) 14
vec_st(high, 16, p_v) 15
24
Programming the Velocity Engine
  • Example 3 Unaligned Loads and Store

4 l.s.b of p_v 7 v 0
1 2 3 4 5 6 7 8 9 a b c d e f
low 0 0 0 4f 0 0 0 8 0 0
0 6 0 0 0 d high 0 0 0
2 0 0 0 4 41 10 f7 8c bf ff fa 58
perm 9 a b c d e f 10 11 12 13 14
15 16 17 18 mask 0 0 0 0 0
0 0 ff ff ff ff ff ff ff ff ff vec_perm(v,v,perm)
v 9 a b c d e f 0 1 2 3 4 5 6 7
8 vec_sel(low,v,mask) 0 0 0 4f 0 0 0 0
1 2 3 4 5 6 7 8 vec_sel(v,high,mask) 9
a b c d e f 4 41 10 f7 8c bf ff fa 58
25
Resources
  • The code for this paper will be available
  • At http//www.bclai.net (Probably by the end of
    the week)
  • Email me on bl12_at_uow.edu.au
  • Other Important Resources
  • AltiVec Information Source
  • At http//www.altivec.org
  • Email group list
  • Apples AltiVec Homepage
  • At http//developer.apple.com/hardware/ve/
  • Tutorials
  • Vector Libraries
  • AlienOrb AltiVec Page
  • At http//www.alienorb.com/AltiVec/
  • AltiVec Tutorial
  • AltiVec Code Examples on lookup table, streaming
    data fetch instructions ...

26
References
  • Bing-Chang Lai, Phillip John McKerrow Programming
    the Velocity Engine, AUC, 2001
  • Motorola, Inc. AltiVec Technology Programming
    Interface Manual, 1999.see http//e-www.motorola.
    com/brdata/PDFDB/MICROPROCESSORS/32_BIT/POWERPC/
    ALTIVEC/ALTIVECPIM.pdf
  • Ian Ollmann Ph.D. AltiVec, 2001. see
    http//www.alienorb.com/AltiVec/Altivec.pdf

27
QA
Write a Comment
User Comments (0)
About PowerShow.com