Title: Apple HPC Tools
1Apple HPC Tools
- Chris Mueller
- July 6, 2004
2Altivec
- http//developer.apple.com/hardware/ve
- SIMD, 32x128-bit registers, 160 opcodes
- G5 has 2 units vector permute and vector ALU
along with a streaming prefetch unit
3Altivec Programming
- -faltivec
- C programming model
- Variables vector lttypegt varname
- Functions vec_ltopgt(arg, )
- Caveats
- Data must be 16 byte-aligned
- Its WYSIWYG, the compiler wont rearrange code
- However, the compiler will add loads/stores as
needed within vector function calls
4Example
void VecAdd(unsigned char data, long len,
unsigned char result) // Diagnol sum of
all the values in data long i 0 vector
unsigned char score, score1, score2, vperm,
newsum newsum vec_splat_u8(0) //
create a constant for(i 0 i lt len - 16
i) // Load each vector if((i
0x0000000f) 0) // aligned case
score vec_ld(0, (datai)) else
// unaligned case
score1 vec_ld(0, (datai)) score2
vec_ld(16, (datai)) vperm
vec_lvsl(0, (datai)) score
vec_perm(score1, score2, vperm)
newsum vec_add(score, newsum)
vec_st(newsum, 0, result) // aligned
store return
5Performance
- Standard sum adds up all values in an array
- Vector sum performs a diagnol sum of all
vectors in an array - Register sum loads one value in to the register
and adds it to itself repeatedly
6Shark
- /Developer/Applications/Performance
Tools/CHUD/Shark - Apples main (high) performance measuerment tool
- Samples all running applications and reports
hotspots and offers optmization suggestions - Nifty source/asm viewer
- Compile with
- -g -DCOPY_PHASE_STRIPNO
7amber/simg5/scrollpv
- Cycle accurate trace (amber), simulation (simg5)
and visualization (scrollpv) - Usage
- Warning amber turns off one CPU. Prematurely
killing amber can leave you with one CPU!!!
amber -I -x 5000 ltprogramgt simg5
thread_001.tt6e 5000 100 1 simg5 -p 1 -b 1 -e
5000 scrollpv -pipe trace_001.pipe -config
trace_001.config