Title: Intel Processor Strategy Update
1Hyper-Threading Intel Compilers
Andrey Naraikin Senior Software Engineer Software
Products Division Intel Nizhny Novgorod
Lab November 29, 2002
2Agenda
- Hyper-Threading Technology Overview
- Introduction Intel SW Development Tools
- Motivation
- Challenges
- Intel SW Tools
- Intel Compilers Overview
- Technologies supported
- SPEC and other benchmarks
- Some features supported by Intel Compilers
3Hyper-Threading Overview Todays Processors
- Single Processor Systems
- Instruction Level Parallelism (ILP)
- Performance improved with more CPU resources
- Multiprocessor Systems
- Thread Level Parallelism (TLP)
- Performance improved by adding more CPUs
4Hyper-Threading Overview Todays Software
5Hyper-Threading Overview Multi-Processing
- Run parallel tasks using multiple processors
Multi-tasking workload processor resources gt
Improves MT Performance
6Hyper-Threading Quick View
7Dual-Core Architecture
Hyper-Threading Technology
Multiprocessor
Hyper-Threading
AS
AS
Processor Execution Resources
Processor Execution Resources
AS Architecture State (eax, ebx, control
registers, etc.), xAPIC
Hyper-Threading Technology looks like two
processors to software
8Hyper-Threading Architecture Overview
Pentium, VTune and Xeon is a trademark or
registered trademark of Intel Corporation or its
subsidiaries in the United States or other
countries.
9Hyper-Threading Architecture Details
Pentium, VTune and Xeon is a trademark or
registered trademark of Intel Corporation or its
subsidiaries in the United States or other
countries.
10Hyper-Threading Overview Resource Utilization
Time (proc. cycles)
Note Each box represents a processor execution
unit
11Performance Benefit
Hyper-Threading Technology
Code Description
A1 Engineering
A2 Genetics
A3 Chemistry
A4 Engineering
A5 Weather
A6 Genetics
A7 CFD
A8 FEA
A9 FEA
Hyper-Threading Technology Impact on
Compute-Intensive Workloads, Intel Technical
Journal, Vol. 6, 2002.
12Key Point
Hyper-Threading Technology
- Hyper-Threading Technology gives better
utilization of processor resources - Hyper-Threading Technology gives more computing
power for multithreaded applications
13Collateral
- Web Sites
- http//developer.intel.com/technology/hyperthread/
- http//developer.intel.com/design/pentium4/applnot
s - http//developer.intel.com/design/pentium4/manuals
- Documentation and application notes
- IA-32 Intel Architecture Software Developers
Manual - Intel Pentium 4 and Intel XeonTM Processor
Optimization Manual - Intel App Note AP485 - Intel Processor
Identification and CPU Instructions - Intel App Note AP 949 Using Spin-Loops on Intel
Pentium 4 Processor and Intel Xeon Processor - Intel App Note Detecting Support for Jackson
Technology Enabled Processors
14Collateral (Contd)
- Intel Technology Journal
- http//developer.intel.com/technology/itj/
- Intel Threading Tools
- http//www.intel.com/software/products/
- OpenMP
- http//www.openmp.org
- HT Overview
- http//www.ixbt.com/cpu/pentium4-3ghz-ht.shtml
15Performance AdvantageOptimization Path
Intel SW Development Tools
15x faster
Minor Code Change
Performance Libraries (IPP or MKL)
13x
Little or No Code Change
OpenMP Threading
OpenMP Threading
9x
Performance Libraries (IPP or MKL)
Analysis with VTune
7x
Minor Code Change (1 Line)
Intel Compiler
Intel Compiler
Intel Compiler
Intel Compiler
4x
Performance Libraries (IPP or MKL)
1x
Standard Compiler
Standard Compiler
16Sunset Simulation Optimized Performance
Intel SW Development Tools
15x faster
17Intel Compilers
Intel SW Development Tools Compilers
- C, C and Fortran95
- Available on Windows and Linux
- Available for 32-bit and 64-bit platforms
- Utilization of latest processor/platform features
- Optimizations for NetBurst architecture
(Pentium 4 and Xeon processor) - Optimizations for Itanium architecture
- Seamless integration into Windows (IDE)and
Linux environment - Source and binary compatible with Microsoft
compiler mostly source compatible with GNU (gcc)
18Benchmarks Intel Compilers 6.0 for Windows
Intel SW Development Tools Compilers
SPECfp_base2000 (Geomean of Fortran)
SPECint_base2000
28 Faster Floating-point Performance!!
17 Faster Integer Performance!!
900
900
Geomean of Fortran 881
SPECint_base2000 825
800
800
SPECint_base2000 703
700
700
Geomean of Fortran 686
600
600
500
500
400
400
Leading C Compiler
CVF 6.6
Intel Fortran Compiler 6.0
Intel C Compiler 6.0
Configuration info Intel Pentium 4 Processor,
2.4 GHz, Intel Medford 850 Motherboard, (D850MD
850 motherboard) Chipset, 256 MB Memory, Windows
XP Professional Edition (build 2600), GeForce
3/nVidia Graphics
Performance tests and ratings are measured using
specific computer systems and/or components and
reflect the approximate performance of Intel
products as measured by those tests. Any
difference in system hardware or software design
or configuration may affect actual performance.
Buyers should consult other sources of
information to evaluate the performance of
systems or components they are considering
purchasing. Users results are dependent upon
the application characteristics (loopy vs. flat),
mix of C and C, and other factors. For more
information on performance tests and on the
performance of Intel products, reference
www.intel.com or call (U.S.) 1-800-628-8686 or
1-916-356-3104.
19Intel C Compiler 6.0 for Linux
Intel SW Development Tools Compilers
PovRay Image Rendering Time
Improvement
Configuration info Intel Pentium 4 processor,
2.0 GHz, 256 MB Memory, nVidia GeForce 2
graphics card, Linux 2.4.7, PovRay 3.1G
20Special Performance Features
Intel SW Development Tools Compilers
- Auto-Vectorization for NetBurst architecture
- Software-Pipelining for EPIC architecture
- Auto-Parallelization and OpenMP based
parallelization - for Hyper-Threading and multi-processor systems
- Data Pre-Fetching
- Profile-Guided Optimization (PGO)
- Inter-procedural Optimization (IPO)
- CPU Dispatch
- Establishes code path at runtime dependent on
actual processor type - Allows single binary with optimal performance
across processor families
21Techniques Overview
Features by Intel Compilers
- Exploit parallelism to speedup application
- Vectorization
- Supported by programming languages and compilers
- Motivated by modern architectures
- Superscalarity, deeply pipelined core
- SIMD
- Software pipelining on Itanium architecture
- Parallelization
- OpenMP directives for shared memory
multiprocessor systems - MPI computations for clusters
22Intel processors and vectorization
Features by Intel Compilers - Vectorization
Type of processor
Vectorization features supported
Streaming SIMD Extensions 2 (SSE 2), Double
precision floating point, Integer types, 128 bits
Pentium 4 processor
Streaming SIMD Extensions (SSE), Single precision
floating point
Pentium III processor
Pentium with MMX technology, Pentium
II processors
Integer types, 64 bits
23Automatic Vectorization
Features by Intel Compilers - Vectorization
- Compiler automatically transforms sequential code
for SIMD execution
for (i0 iltn i) ai ai bi
ai sin(ai)
Run-Time Library
HW SIMD instruction
icl - QxMKW
for(i0 iltn iiVL) a(i iVL-1) a(i
iVL-1) b(i iVL-1) a(i iVL-1)
_vmlSin(a(i iVL-1))
24Vectorization Example
Features by Intel Compilers - Vectorization
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
a
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
b
0.0
2.0
4.0
6.0
8.0
10.0
12.0
14.0
16.0
18.0
Scalar
Vector
icl - QxW
double aN, bN int i for (i 0 i lt N i)
ai ai bi
25Reduction Example
Features by Intel Compilers - Vectorization
11.0
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
10.0
a
float aN, x int i x0.0 for (i 0 i lt N
i) x ai
26Parallel Program Development
Features by Intel Compilers - Parallelization
Ease of use/ maintenaince
27Autoparallelization
Features by Intel Compilers - Parallelization
float aN, bN, cN int i for (i0 iltN
i) ci aibi
icl -Qparallel foo.c
-xparallel on Linux . foo.c foo.c(7) (col. 2)
remark LOOP WAS AUTO-PARALLELIZED. ... ./foo.exe
-- Executable detects and uses number of
processors -Qpar_reportn - get helpful
messages from the compiler
28OpenMP Directives
Features by Intel Compilers - Parallelization
- OpenMP standard (www.openmp.org)
- Set of directives to enable the writing of
multithreaded programs - Use of shared memory parallelism on programming
language level - Portability
- Performance
- Support by Intel Compilers
- Windows, Linux
- IA-32 and Itanium architectures
29Simple Directives
Features by Intel Compilers - Parallelization
foo(float a, float b, float c) int
i pragma parallel for (i0 iltN i)
c (a)bar(b)
Use simple directives instead
Pointers and procedure calls with escaped
pointers prevent analysis for autoparallelization
30OpenMP Directives
Features by Intel Compilers - Parallelization
- void foo()
- int a1000, b1000, c1000, x1000, i, NUM
- / parallel region /
- pragma omp parallel private(NUM) shared(x, a, b,
c) - NUM omp_get_num_threads()
- pragma omp for private(i) / work-sharing for
loop / - for (i 0 ilt 1000 i)
- xi bar(ai, bi, ci,
NUM) / assume bar has no side-effects / -
-
icl -Qopenmp -c foo.c
-xopenmp on Linux foo.c foo.c(10) (col. 1)
remark OpenMP DEFINED LOOP WAS
PARALLELIZED. foo.c(7) (col. 1) remark OpenMP
DEFINED REGION WAS PARALLELIZED.
31OpenMP Vectorization
Features by Intel Compilers
- Combined speedup
- Order of use might be important
- Parallelization overhead
- Vectorize inner loops
- Parallelize outer loops
- Supported by Intel Compilers
32Intel Compilers
Intel SW Development Tools
- Leading-Edge compiler technologies
- Compatible with leading industry standard
compilers - Processor optimized code generation
- Support single source code across Intel processor
families
Make performance a feature of your applications
today stay competitive
33Collateral
- Intel Technology Journal
- http//developer.intel.com/technology/itj/
- Intel Threading Tools
- http//www.intel.com/software/products/
- OpenMP
- http//www.openmp.org
- HT Overview
- http//www.ixbt.com/cpu/pentium4-3ghz-ht.shtml
34To be continued