Title: RADEON
1RADEON 9700Architecture and 3D Performance
2RADEON 9700
- What is the RADEON 9700 ?
- Programmability(SMARTSHADER 2.0)
- First Full Floating Point Graphics Pipeline
- Enables Compilation of High Level Shading
Languages - Performance
- High Bandwidth
- Parallelism
- Efficiency
- Image Quality (SMOOTHVISION 2.0)
- Multisample Antialiasing
- Anisotropic Texture Filtering
3Image Generation with Image Mapping 1st
Generation Programmability
- Idea Texture Mapping, Blinn and Newell 1976
- Implementation SGI VGXT 1990
- Hardwired Vertex Processing
- Hardwired Fragment Processing with a Single
Texture - Result Environment Mapping and other effects
Blinn, J. F. and Newell, M. E. Texture and
reflection in computer generated images.
Communications of the ACM Vol. 19, No. 10
(October 1976), 542-547
4Image Generation with Texture Composition2nd
Generation Programmability
- Idea Shade trees, R. Cook 1984
- Implementation RADEON 8500 2001
- Limited Vertex Programmability
- Limited Fragment Processing
- Multiple Textures
- Fixed Point Data
- Short Programs
- Result Current generation of effects.
Robert L. Cook Shade Trees. Computer Graphics
Vol. 18, No. 3, (July 1984), 223-231
5Image Generation with General Purpose Floating
Point Math Texturing 3rd Generation
Programmability
- Idea RenderMan, Pixar 1987
- Implementation ATI RADEON 9700 2002
- Advanced Vertex Programmability
- Advanced Fragment Programmability
- Floating Point Data
- Rich Instruction Set
- Large Instruction Store
- Result Enabling Cinematic Rendering
- Compiling RenderMan, Maya, etc.
Willina T. Reeves, David H. Salesin, Robert L.
Cook Rendering Antialiased Shadows with Depth
Maps. Computer Graphics Vol. 21, No. 4, (July
1987), 283-291
6SMARTSHADER 2.0
- Next-generation programmable shader technology
- Enabling cinema-quality effects in real time
- First complete DirectX 9.0 feature support
- 2.0 Vertex and Pixel Shaders
- Floating Point Pixel Pipelines
- 128-bit Floating Point Texture and Frame Buffer
Formats - Two-Sided Stencil Shadow Acceleration
- High Precision 32-bpp (1010102) Display Mode
- Higher Order Surface Enhancements
- Full feature set also available for OpenGL
- OpenGL Shading Language Support
7Vertex Shaders (SMARTSHADER 2.0)
- Flow Control
- Loops, jumps and subroutines
- Allow re-use of certain parts of theshader code
- Avoids repetition and saves instructions
- More Instructions, More Complex Effects
- Up to 65,280 instructions per pass
- Vertex shaders can be much more complex than they
were in DX8
8Pixel Shaders (SMARTSHADER 2.0)
- More Complex Shaders by an Order of Magnitude
- Up to 160 instructions per pass
- 32 address ops, 64 color ops, 64 alpha ops
- Compared with 12 instructions total in DX8.0
- Multi-pass rendering support
- High precision 128-bit floating point data
formats for storing intermediate results between
passes - Shaders can now effectively be thousands of
instructions long performance is the only
limitation - 24-bit per component floating point precision for
all pixel shader operations - necessary for
cinema-quality effects - Allows shaders written in any present or future
language to run on hardware with SMARTSHADER 2.0 - Even high level languages like RenderMan can now
be compiled to run on RADEON 9700 in real time - Pixel shader can also implement complex Image
Processing algorithms
9RADEON 9700 Performance
- Key design elements for best performance High
Bandwidth, Parallelism, Efficiency - High Bandwidth
- AGP 8x provides 2 GB/sec transfers to or from the
CPU or system memory. - 310 MHz 256-bit DDR Memory Interface provides
20 GB/sec access to the Frame Buffer - Internal 256-bit data busses for Color, Texture
and Z - Parallelism
- 4 Vertex Engines running at 325MHz provides
325 Mtriangles/sec (4 clocks per vertex per
engine) - 8 Pixels/Clock Rasterization Architecture running
at 325MHz provides a peak fill rate of 2.6
Gpix/sec
10RADEON 9700 Performance (cont.)
- Efficiency
- Graphics systems tend to be Memory Bandwidth
limited. The RADEON 9700 is no exception. So it
is important to use the bandwidth efficiently. - Hierarchical and Early Z checking allows pixels
to be rejected before the pixel shader. This is
very important when shader programs are long. - Color, Texture and Z caches reduce memory
bandwidth utilization. Benefit from spacial and
temporal locality. - Lossless Color and Z data compression reduce
memory bandwidth utilization. - Compressed Textures can be utilized to reduce
memory bandwidth utilization. - Fast Color and Z clears eliminate need to access
memory for clears - HyperZ III
11RADEON 9700 Performance (cont.)
- One more interesting thing..
- Scalability
- The RADEON 9700 Architecture is capable of
scaling up to 256 simultaneous units
12Image Quality (SMOOTHVISION 2.0 )
- Performance matters too
- Pixel antialiasing and anisotropic texture
filtering improve image quality only if they are
enabled. - Just going to higher resolutions isnt the answer
for improved image quality. - Artifacts due to poor texture sampling remain.
- Dynamic antialiasing artifacts are still very
visible. - Sufficient performance for high resolution
display, high quality texture filtering, and
antialiasing is needed. - The RADEON 9700 was architected to do all three
simultaneously.
13Anti-Aliasing (SMOOTHVISION 2.0)
- Non-Grid Programmable Multi-Sampling
- 2, 4, or 6 samples per pixel
- Sample positions provide the maximum quality per
sample - Lossless Z and Color compression minimizes
bandwidth cost of higher sample counts. - Per Sample Gamma Correction
- Takes gamma into account when blending samples
- Creates smoother edge transitions
14Anisotropic Filtering (SMOOTHVISION 2.0)
- Improved Adaptive Algorithm
- Up to 16 Trilinear Samples (128-tap)
- Calculates optimal number of samples foreach
polygon - Delivers full image quality benefit while
conserving memory bandwidth
15RADEON 9700 Demos
16Conclusion