Workload Characterization of 3D Games - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Workload Characterization of 3D Games

Description:

Unreal 2.5 Direct3D. YES. 16X. High/Aniso. 1' 39' Splinter Cell 3/first level. Mar 2006 ... Unreal 2.5. OpenGL. NO. 16X. High/Aniso. 1' 06' UT2004/Primeval ... – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 28
Provided by: jordiroc
Category:

less

Transcript and Presenter's Notes

Title: Workload Characterization of 3D Games


1
Workload Characterization of 3D Games
  • Jordi Roca, Victor Moya, Carlos González, Chema
    Solis, Agustín Fernandez and Roger Espasa (Intel
    DEG Barcelona)

Computer Architecture Department
2
Outline
  • Introduction
  • Game selection stats gathering
  • Game analysis
  • System ? GPU traffic
  • Primitive culling efficiency
  • Rasterization pipeline
  • Fragment shading texturing
  • Memory usage
  • Conclusions

3
Introduction
  • Games and GPU evolve fast
  • GPUs cater for game demands
  • Better effects (flexible programming models)
  • Higher fill-rate (more processing power)
  • Higher quality (HDR, MSAA, AF)
  • Games highly tuned to released GPUs
  • New characterization needed for every Game and
    GPU generation.

4
Outline
  • Introduction
  • Game selection stats gathering
  • Game analysis
  • System ? GPU traffic
  • Primitive culling efficiency
  • Rasterization pipeline
  • Fragment shading texturing
  • Memory usage
  • Conclusions

5
Game workload selection
  • Resolution 1024x768

6
Statistics environment (OpenGL)
OGL Application
OGL Application
GLInterceptor
7
Statistics environment (Direct3D)
Collect
Verify
Simulate
Analyze
D3D Application
PIXRun Trace
Microsoft PIX
Direct3D API call stats
DXPlayer
Microsoft D3D Driver
Microsoft D3D Driver
ATI R520/NVidia G70
ATI R520/NVidia G70
Framebuffer
Framebuffer
CHECK!
8
Outline
  • Introduction
  • Game selection stats gathering
  • Game analysis
  • System ? GPU traffic
  • Primitive culling efficiency
  • Rasterization pipeline
  • Fragment shading texturing
  • Memory usage
  • Conclusions

9
System ? GPU traffic
T. Mitra. T. Chiueh, Dynamic 3D Graphics
Workload Characterization and the architectural
implications, MICRO 99
10
System ? GPU traffic
Index BW
11
Post-TL vertex cache
System ? GPU traffic
  • For adjacent triangles lists
  • 2/3 of referenced vertexes already computed
  • 66 hit rate

12
Post-TL vertex cache experiments
System ? GPU traffic
  • Results show expected hit rate
  • Game preference for triangle lists
  • Low Bus BW usage related to index sent
  • Same vertex computation work as with strips or
    fans using a Post-TL vertex cache
  • Triangle lists are easier managed by modeling
    tools.

13
Outline
  • Introduction
  • Game selection stats gathering
  • Game analysis
  • System ? GPU traffic
  • Primitive culling efficiency
  • Rasterization pipeline
  • Fragment shading texturing
  • Memory usage
  • Conclusions

14
Primitive culling efficiency
  • Clipping/Culling intensively used by our games.
  • Quake4 half of the polygons lie out of the view
    volume.
  • Game renderer engines let GPU do the important
    clipping/culling work
  • Easier and cheaper in GPU Hardware.

15
Outline
  • Introduction
  • Game selection stats gathering
  • Game analysis
  • System ? GPU traffic
  • Primitive culling efficiency
  • Rasterization pipeline
  • Fragment shading texturing
  • Memory usage
  • Conclusions

16
Rasterization pipeline
The Basics
  • Triangles are broken into quads (2x2 fragments)
  • Quad frags are tested individually in different
    stages
  • Z test (hidden surfaces),Stencil test, Alpha Test
    (transparency), Color Mask.
  • Finally alive frags update framebuffer
  • Empty quads are not further processed

17
Rasterization pipeline
Experimentation
  • Quad generation efficiency
  • Higher efficiency than reported in Mitra 99
  • Results show between 40 and 60 efficiencies.
  • Interactive 3D games use less detailed 3D models
    (larger triangles).

18
Rasterization pipeline
  • Doom3 and Quake4
  • Polygon rasterization overhead due to stencil
    shadow volumes (SSV)

19
Rasterization pipeline
  • Fragment rejection breakdown
  • On-die HZ greatly reduces GDDR BW avoiding
    ZStencil buffer accesses.
  • In SSV games Still room for higher BW reduction
    with HZ performing also Stencil test

20
Outline
  • Introduction
  • Game selection stats gathering
  • Game analysis
  • System ? GPU traffic
  • Primitive culling efficiency
  • Rasterization pipeline
  • Fragment shading texturing
  • Memory usage
  • Conclusions

21
Fragment shading texturing
  • Texture filtering cost measured in bilinears

Bilinear filtering 1 bilinear (constant)
Trilinear filtering 2 bilinears (constant)
Anisotropic filtering from 2 up to 32 bilinears
(variable)
  • Texture pipelines can usually execute 1
    bilinear/cycle

22
Fragment shading texturing
  • ALU to Texture Ratio
  • ATI Xenos, RV530, R580 peak performance
  • Up to 3 ALU instructions per bilinear
  • 80 ALU power not used

23
Outline
  • Introduction
  • Game selection stats gathering
  • Game analysis
  • System ? GPU traffic
  • Primitive culling efficiency
  • Rasterization pipeline
  • Fragment shading texturing
  • Memory usage
  • Conclusions

24
Memory usage
  • Memory Hierarchy
  • Specialized features
  • Fast clears
  • Transparent compression
  • Hit rate and miss BW
  • In non-SSV games (UT2004)
  • Most demanding stages Texture, Color.
  • In SSV games (Doom3, Quake4)
  • The most demanding stage ZStencil (50!!)

25
Conclusions
26
Conclusions
  • Do our 3D games use GPU resources efficiently?

27
Conclusions
  • Some inferred implications
Write a Comment
User Comments (0)
About PowerShow.com