Kenneth%20Hurley%20Sr.%20Software%20Engineer - PowerPoint PPT Presentation

About This Presentation
Title:

Kenneth%20Hurley%20Sr.%20Software%20Engineer

Description:

... Buffers. Bad Things can happen unless you know the 'right' way to use a vertex Buffer ... Example vertex buffer flow. CreateVB(WRITEONLY, 1000-12000) A: I = 0 ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 23
Provided by: DanG76
Category:

less

Transcript and Presenter's Notes

Title: Kenneth%20Hurley%20Sr.%20Software%20Engineer


1
Kenneth HurleySr. Software Engineer
  • khurley_at_nvidia.com

2
What are the problems we are seeing when 3D
engines are written?
  • Misuse of Vertex Buffers
  • Concurrency Limitations
  • Frame Rate Limiters
  • Non-Optimized surface usage
  • Cache misses
  • Data Ordering

3
Misuse of Vertex Buffers
  • Bad Things can happen unless you know the right
    way to use a vertex Buffer
  • Dynamic vertex buffer vs. static vertex buffers
  • When creating the vertex buffer, use
    D3DVBCABS_WRITEONLY
  • Use D3DLOCK_DISCARDCONTENTS
  • Use D3DLOCK_NOOVERWRITE
  • Vertex buffer ordering
  • Use ordered vertex buffers because of cache
    coherency

4
Using Vertex Buffers Correctly
5
Example vertex buffer flow
  • CreateVB(WRITEONLY, 1000-12000)
  • A I 0
  • B Space in VB for M vertices?
  • Yes Lock(NOOVERWRITE)
  • No GOTO C
  • Fill in M vertices at index I
  • Unlock() DIPVB(I) I M GOTO B
  • C Lock(DISCARDCONTENTS) GOTO A

6
Concurrency
  • Why do I need it?
  • Concurrency helps parallelism between the CPU and
    the GPU.
  • OK, How do I achieve it?
  • Use NVPAT to see if Spin Lock is happening.
  • Spin Locks are when the driver has to stall
    waiting for the hardware to finish with an object
  • These objects can be vertex buffers or texture
    surfaces

7
Concurrency (cont.)
  • Use the vertex buffer and texture surface flags
    so the driver can give you another buffer while
    the hardware is using the other one.

8
Frame Rate Limiters
  • Can cause concurrency issues
  • Better ways to achieve constant frame rates
  • Makes effective triangle rate much lower, because
    driver has to do some work with vertex data.

9
Frame Rate Limiter Problem
  • Serialization of code loop

Rescheduled for concurrency
10
Non Optimized Surface Usage
  • Locking a texture before the GPU is finished with
    it causes concurrency problems by stalling the
    CPU inside the driver.
  • Typical examples include locking the backbuffer
    to do 2D operations on it
  • The best solution for this is to use 2 screen
    aligned triangles (quad) instead and put them
    directly in the 3D pipeline

11
Cache Misses
  • Big slowdowns can occur here
  • CPU cache misses can occur because of ordering of
    vertex data. Check these carefully with VTune.
  • GPU has a vertex cache also. Geforce has a 16
    entry cache, but optimal cache use is 10, because
    6 triangles can be in flight at any given time.
  • GPU vertex cache statistics will be added to
    NVPAT.

12
Vertex Ordering
  • Best performance is to also order vertex data and
    vertex indices in sequential order. This helps
    both the CPU and the GPU
  • Out of order vertices makes the CPU hit the cache
    more often
  • It does the same thing to the GPU

13
How do we solve these problems?
  • VTune
  • GPT
  • NVPAT

14
VTune 4.5
  • Will help your application optimize for CPU
  • Works well in conjunction with NVPAT
  • I personally use the Time-Based Sampling Wizard
  • VTune is excellent for application specific
    analysis
  • It doesnt show where in the driver time is
    spent, unless you have symbols for the driver.
    You almost certainly dont have driver symbols.

15
VTune 4.5
  • Flare Application

16
GPT 3.5
  • Excellent tool to help you achieve maximum
    performance.
  • Works on both D3D and OpenGL
  • Helps with application ?? API slowdowns
  • Works well in conjunction with VTune and NVPAT.
    GPT is excellent for application to
    Direct3D/OpenGL analysis.
  • It still cant tell you what is occurring inside
    the driver that may be slowing your application
    down

17
GPT 3.5 (cont)
  • Quad view for visual analysis modes

View of alien world in Half-Life
18
NVPAT 1.07
  • Analyze interaction with driver
  • Works on NVIDIA hardware only
  • Windows 98/Windows 2000 capable
  • Hotkey capable
  • Online help via F1 function key
  • Logging
  • Frame Rate Display
  • Natural Extension to VTune and GPT

19
NVPAT 1.07
  • Demo Flare VS NewFlare
  • NVPAT Available free at http//www.nvidia.com/Mark
    eting/Developer/SwDevStaticPages.nsf/pages/StatsDr
    iver
  • You must be a registered NVIDIA developer

20
VTune DLL SDK
  • Soon, all these performance tools should be
    integrated into VTune using the DLL SDK
  • NVPAT will be integrated into the VTune DLL SDK
  • VTune DLL SDK is available from Intel and gives
    you the ability to integrate performance tools
    into VTune.
  • http//developer.intel.com/vtune/analyzer/vtperfd
    ll
  • Common User Interface/API means less to learn for
    developers

21
Action Items
  • Profile often and early in the process
  • Use the tools available to you
  • Some are free, the rest are reasonable
  • Architect engine with concurrency in mind
  • Ask for enhancements from your tool vendor

22
Questions?
  • Comments/Suggestions?
  • Enhancement requests for NVPAT can be sent to
    statdriver_at_nvidia.com
Write a Comment
User Comments (0)
About PowerShow.com