Common C++ Performance Mistakes in Games - PowerPoint PPT Presentation

About This Presentation
Title:

Common C++ Performance Mistakes in Games

Description:

Common C++ Performance Mistakes in Games Pete Isensee Xbox Advanced Technology Group About the Data ATG reviews code to find bottlenecks and make perf recommendations ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 40
Provided by: tantalonC
Category:

less

Transcript and Presenter's Notes

Title: Common C++ Performance Mistakes in Games


1
Common C Performance Mistakes in Games
  • Pete Isensee
  • Xbox Advanced Technology Group

2
About the Data
  • ATG reviews code to find bottlenecks and make
    perf recommendations
  • 50 titles per year
  • 96 use C
  • 1 in 3 use advanced features like templates or
    generics

3
Why This Talk Is Important
  • The majority of Xbox games are CPU bound
  • The CPU bottleneck is often a language or C
    library issue
  • These issues are not usually specific to the
    platform

4
Format
  • Definition of the problem
  • Examples
  • Recommendation
  • For reference
  • A frame is 17 or 33 ms (60fps / 30fps)
  • Bottlenecks given in ms per frame

5
Issue STL
  • Game using stdlist
  • Adding 20,000 objects every frame
  • Rebuilding the list every frame
  • Time spent 6.5 ms/frame!
  • 156K overhead (2 pointers per node)
  • Objects spread all over the heap

6
stdset and map
  • Many games use set/map as sorted lists
  • Inserts are slow (log(N))
  • Memory overhead 3 ptrs color
  • Worst case in game 3.8 ms/frame

7
stdvector
  • Hundreds of push_back()s per frame
  • VS7.1 expands vector by 50
  • Question How many reallocations for 100
    push_back()s?
  • Answer 13! (1,2,3,4,5,7,10,14,20,29,43,64,95)

8
Clearly, the STL is Evil
9
Use the Right Tool for the Job
  • The STL is powerful, but its not free
  • Filling any container is expensive
  • Be aware of container overhead
  • Be aware of heap fragmentation and cache
    coherency
  • Prefer vector, vectorreserve()

10
The STL is Evil, Sometimes
  • The STL doesnt solve every problem
  • The STL solves some problems poorly
  • Sometimes good old C-arrays are the perfect
    container
  • Mike Abrash puts it well
  • The best optimizer is between your ears

11
Issue NIH Syndrome
  • Example Custom binary tree
  • Sorted list of transparent objects
  • Badly unbalanced
  • 1 ms/frame to add only 400 items
  • Example Custom dynamic array class
  • Poorer performance than stdvector
  • Fewer features

12
Optimizations that Arent
  • void appMemcpy( void d, const void s, size_t b
    )
  • // lots of assembly code here ...
  • appMemcpy( pDest, pSrc, 100 ) // bottleneck
  • appMemcpy was slower than memcpy for anything
    under 64K

13
Invent Only What You Need
  • stdset/map more efficient than the custom tree
    by 10X
  • Tested and proven
  • Still high overhead
  • An even better solution
  • Unsorted vector or array
  • Sort once
  • 20X improvement

14
Profile
  • Run your profiler
  • Rinse. Repeat.
  • Prove the improvement.
  • Dont rewrite the C runtime or STL just because
    you can. There are more interesting places to
    spend your time.

15
Issue Tool Knowledge
  • If youre a programmer, you use C/C every day
  • C is complex
  • CRT and STL libraries are complex
  • The complexities matter
  • Sometimes they really matter

16
vectorclear
  • Game reused global vector in frame loop
  • clear() called every frame to empty the vector
  • C Standard
  • clear() erases all elements (size() goes to 0)
  • No mention of what happens to vector capacity
  • On VS7.1/Dinkumware, frees the memory
  • Every frame reallocated memory

17
Zero-Initialization
  • struct Array int x1000
  • struct Container
  • Array arr
  • Container() arr()
  • Container x // bottleneck
  • Costing 3.5 ms/frame
  • Removing arr() speeds this by 20X

18
Know Thine Holy Standard
  • Use resize(0) to reduce container size without
    affecting capacity
  • T() means zero-initialize PODs. Dont use T()
    unless you mean it.
  • Get a copy of the C Standard. Really.
  • www.techstreet.com search on 14882
  • Only 18 for the PDF

19
Issue C Runtime
  • void BuildScore( char s, int n )
  • if( n gt 0 )
  • sprintf( s, d, n )
  • else
  • sprintf( s, )
  • n was often zero
  • sprintf was a hotspot

20
qsort
  • Sorting is important in games
  • qsort is not an ideal sorting function
  • No type safety
  • Comparison function call overhead
  • No opportunity for compiler inlining
  • There are faster options

21
Clearly, the CRT is Evil
22
Understand Your Options
  • itoa() can replace sprintf( s, d, n )
  • s \0 can replace sprintf( s, )
  • stdsort can replace qsort
  • Type safe
  • Comparison can be inlined
  • Other sorting options can be even faster
    partial_sort, partition

23
Issue Function Calls
  • 50,000-100,000 calls/frame is normal
  • At 60Hz, Xbox has 12.2M cycles/frame
  • Function call/return averages 20 cycles
  • A game calling 61,000 functions/frame spends 10
    CPU (1.7 ms/frame) in function call overhead

24
Extreme Function-ality
  • 120,000 functions/frame
  • 140,000 functions/frame
  • 130,000 calls to a single function/frame
    (ColumnVeclt3,floatgtoperator)
  • And the winner
  • 340,000 calls per frame!
  • 9 ms/frame of call overhead

25
Beware Elegance
  • Elegance ? levels of indirection ? more functions
    ? perf impact
  • Use algorithmic solutions first
  • One pass through the world
  • Better object rejection
  • Do AI/physics/networking less often than
    once/frame

26
Inline Judiciously
  • Remember inline is a suggestion
  • Try inline any suitable compiler option
  • 15 to 20 fps
  • 68,000 calls down to 47,000
  • Try __forceinline or similar keyword
  • Adding to 5 funcs shaved 1.5 ms/frame
  • Dont over-inline

27
Issue for loops
  • // Example 1 Copy indices to push buffer
  • for( DWORD i 0 i lt dwIndexCnt i )
  • pPushBuffer arrIndices i
  • // Example 2 Initialize vector array
  • for( DWORD i 0 i lt dwMax i )
  • mVectorArri XGVECTOR4(0,0,0,0)
  • // Example 3 Process items in world
  • for( itr i c.begin() i lt c.end() i )
  • Process( i )

28
Watch Out For For
  • Never copy/clear a POD with a for loop
  • stdalgorithms are optimized use them
  • memcpy( pPushBuffer, arrIndices,
  • dwIndexCnt sizeof(DWORD) )
  • memset( mVectorArr, 0, dwMax sizeof(XGVECTOR4)
    )
  • for_each( c.begin(), c.end(), Process )

29
Issue Exception Handling
  • Most games never throw
  • Most games never catch
  • Yet, most games enable EH
  • EH adds code to do stack unwinding
  • A little bit of overhead to a lot of code
  • 10 size increase is common
  • 2 ms/frame in worst case

30
Disable Exception Handling
  • Dont throw or catch exceptions
  • Turn off the C EH compiler option
  • For Dinkumware STL
  • Define _HAS_EXCEPTIONS0
  • Write empty _Throw and _Raise_handler see
    stdthrow.cpp and raisehan.cpp in crt folder
  • Add pragma warning(disable 4530)

31
Issue Strings
  • Programmers love strings
  • Love hurts
  • 7000 calls to stricmp in frame loop
  • 1.5 ms/frame
  • Binary search of a string table
  • 2 ms/frame

32
Avoid strings
  • String comparisons dont belong in the frame loop
  • Put strings in an table and compare indices
  • At least optimize the comparison
  • Compare pointers only
  • Prefer strcmp to stricmp

33
Issue Memory Allocation
  • Memory overhead
  • Xbox granularity/overhead is 16/16 bytes
  • Overhead alone is often 1 MB
  • Too many allocations
  • Games commonly do thousands of allocations per
    frame
  • Cost 1-5 ms/frame

34
Hidden Allocations
  • push_back(), insert() and friends typically
    allocate memory
  • String constructors allocate
  • Init-style calls often allocate
  • Temporary objects, particularly string constants
    that convert to string objects

35
Minimize Per-Frame Allocations
  • Use memory-friendly data structures, e.g. arrays,
    vectors
  • Reserve memory in advance
  • Use custom allocators
  • Pool same-size allocations in a single block of
    memory to avoid overhead
  • Use the explicit keyword to avoid hidden
    temporaries
  • Avoid strings

36
Other Tidbits
  • Compiler settings experiment
  • dynamic_cast just say no
  • Constructors performance killers
  • Unused static array space track this
  • Loop unrolling huge wins, sometimes
  • Suspicious comments watch out
  • Immensely slow matrix multiplication

37
Wrap Up
  • Use the Right Tool for the Job
  • The STL is Evil, Sometimes
  • Invent Only What You Need
  • Profile
  • Know Thine Holy Standard
  • Understand Your Options
  • Beware Elegance
  • Inline Judiciously
  • Watch Out For For
  • Disable Exception Handling
  • Avoid Strings
  • Minimize Per-frame Allocations

38
Call to Action Evolve!
  • Pass the rubber chicken
  • Share your C performance mistakes with your
    team
  • Mentor junior programmers
  • So they only make new mistakes
  • Dont stop learning
  • You can never know enough C

39
Questions
  • Fill out your feedback forms
  • Email pkisensee_at_msn.com
  • This presentation www.tantalon.com/pete.htm
Write a Comment
User Comments (0)
About PowerShow.com