Title: KIPA Game Engine Seminars
1KIPA Game Engine Seminars
Day 15
- Jonathan Blow
- Seoul, Korea
- December 12, 2002
2Bit Tricks
- Generating Bit Masks
- Is some number a power of two?
- Avoiding if statements (branch prediction)
- Floating-point absolute value
- Floating-point compare
- Floating-point log2
3Generating Bit Masks
- Suppose we want to mask the low n bits of a
machine word - We can generate that with a loop
- Show summation equation for the loop
- Identity that lets us do something faster
4Is some number a power of two?
- The power-of-two will be a single bit somewhere
in the middle of the word - The power-of-two minus one will be a bit mask
like the ones we just looked at - ANDing them together will produce 0
5Counting the numberof set bits in a machine word
- Slow loop version
- Trick O(num set bits) version
- Discussion of tree version
6Pentium 4 fireball
- A 16-bit integer unit at the core of the chip
that runs at very high clock speeds - 32-bit integer operations are pipelined through
the fireball as multi-stage 16-bit operations - Pipeline is organized for bits to flow from
bottom to top of the word (as with addition and
subtraction) - Right-shifts require a dependency that goes in
the opposite direction (slower!)
7How many bits does it take to store this range
of values?
- Application network or file i/o
- Want ceil(log2(n_max)) assuming the values go
from 0 to n_max - Slow floating-point versions
- Fast bit-extraction versions
8Floating-Point log2
- Show slow version
- Fast version utilizing the IEEE-754 format
9Fast absolute value
- Utilizing IEEE-754 floating point format
10Fast floating-point compare
- Description of how x86 machines compare floating
point numbers - Get at least one of them on the stack
- Perform fcomp instruction
- Load the floating point control word
- Bit-mask it to see if the desired field is set
11Decision-making without branching
- (And without writing in assembly language, to use
instructions like CMOV) - Build a mask based on whether some intermediate
result is negative or not - Use that to mask values and add them, or whatever
you want - Examples
12Collision Detection
- Speedbox and Schnitzel as alternatives to the
prevent tunneling raycast
13Collision Detection
- Dont forget to optimize mainly for the expected
case! - To miss a lot, or to hit a lot?
- Example of Shock Force and the early hit test
- We expect to miss usually!
- So the early hit test was not so effective
14Collision detection
- More Shock Force examples
- Hierarchy of tests bounding sphere, OBB, simple
plane divide, BSP hard case
15Profiling
- Motivation
- You cant optimize unless you profile. For some
reason some people think they can theyre wrong. - Demo of sample app
- Goals
- Know where the overall CPU is being spent
- May depend on which kind of behavior is
happening! - Know which routines are stable and which ones are
not
16Profiling
- Example of getting the current time on Windows
- At different accuracy levels
- Description of how this is slow, and why
- Too slow to call very often in code!
17Profiling (2)
- Using the rdtsc instruction
- Converting this to realtime units by calling
QueryPerformanceCounter once per frame
18Profiling (3)
- Define macros that put rdtsc calls into preambles
and postambles for functions - Measure and categorize CPU time this way
- Measure self time and hierarchical time
- Code review of macros / constructors
19Problem with rdtsc
- Theres this SpeedStep thing on Intel laptops
- Change the CPUs clock speed based on performance
/ temperature demands - Does not adjust rdtsc to compensate
- May spread beyond laptops in the future
- Power consumption of CPUs is becoming an
important concern for businesses
20We can detect if rdtsc is screwing up profiling
data
- But we cant fix the profiling data
- Solution just draw a big warning on the screen
21Division of Profiler
- Low-Level Profiler
- High-Level Profiler
22Walkthrough of first demo app
- How it uses the macros
- How it collects and draws the profiling data
23Measuring varianceof profiling data
- To figure out how stable each function is
- Draw which functions are hot in the realtime
display
24Behaviors
- We would like some better analysis of what the
different behaviors are for our program - Just eyeing the results is not very scientific
- Examples of different behaviors
- Fill rate limited, AI limited, etc
25Batch Profiling vs Interactive Profiling
- Batch profiling averages a bunch of data together
over a session - Maybe it provides a way to peek at individual
samples but the processing is never very
convenient - Interactive profiling is about seeing results as
soon as they happen - But interactive profilers are usually hacked
together - What if we made a good one?
26Want to detect and analyzespecific behaviors
- But without preconceived ideas of what they might
be - Treat incoming frames of profiling data as
vectors, and cluster them - Description of k-means clustering
27Clustering algorithms tend tobe pretty slow
- And they require batch data to process
- k-means needs random access to the input!
- Online k-means
- Faster, non-batch. But quality?
28Self-Organizing Map
- Kohonen Self-Organizing Map
- Description of the algorithm
- Much like online k-means
- But with coherence in a separate space
29Demo of SOM-enabledProfiling Tool
- Visualizations are still early
- Hopefully they will mature into something truly
useful (people in other visualization fields like
SOMs, so hopes are high)
30Discussions of changes made to SOM to support
online clustering