KIPA Game Engine Seminars - PowerPoint PPT Presentation

About This Presentation
Title:

KIPA Game Engine Seminars

Description:

Suppose we want to mask the low n bits of a machine word. We can generate ... Speedbox and Schnitzel as alternatives to the 'prevent tunneling' raycast. 13 ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 31
Provided by: some8
Category:

less

Transcript and Presenter's Notes

Title: KIPA Game Engine Seminars


1
KIPA Game Engine Seminars
Day 15
  • Jonathan Blow
  • Seoul, Korea
  • December 12, 2002

2
Bit Tricks
  • Generating Bit Masks
  • Is some number a power of two?
  • Avoiding if statements (branch prediction)
  • Floating-point absolute value
  • Floating-point compare
  • Floating-point log2

3
Generating Bit Masks
  • Suppose we want to mask the low n bits of a
    machine word
  • We can generate that with a loop
  • Show summation equation for the loop
  • Identity that lets us do something faster

4
Is some number a power of two?
  • The power-of-two will be a single bit somewhere
    in the middle of the word
  • The power-of-two minus one will be a bit mask
    like the ones we just looked at
  • ANDing them together will produce 0

5
Counting the numberof set bits in a machine word
  • Slow loop version
  • Trick O(num set bits) version
  • Discussion of tree version

6
Pentium 4 fireball
  • A 16-bit integer unit at the core of the chip
    that runs at very high clock speeds
  • 32-bit integer operations are pipelined through
    the fireball as multi-stage 16-bit operations
  • Pipeline is organized for bits to flow from
    bottom to top of the word (as with addition and
    subtraction)
  • Right-shifts require a dependency that goes in
    the opposite direction (slower!)

7
How many bits does it take to store this range
of values?
  • Application network or file i/o
  • Want ceil(log2(n_max)) assuming the values go
    from 0 to n_max
  • Slow floating-point versions
  • Fast bit-extraction versions

8
Floating-Point log2
  • Show slow version
  • Fast version utilizing the IEEE-754 format

9
Fast absolute value
  • Utilizing IEEE-754 floating point format

10
Fast floating-point compare
  • Description of how x86 machines compare floating
    point numbers
  • Get at least one of them on the stack
  • Perform fcomp instruction
  • Load the floating point control word
  • Bit-mask it to see if the desired field is set

11
Decision-making without branching
  • (And without writing in assembly language, to use
    instructions like CMOV)
  • Build a mask based on whether some intermediate
    result is negative or not
  • Use that to mask values and add them, or whatever
    you want
  • Examples

12
Collision Detection
  • Speedbox and Schnitzel as alternatives to the
    prevent tunneling raycast

13
Collision Detection
  • Dont forget to optimize mainly for the expected
    case!
  • To miss a lot, or to hit a lot?
  • Example of Shock Force and the early hit test
  • We expect to miss usually!
  • So the early hit test was not so effective

14
Collision detection
  • More Shock Force examples
  • Hierarchy of tests bounding sphere, OBB, simple
    plane divide, BSP hard case

15
Profiling
  • Motivation
  • You cant optimize unless you profile. For some
    reason some people think they can theyre wrong.
  • Demo of sample app
  • Goals
  • Know where the overall CPU is being spent
  • May depend on which kind of behavior is
    happening!
  • Know which routines are stable and which ones are
    not

16
Profiling
  • Example of getting the current time on Windows
  • At different accuracy levels
  • Description of how this is slow, and why
  • Too slow to call very often in code!

17
Profiling (2)
  • Using the rdtsc instruction
  • Converting this to realtime units by calling
    QueryPerformanceCounter once per frame

18
Profiling (3)
  • Define macros that put rdtsc calls into preambles
    and postambles for functions
  • Measure and categorize CPU time this way
  • Measure self time and hierarchical time
  • Code review of macros / constructors

19
Problem with rdtsc
  • Theres this SpeedStep thing on Intel laptops
  • Change the CPUs clock speed based on performance
    / temperature demands
  • Does not adjust rdtsc to compensate
  • May spread beyond laptops in the future
  • Power consumption of CPUs is becoming an
    important concern for businesses

20
We can detect if rdtsc is screwing up profiling
data
  • But we cant fix the profiling data
  • Solution just draw a big warning on the screen

21
Division of Profiler
  • Low-Level Profiler
  • High-Level Profiler

22
Walkthrough of first demo app
  • How it uses the macros
  • How it collects and draws the profiling data

23
Measuring varianceof profiling data
  • To figure out how stable each function is
  • Draw which functions are hot in the realtime
    display

24
Behaviors
  • We would like some better analysis of what the
    different behaviors are for our program
  • Just eyeing the results is not very scientific
  • Examples of different behaviors
  • Fill rate limited, AI limited, etc

25
Batch Profiling vs Interactive Profiling
  • Batch profiling averages a bunch of data together
    over a session
  • Maybe it provides a way to peek at individual
    samples but the processing is never very
    convenient
  • Interactive profiling is about seeing results as
    soon as they happen
  • But interactive profilers are usually hacked
    together
  • What if we made a good one?

26
Want to detect and analyzespecific behaviors
  • But without preconceived ideas of what they might
    be
  • Treat incoming frames of profiling data as
    vectors, and cluster them
  • Description of k-means clustering

27
Clustering algorithms tend tobe pretty slow
  • And they require batch data to process
  • k-means needs random access to the input!
  • Online k-means
  • Faster, non-batch. But quality?

28
Self-Organizing Map
  • Kohonen Self-Organizing Map
  • Description of the algorithm
  • Much like online k-means
  • But with coherence in a separate space

29
Demo of SOM-enabledProfiling Tool
  • Visualizations are still early
  • Hopefully they will mature into something truly
    useful (people in other visualization fields like
    SOMs, so hopes are high)

30
Discussions of changes made to SOM to support
online clustering
Write a Comment
User Comments (0)
About PowerShow.com