Title: Cross Platform Development Best Practices
1(No Transcript)
2Cross Platform Development Best Practices
- Matt Lee, Kev Gee
- Microsoft Game Technology Group
3Agenda
- Code Considerations
- CPU Considerations
- GPU Considerations
- IO Considerations
- Content Considerations
- Data Build System
- Geometry Formats
- Texture Formats
- Shaders
- Audio Considerations
4Compiler Comparison
- VS 2005 front end used for both platforms
- Preprocessor benefits both platforms
- Debugger experience is the same
- Full 2005 IDE support coming
- Xbox 360 optimizing back end added with XDK
install - Single solution / MSBuild file can target both
platforms
5PC CPUs
- Intel Pentium D / AMD Athlon64 X2
- Programming Model
- 2 Cores running _at_ around 3.20 GHz
- 12-KB Execution trace cache
- 16-KB L1 cache, 1 MB L2 cache
- Deep Branch Prediction
- Dynamic data flow analysis
- Speculative Execution
- Little-endian byte ordering
- SIMD instructions
- Quad Core announced for early 2007
6360 Custom CPU
- Custom IBM Processor
- 3 64-bit PowerPC cores running at 3.2 GHz
- Two hardware threads per core
- 32-KB L1 instruction cache data cache, per core
- Shared 1-MB L2 cache
- 128-byte cache lines on all caches
- Big-endian byte ordering
- VMX 128 SIMD
- Lots of Registers
7Performance Tools
- Profiling approaches are very similar between PC
and Xbox 360 - PIX for Xbox 360 PIX for Windows
- Being developed by the same team now
- Use instrumented tools on Xbox 360
- XbPerfView / Tracedump
- Xbox 360 does not have a sampling profiler yet
- Use PC profiling tools
- Intel VTune / AMD Code Analyst / VS Team System
Profiler - Attend the Performance Hands on training!
8Focus Your Efforts
- Use performance tools to guide work
- Areas where we have seen platform specific
efforts reap rewards - Single Data Pass engine design
- High Frequency Game API Layers
- Use your profiler tools to target the hot spots
- Math Library - Bespoke vs XGMath vs D3DXMath
9Impact on Code Design
- Designing Cross platform APIs
- Use of virtual Functions
- Parameter passing mechanisms
- Pass by reference vs. pass by value
- Typedef vector types and intrinsics
- Math Library Design Case Study
- Use of inlining
10Use of Virtual Functions
- Be careful when using virtual functions to hide
platform differences - Virtual function performance on Xbox 360
- Adds branch instruction which is always
mispredicted! - Compiler limited in optimizing these
- Make a concrete implementation for Xbox 360
- Avoid virtual functions in inner loops
11Cross Platform Render Example
12Cross Platform Render Example (ctd.)
- class IRenderSystem
-
-
- public
- if !defined(_XBOX)
- virtual void Draw()0
- else
- void Draw()
- endif
- void IRenderSystemDraw()
-
- // 360 Implementation
-
- D3D9 D3D10 implementations subclass for
specialization
13Beware Big Constructors
- Ctors can dominate execution time
- Ctors often hidden to casual observer
- Copy ctors add objects to containers
- Arrays of C objects are constructed
- Overloaded operators may construct temporaries
- Consider should ctor init data?
- Example matrix class zeroing all data
- Prefer array initialization
14Inlining
- Careful inlining is in general a Good Thing
- Plan to spend time ensuring the compiler is
inlining the right stuff - Use Perf Tools such as VTune / Trace recorder
- Try the inline any suitable option
- Enable link-time code generation
- Consider profile-guided optimization
- Use __forceinline only where necessary
15Consider Passing Native Types by Value
- Xbox 360 has large registers
- 64 bit Native PC does too
- Pass and return these types by value
- int, __int64, float
- Consider these types if targeting SSE / VMX
- __m128 / __vector4, XMVECTOR, XMMATRIX
- Pass structs by pointer or reference
- Help the compiler using _restrict
16Math Library Header (Xbox 360)
- if defined( _XBOX )
- include
- include
- typedef __vector4 XVECTOR
- typedef const XVECTOR XVECTOR_PARAM
- typedef XVECTOR XVECTOR_OUTPARAM
- define XMATHAPI inline
- define VMX128_INTRINSICS
- endif
Pass by value
17Math Library Header (Windows)
- if defined( _WIN32 )
- include
- typedef __m128 XVECTOR
- typedef const XVECTOR XVECTOR_PARAM
- typedef XVECTOR XVECTOR_OUTPARAM
- define XMATHAPI inline
- define SSE_INTRINSICS
- endif
Pass by reference
18Math Library Function
- XVECTOR XMATHAPI XVectorAdd( XVECTOR_PARAM vA,
- XVECTOR_PARAM vB )
-
- if defined( VMX128_INTRINSICS )
- return __vaddfp( vA, vB )
- elif defined( SSE_INTRINSICS )
- return _mm_add_ps( vA, vB )
- endif
19Threading
- Why Multithread?
- Necessary to take full advantage of modern CPUs
- Attend the Multi-threading talk later today
- Covers synchronization prims and lockless sync
methods - See Also
- Talks from Intel and AMD (GDC2005 / GDC-E)
- OpenMP C, not C, useful in limited
circumstances - Concur C, see
- http//microsoft.sitestream.com/PDC05/TLN/TLN309_f
iles/Default.htmnopreload1autostart1
20D3D Architectural Differences
- D3D9 draw call cost is higher on Windows than on
Xbox 360 - 360 is optimized for a Single GPU target
- D3D10 improves draw call cost by design on
Windows - Very important to carefully manage the number of
batches submitted - This can have an impact on content creation
- This work will help with 360 performance too
21Agenda
- Code Considerations
- CPU Considerations
- GPU Considerations
- IO Considerations
- Content Considerations
- Data Build System
- Geometry Formats
- Texture Formats
- Shaders
- Audio Considerations
22PC GPUs
- Wide variety of available Direct3D9 H/W
- CAPs and Shader Models abstract over feature
differences - GPUs that are approximately equivalent
performance to the Xbox 360 GPU - ATi X1900 / NVidia 7800 GTX
- Shader Model 3.0 support
- Direct3D10 Standardizes feature set
- H/W Scales on performance instead
23Xbox 360 Custom GPU
- Direct3D 9.0 compatible
- High-Level Shader Language (HLSL) 3.0 support
- 10 MB Embedded DRAM
- Frame Buffer with 256 GB/sec bandwidth
- Hardware scaling for display resolution matching
- 48 shader ALUs shared between pixel and vertex
shading (unified shaders) - Up to 8 simultaneous contexts (threads) in-flight
at once - Changing shaders or render state can be cheap,
since a new context can be started up easily - Hardware tesselator
- N-patches, triangular patches, and rectangular
patches - For non continuous / adaptive cases trade memory
for this feature on PC systems
24Explicit Resolve Control
- Copies surface data from EDRAM to a texture in
system memory - Required for render-to-texture and presentation
to the screen - Can perform MSAA sample averaging or resolve
individual samples - Can perform format conversions and biasing
- Cannot do rescaling or resampling of any kind
- This can Impact your Xbox 360 engine design as it
adds an extra step to common operations.
25Agenda
- Code Considerations
- CPU Considerations
- GPU Considerations
- IO Considerations
- Content Considerations
- Geometry
- Textures
- Shaders
- Audio data
26Use Native File I/O Routines
- Only native routines support key features
- Asynchronous I/O
- Completion routines
- Prefer CreateFile and ReadFile
- Guaranteed as fast or faster than any other
alternatives - Avoid fopen, fread, C iostreams
27Use Asynchronous File I/O
- File read/write operations block by default
- Async operations allows the game to do other
interesting work - CreateFile with FILE_FLAG_OVERLAPPED
- Use FILE_FLAG_NO_BUFFERING, too
- Guarantees no intermediate buffering
- Use OVERLAPPED struct to determine when operation
is complete - See CreateFile docs for details
28Memory Mapped File I/O
- Fastest way to load data on Windows
- However, the 32 bit address space is getting
tight - This is a great 64 bit feature add! ?
- Memory Mapped I/O not supported on 360
- No HDD backed Virtual Memory management system
29Universal Gaming Controller
- XInput is the same API for Xbox 360 and Windows
- The Microsoft universal controller is a reference
design which can be leveraged by other hardware
manufacturers - XP Driver available from Windows Update
- Support is built in to Xbox 360 and Windows
Vista
30Agenda
- Code Considerations
- CPU Considerations
- GPU Considerations
- IO Considerations
- Content Considerations
- Data Build System
- Geometry Formats
- Texture Formats
- Shaders
- Audio Considerations
31Data Build System
- Add a data build / processing phase to your
production system - Compile, optimize and compress data according to
multiple target platform requirements - Easier and faster to handle endian-ness and other
format conversions offline - Data packing process can occur here too
- Invest time in making the build fast
- Artists need to rapidly iterate to make quality
content - Incremental builds can really help reduce the
buildtime - Try the XNA build tools
- Copies of XNA build CTP are available NOW!
32Geometry Compression
- Offline Compression of Geometry
- Provides wins across all platforms
- Disk I/O wins as well as GPU wins
- The compression approach is likely to be target
specific - PC is usually a superset of the consoles in this
area - D3D9 CAPs / limitations to consider
- 16 bit Normals - D3DDECLTYPE_FLOAT16_2
33Compressing Textures
- Wide variety of Texture Compression Tools
- ATI Compressinator
- DirectX SDK DDS tools
- NVIDIA Photoshop DDS Export
- Compression tools for 360 (xgraphics.lib)
- Supports endian swap of texture formats
- Build your own too!
- Make them fit your content.
34Texture Formats
- DXT / DXGI_FORMAT_BC
- BC Block Compressed
- Standard DXT formats across all platforms
- DXN / DXGI_FORMAT_BC5 / BC5u
- 2-component format with 8 bits of precision per
component - Great for normal maps
- DXT3A / DXT5A
- Single component textures made from a DXT3/DXT5
alpha block - 4 bits of precision
- Xbox 360 / D3D9 Only
35Texture Arrays
- Texture arrays
- generalized version of cube maps
- D3D9 emulate using a texture atlas
- Xbox 360
- Up to 64 surfaces within a texture, optional
MIPmaps for each surface - Surface is indexed with a 0..1 z coordinate in
a 3D texture fetch - D3D10 supports this as a standard feature
- Up to 512 surfaces within a texture
- Bindable as rendertarget, with per-primitive
array index selection
36Custom Vertex Fetch / Vertex Texture
- D3D9 Vertex Texture implementations use
intrinsics - tex2dlod()
- 360 supports explicit instructions for this
- D3D10 supports this as a standard feature
- Load() from buffer (VB, IB, etc.) at any stage
- Sample() from texture at any stage
37Effects
- D3DX FX and FX Lite co-exist easily
- define around the texture sampler differences
- Preshaders are not supported on FX Lite
- We advise that these should be optimized to
native code for D3D9 Effects
38HLSL Development
- Set up your engine and tools for rapid shader
development and iteration - Compile shaders offline for performance,
- maybe allow run-time recompilation during
development - Be careful with shader generation tools
- Perf needs to be considered
- Schedule / Plan work for this
39Cross-Platform HLSL Consideration
- Texture access instruction considerations
- Xbox 360 has native tfetch / getWeights features
- Constant texel offsets (-8.0 to 7.5 in 0.5
increments) - Independent of texture size
- Direct3D 10 supports integer texture offsets when
fetching - Direct3D 10 supports getdimensions() natively
- Equivalent to getWeights
- Direct3D 9 can emulate tfetch getWeights
behavior using a shader constant for texture
dimensions
40HLSL Example
float2 g_invTexSize float2( 1/512.0f,
1/512.0f) float2 getWeights2D( float2 texCoord
) return frac( texCoord / g_invTexSize
) float4 tex2DOffset( sampler t, float2
texCoord, float2 offset ) texCoord offset
g_invTexSize return tex2D( t, texCoord )
41Shader management
- Find a balance between übershaders and
specialized shader libraries - Dynamic/static branching versus static
compilation - Small shader libraries can be built and stored
inside a single Effect file - One technique per shader configuration
- Larger shader libraries
- Hash table populated with configurations
- Streaming code can load could shader groups on
demand - Profile-guided content generation
- Avoid compiling shaders at run time
- Compiled shaders compress very well
42Audio Considerations
- XACT
- (Microsoft Cross-Platform Audio Creation Tool)
- API and authoring tool parity
- author once, deploy to both platforms
- Primary difference wave compression
- ADPCM on Windows vs. Xbox 360 native XMA support
- XMA controllable quality setting (varies,
typically 6-141) - ADPCM Static 3.51 compression
- Likely need to trade memory for bit rate.
- On Windows, can use hard disk streaming to
balance lower compression rates if needed
43Call To Action!
- Design your games, engines and production systems
with cross platform development in mind - (PC / Xbox 360 / Other)
- Invest in making your data build system fast
- Take advantage of each platforms strengths
- Target a D3D10 content design point and fallback
to D3D9, D3D9, - Provide feedback on how we can make production
easier - Attend the XACT, HLSL, SM4.0 and Performance
Hands On Labs
44Questions?
45(No Transcript)