Efficient HighLevel Shader Development - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

Efficient HighLevel Shader Development

Description:

... Hints ... Disassembly output provides valuable hints when 'compiling down' to an ... look at the disassembly output for hints when failing to compile to an ... – PowerPoint PPT presentation

Number of Views:391
Avg rating:5.0/5.0
Slides: 60
Provided by: millerfr4
Category:

less

Transcript and Presenter's Notes

Title: Efficient HighLevel Shader Development


1
Efficient High-Level Shader Development
  • Natalya Tatarchuk
  • 3D Application Research Group
  • ATI Technologies, Inc.

2
Overview
  • Writing optimal HLSL code
  • Compiling issues
  • Optimization strategies
  • Code structure pointers
  • HLSL Shader Examples
  • Multi-layer car paint effect
  • Translucent Iridescent Shader
  • Überlight Shader

3
Why use HLSL?
  • Faster, easier effect development
  • Instant readability of your shader code
  • Better code re-use and maintainability
  • Optimization
  • Added benefit of HLSL compiler optimizations
  • Still helps to know whats under the hood
  • Industry standard which will run on cards from
    any vendor
  • Current and future industry direction
  • Increase your ability to iterate on a given
    shader design, resulting in better looking games
  • Conveniently manage shader permutations

4
Compile Targets
  • Legal HLSL is still independent of compile target
    chosen
  • But having an HLSL shader doesnt mean it will
    always run on any hardware!
  • Currently supported compile targets
  • vs_1_1, vs_2_0, vs_2_sw
  • ps_1_1, ps_1_2, ps_1_3, ps_1_4, ps_2_0, ps_2_sw
  • Compilation is vendor-independent and is done by
    a D3DX component that Microsoft can update
    independent of the runtime release schedule

5
Compilation Failure
  • The obvious program errors (bad syntax, etc)
  • Compile target specific reasons your shader is
    too complex for the selected target
  • Not enough resources in the selected target
  • Uses too many registers (temporaries, for
    example)
  • Too many resulting asm instructions for the
    compile target
  • Lack of capability in the target
  • Such as trying to sample a texture in vs_1_1
  • Using dynamic branching when unsupported in the
    target
  • Sampling texture too many times for the target
    (Example more than 6 for ps_1_4)
  • Compiler provides useful messages

6
Use Disassembly for Hints
  • Very helpful for understanding relationship
    between compile targets and code generation
  • Disassembly output provides valuable hints when
    compiling down to an older compile target
  • If successfully compiled for a more recent target
    (eg. ps_2_0), look at the disassembly output for
    hints when failing to compile to an older target
    (eg. ps_1_4)
  • Check out instruction count for ALU and tex ops
  • Figure out how HLSL instructions get mapped to
    assembly

7
Getting Disassembly Output for Your Shaders
  • Directly use FXC
  • Compile for any target desired
  • Compile both individual shader files and full
    effects
  • Various input arguments
  • Allow to turn shader optimizations on / off
  • Specify different entry points
  • Enable / disable generating debug information

8
Easier Path to Disassembly
  • Use RenderMonkey while developing shaders
  • See your changes in real-time
  • Disassembly output is updated every time a
    shader is compiled
  • Displays count for ALUand texture ops, as well
    as the limits forthe selected target
  • Can save resulting assembly code into text file

9
Optimizing HLSL Shaders
  • Dont forget you are running on a vector
    processor
  • Do your computations at the most efficient
    frequency
  • Dont do something per-pixel that you can do
    per-vertex
  • Dont perform computation in a shader that you
    can precompute in the app
  • Use HLSL intrinsic functions
  • Helps hardware to optimize your shaders
  • Know your intrinsics and how they map to asm,
    especially asm modifiers

10
HLSL Syntax Not Limited
  • The HLSL code you write is not limited by the
    compile target you choose
  • You can always use loops, subroutines, if-else
    statements etc
  • If not natively supported in the selected compile
    target, the compiler will still try to generate
    code
  • Loops will be unrolled
  • Subroutines will be inlined
  • If else statements will execute both branches,
    selecting appropriate output as the result
  • Code generation is dependent upon compile target
  • Use appropriate data types to improve instruction
    count
  • Store your data in a vector when needed
  • However, using appropriate data types helps
    compiler do better job at optimizing your code

11
Using If Statement in HLSL
  • Can have large performance implications
  • Lack of branching support in most asm models
  • Both sides of an if statement will be executed
  • The output is chosen based on which side of the
    if would have been taken
  • Optimization is different than in the CPU
    programming world

12
Example of Using If in Vs_1_1
If ( Threshold 0.0 ) Out.Position
Value1else Out.Position Value2
generates following assembly output
// calculate lerp value based on Value 0
mov r1.w, c2.x slt r0.w, c3.x, r1.w // lerp be
tween Value1 and Value2 mov r7, -c1 add r2, r7,
c0 mad oPos, r0.w, r2, c1
13
Example of Function Inlining
// Bias and double a value to take it from 0..1
range to -1..1 range float4 bx2(float x) r
eturn 2.0f x - 1.0f float4 main( float4 t
c0 TEXCOORD0, float4 tc1 TEXCOOR
D1, float4 tc2 TEXCOORD2,
float4 tc3 TEXCOORD3)
COLOR // Sample noise map three times wit
h different // texture coordinates float4
noise0 tex2D(fire_distortion, tc1)
float4 noise1 tex2D(fire_distortion, tc2)
float4 noise2 tex2D(fire_distortion, tc3)
// Weighted sum of signed noise float4 no
iseSum bx2(noise0) distortion_amount0
bx2(noise1)
distortion_amount1 bx2(nois
e2) distortion_amount2 // Perturb base co
ordinates in direction of noiseSum as function of
height (y) float4 perturbedBaseCoords tc0
noiseSum (tc0.y height_attenuation.x

height_attenuation.y)
// Sample base and opacity maps with perturbe
d coordinates float4 base tex2D(fire_base
, perturbedBaseCoords) float4 opacity te
x2D(fire_opacity, perturbedBaseCoords)
return base opacity
14
Code Permutations Via Compilation
static const bool bAnimate false
VS_OUTPUT vs_main( float4 Pos POSITION,
float2 Tex TEXCOORD0 )
VS_OUTPUT Out (VS_OUTPUT) 0 Out.Pos
mul( view_proj_matrix, Pos )
if ( bAnimate ) Out.Tex.x Tex.x
time / 2 Out.Tex.y Tex.y - time / 2
else Out.Tex Tex return
Out
static const bool bAnimate false
vs_1_1 dcl_position v0 dcl_texcoord v1 mul r0,
v0.y, c1 mad r0, c0, v0.x, r0 mad r0, c2, v0.z,
r0
mad oPos, c3, v0.w, r0 mov oT0.xy, v1
5 instructions
bool bAnimate false VS_OUTPUT vs_main( float4
Pos POSITION, float2 Tex
TEXCOORD0 ) VS_OUTPUT Out (VS_OUTPUT) 0
Out.Pos mul( view_proj_matrix, Pos ) i
f ( bAnimate ) Out.Tex.x Tex.x t
ime / 2 Out.Tex.y Tex.y - time / 2
else Out.Tex Tex return Out

vs_1_1 def c6, 0.5, 0, 0, 0dcl_position v0
dcl_texcoord v1 mul r0, v0.y, c1 mad r0, c0,
v0.x, r0 mov r1.w, c4.x mul r1.x, r1.w, c6.x m
ad r0, c2, v0.z, r0 mov r1.y, -r1.x mad oPos, c3
, v0.w, r0 mad oT0.xy, c5.x, r1, v1
const bool bAnimate false
8 instructions
15
Scalar and Vector Data Types
  • Scalar data types are not all natively supported
    in hardware
  • i.e. integers are emulated on float hardware
  • Not all targets have native half and none
    currently have double
  • Can apply swizzles to vector types
  • float2 vec pos.xy
  • But!
  • Not all targets have fully flexible swizzles
  • Acquaint yourself with the swizzles native to the
    relevant compile targets (particularly ps_2_0 and
    lower)

16
Integer Data Type
  • Added to make relative addressing more efficient
  • Using floats for addressing purposes without
    defined truncation rules can result in incorrect
    access to arrays.
  • All inputs used as ints should be defined as ints
    in your shader

17
Example of Integer Data Type Usage
  • Matrix palette indices for skinning
  • Declaring variable as an int is a free
    operation no truncation occurs
  • Using a float and casting it to an int or using
    directly truncation will happen

18
Real-World Shader Examples
  • Will present several case studies of developing
    shaders used in ATIs demos
  • Multi-tone car paint effect
  • Translucent iridescent effect
  • Classic überlight example
  • Examples are presented as RenderMonkeyTM
    workspaces
  • Distributed publicly with version 1.0 release

19
Multi-Tone Car Paint
20
Multi-Tone Car Paint Effect
  • Multi-tone base color layer
  • Microflake layer simulation
  • Clear gloss coat
  • Dynamically Blurred Reflections

21
Car Paint Layers Build Up
Multi-Tone Base Color
Microflake Layer
Clear gloss coat
Final Color Composite
22
Multi-Tone Base Paint Layer
  • View-dependent lerpingbetween three
    paintcolors
  • Normal from appearancepreserving
    simplificationprocess, N
  • Uses subtractive tone to control overall color
    accumulation

23
Normal Decompression
  • Sample from two-channel 16-16 normal map
  • Derive z from sqrt (1 x2 y2)
  • Gives higher precision than typically used
    8-8-8-8 normal map

24
Multi-Tone Base Coat Vertex Shader
VS_OUTPUT main( float4 Pos POSITION,
float3 Normal NORMAL,
float2 Tex TEXCOORD0,
float3 Tangent TANGENT,
float3 Binormal BINORMAL )
VS_OUTPUT Out (VS_OUTPUT) 0 // Prop
agate transformed position out
Out.Pos mul( view_proj_matrix, Pos )
// Compute view vector Out.View norma
lize( mul(inv_view_matrix,
float4( 0, 0, 0, 1)) - Pos )
// Propagate texture coordinates Out.Tex
Tex // Propagate tangent, binormal, and no
rmal vectors to pixel shader Out.Normal N
ormal Out.Tangent Tangent Out.Binorma
l Binormal return Out
25
Multi-Tone Base Coat Pixel Shader
float4 main( float4 Diff COLOR0, float2
Tex TEXCOORD0, float3 Tangent
TEXCOORD1, float3 Binormal TEXCOORD2,
float3 Normal TEXCOORD3, float3 View
TEXCOORD4 ) COLOR float3 vNormal tex2
D( normalMap, Tex ) vNormal 2 vNormal - 1
.0 float3 vView normalize( View )
float3x3 mTangentToWorld transpose( float3x3(
Tangent,
Binormal, Normal ))
float3 vNormalWorld normalize(
mul(mTangentToWorld,vNormal))
float fNdotV saturate( dot( vNormalWorld
, vView ) ) float fNdotVSq fNdotV fNdo
tV float4 paintColor fNdotV paintColor0
fNdotVSq paintColo
rMid fNdotVSq fNdotVSq
paintColor2 return float4( paintC
olor.rgb, 1.0 )
26
Microflake Layer
27
Microflake Deposit Layer
  • Simulating light interaction resulting from
    metallic flakes suspended in the enamel coat of
    the paint
  • Uses high frequency normalized vector noise map
    (Nn) which is repeated across the surface of the
    car

28
Computing Microflake Layer Normals
  • Start out by using normal vector fetched from
    the normal map, N
  • Using the high frequency noise map, compute
    perturbed normal Np
  • Simulate two layers of microflake deposits by
    computing perturbed normals Np1 and Np2

where c b
where a
29
Microflake Layer Vertex Shader
  • VS_OUTPUT main(float4 Pos POSITION, float3
    Normal NORMAL, float2 Tex
    TEXCOORD0, float3 Tangent TANGENT,
    float3 Binormal BINORMAL )
  • VS_OUTPUT Out (VS_OUTPUT) 0
  • // Propagate transformed position out
  • Out.Pos mul( view_proj_matrix, Pos )
  • // Compute view vector
  • Out.View normalize(mul(inv_view_matrix,
    float4(0, 0, 0, 1))- Pos)
  • // Propagate texture coordinates
  • Out.Tex Tex
  • // Propagate tangent, binormal, and normal
    vectors to pixel
  • // shader
  • Out.Normal Normal
  • Out.Tangent Tangent
  • Out.Binormal Binormal
  • // Compute microflake tiling factor

Compute texture coordinates for accessing noise
map using input texture coordinates and a tiling
factor
30
Microflake Layer Pixel Shader
  • float4 main(float4 Diff COLOR0, float2
    Tex TEXCOORD0, float3 Tangent
    TEXCOORD1, float3 Binormal TEXCOORD2,
    float3 Normal TEXCOORD3, float3 View
    TEXCOORD4, float3 SparkleTex
    TEXCOORD5 ) COLOR
  • fetch and signed scale the normal fetched
    from the normal map
  • float3 vFlakesNormal 2 tex2D(
    microflakeNMap, SparkleTex ) - 1
  • float3 vNp1 microflakePerturbationA
    vFlakesNormal normalPerturbation
    vNormal
  • float3 vNp2 microflakePerturbation (
    vFlakesNormal vNormal )
  • float3 vView normalize( View )
  • float3x3 mTangentToWorld transpose( float3x3(
    Tangent, Binormal,
    Normal ))
  • float3 vNp1World normalize( mul(
    mTangentToWorld, vNp1) )
  • float fFresnel1 saturate( dot( vNp1World,
    vView ))
  • float3 vNp2World normalize( mul(
    mTangentToWorld, vNp2 ))
  • float fFresnel2 saturate( dot( vNp2World,
    vView ))
  • float fFresnel1Sq fFresnel1 fFresnel1
  • float4 paintColor fFresnel1 flakeColor
    fFresnel1Sq flakeColor
    fFresnel1Sq fFresnel1Sq flakeColor
    pow( fFresnel2, 16 )
    flakeColor

31
Clear Gloss Coat
32
RGBScale HDR Environment Map
  • Alpha channel contains 1/16 of the true HDR scale
    of the pixel value
  • RGB contains normalized color of the pixel
  • Pixel shader reconstructs HDR value from
    scale8color to get half of the true HDR value
  • Obvious quantization issues, but reasonable for
    some applications
  • Similar to Wards RGBE Real Pixels but simpler
    to reconstruct in the pixel shader

33
Environment Map
Ceiling of car showroom
Top Cube Map Face RGB
Top Face Scale in Alpha Channel
34
Dynamically Blurred Reflections
Blurred Reflections
35
Dynamic Blurring of Environment Map Reflections
  • A gloss map can be supplied to specify the
    regions where reflections can be blurred
  • Use bias when sampling the environment map to
    vary blurriness of the resulting reflections
  • Use texCUBEbias for to access the cubic
    environment map
  • For rough specular, the bias is high, causing a
    blurring effect
  • Can also convert color fetched from environment
    map to luminance in rough trim areas

36
Clear Gloss Coat Pixel Shader
  • float4 ps_main( ... / same inputs as in the
    previous shader / )
  • // ... use normal in world space (see
    Multi-tone pixel shader)
  • // Compute reflection vector
  • float fFresnel saturate(dot( vNormalWorld,
    vView))
  • float3 vReflection 2 vNormalWorld fFresnel
    - vView
  • float fEnvBias glossLevel
  • // Sample environment map using this reflection
    vector and bias
  • float4 envMap texCUBEbias( showroomMap,
    float4( vReflection,
    fEnvBias ) )
  • // Premultiply by alpha
  • envMap.rgb envMap.rgb envMap.a
  • // Brighten the environment map sampling
    result
  • envMap.rgb brightnessFactor

37
Compositing Multi-Tone Base Layer and Microflake
Layer
  • Base color and flake effect are derived from Np1
    and Np2 using the following polynomial
  • color0(Np1V) color1(Np1V)2 color2(Np1V)4
    color3(Np2V)16

Base Color
Flake
38
Compositing Final Look
... // Compute final paint color combine
s all layers of paint as well// as two layers of
microflakes float fFresnel1Sq fFresnel1 fF
resnel1 float4 paintColor fFresnel1 pai
ntColor0 fFresnel1Sq
paintColorMid fFresnel1Sq
fFresnel1Sq paintColor2
pow( fFresnel2, 16 ) flakeLayerColor
// Combine result of environment map reflection
with the paint // color float fEnvContributi
on 1.0 - 0.5 fNdotV // Assemble the final l
ook float4 finalColor finalColor.a 1.0
finalColor.rgb envMap fEnvContribution
paintColor return finalColor
39
Original Hand-Tuned Assembly
40
Car Paint Shader HLSL Compiler Disassembly Output
41
Full Result of Multi-Layer Paint
42
Translucent Iridescent Shader Butterfly Wings
43
Translucent Iridescent Shader Butterfly Wings
  • Simulates translucency of delicate butterfly
    wings
  • Wings glow from scattered reflected light
  • Similar to the effect of softly backlit rice
    paper
  • Displays subtle iridescent lighting
  • Similar to rainbow pattern on the surface of soap
    bubbles
  • Caused by the interference of light waves
    resulting from multiple reflections of light off
    of surfaces of varying thickness
  • Combines gloss, opacity and normal maps for a
    multi-layered final look
  • Gloss map contributes to satiny highlights
  • Opacity map allows portions of wings to be
    transparent
  • Normal map is used to give wings a bump-mapped
    look

44
RenderMonkey Butterfly Wings Shader Example
  • Parameters that contribute to the translucency
    and iridescence look
  • Light position and scene ambient color
  • Translucency coefficient
  • Gloss scale and bias
  • Scale and bias for speed of iridescence change
  • WorkspaceIridescent Butterfly.rfx

45
Translucent Iridescent Shader Vertex Shader
  • ..
  • // Propagate input texture coordinates
  • Out.Tex Tex
  • // Define tangent space matrix
  • float3x3 mTangentSpace
  • mTangentSpace0 Tangent
  • mTangentSpace1 Binormal
  • mTangentSpace2 Normal
  • // Compute the light vector (object space)
  • float3 vLight normalize( mul(
    inv_view_matrix, lightPos ) - Pos )
  • // Output light vector in tangent space
  • Out.Light mul( mTangentSpace, vLight )
  • // Compute the view vector (object space)
  • float3 vView normalize( mul(
    inv_view_matrix, float4(0,0,0,1)) - Pos )

46
Translucent Iridescent Shader Loading
Information
float3 vNormal, baseColor float fGloss, fTransl
ucency // Load normal and gloss map float4(
vNormal, fGloss ) tex2D( bump_glossMap, Tex )
// Load base and opacity map float4 (baseColor
, fTranslucency) tex2D( base_opacityMap, Tex
)
47
Diffuse Illumination For Translucency
float3 scatteredIllumination saturate(dot(-vNorm
al, Light))
fTranslucency translucencyCoeff
float3 diffuseContribution saturate(dot(vNor
mal,Light)) ambient baseColor
scatteredIllumination diffuseContribution
48
Adding Opacity to ButterlyWings
  • Resulted color is modulated by the opacity value
    to add
  • transparency to the wings

// Premultiply alpha blend to avoid clamping the
highlights
baseColor fOpacity


49
Making Butterfly Wings Iridescent
// Compute index into the iridescence gradient
map, which // consists of NV coefficient float
fGradientIndex dot( vNormal, View)
iridescence_speed_scale
iridescence_speed_bias // Load the iridescence
value from the gradient map float4 iridescence
tex1D( gradientMap, fGradientIndex )
50
Assembling Final Color
// Compute glossy highlights using values from
gloss map float fGlossValue fGloss ( saturat
e( dot( vNormal, Half )) gl
oss_scale gloss_bias ) // Assemble the final
color for the wings baseColor fGlossValue i
ridescence
51
HLSL Disassembly Comparison
12 ALU 3 Texture 15 Total
15 ALU 3 Texture 18 Total
52
Example of Translucent Iridescent Shader
53
Optimization Study Überlight
  • Flexible light described in JGT article Lighting
    Controls for Computer Cinematography by Ronen
    Barzel of Pixar
  • Überlight is procedural and has many controls
  • light type, intensity, light color, cuton,
    cutoff, near edge, far edge, falloff, falloff
    distance, max intensity, parallel rays, shearx,
    sheary, width, height, width edge, height edge,
    roundness and beam distribution
  • Code here is based upon the public domain
    RenderMan implementation by Larry Gritz

54
Überlight Spotlight Mode
  • Spotlight mode defines a procedural volume with
    smooth boundaries
  • Shape of spotlight is made up of two nested
    superellipses which are swept along direction of
    light
  • Also has smooth cuton and cutoff planes
  • Can tune parameters to get all sorts of looks

55
Überlight Spotlight Volume
Roundness ½
56
Überlight Spotlight Volume
Outer swept superellipse
Roundness 1
b
Inner swept superellipse
a
A
B
57
Original clipSuperellipse() routine
  • Computes attenuation as a function of a points
    position in the swept superellipse.
  • Directly ported from original RenderMan source
  • Compiles to 42 cycles in ps_2_0, 40 cycles on R3x0

float clipSuperellipse ( float3 Q,
// Test point on the x-y plane
float a, // Inner superellipse
float b, float A,
// Outer superellipse float B,
float roundness) // Same roundness for
both ellipses float x abs(Q.x), y abs(
Q.y) float re 2/roundness // roundness
exponent float q a b pow (pow(bx, re
) pow(ay, re), -1/re) float r A B po
w (pow(Bx, re) pow(Ay, re), -1/re)
return smoothstep (q, r, 1)
58
Vectorized Version
  • Precompute functions of roundness in app
  • Vectorize abs() and all of the multiplications
  • Compiles to 33 cycles in ps_2_0, 28 cycles on
    R3x0

float clipSuperellipse ( float2 Q,
// Test point on the x-y plane
float4 aABb, // Dimensions of
superellipses float2 r) // Two
precomputed functions of roundness
float2 qr, Qabs abs(Q) float2 bx_Bx
Qabs.x aABb.wzyx // Swizzle to unpack bB
float2 ay_Ay Qabs.y aABb
qr.x pow (pow(bx_Bx.x, r.x) pow(ay_Ay.
x, r.x), r.y) qr.y pow (pow(bx_Bx.y, r.x)
pow(ay_Ay.y, r.x), r.y) qr aABb aABb.
wzyx return smoothstep (qr.x, qr.y, 1)

59
smoothstep() function
  • Standard function in procedural shading
  • Intrinsics built into RenderMan and DirectX HLSL

1
0
edge0
edge1
60
C implementation
float smoothstep (float edge0, float edge1, float
x) if (x (x edge1) return 1 // Scale/bias
into 0..1 range x (x - edge0) / (edge1 -
edge0) return x x (3 - 2 x)
61
HLSL implementation
  • The free saturate handles x outside of
    edge0..edge1 range

float smoothstep (float edge0, float edge1, float
x) // Scale, bias and saturate x to 0..1 ra
nge x saturate((x - edge0) / (edge1 edge0)
) // Evaluate polynomial return x x
(3 2 x)
62
Vectorized HLSL Implementation
  • Precompute 1/(edge1 edge0)
  • Done in the app for edge widths at cuton and
    cutoff planes
  • Operation performed on float3s to compute three
    different smoothstep operations in parallel
  • With these optimizations, the entire spotlight
    volume computation of überlight compiles to 47
    cycles in ps_2_0, 41 cycles on R3x0

float3 smoothstep3 (float3 edge,
float3 OneOverWidth, float3
x) // Scale, bias and saturate x to 0..1
range x saturate( (x - edge) OneOverWidth
) // Evaluate polynomial return x x
(3 2 x)
63
Summary
  • Writing optimal HLSL code
  • Compiling issues
  • Optimization strategies
  • Code structure pointers
  • Shader Examples
  • Shipped with RenderMonkey version 1.0see
    www.ati.com/developer

MultiTone Car Paint.rfx
Iridescent Butterfly.rfx
Write a Comment
User Comments (0)
About PowerShow.com