Title: DirectX HighLevel Shading Language
1DirectXHigh-Level Shading Language
- Chas. Boyd
- DirectX Graphics Architect
- Microsoft
2Outline
- What drove the language design?
- Background
- What does it look like?
- Syntax definition
- How does it work?
- API integration
- How to use it efficiently?
- Tips and Tricks
3DirectX 8 Assembly
- tex t0 base texture
- tex t1 environment map
- add r0, t0, t1 apply reflection
4DirectX 9 HLL Syntax
- outColor tex2d( baseTextureCoord, baseTexture )
- texCube( EnvironmentMapCoord, Environment )
5Why an HLL?
- Scalability vs hw
- Programming complexity
- Higher Level Language solves these
6Design Goals
- High level enough to hide hardware specific
details - Simple enough for efficient code generation
- Familiar enough to reduce learning curve
- With enough optimizing back-ends for portability
7Design Baseline
- C -like syntax
- A standard language
- like c or C or HTML
- in the VS.net IDE
8Graphics Architecture
Application
D3DX
Assembler, compiler, effects, utilities
Direct3D
Semantic mapping
Driver
Code translation
Hardware
9Preprocessor
- define
- elif
- else
- endif
- error
10Types
- Basic types
- float
- int
- bool
- double
- half
- Structs and arrays supported
11Vectors and Matrices
- Typedef to shorthand user defined types
- float1, float2, float3, float4
- Float1x1, float1x2 float4x4
- Defined for all basic types
- Int1-4, half1-4, etc.
- Component access and swizzles supported on
vector/matrix types - FloatVector.xyz
- FloatMatrix._11_12 or FloatMatrix11
12Variables
- Local / global
- Static
- Global variables that are not externally visible
- Const
- Cannot be modified by the shader program
- Can be set external to the program
- Can have initializers
- Can have semantics
- For function parameters
13Operators
- Pretty much all of C operators
- Including ?, , --, , -, etc
- No new language semantics
- Despite temptation
- Arithmetic operators are per component
- Matrix multiply is an intrinsic function
- Logical operators are per component
- No bitwise operators
14Statement Syntax
- statements
- expression
- return expression
- if ( expression ) statement else statement
- for ( expression variable_declaration
expression expression ) statement
15Some Intrinsic functions
16User Functions
- Standard C-like functions
- Output type and input parameters
- Parameters can be passed by copy in/copy out
mechanism - in/out declaration
- Inlined internally -no recursion
17Functions (cont.)
- Can be static (not externally accessible)
- Non-static functions parameters must have
Direct3D declarator semantics - Parameters can be marked const
- Parameters can have default initializers
18Differences from C
19HLSL Summary
- Ease of Use
- Enable software developers
- Consistency of Implementation
- Enable multiple vendors
- Management of Evolution
- Enable multiple generations
- Result
- Fundamental architecture of DXG software stack
and higher level language
20Geometry Mapping
- DirectX 8 Vertex Shaders assume a data layout
- Decl shader code are tied together
- Forces shader author to communicate with geometry
provider - Standard register conventions can help some
- Complicates combining shaders
21Semantics
- DirectX 9 decouples decl from VS
- Both decl and VS refer to semantics rather than
register names - Direct3D runtime connects appropriate vertex
stream data to Vertex Shader registers - Key feature of DirectX9 low-level API
- driven by HLSL and shader requirements
22DX8 Vertex Declaration
Strm0
Strm1
Vertex layout
v0
skip
v1
Declaration
vs 1.1 mov r0, v0
Shader handle
Shader program
23New Vertex Declaration
Strm0
Strm1
Strm0
Vertex layout
pos
norm
diff
pos
norm
diff
Declaration
vs 1.1 dcl_position v0 dcl_diffuse v1 mov r0,
v0
vs 1.1 dcl_position v0 dcl_diffuse v1 mov r0,
v0
Shader program (Shader handle)
24Vertex Declaration
- struct D3DVERTEXELEMENT9
- Stream // id from setstream() Offset
// offset verts into str Type // float
vs byte, etc. Method // tessellator
op Usage // default semantic(pos, etc)
UsageIndex // e.g. texcoord -
25VS Input Semantics
- positionn
- blendweightn
- blendindicesn
- normaln
- psizen
- diffusen
- specularn
- texcoordn
- tangentn
- binormaln
26VS output / PS input semantics
- position
- psize
- fog
- colorn
- texcoordn
27Â Uses for Semantics
- A data binding protocol
- Between vertex data and shaders
- Between pixel and vertex shaders
- Between pixel shaders and hardware
- Between shader fragments
28Integrating with Applications
- Extract dissassembly and use as .asm shader code
ala DX8 - Use compiled shader object
- Enables constant table access
- Via ID3DXConstantTable Interface
- Use in an effect object
- Manage constants, fallbacks, etc.
- Via ID3DXEffect Interface
29Language API Standalone
- Compiler returns a VS or PS and a symbol table
- Maps extern constants to registers
- Any expression of constants (i.e. per primitive
expressions) still performed per vertex - Symbol table is a set of constants
- ID3DXConstantTable interface
30ID3DXConstantTable
- Exposes constant parameter metadata
- For convenient specification of shader input data
- SetMatrix( curv, matrix )
- String or handle
- D3DXHandle hHandle
- SetVector()
- SetValue()
- Use effect parameters
- Per primitive expressions of parameters computed
outside the vertex shader
31PerformanceÂ
- Compiler updates will be frequent
- Microsoft has good compiler people
32Current Back Ends
- Vertex Shader 1.1, 2.0
- Pixel Shader 1.1, 1.4, 2.0
33Tips And Tricks
- Using the int datatype
- Using matrix datatypes
- How do if statements work
- Using constant specialization
- Pixel shader 1.x optimizations
34Int Datatype
- Declare indexing variables as ints
- avoids unnecessary frc's used to truncate
- allows for other int optimizations
- What are the frcs required for?
- The float to int truncation must happen before
multiplying by the size of the datatype for
correct results
35Int Index example
- OutPos mul(WorldArrayIndex, Pos)
// float Index frc r0.w, r1.w add r2.w, -r0.w,
r1.w mul r9.w, r2.w, c61.x mov a0.x, r9.w m4x4
oPos, v0, c0a0.x
// int Index mul r0.w, c60.x, r1.w mov a0.x,
r0.w m4x4 oPos, v0, c0a0.x
36Matrix Datatypes
- Advantages over array of vectors
- Will be stored in optimal format
- Column major or row major depending on usage
- Easy to cast down to 4x3, 3x3, etc
- Allows for better performance and correct
behavior - Matrix is supported by set of intrinsics
- Column major is preferred storage
- Recommend mul(matrix, vector) order
- Allows the compiler to use dp4/3s
37If statements
- All back ends support if statements
- If branching is not supported (i.e. vs.1.1)
- Both sides of the if are executed and final
result chosen - Depending on the conditions this can be expensive
- If constant branching is supported (i.e. 2.0)
- If the condition is constant, constant branch
instructions are used - Else will fall back to the vs 1.1 solution
38If statement example
if (Value gt 0) Position Value1 else
Position Value2 // calculate lerp value based
on Value gt 0 mov r1.w, c2.x slt r0.w, c3.x,
r1.w // lerp between Value1 and Value2 mov r7,
-c1 add r2, r7, c0 mad oPos, r0.w, r2, c1
39Constant Specialization
- Specify constants that are to be literals
- via ID3DXEffectCompiler Interface
- Call CompileShader() method
- returns pre-optimized shader or effect
- Easily generate multiple shaders optimized for
specific cases - Can help shader management by generating them on
the fly
40PS 1.x optimizations
- Modifiers automatically used
- Complement, negate, x2, sat, etc
- Optimizes for Co-issue
- Instruction reordering done to utilize
- Still keep 1.x shaders simple
- Doesnt have arbitrary swizzles
- If bad swizzle requested compile will fail
- Limited instruction count
- Complex shaders are possible
- Modifiers allow for a lot of computation in a
small number of instructions - Effective co-issue use helps as well
41PS 1.x sample shader
sampler samplerA, samplerB float4
ColorScale float4 PShader(float4 Diffuse
COLOR0, float4 Specular COLOR1, float2
Tex1 TEXCOORD0, float2 Tex2 TEXCOORD1)
COLOR0 float4 Sample1 tex2D(samplerA,Tex1)
float4 Sample2 tex2D(samplerB,Tex2)
float4 Color (1-Diffuse.a)Sample1
Diffuse.aSample2 Color
max(Color,0) Color min(Color,1)
Color Color - .5f Color ColorScale
return Color
ps_1_4 texld r0, t0 texld r1, t1 lrp_sat r0,
v0.w, r0, r1 mul r0, c0, r0_bias
42Input Datatype Declarations
- Important to provide good type information for
program inputs - All int input should be declared as int
- Matrix indices, lookup values, etc.
- If the data is not integer odd results can
happen! - Take advantage of expansion to float4
- i.e. declare Position as float4
- If the vertex data has x,y, and z then w will be
filled in with 1.0
43HLSL shader sample
- Wood Sample shader
- Thanks to Jason Mitchell (ATI)
- Procedural wood
- Complex - rings, wobble, noise
44hlsl_wood()
float4 hlsl_wood (float3 Pshade0 TEXCOORD0,
float3 Pshade1 TEXCOORD1, float3 Pshade2
TEXCOORD2, float3 zWobble0
TEXCOORD3, float3 zWobble1 TEXCOORD4, float3
Peye TEXCOORD6, float3 Neye TEXCOORD7)
COLOR float3 coloredNoise float3
wobble coloredNoise.x tex3D
(NoiseSampler, Pshade0) // Construct colored
noise from three samples coloredNoise.y
tex3D (NoiseSampler, Pshade1) coloredNoise.z
tex3D (NoiseSampler, Pshade2) wobble.x
tex3D (NoiseSampler, zWobble0) wobble.y
tex3D (NoiseSampler, zWobble1) wobble.z
0.5f coloredNoise coloredNoise 2.0f -
1.0f // Make signed wobble wobble
2.0f - 1.0f // Scale noise and add to
Pshade float3 noisyWobblyPshade Pshade0
coloredNoise psConst3.w wobble psConst4.w
float scaledDistFromZAxis
sqrt(dot(noisyWobblyPshade.xy, noisyWobblyPshade.x
y)) psConst2.w float4 blendFactor tex2D
(PulseTrainSampler, float2 (0.0f,
scaledDistFromZAxis)) // Lookup blend factor
from pulse train float3 albedo psConst2
blendFactor.x psConst3 (1 - blendFactor.x)
// Blend wood colors together // Compute
normalized vector from vertex to light in eye
space (Leye) float3 Leye (psConst4 - Peye)
/ len(psConst4 - Peye) Neye Neye /
len(Neye)
// Normalize interpolated normal float3
Veye -(Peye / len(Peye))
// Compute Veye float3 Heye
(Leye Veye) / len(Leye Veye)
// Compute half-angle float NdotH
clamp(dot(Neye, Heye), 0.0f, 1.0f)
// Compute N.H float k blendFactor.z
//
Scale and bias exponent from pulse train
float specular tex2D (VariablSpecularSampler,
float2 (NdotH, k)) // Evaluate (N.H)k via
dependent read float NdotL dot(Neye,
Leye) //
N.L float diffuse NdotL 0.5f 0.5f
// "Half-Lambert"
technique for more pleasing diffuse float
gloss blendFactor.y
// gloss the specular term
return diffuse float4 (albedo.r, albedo.g,
albedo.b, 0.0f) specular gloss
45Hlsl_wood() asm
- ...
- texld r0, t0, s0
- add r7.x, r0.x, r0.x
- texld r2, t1, s0
- add r7.y, r2.x, r2.x
- add r9.xy, r7, c3.x
- mad r11.xy, r9, c1.w, t0
- texld r6, t3, s0
- add r11.z, r6.x, r6.x
- texld r1, t4, s0
- add r11.w, r1.x, r1.x
- add r11.zw, r11, c3.x
- mad r8.x, r11.z, c2.w, r11.x
- mad r8.y, r11.w, c2.w, r11.y
- dp2add r3.w, r8, r8, c4.x
- rsq r2.w, r3.w
- mul r10.w, r2.w, r3.w
- mul r5.y, c0.w, r10.w
- mov r5.x, c3.w
... texld r3, t0, s0 texld r4, t1, s0
texld r5, t2, s0 texld r6, t3,
s0 texld r7, t4, s0 mov r3.y,
r4.x mov r3.z, r5.x mov
r6.y, r7.x mad r6, r6, c0.x, c0.y
mad r3, r3, c0.x, c0.y mad r7, c3.w, r3, t0
mad r7, c4.w, r6, r7 dp2add r0, r7,
r7, c1.w rsq r0, r0.x rcp r0,
r0.x mul r0, r0, c2.w
texld r0, r0, s1 mov r1, c3 lrp r2,
r0.x, c2, r1 sub r4, c4, t6 dp3
r5.w, r4, r4 rsq r5.w, r5.w
mul r4, r4, r5.w dp3 r6.w, t7, t7
rsq r6.w, r6.w mul r5, t7, r6.w
dp3 r3.w, t6, t6 rsq r3.w,
r3.w mul r3, -t6, r3.w add
r6, r3, r4 dp3 r6.w, r6, r6 rsq r6.w,
r6.w mul r6, r6, r6.w dp3_sat r6, r5,
r6 mov r6.y, r0.z texld r6,
r6, s2 dp3 r5, r4, r5
mad_sat r5, r5, c0.z, c0.z mul r6, r6, r0.y
mad r2, r5, r2, r6 mov oC0, r2
HLSL generates 37 ALU Instructions
Handwritten asm is 35 instructions
46Summary
- HLSL abstraction solve
- Continuing hardware evolution
- Shader programming complexity
- API semantics solve
- Shader interoperability
47Summary
- HLSL is the next step in graphics API/hardware
evolution - DirectX implementation provides
- Close API integration
- Semantic binding to low-level API
- Shader management via D3DX effects
- Full IDE support including debugging
- Performant cross vendor support
48Action Items
- Check it out!
- Use it in your research development
- Let us know what you think
- directx_at_microsoft.com
- http//msdn.microsoft.com/directx
49Questions
50Backup