60 4.ScreenSpaceClassificationforEfficientDeferredShading
if (rgb.b != 0.0)
bits += RAW_SHADOW_FADE / 255.0;
if (rgb.g != 0.0)
bits += RAW_LIGHT_SCATTERING / 255.0;
// Write results to red channel.
output.color = float4(bits, 0.0, 0.0, 0.0);
Listing 4.2. Packing classification results together in the second-pass shadow mask shader. Note
that this code could be simplified under shader model 4 or higher because they natively support
integers and bitwise operators.
4.4PixelClassification
It helps to explain how this pass works by describing the Split/Second G-buffer
format (see Table 4.1). Moore and Jefferies [2009] explain how we calculate a
per-pixel MSAA edge bit by comparing the results of centroid sampling against
linear sampling. We pack this into the high bit of our motion ID byte in the
G-buffer. For classifying MSAA edges, we extract this MSAA edge bit from
both MSAA fragments and also compare the normals of each of the fragments to
catch situations in which there are no polygon edges (e.g., polygon intersections).
The motion ID is used for per-pixel motion blur in a later pass, and each ob-
ject type has its own ID. For the sky, this ID is always zero, and we use this value
to classify sky pixels.
For sun light classification, we test normals against the sun direction (unless
it’s a sky pixel). Listing 4.3 shows how we classify MSAA edge, sky, and sun-
light from both G-buffer 1 fragments.
Buffer Red Green Blue Alpha
Buffer 0 Albedo red Albedo green Albedo blue Specular amount
Buffer 1 Normal x Normal y Normal z Motion ID + MSAA edge
Buffer 2 Prelit red Prelit green Prelit blue Specular power
Table 4.1. The Split/Second G-buffer format. Note that each component has an entry for
both 2X MSAA fragments.
4.5CombiningClassificationResults 61
// Separate motion IDs and MSAA edge fragments from normals.
float2 edgeAndID_frags = float2(gbuffer1_frag0.w, gbuffer1_frag1.w);
// Classify MSAA edge (marked in high bit).
float2 msaaEdge_frags = (edgeAndID_frags > (128.0 / 255.0));
float mssaEdge = any(msaaEdge_frags);
float3 normalDiff = gbuffer1_frag0.xyz - gbuffer1_frag1.xyz;
mssaEdge += any(normalDiff);
// Classify sky (marked with motion ID of 0 – MSAA edge bit
// will also be 0).
float2 sky_frags = (edgeAndID_frags == 0.0);
float sky = any(sky_frags);
// Classify sunlight (except in sky).
float2 sunlight_frags;
sunlight_frags.x = sky_frags.x ? 0.0 : -dot(normal_frag0, sunDir);
sunlight_frags.y = sky_frags.y ? 0.0 : -dot(normal_frag1, sunDir);
float sunlight = any(sunlight_frags);
// Pack classification bits together.
#define RAW_MSAA_EDGE (1 << 4)
#define RAW_SKY (1 << 5)
#define RAW_SUN_LIGHT (1 << 6)
float bits = msaaEdge ? (RAW_MSAA_EDGE / 255.0) : 0.0;
bits += sky ? (RAW_SKY / 255.0) : 0.0;
bits += sunlight ? (RAW_SUN_LIGHT / 255.0) : 0.0;
Listing 4.3. Classifying MSAA edge, sky, and sun light. This code could also be simplified in
shader model 4 or higher.
4.5CombiningClassificationResults
We now have per-pixel classification results for MSAA edge, sky, and sunlight,
but we need to downsample each
4
4
pixel area to get a per-tile classification
ID. This is as simple as ORing each
4
4
pixel area of the pixel classification
results together. We also need to combine these results with the depth-related
classification results to get a final classification ID per tile. Both these jobs are
62 4.ScreenSpaceClassificationforEfficientDeferredShading
done in a very different way on each platform in order to make the most of their
particular strengths and weaknesses, as explained in Section 4.9.
4.6IndexBufferGeneration
Once both sets of classification results are ready, a GPU callback triggers index
buffer generation for each classification ID. There is one preallocated index buff-
er containing exactly enough indices for all tiles. On the Xbox 360, we use the
RECT primitive type, which requires three indices per tile, and on the Play-
Station 3, we use the the
QUAD primitive type, which requires four indices per tile.
The index buffer references a prebuilt vertex buffer containing a vertex for each
tile corner. At a tile resolution of
4
4
pixels, this equates to
3
21 181
vertices at
a screen resolution of
1
280 720
.
Index buffer generation is performed in three passes. The first pass iterates
over every tile and builds a table containing the number of tiles using each classi-
fication ID, as shown in Listing 4.4. The second pass iterates over this table and
builds a table of offsets into the index buffer for each classification ID, as shown
in Listing 4.5. The third pass fills in the index buffer by iterating over every tile,
getting the current index buffer offset for the tile’s classification ID, writing new
indices for that tile to the index buffer, and advancing the index buffer pointer.
An example using the
QUAD primitive is shown in Listing 4.6. We now have a
final index buffer containing indices for all tiles and a table of starting indices for
each classification ID. We’re ready to render!
#define SHADER_COUNT 128
#define TILE_COUNT (320 * 180)
unsigned int shaderTileCounts[SHADER_COUNT];
for (int shader = 0; shader < SHADER_COUNT; shader++)
{
shaderTileCount[shader] = 0;
}
for (int tile = 0; tile < TILE_COUNT; tile++)
{
unsigned char id = classificationData[tile];
shaderTileCount[id]++;
}
Listing 4.4. This code counts the number of tiles using each classification ID.
4.6IndexBufferGeneration 63
unsigned int *indexBufferPtrs[SHADER_COUNT];
int indexBufferOffsets[SHADER_COUNT];
int currentIndexBufferOffset = 0;
for (int shader = 0; shader < SHADER_COUNT; shader++)
{
// Store shader index buffer ptr.
indexBufferPtrs[shader] = indexBufferStart +
currentIndexBufferOffset;
// Store shader index buffer offset.
indexBufferOffsets[shader] = currentIndexBufferOffset;
// Update current offset.
currentIndexBufferOffset += shaderTileCounts[shader] * INDICES_PER_PRIM;
}
Listing 4.5. This code builds the index buffer offsets. We store a pointer per shader for index
buffer generation and an index per shader for tile rendering.
#define TILE_WIDTH 320
#define TILE_HEIGHT 180
for (int y = 0; y < TILE_HEIGHT; y++)
{
for (int x = 0; x < TILE_WIDTH; x++)
{
int tileIndex = y * TILE_WIDTH + x;
unsigned char id = classificationData[tileIndex];
unsigned int index0 = y * (TILE_WIDTH + 1) + x;
*indexBufferPtrs[id]++ = index0;
*indexBufferPtrs[id]++ = index0 + 1;
*indexBufferPtrs[id]++ = index0 + TILE_WIDTH + 2;
*indexBufferPtrs[id]++ = index0 + TILE_WIDTH + 1;
}
}
Listing 4.6. This code builds the index buffer using the QUAD primitive.
64 4.ScreenSpaceClassificationforEfficientDeferredShading
4.7TileRendering
To render the tiles, we’d like to simply loop over each classification ID, activate
the shaders, then issue a draw call to render the part of the index buffer we calcu-
lated earlier. However, it’s not that simple because we want to submit the index
buffer draw calls before we’ve received the classification results and built the
index buffers. This is because draw calls are submitted on the CPU during the
render submit phase, but the classification is done later on the GPU. We solve
this by submitting each shader activate, then inserting the draw calls between
each shader activate later on when we’ve built the index buffers and know their
starting indices and counts. This is done in a very platform-specific way and is
explained in Section 4.9.
4.8ShaderManagement
Rather than trying to manage 128 separate shaders, we opted for a single uber-
shader with all lighting properties included, and we used conditional compilation
to remove the code we didn’t need in each case. This is achieved by prefixing the
uber-shader with a fragment defining just the properties needed for each shader.
Listing 4.7 shows an example for a shader only requiring sunlight and soft shad-
ow. The code itself is not important and just illustrates how we conditionally
compile out the code we don’t need.
// Fragment defining light properties.
#define SUN_LIGHT
#define SOFT_SHADOW
// Uber-shader starts here.
...
// Output color starts with prelit.
float3 oColor = preLit;
// Get sun shadow contribution.
#if defined(SOFT_SHADOW) && defined(SUN_LIGHT)
float sunShadow = CascadeShadowMap_8Taps(worldPos, depth);
#elif defined(SUN_LIGHT) && !defined(SOLID_SHADOW)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset