Much is made of the advantages of deferred shading, but to read most articles on the subject the only disadvantages are translucency and multisample antialiasing. This is really doing a disservice to forward rendering, however, as there are many popular techniques in real-time computer graphics that are impractical in a deferred shading engine. One of the best examples of these techniques is light include / exclude groups.
Although it may not have much basis in reality, being able to specify exactly which lights affect which objects is a powerful tool in lighting design. It can be used stylistically to make certain objects stand out from their environments, for example making player avatars artificially brighter than their surroundings. It can also be used to compensate for limitations of the real-time lighting model, for example skin can be lit with additional lights to simulate light transmission and subsurface scattering. Light groups are also extremely useful in optimizing shadow computation. A common problem that must be dealt with when lighting indoor environments is light bleeding through walls. Although this can be solved by any number of real-time shadowing techniques, it simply isn’t affordable on current hardware to cast real-time shadows from every light. A cheap and flexible alternative is to use light groups to manually specify that lights in a room only affect objects in the same room.
Implementing light groups is cheap and easy with forward rendering. Each light can store a list of meshes that it either explicitly includes or explicitly excludes. Since meshes are already being matched with lights on the CPU, it is an easy thing to query the include / exclude list for each light and discard lights that don’t affect the current mesh. The implementation of light groups in a deferred environment, however, is much less obvious.
One option for supporting light groups in a deferred renderer is to add an additional attribute to the G-Buffer specifying a unique mesh ID. The light shaders, during the deferred pass, can read back the mesh ID per pixel and compare that to an include / exclude list passed to the shader in a texture or in constant registers. Mesh IDs can be allocated on-demand and reused cleverly across the frame, so it is possible that as few as 8 bits are needed for them in the G-Buffer. Likewise with some careful engineering the light shader can perform the light group classification in as little as one indirect texture fetch per pixel. Unfortunately, no amount of optimization in this technique can address its largest drawback. The light group classification is performed in the fragment shader which means that a light’s fragment shader must run even for pixels that aren’t in the light’s group. That is a huge drawback from the forward implementation which incurs absolutely zero GPU cost for lighting meshes that aren’t in a light’s group. If one of the principle advantages of light groups is that they allow for optimized, statically-defined shadows, this deferred implementation isn’t going to cut it.
A second, and far more common, option for supporting light groups with deferred shading is to define only a fixed number of light groups. With this approach meshes and lights can be associated with multiple light groups, and lights only affect meshes with which they share membership in a group. The number of light groups is usually selected to be less than or equal to 8, and the stencil buffer is written during the G-Buffer pass as a per-pixel mask of the groups each mesh belongs to. During the deferred pass lights use the stencil test hardware to discard any pixels that aren’t in any of the light’s groups. This technique for supporting light groups has several advantages over the previous technique. It requires no additional storage in the G-Buffer, it adds no additional cost to the light shaders, and it has the potential at least to discard pixels that aren’t in a particular light’s groups prior to executing the light’s fragment shader.
There are some downsides, however. The stencil buffer can be a precious commodity in a deferred renderer. Engines using stencil shadows will need to reserve a piece of it and most deferred renders will also use a few bits of the stencil buffer to mask and quickly cull pixels that are unlit or outside a light’s depth bounds. One consequence of this is that the number of bits available in the stencil buffer for a light group mask is usually less than 8 (4 to 6 in my experience). Six unique light groups are better than none, but the limit does make it much more difficult to use light groups for statically defined shadows. Assigning light groups to rooms in a level to prevent light bleeding through walls becomes a form of the computationally challenging graph coloring problem.
Another consequence of this technique’s reliance on the stencil buffer is that since its stencil test logic is complicated and changes with every light, it may disable or dramatically reduce the effectiveness of the GPU’s pre-fragment shader stencil cull. On most hardware early stencil cull is accomplished by storing a low resolution, low bit depth cache of the stencil buffer. Blocks of between 16 and 64 pixels are reduced to a single bit that represents a conservative average of the stencil function across all pixels in the block. This cache is then consulted prior to running the fragment shader to quickly discard pixels which are guaranteed to fail the stencil test. Given this implementation of early stencil in the hardware, there are two options for utilizing it with light groups.
The first option is to sort all lights by their light group mask and refresh the early stencil cache between lights when the mask changes. This provides pre-fragment shader culling for light groups (and anything else you might be using the stencil buffer for), but it requires a potentially expensive refresh of the early stencil cache. It also adds a new criterion on which to sort lights, and there are plenty of other deferred shading techniques that have their own competing sort order requirements.
The second option is to give up on pre-fragment shader culling for light groups and instead use the early stencil cache solely for stencil shadows and light depth bound culling. To do this one would simply mask out and ignore the stencil bits representing light groups when populating the early stencil cache. With this option light groups no longer have any impact on performance–no cost, no benefit. This option may sound like a defeat, but even it isn’t available on all platforms. The Xbox 360, for example, can’t apply a mask during the construction of the early stencil cache, so there is no way to exclude light groups from early stencil cull. The best option on that platform without refreshing early stencil frequently is to make the early stencil test much more conservative. That means that the presence of light groups will actually allow some pixels that would have been culled due to stencil shadows or light depth bounds to be processed by the fragment shader, and light groups become an unpredictable performance cost.
On the Xbox 360 and Playstation 3, at least, the early stencil hardware is well documented and fully exposed to developers. On the PC, however, the concept of early stencil cull is entirely hidden from developers and implemented transparently within the driver. This makes sense since early stencil cull is an optimization very specific to individual hardware implementations and has no impact on the correctness of the output, but it makes relying on early stencil cull for performance extremely dangerous. Even when you know a lot about your rendering pipeline and your game’s data, it isn’t clear what the optimal use of early stencil cull is for light groups. Drivers don’t have enough information to make the best choice on your behalf, and when the stencil configuration gets too complicated they are likely to just disable early stencil cull entirely! So a scary third option for implementing light groups with the stencil buffer is that you lose all early stencil cull during your deferred pass and stencil shadows and light depth bound culling get a lot more expensive.
I’m a huge fan of deferred techniques, but it is important to remember that they are not a one-for-one replacement for forward rendering. Forward rendering creates a tight coupling between the geometry and the frame buffer; its greatest weakness perhaps, but also its greatest strength. When evaluating a move from forward to deferred rendering, don’t overlook the fact that light groups and a host of similar mesh-centric features aren’t going to be available anymore.