The Dark Side of Deferred Shading: Light Groups

Much is made of the advantages of deferred shading, but to read most articles on the subject the only disadvantages are translucency and multisample antialiasing.  This is really doing a disservice to forward rendering, however, as there are many popular techniques in real-time computer graphics that are impractical in a deferred shading engine.  One of the best examples of these techniques is light include / exclude groups.

Although it may not have much basis in reality, being able to specify exactly which lights affect which objects is a powerful tool in lighting design.  It can be used stylistically to make certain objects stand out from their environments, for example making player avatars artificially brighter than their surroundings.  It can also be used to compensate for limitations of the real-time lighting model, for example skin can be lit with additional lights to simulate light transmission and subsurface scattering.  Light groups are also extremely useful in optimizing shadow computation.  A common problem that must be dealt with when lighting indoor environments is light bleeding through walls.  Although this can be solved by any number of real-time shadowing techniques, it simply isn’t affordable on current hardware to cast real-time shadows from every light.  A cheap and flexible alternative is to use light groups to manually specify that lights in a room only affect objects in the same room.

Implementing light groups is cheap and easy with forward rendering.  Each light can store a list of meshes that it either explicitly includes or explicitly excludes.  Since meshes are already being matched with lights on the CPU, it is an easy thing to query the include / exclude list for each light and discard lights that don’t affect the current mesh.  The implementation of light groups in a deferred environment, however, is much less obvious.

One option for supporting light groups in a deferred renderer is to add an additional attribute to the G-Buffer specifying a unique mesh ID.  The light shaders, during the deferred pass, can read back the mesh ID per pixel and compare that to an include / exclude list passed to the shader in a texture or in constant registers.  Mesh IDs can be allocated on-demand and reused cleverly across the frame, so it is possible that as few as 8 bits are needed for them in the G-Buffer.  Likewise with some careful engineering the light shader can perform the light group classification in as little as one indirect texture fetch per pixel.  Unfortunately, no amount of optimization in this technique can address its largest drawback.  The light group classification is performed in the fragment shader which means that a light’s fragment shader must run even for pixels that aren’t in the light’s group.  That is a huge drawback from the forward implementation which incurs absolutely zero GPU cost for lighting meshes that aren’t in a light’s group.  If one of the principle advantages of light groups is that they allow for optimized, statically-defined shadows, this deferred implementation isn’t going to cut it.

A second, and far more common, option for supporting light groups with deferred shading is to define only a fixed number of light groups.  With this approach meshes and lights can be associated with multiple light groups, and lights only affect meshes with which they share membership in a group.  The number of light groups is usually selected to be less than or equal to 8, and the stencil buffer is written during the G-Buffer pass as a per-pixel mask of the groups each mesh belongs to.  During the deferred pass lights use the stencil test hardware to discard any pixels that aren’t in any of the light’s groups.  This technique for supporting light groups has several advantages over the previous technique.  It requires no additional storage in the G-Buffer, it adds no additional cost to the light shaders, and it has the potential at least to discard pixels that aren’t in a particular light’s groups prior to executing the light’s fragment shader.

There are some downsides, however.  The stencil buffer can be a precious commodity in a deferred renderer.  Engines using stencil shadows will need to reserve a piece of it and most deferred renders will also use a few bits of the stencil buffer to mask and quickly cull pixels that are unlit or outside a light’s depth bounds.  One consequence of this is that the number of bits available in the stencil buffer for a light group mask is usually less than 8 (4 to 6 in my experience).  Six unique light groups are better than none, but the limit does make it much more difficult to use light groups for statically defined shadows.  Assigning light groups to rooms in a level to prevent light bleeding through walls becomes a form of the computationally challenging graph coloring problem.

Another consequence of this technique’s reliance on the stencil buffer is that since its stencil test logic is complicated and changes with every light, it may disable or dramatically reduce the effectiveness of the GPU’s pre-fragment shader stencil cull.  On most hardware early stencil cull is accomplished by storing a low resolution, low bit depth cache of the stencil buffer.  Blocks of between 16 and 64 pixels are reduced to a single bit that represents a conservative average of the stencil function across all pixels in the block.  This cache is then consulted prior to running the fragment shader to quickly discard pixels which are guaranteed to fail the stencil test.  Given this implementation of early stencil in the hardware, there are two options for utilizing it with light groups.

The first option is to sort all lights by their light group mask and refresh the early stencil cache between lights when the mask changes.  This provides pre-fragment shader culling for light groups (and anything else you might be using the stencil buffer for), but it requires a potentially expensive refresh of the early stencil cache.  It also adds a new criterion on which to sort lights, and there are plenty of other deferred shading techniques that have their own competing sort order requirements.

The second option is to give up on pre-fragment shader culling for light groups and instead use the early stencil cache solely for stencil shadows and light depth bound culling.  To do this one would simply mask out and ignore the stencil bits representing light groups when populating the early stencil cache.  With this option light groups no longer have any impact on performance–no cost, no benefit.  This option may sound like a defeat, but even it isn’t available on all platforms.  The Xbox 360, for example, can’t apply a mask during the construction of the early stencil cache, so there is no way to exclude light groups from early stencil cull.  The best option on that platform without refreshing early stencil frequently is to make the early stencil test much more conservative.  That means that the presence of light groups will actually allow some pixels that would have been culled due to stencil shadows or light depth bounds to be processed by the fragment shader, and light groups become an unpredictable performance cost.

On the Xbox 360 and Playstation 3, at least, the early stencil hardware is well documented and fully exposed to developers.  On the PC, however, the concept of early stencil cull is entirely hidden from developers and implemented transparently within the driver.  This makes sense since early stencil cull is an optimization very specific to individual hardware implementations and has no impact on the correctness of the output, but it makes relying on early stencil cull for performance extremely dangerous.  Even when you know a lot about your rendering pipeline and your game’s data, it isn’t clear what the optimal use of early stencil cull is for light groups.  Drivers don’t have enough information to make the best choice on your behalf, and when the stencil configuration gets too complicated they are likely to just disable early stencil cull entirely!  So a scary third option for implementing light groups with the stencil buffer is that you lose all early stencil cull during your deferred pass and stencil shadows and light depth bound culling get a lot more expensive.

I’m a huge fan of deferred techniques, but it is important to remember that they are not a one-for-one replacement for forward rendering.  Forward rendering creates a tight coupling between the geometry and the frame buffer; its greatest weakness perhaps, but also its greatest strength.  When evaluating a move from forward to deferred rendering, don’t overlook the fact that light groups and a host of similar mesh-centric features aren’t going to be available anymore.

9 Comments

  1. Matt says:

    By rendering some objects into the lighting buffer before the lights in a deferred shading pipeline, they can use their own lighting model, and their result is added to the contributions from the other lights. Emission, light groups, subsurface scattering, refraction, and reflection are all good candidates for this pre-light, post G-buffer pass. This is quite similar to the blended object pass after lighting, but the opaque self-lit surfaces can also receive light from the hundreds of deferred lights if desired.

    1. Adrian Stone says:

      Sure, you can always combine forward and deferred approaches in the same scene, but that’s not always a good solution. There is an upfront cost to deferred shading in constructing the G-Buffer. If you then render a lot of your objects with forward shading you squander that investment.

      Arguments that deferred unfriendly features like translucency and shadow groups should be handled with a forward pass only hold up if the number of objects using those features is very small.

  2. Pal Engstad says:

    Notice that the deferred lighting technique is not affected by this problem. In deferred lighting, geometry is rendered as a traditional forward pass. All that is required is that the lighting from light groups is done in this pass, not in the deferred pass.

    1. Adrian Stone says:

      See my reply to Matt. You can choose to render some lights in a forward pass with either deferred shading or deferred lighting, but I don’t accept that as a solution. My article is not about whether you can mix forward and deferred techniques in the same renderer, it is about whether you can implement common forward-rendering features in a deferred fashion.

    2. Pal Engstad says:

      I don’t see how “an upfront cost to deferred shading” relates to a supposed upfront cost to deferred lighting. The answer to your question (can one implement [all] common forward-rendering features in a deferred fashion?) is trivially false. One cannot easily handle translucency and light-groups is also tricky. One cannot easily support multiple different light-models, and things like bleeding lights (i.e, let lights still contribute even if N dot L is less than 0) are really tricky to do deferred.

      As I mentioned in a previous talk, deferred lighting is nothing but an optimization technique. If you have a forward pass where you calculate a common operation Y = F(X), where Y and X are inputs and output vectors, then it is possible to decompose this in a pass where you first generate and save the input parameters X in a buffer, then run a screen (full-screen or partially full-screen) pass to calculate F(X) into an output buffer storing the results, and use the Y-results directly in a second pass through texture lookup. The feasibility of the approach depends on how possible it is to store the X’s and Y’s, as well as in how well you can deal with partial screen-space areas.

      PKE

    3. Adrian Stone says:

      I think we’re on the same page. My purpose in this article is merely to point out that the features which are difficult to implement in a deferred environment (as opposed to a forward one) go far beyond translucency and MSAA.

      The upfront cost I’m referring to, btw, regarding deferred lighting, is the cost of generating the G-Buffer and applying the lighting buffer. If you consider a scene with zero lights or zero objects, for example, that is all wasted cost. This is obviously an extreme case, but it is relevant when you start allowing object and light types that will be excluded from the deferred renderer and handled by a forward pass. The value of a deferred renderer is proportional to the percentage of elements in a scene that can take advantage of it.

  3. Hmm it looks like your website ate my first comment (it was extremely long) so I guess I’ll just sum it up what I had written and say, I’m thoroughly enjoying your blog. I as well am an aspiring blog writer but I’m still new to the whole thing. Do you have any points for rookie blog writers? I’d really appreciate it.

    1. Adrian Stone says:

      There are three qualities I admire in other blogs.

      The first is a consistent theme. I subscribe to blogs in hopes of getting more content similar to what drew me to the blog to begin with. It is rare that the content of a site will be so good, however, that I’m willing to wade through an equal proportion of articles on unrelated topics to keep abreast of the posts that are pertinent to my interests.

      The second is unique content. Unless you’re a cutting-edge researcher in a particular field, your best bet for unique content is probably to draw from personal experiences. There is a real dearth of testimonial data in the game industry, particular compared to the avalanche of theoretical papers coming from researchers.

      Finally, and this is the one that I’ve completely failed at, is frequent updates. I really enjoy reading very thorough examinations of a topic, but short, regular posts do a much better job of keeping readers engaged in a blog than long, infrequent ones. If you want your blog to be read, you should set a schedule for posting (say 100 words a week) and really force yourself to adhere to it.

Leave a Reply