As I indicate in the subtitle of this blog, there is no single way to develop games. The techniques used in game development are as many and as varied as the games themselves–what’s best for one game is not necessarily best for another. The phrase YMMV (your mileage may vary) is pretty much a staple of game technology discussions. On the other hand, few teams have the money or stamina to try every technology on every game, so I hope people won’t hold it against me when I choose to take sides.
I’ve noticed an increase recently in game developers promoting a technique called deferred lighting. Unfortunately this technique is old enough that not everyone remembers it by that name. Wolfgang Engel reintroduced it in ShaderX7 under the name light pre-pass rendering, and for many that name seems to be sticking. The most recent advocate of deferred lighting is Crytek. Martin Mittring divulged in a presentation at the Triangle Games Conference that Crytek will be utilizing deferred lighting in version 3 of CryENGINE.
Now I get to tell you why that’s a bad idea.
Deferred lighting is similar to a better-known technique called deferred shading. In deferred shading all the attributes necessary to completely shade a 3-D scene are rendered into off-screen textures called a G-Buffer. The G-Buffer for a scene contains, per pixel, things like the surface normal, material abedos, and Phong specular exponent. Shading can then be done in screen-space per light by reading back the necessary data from the G-Buffer. This has the distinct advantage of decoupling the geometry processing in the scene from the lighting and shading calculations. It is generally assumed that one can construct the G-Buffer in a single pass over the scene’s geometry and that one can constrain the light rendering in such a way that no more pixels are processed for a given light than are actually affected by the light. From an algorithmic complexity standpoint this sounds great. Meshes are rendered only once and no extraneous lighting or shadowing calculations are performed. There is a drawback however. The G-Buffer can be quite heavyweight, containing all those shading attributes, and consequently deferred shading consumes a lot of memory and bandwidth in constructing and reading back the G-Buffer. Deferred lighting attempts to address that problem.
In deferred lighting only the lighting, not the shading, computations are deferred. In the initial pass over the scene geometry only the attributes necessary to compute per-pixel lighting (irradiance) are written to the G-Buffer. The screen-space, “deferred” pass then outputs only diffuse and specular lighting data, so a second pass must be made over the scene to read back the lighting data and output the final per-pixel shading (radiant exitance). The apparent advantage of deferred lighting is a dramatic reduction in the size of the G-Buffer. The obvious cost, of course, is the need to render the scene meshes twice instead of once. An additional cost is that the deferred pass in deferred lighting must output diffuse and specular irradiance separately, whereas the deferred pass in deferred shading need only output a single combined radiance value.
Five years ago, when I was designing the renderer for the Despair Engine, I thought deferred lighting was the ideal choice. Details on the Playstation 3 were sketchy at that time, but we already knew that render target memory on the Xbox 360 would be severely limited. The G-Buffer for a deferred shading system wouldn’t fit in EDRAM and consequently it would have to be rendered in two tiles. With deferred shading on the Xbox 360 requiring two passes over the scene meshes, the primary disadvantage of deferred lighting appeared nullified.
Despair Engine utilized deferred lighting for over two years, and we were generally very happy with the results. It was implemented initially on the Xbox 360 and PC, but when the Playstation 3 was released it was extended to that platform as well. Unfortunately our initial implementation on the Playstation 3 yielded significantly worse performance than we were seeing on the Xbox 360. We had multiple projects well into development at that point, however, so scaling back our expectations on the content side wasn’t a viable option. Instead the performance deficit on the Playstation 3 motivated our very talented PS3 programmer, Chris McCue, to look for alternate solutions. From extensive profiling he identified two bottlenecks unique to the Playstation 3. First, the PS3 struggled far more with vertex processing costs and consequently both the attributes and shading stages of deferred lighting were more frequently vertex bound on the PS3 than on the other platforms. Second, the PS3 was sometimes ROP bound during the deferred lighting pass itself, a problem that is all but impossible on the Xbox 360 due to the massive bandwidth to EDRAM.
Based on this data, Chris proposed to switch to classical deferred shading on the Playstation 3. Deferred shading would reduce the number of geometry passes from two to one and reduce the output bandwidth during the deferred pass. I agreed, and sure enough the move to deferred shading was a success. It helped narrow the gap between the Playstation 3 and the Xbox 360 to the point where we could ship the same content on both platforms and provide nearly identical play experiences on each.
The move to deferred shading on the PS3 prompted me to take a closer look at my decision to use deferred lighting on the other platforms. If deferred shading was a win on the PS3, it seemed likely to have some advantages on the PC and maybe even the Xbox 360. Although I’ve never been a proponent of settling for the least-common-denominator in cross-platform development, if we could move all platforms to the same deferred process without sacrificing performance, I knew it would save us some headaches in maintaining platform compatibility later on.
I implemented deferred shading on the Xbox 360 and PC a few months later and profiled the results. On the Xbox 360, much to my surprise, deferred shading performed within a few percent of deferred lighting. I could literally toggle back and forth between the two technique and barely notice the difference in GPU utilization. Deferred lighting was a few percent faster in that initial implementation, but considering that we’d been optimizing the deferred lighting pipeline for years, I wasn’t about to be quibble over less than a millisecond of GPU time. Doing head-to-head comparisons on the PC is a little more difficult because of the wide range of PC graphics hardware, but on the high-end DX9 cards and the low-end DX10 cards that I had access to at the time, the difference in rendering performance between the two techniques on the PC was similarly small. More importantly, on the PC we suffered far more from CPU-side batch overhead and deferred shading handily cut that cost in half.
Having lived with deferred shading for a couple years now, I’ve come to appreciate the many ways in which it is superior to deferred lighting. Although deferred lighting sounds great in theory, it can’t quite deliver in practice. It does, in my experience, offer marginal GPU performance advantages on some hardware, but it does so at the expense of a lot of CPU performance and some noteworthy feature flexibility. To understand this, consider the implementation of a traditional Phong lighting pipeline under deferred shading and deferred lighting.
Deferred shading consists of two stages, the “attributes stage” and the “deferred stage.”
- The attributes stage:
-
- Reads material color textures
- Reads material normal maps
- Writes depth to a D24S8 target
- Writes surface normal and specular exponent to an A8R8G8B8 target
- Writes diffuse albedo to an X8R8G8B8 target
- Writes specular albedo to an X8R8G8B8 target
- Writes emissive to an X8R8G8B8 target
- The deferred Stage:
-
- Reads depth, surface normal, specular exponent, diffuse albedo, and specular albedo
- Blends exit radiance additively into an X16R16G16B16 target.
Deferred lighting, on the other hand, consists of three stages: the “attributes stage”, the “deferred stage,” and the “shading stage.”
- The attributes stage:
-
- Reads material normal maps
- Writes depth to a D24S8 target
- Writes surface normal and specular exponent to an A8R8G8B8 target
- The deferred stage:
-
- Reads depth, surface normal, and specular exponent
- Blends specular irradiance additively into an X16R16G16B16 target.
- Blends diffuse irradiance additively into an X16R16G16B16 target
- The shading stage:
-
- Reads material color textures
- Reads diffuse and specular irradiance
- Writes exit radiance into an X16R16G16B16 target
First let’s consider the memory requirements of the two techniques. Deferred shading uses a G-Buffer that is 20 bytes per pixel and a radiance target that is 8 bytes per pixel for a total of 28 bytes per pixel. Deferred lighting requires only 8 bytes per pixel for the G-Buffer and 8 bytes per pixel for the radiance target, but it also requires 16 bytes per pixel for two irradiance targets. So in this configuration deferred lighting actually requires 8 bytes more memory per pixel. I am assuming that both approaches are using appropriate bit-depth targets for high dynamic range rendering with tone reproduction handled as a post-processing step. If you assume LDR rendering instead, I would argue that deferred lighting still requires deeper than 8-bit targets for irradiance, because the range of values for irradiance in a scene is typically far greater than the range of values for exit radiance. In any case, there are a few variations on the layout described above and a number of options for overlapping or reusing targets on the more flexible console architectures that reduce the per-pixel costs of each technique to an equivalent 20-24 bytes per pixel.
Now let’s take a look at bandwidth usage. The bandwidth required for “material color textures” and “material normal maps” is content dependent, but it is also exactly the same between the two techniques so I can conveniently factor it out of my calculations. Looking at the layout described above, bandwidth consumed during the attributes and shading stages is measured per pixel and bandwidth consumed during the deferred stages is measured per lit pixel. Adding everything up except the material color textures and normal maps, we see deferred shading writes 20 bytes per pixel plus an additional 8 bytes per lit pixel and reads 24 bytes per lit pixel. Deferred lighting, however, writes 16 bytes per pixel plus an additional 16 bytes per lit pixel and reads 16 bytes per pixel plus an additional 24 bytes per lit pixel. What this means is that if the average number of lights affecting a pixel is greater than 0.5, deferred lighting consumes more write bandwidth than deferred shading. Furthermore, no matter how many lights affect each pixel, deferred shading consumes 16 fewer bytes of read bandwidth per pixel.
The last thing to consider when comparing the two techniques is feature flexibility. So far I’ve looked at how traditional Phong lighting might be implemented using the rival deferred techniques. Proponents of deferred lighting will sometimes argue that handling only the irradiance calculation in screen-space affords more flexibility in the choice of lighting models. Once the diffuse and specular irradiance buffers have been constructed, each material is free to use them however it sees fit. Unfortunately there isn’t as much freedom in that as one would like. Most of the interesting variations in lighting occur in the irradiance calculation, not in the exit radiance calculation. Anisotropic lighting, light transmission, and subsurface scattering all require additional attributes in the G-Buffer. They can’t simply be achieved by custom processing in the shading stage. When you consider the cost of adding additional attributes to each technique, the advantages of deferred shading really come to light. The 8 byte G-Buffer layout for deferred lighting is completely full. There is no room for an ambient occlusion or transmissive term without adding an additional render target at the cost of at least 4 bytes per pixel. The deferred shading layout I’m using for this comparison, however, has unused channels in both the diffuse and specular albedo targets that can be read and written without adding anything to the space and bandwidth calculations above.
To be fair, there is one important detail I should mention. Most proponents of deferred lighting recognize the excessive cost in generating separate diffuse and specular irradiance buffers and consequently adopt a compromise to the Phong lighting model. They assume that specular irradiance is either monochromatic or a scalar factor of diffuse irradiance, and consequently it can be stored in the alpha channel of the diffuse irradiance target instead of requiring a full target of its own. This configuration dramatically improves the results calculated above. Again in the interests of fairness, when evaluating this form of deferred lighting, a similar compromise should be made for deferred shading. The specular albedo can be considered monochromatic or a scalar factor of diffuse albedo (or both with sufficient packing). With these modifications to both techniques deferred lighting does, indeed, have an advantage. Deferred lighting will now require as little as 16 bytes of memory per pixel on some platforms whereas deferred shading will require 20. Deferred lighting also ends up having equal write bandwidth requirements to deferred shading and lower read bandwidth requirements as long as the average number of lights per pixel is greater than 2.
Nevertheless, the differences are never huge, and ultimately there are a number of subtleties regarding how the bandwidth is distributed across the various stages and whether the stages are typically bandwidth bound that further muddy the waters. The most damning evidence against deferred lighting remains that in a direct comparison across the content of two games and three platforms it only provided at best a few percent GPU performance advantage over deferred shading at the cost of nearly doubling the CPU-side batch count. If further evidence is needed, consider that Killzone 2 experimented with deferred lighting early on in its development and also ultimately settled on a classical deferred shading architecture.
So as I said at the start, YMMV, but I for one don’t expect to be returning to deferred lighting anytime soon.