About Me

Hi. I'm Josh Ols. Lead Graphics Developer for RUST LTD.


View Joshua Ols's profile on LinkedIn


Entries in Deferred Shading (10)


Deferred shading (again?)


Recently, I started thinking that maybe I jumped the gun a little when I said I wanted to ditch deferred rendering. I came to this realization shortly after I started playing around with forward rendering and static lighting techniques. Ultimately, my findings with these techniques were not quite what I expected they would be, and forced me to reconsider my plans.




The difference isn't as apparent as I had imagined it would be, only really seeing a dramatic difference for distant objects. Even then, at 4x MSAA I am effectively paying the storage cost of a 4 MRT g-buffer. For that same storage cost, I could be storing information for dramatic post-processing effects, and complex lighting. Bottom line, aliasing is harder to notice in HD, and I expect it to become less and less of an issue as resolutions go up.

Custom lighting:

There are a variety of materials (SSS, anisotropic, etc) that really do need some kind of custom lighting in order to look their best. Sadly, I concluded that I really couldn't have many custom lighting models, since I would then have the headaches of a shader combinatorial explosion. So I have decided to settle for a good general purpose lighting approach that covers most materials.

Baked lighting:

Lightmaps and irradiance volumes have the potential to make for some stunning quality lighting. However, this potential is heavily dependent on the capabilities of the tools that generate and bake the lighting, as well as the processing power you have to perform the baking. The way I see it, if the maps don't have fancy effects (area lights, radiosity, etc) baked in, then I'd be better off with deferred shading since all the lighting would be dynamic.


New goals:

Having had time to think it over, deferred rendering seems like the best choice for me. With that in mind, I need some new goals to help move my project forward, and decide where it ultimately needs to go.

Limited baking:

In order to facilitate rapid-iteration, I will need to limit baking in my asset processing pipeline. Ideally, it will be limited to when the asset is imported into the dev tools, and when it is exported to the target platform format.

Sandbox-style editing:

WYSIWYP is a must for modern rapid-iteration development, and this style seems to work well in the other sandbox editors I have sampled.

Dynamic Environments:

This will be a combination of different systems interacting. Between ambient conditions, post-processing, etc to create a mood and a world that feels like it is changing.

Standardized Shading:

In order to fit with my deferred renderer, I will need a sufficiently capable shading model that can handle the vast majority of cases. It needs to feel like it covers a wide variety of materials, while not eating too much processing power doing it.


Final Thoughts:

In conclusion, my little foray into forward rendering was a valuable learning experience, but ultimately didn't demonstrate enough merit for me to stick with it. So I will be transitioning back to deferred rendering, and don't expect to be moving away from it again any time soon.


Deferred-lighting renderer

Okay, I've officially decided to switch over to Deferred Lighting for my renderer. During my tests, I found its benefits over traditional deferred shading more than made up for the extra drawing pass for all my geometry. The reduction in VRAM, the added material/shading options, and the possibility of MSAA on SM 4.0 hardware are all far too tasty for me to pass up.

After much experimentation, my new renderer is at a point where it uses RGBA8 buffers, the native depth information, and no MRT. On SM 4.0 hardware, I will be able to make it sample the hardware depth buffer, rather than having to copy its contents. Ideally, I could also do something like what Insomniac did with Resistance 2, where they store their normals and gloss in the back-buffer, to save some extra memory.

Needless to say, I have been doing a lot of tinkering with this new approach, and will be doing far more in the coming weeks. ;)


Lighting pipeline:

I'm planning to use a lighting pipeline similar to the one employed by CryEngine3. The order will be ambient lighting, multiplied by SSAO, and then adding direct lighting. This ensures that SSAO only affects ambient lighting, and direct illumination will correctly show up in the occluded regions. Shadows will mask a light's contribution, so that ambient lighting will dominate the shadowed region.

Image-based lighting will be combined in the lighting buffer during the ambient phase. It will be used for ambient lighting and glossy/metallic reflections. Handling reflections here will nicely combine them with specular lighting. This also keeps with the mentality of keeping lighting in the lighting phase, and avoiding pushing that burden on the material phase.


Material Benefits:

This pipeline will keep the material shaders independent/ignorant of the lighting phase implementation. All they have to care about is that the lighting comes to them through two standard channels (diffuse, specular). So the material shaders can decide how they treat the lighting information.

Letting materials, rather than lights, decide how to combine the illumination and material properties opens up a whole host of shading possibilities. For example, they can approximate lighting models from the diffuse illumination, such as Minnaert shading and a sub-surface scattering approximation. The same goes for the specular contribution, allowing things like metallic or glassy reflections.

Ideally, I could nicely separate lighting and material shading from each other. However, I will have to make some exceptions for things like rim-lighting for "fuzzy" materials, or anisotropic highlights for things like hair. Since these are material-specific, they can't conveniently be combined with the light buffer. However, they are both quite necessary for a diverse range of common materials.

Time will tell if I have to make any other exceptions. However, the pipeline I have outlined covers the vast majority of materials I might want, and nicely separates most of the stages so that they can be optimized.



One thing I felt I should mention for anyone else who tries to implement a similar renderer. Be careful about how you store and recover your normals. I lost two weeks trying to fix a nasty artifact that resulted when I made the switch to low-precision buffers. An artifact that turned out to be a trivial fix.

In my case, I was storing view-space normals in an RGBA8 buffer, just packing it to the range [0,1] for storage, then upacking to [-1, 1] for shading. This works fine for diffuse lighting, but you will have nasty artifacts for specular lighting due to quantization making the vectors not unit length. To deal with this, do not forget to normalize() the unpacked normal to correct these errors.

Keep things like this in mind when transitioning from high-precision buffers to low precision buffers.



Much to my annoyance, I have learned that my test implementation would not work in D3D9. Currently, I am using an FBO trick to copy the contents of the depth_stencil renderbuffer to another FBO with a depth texture. D3D9 doesn't allow copying from a depth_stencil surface, without using hardware-specific hacks. So that means I have no choice but to store the depth in an RT during the prepass stage to maintain compatibility between the OGL & D3D versions of my renderer.


Prelighting & Log-Lights (Ver. 2)


Prelighting, deferred lighting, light prepass rendering...They all refer to what is essentially the same concept. It is a form of deferred shading that defers the lighting phase, but keeps the material phase as forward rendering. All for the purpose of avoiding the fat g-buffers that are typical of deferred shading, and allowing more varied materials. Who came up with it first is a bit unclear, but to my knowledge the first public definition was developed by Wolfgang Engel (link).

When compared to the approach I am currently considering, this approach offers many appealing benefits. Firstly, how it allows deferred shading without explicitly requiring MRT support. Then there is the possibility of MSAA on platforms that support explicit control of how samples are resolved. Finally, how it enables more material variety than a standard deferred renderer, since it doesn't force all objects to use only the provided material channels of a g-buffer.

Sadly, it has its downsides as well. Because it is halfway in-between a deferred renderer and a z-prepass renderer, it inherits many problems of both approaches. On the z-prepass side, it forces you to divide material properties between two passes, possibly sampling the same texture multiple times. Not to mention, having to draw all object at least twice, potentially limiting the maximum number of objects that you can draw. On the deferred shading side, it forces one lighting model for all objects. There is also the age-old issue of translucency, but that is an issue for any renderer that isn't single pass.

So to set up for the second part of this post, I will outline the renderer used to generate my example images. As far as I know, the most common approach for the lighting buffer in this kind of system is to store the diffuse illumination in the RGB channels, and the specular component in the A channel. To compensate for the monochrome specular, it gets multiplied with the chromacity of the diffuse light buffer to approximate colored specular lighting. Finally, during the material phase the light contributions are combined with the appropriate textures, and added together with other material properites before the final output.

Further reading:

Insomniac's Prelighting (GDC 2009)

Engel's Prepass Renderer (SIGGRAPH 2009)

Deferred Lighting Approaches

ShaderX7, Chapter 8.5


(prelight) diffuse (prelight) diffuse chromacity (prelight) specular [monochrome] (prelight) specular [colored] (prelight) diffuse + colored specular

Figure 1. 1. Diffuse accumulation, 2. Diffuse chromacity, 3. Specular accumulation [monochrome], 4. Specular accumulation [diffuse chromacity], 5. Material pass results


Log Lights:

Here's the real meat of this post, directly following from my consideration of Prelighting. For the best results, we want to perform linear HDR accumulation of lights. This usually mandates that we use at least an FP16 RT, since integer formats don't have the necessary range or precision. However, fp16 RTs eat memory, are slower to render, and can have restricted support on some platforms. So if we are limited to RGBA8 RTs, how much can we get out of them?

When used with a prelighting renderer, I have found that RGBA8 can do a decent job. Storing the raw lighting data, it handles linear light accumulation fairly well when they don't overlap much. However, it does suffer from some visual artifacts on dim lights. Though more than anything, it is limited to the range [0,1]. So when lights pile up on a surface, they will soon saturate at 1.0 and begin to lose normal-mapping detail. These limitations are simply unacceptable, but how can we do better?

The concepts of Log Lights was first introduced to me in the gamedev post "Light Prepass HDR" by a fellow named drillian (link). The idea is to use the properties of power functions to make best use of an integer RT's precision. It allows a form of linear light accumulation with extended range, while still taking advantage of hardware blending. Plus, it does all this in the confines of individual RGBA8 components, so it doesn't have to steal another one to add precision.

This trick works by exploiting the fact that when power functions are multiplied, their exponents are added together. This allows us to add light values by storing them in the exponent, and using multiplicative blending on the output of the light shader. After that, it is a simple matter to extract the value from the exponent via a logarithm. To make sure that the power function stays in the range [0, 1] for storage in and RGBA8 RT, the light values are negated before being stored in the exponent. This requires that the recovered values be negated again to be correct. Easy enough, right?

Now, it does come with some deficiencies. For starters, darker light values show slightly worse artifacts than straight RGBA for dim lights. Then there is the issue that this doesn't provide a true HDR range of values, only really providing MDR. Finally, you are restricted in the kinds of operations you can perform on the light buffer, since it is no longer storing raw rgb data. There is also the issue that this technique only looks decent for accumulation of raw light values. I have found that multiplying each contribution by a texture will produce nasty visual artifacts. So this technique wouldn't be good for accumulating lights in a standard deferred renderer.

After much testing, I have concluded that this technique can nicely fill the niche of the light accumulation phase of a prelighting renderer. The added range may not be the best, but is substantially better than using raw RGBA values. Yet it uses the exact same storage media, and only requires a small amount of overhead to (un)pack the values. Overall, it is a useable solution to the problems I have outlined, depending on what you are trying to do.


// Needed for multiplicative blending, since values need to start at one



// Light shader output

outputLight = exp2(-lightContribution);

// Material shader input

recoveredLight = -log2(lightAccumulation);


(log light) diffuse buffer [exp2] (log light) specular buffer [exp2]

Figure 2. 1. Diffuse accumulation [exp2], 2. Specular accumulation [monochrome, exp2] 


(rgba) 0.1 (log-light) 0.1 (rgba) 1.0 (log-light) 1.0 (rgba) 2.0 (log-light) 2.0 (rgba) 3.0 (log-light) 3.0

Figure 3. Alternating brightness values, RGBA first, then Log-Light. 0.1, 1.0, 2.0, 3.0


(log-light) artifacts [standard deferred]

Figure 4. 1. Standard deferred artifacts


Demo Update:

I almost have the detail mapping demo code to where I want it, so that part shouldn't take too much longer. Right now, I am waiting on a request I made to an artist friend to make some cool geometry/textures so it looks good, and doesn't use textures I don't own. I promise I will release it as soon as I feel everything is ready, so please bear with me.


Stippled Alpha

There is something I didn't consider until Digital Foundry pointed it out for their FF13 PS3 vs 360 article. Basically, if you use this effect at a lower resolution, and then upscale the image for final display, the effect can look pretty bad. Be forewarned!

I later found out that FF13, Resonance of Fate, & GoW3 were actually using Alpha-To-Coverage antialiasing. I wanted to clear that up to avoid confusion.

Stippled alpha has actually been around for a while, but has only recently resurfaced as a viable technique for adding translucency. It takes the idea that a densely perforated surface when viewed at a distance will appear to be translucent. This technique has already been used to great effect in Final Fantasy 13 & Resonance of Fate (hair, beards, eyelashes), the Nvidia Luna demo (veil), Zack and Wiki & Dead Space Extraction (particle effects), God of War 3 (fur).

Translucency is a must in almost all modern games. However, the typical implementation using alpha-blending is difficult to integrate into the rendering pipeline due to results being depth order-dependent. You can't perform the usual optimizations of ordering by shader, materials, textures, etc. The problem is further exacerbated by how this requirement restricts translucency to single-pass forward lighting.

So for multipass and deferred shading, it necessitates exploring other options. Most times, they will make translucent objects statically lit, or only lit by global light sources. Other than that, they will impose restrictions on their usage, usually limiting them to particles, windows, and some foliage. Everywhere else, they try to make alpha-tested cutouts pick up the slack. Sadly, while cutouts may work for a variety of things, they are not a true replacement for translucency.

So let's see how stippled alpha tries to solve this.



For starters, the effect needs to be screen-aligned. If we applied the effect to each surface like a texture, we will get shimmering as the camera moves around the object. Also, to get the best quality we need to make sure that the discarded components are only one pixel in size, and padded by one pixel on all sides. So we must make sure that the stipple pattern is 2x2 pixels, with three opaque, and one transparent.

Thus far, I have found two approaches to accomplishing this. The first uses a 2x2 texture of three opaque values, and one transparent value. Then it is just a matter of mapping every 2x2 square of pixels to this texture, and using discard on the transparent pixel of each. The second just uses information about pixel positions to guess which one should be eliminated based on where it lies in a 2x2 square.

For my example, I will use a technique that doesn't require a texture, and just makes use of information obtained inside the shader. Basically, pixelPosition is the position of each pixel in the range [0, screenWidth], and [0, screenHeight]. Using this, we just cut the dimensions in half, and look at the fractional components for each pixel position. Wherever the components for both width and height exceed 0.5, that means we are on the pixel that needs to be discarded. It's also trivial to do a variation where it discards the other three pixels, and preserves the one that is past 0.5 for both width and height.


halfPosition = fract(pixelPosition * 0.5);

discard(step(halfPosition.x, 0.5) && step(halfPosition.y, 0.5));



(stippled alpha) High-density (stippled alpha) Low-density

Figure 1. 1, High-density; 2, Low-density


(stippled alpha) Light-Buffer (stippled alpha) Normal-Buffer (stippled alpha) Depth-Buffer[Enhanced]

Figure 2. 1, Light-buffer; 2, Normal-buffer; 3, Depth-buffer



Probably the biggest advantage of this technique is that it will work well with multipass and deferred shading. However, it also extremely nice from the performance standpoint, since it is basically just a modification of standard alpha-testing. Front-to-Back sorting, state-sorting, updating the depth buffer all work with this approach. As a result, it will also work with post-processing effects that take advantage of the depth buffer. It also has the advantage that translucent objects that overlap themselves won't have popping issues (see Luna demo). Finally, as screen resolutions go up, so will the quality of this technique, since the perforations are the size of a pixel.

Of course, many of the things that make this approach great also make it not so great. For starters, it only allows one layer of translucency for everything using the same technique, since the perforations are based on absolute pixel-position. So the object nearest to the camera will obscure those farther away. This also makes it hard to do things like drawing a box where you can see the inside wall behind the translucent front wall. There is also the issue of the constant size of the perforations. So as objects get further from the camera, their translucency quality gradually gets worse. Ultimately, this technique will not look so great on lower-resolution monitors, since there will be fewer pixels to utilize.

There are some specific cases where it will cause conflicts with other tech that is seeing more use in games. Screen-Space Ambient Occlusion will be problematic since the perforations make for a dramatic differences in normal and depth per-pixel on an objects surface. So it will manifest as noisy dark and bright spots all over said surface.

Then there is the problem with downsampled buffers. Because they reduce the normal and depth buffers based on an average of several pixels, the wildly different values can produce unexpected results. Sometimes it will make the object disappear, and sometimes it will make it completely opaque. The worst case being when it causes a weird grid to form, that will produce really nasty visual artifacts. Probably the most annoying aspect is that the specific behavior may be different between the depth and normal buffers. This is affected by screen resolution, and will be better or worse depending on the size you pick. Sadly, the example images were obtained at the "standard" 1280x720. :(


(stippled alpha) SSAO (stippled alpha) Normal-Buffer [downsampled] (stippled alpha) Depth-Buffer [downsampled]

Figure 3. 1, SSAO; 2, Normal-buffer [downsampled]; 3, Depth-buffer [downsampled]


Final Thoughts:

This is not a drop-in replacement for alpha-blending. However, it is another approach that can be useful in places where neither alpha-blending or alpha-testing is really appropriate. It will look best for gossamer materials that overlap themselves (veil), or cutouts that need to gradually lose opacity (hair). More than anything, if you have a multipass/deferred rendering pipeline that needs translucent object to be affected by dynamic lighting, this seems like the only really viable option available.

So that is why I am considering this tech for my project. Here's hoping you find the information useful for your own. ;)



For a good example of this type of renderer in motion, check out my Cubism Demo video and/or download the demo source+executable from the "Projects" section of the website. ;)

In my project, I have been grappling with all the headaches that accompany trying to develop a lighting pipeline. Which is more important, many and varied materials, or being able to have any combination and number of lights? Is a more general lighting system worth the complexity of managing shader variations?

Ultimately, forward rendering was a scary prospect, due to the risk of shader combinatorial explosions and limited numbers of lights on screen. However, deferred rendering was unappealing due to material limitations and its tendency to eat VRAM. Then there were crazier systems like projecting lights onto an SH basis per object to provide multiple lights in one pass, but having limited support for point lights and basically no support for spot lights.

Ultimately, I found myself leaning towards a hybrid system of forward and deferred shading to avoid grappling with the problems of either in isolation. Now the question remained, what kind of deferred shading should I choose?

Recently, I came across the concept of Afterlights, a form of Deferred Shading developed by Realtime Worlds, and first introduced in the game Crackdown (Shader X7, Section 2.6). Essentially, they tackle the issue of the fat G-Buffer by looking at what buffers are needed by multiple systems, and adding only the smallest amount of data necessary to get deferred shading.

Firstly, they make the assumption that most modern graphics pipelines will have per-pixel depth/normals for various deferred/post-processing effects (fog, soft-particles, SSAO, outlines, etc). So why not just share that data between the different stages of the pipeline? This way, we can justify eating the cost, and have most of what is needed to perform the actual lighting.

Now the problem becomes how to handle material data? This one is more difficult, as this data is only really used by deferred shading system. To make matters worse, it tends to take up the most memory out of the G-Buffer. So how much material data is really necessary? What's the smallest amount we need to perform deferred shading? Where will it be stored? How will it be accessed?

To answer these questions, we need to look at the other buffers that are available to be repurposed. Many times, RTs in a G-Buffer aren't fully utilized, with more than a few channels going to waste. This is a problem for the light-buffer, since its channels can't be accessed in the shader when the application is rendering to it. In this situation, the only way to access them is via hardware blending. This is where Afterlights come into the picture.



So the way they work is that during the G-Buffer phase, we obtain the luminance of each surface's albedo texture, and store it in the free alpha-channel of the light buffer. Then during the light accumulation phase, we use alpha-blending set to blend colors as (DST_ALPHA, ONE) and alpha as (ZERO, ONE). This configuration effectively modulates each light's contribution by the per-pixel albedo luminance before adding them to the light buffer. Thus, they are darkened in a way that roughly corresponds to the actual albedo.


(Afterlights) Albedo (Afterlights) Luminance

Figure 1. 1, Albedo; 2, Luminance



(Afterlights) Normals (Afterlights) Depth [enhanced] (Afterlights) Global (Afterlights) Luminance

Figure 2. 1, View-space normals; 2, Linear View-space depth [enhanced]; 3, Light Buffer [Forward + Emissive]; 4, Luminance



(Afterlights) Global (Afterlights) Local (Afterlights) Local + Global (Deferred Shading) Local (Deferred Shading) Global + Local (Deferred-Afterlights) Local

Figure 1. 1, Light Buffer [Hemisphere light]; 2, [Afterlights]; 3, [Hemisphere light + Afterlights]; 4, [Standard Deferred]; 5, [Hemisphere light + Standard Deferred]; 6, Difference



This biggest advantage of this technique is how it utilizes the depth/normal buffers that are shared between multiple systems. Plus, the only data specific to this system makes use of a shared RT channel that normally goes to waste. Finally, it integrates nicely with hardware blending to not only avoid adding extra complexity to the lighting shaders, but even simplifying them in the process.

Sadly, it has many disadvantages as well. Since it only stores the luminance of the albedo, it only darkens each light. This has the consequences of making it look like you are rendering with black-n-white textures. Then there is the issue that lights that would normally be absorbed by the surface will still appear, and end up looking a bit out of place. Finally, it can't provide specular lighting, since it only encodes information about each surface's albedo.

Probably the biggest issue is one of implementation.  Specifically, that the light buffer has to have an alpha-channel for this approach to work. This can be a problem if you want HDR, as it makes an fp16 RT the only real option.



To deal with the desaturated look, you need to have some kind of pre-existing scheme for providing global illumination during the G-Buffer filling phase. For my examples, I used a hemisphere light, but you could use more sophisticated approaches (SH, Ambient Cube, Rim lighting, etc). Even then, if several lights pile up on a surface, they will make the material start to desaturate. So it is best to spread them out, and avoid having too many lights influencing one area.

For material variety, you could employ directional lights which are forward-rendered during the G-Buffer filling phase. These would allow specular effects and illumination modulated by the actual albedo. Generally speaking, it is better to render them this way since they affect all objects, and would be expensive to render in the deferred phase (texture samples, fill rate). Besides, they wouldn't introduce too many shader combinations since they would all be the same light type, and they would only need the most basic parameters (direction, color, intensity).

As for the alpha-channel necessity, you might be able to get around this if you use an RGBA8 target for your normal buffer, and had an unused channel. Then you could store the luminance in said free channel, sample it in the light shader, and perform the multiplication yourself.


Final Thoughts:

Bottom line, Afterlights work best when used for decorations and effects. If you want something that will illuminate scenes and provide material variety, then you will be sorely disappointed.

After having used them in my own projects, I can safely say they are a viable option for including deferred shading in any renderer. Depending on what you plan to do, they may be a valid option for your projects as well.

On a final note, the demo project I mentioned in the PDN article will make use of this approach for lighting the different normal map variations. I promise to release said project with source code in the not too distant future. Here's hoping you will find it useful.

Page 1 2