About Me

Hi. I'm Josh Ols. Lead Graphics Developer for RUST LTD.


View Joshua Ols's profile on LinkedIn


Entries in Encoding (3)


Linear sRGB Blending & Deferred Lighting (ver. 2)

Since my new computer's is equipped with an SM 4.0 graphics card, I decided to play around with some features that were standardized for this class of GPU. In particular, I was interested in how they now correctly perform linear blending when using sRGB rendertargets. This means that I get the correct color results of linear blending, but the concentrated precision of automatic sRGB conversion.


This is relevant to my project because I have read about other Deferred Renderers using this feature on consoles to avoid needing high precision lighting buffers. So it seemed like a good opportunity to dust off my old Deferred Ligthing code, and see how it could benefit from this approach.


If you are unfamiliar with sRGB and Gamma-space, the following links will provide a far better explanation than I could.



[1] RenderWonk: Adventures with Gamma-Correct Rendering

[2] blog.illuminate.labs: Are you Gamma Correct?

[3] Gamefest Unplugged (Europe) 2007: HDR The Bungie Way



For my experiment, I used a scene with six point lights of varying intensities. The data is stored in the range [0,2] in RGBA8 buffers so I can get Medium Dynamic Range lighting. I'm using standard Lambertian diffuse lighting, and Normalized Phong for specular. The defining difference here is whether it uses a standard RGBA8 buffer storing linear values, or an RGBA8_sRGB buffer storing gamma-space values.


For the sake of completeness, I have decided to test both of the common Deferred Lighting light accumulation variations. The first having two render targets, to accumulate diffuse and specular lighting independently. The second having one render target that stores diffuse lighting and specular intensity,  approximating colored specular lighting via the diffuse lighting chromacity.



For the two RT approach, the difference between the raw linear RGB and linear sRGB blending versions is pretty dramatic. The diffuse lighting sees significantly less banding in the center of dim light contributions, and has a smooth falloff out to their edges. The specular light contributions only see improvements around their edges, yielding a subtle but noticeable difference. In both cases, the overall lighting quality is significantly better.


The one RT approach had the same diffuse lighting quality as the two RTs approach, but only marginally improved specular lighting. I had concerns about using the sRGB trick with this approach because automatic sRGB conversion doesn't affect the alpha channel. Not so surprisingly, the stored specular intensity produces nasty banding artifacts in the colored specular lighting approximation regardless of whether or not linear sRGB blending is being used.


2RT, sRGB:

sRGB - Diffuse sRGB - Gloss 1 sRGB - Gloss 47 sRGB - Gloss 256

Figure 1, 1, Diffuse; 2, Gloss - 1; 3, Gloss - 47; 4, Gloss - 256



RGB - Diffuse RGB - Gloss 1 RGB - Gloss 47 RGB - Gloss 256

Figure 2, 1, Diffuse; 2, Gloss - 1; 3, Gloss - 47; 4, Gloss - 256



1RT, sRGB:

1RT sRGB -  Gloss 1 1RT sRGB -  Gloss 47 1RT sRGB -  Gloss 256

Figure 3, 1, Gloss - 1; 2, Gloss - 47; 3, Gloss - 256



1RT RGB -  Gloss 1 1RT RGB -  Gloss 47 1RT RGB -  Gloss 256

Figure 4, 1, Gloss - 1; 2, Gloss - 47; 3, Gloss - 256



All in all, I'd say this approach produces some really nice results when compared against using high-precision buffers. This way, I get the storage cost & read/write bandwidth of an RGBA8 buffer, but with the blending & precision benefits that would normally necessitate an RGBA16F buffer. For this reason, I am strongly considering this approach over floating point buffers on SM 4.0 hardware.


Just have to wait and see I guess.


AA + HDR encoding

Okay, first bit of progress with my new rendering approach. I started by testing out NAO32/RGBM for encoding HDR values in an MSAA/CSAA RGBA8 buffer. Since these are two of the major features that made me decide to make the switch, it seemed appropriate to test them first. So without further ado, I present the gruesome results!

First I will show the results of each sample setting by itself, then its difference from no AA. Please note, I found no appreciable visual difference between 8x MSAA and 8x/8xQ CSAA, so I will not be posting them.




Figure 1. 1. No MSAA, 2. 2xMSAA , 3. 4xMSAA, 4. 8xMSAA


NAO32 MSAA 2x (difference) NAO32 MSAA 4x (difference) NAO32 MSAA 8x (difference)

Figure 2. 1. 2xMSAA , 2. 4xMSAA, 3. 8xMSAA




Figure 3. 1. No MSAA, 2. 2xMSAA , 3. 4xMSAA, 4. 8xMSAA


RGBM MSAA 2x (difference) RGBM MSAA 4x (difference) RGBM MSAA 8x (difference)

Figure 4. 1. 2xMSAA , 2. 4xMSAA, 3. 8xMSAA



As you can see from the comparison shots, both schemes produce acceptable results for 2x/4x MSAA, but start to fall apart for 8x MSAA and equivalent quality CSAA modes. The edge quality improves, but color artifacts start appearing around the scene. So, there goes CSAA, and any MSAA modes above 4x. Still quite useable, but a bit of a bummer nonetheless.


Prelighting & Log-Lights (Ver. 2)


Prelighting, deferred lighting, light prepass rendering...They all refer to what is essentially the same concept. It is a form of deferred shading that defers the lighting phase, but keeps the material phase as forward rendering. All for the purpose of avoiding the fat g-buffers that are typical of deferred shading, and allowing more varied materials. Who came up with it first is a bit unclear, but to my knowledge the first public definition was developed by Wolfgang Engel (link).

When compared to the approach I am currently considering, this approach offers many appealing benefits. Firstly, how it allows deferred shading without explicitly requiring MRT support. Then there is the possibility of MSAA on platforms that support explicit control of how samples are resolved. Finally, how it enables more material variety than a standard deferred renderer, since it doesn't force all objects to use only the provided material channels of a g-buffer.

Sadly, it has its downsides as well. Because it is halfway in-between a deferred renderer and a z-prepass renderer, it inherits many problems of both approaches. On the z-prepass side, it forces you to divide material properties between two passes, possibly sampling the same texture multiple times. Not to mention, having to draw all object at least twice, potentially limiting the maximum number of objects that you can draw. On the deferred shading side, it forces one lighting model for all objects. There is also the age-old issue of translucency, but that is an issue for any renderer that isn't single pass.

So to set up for the second part of this post, I will outline the renderer used to generate my example images. As far as I know, the most common approach for the lighting buffer in this kind of system is to store the diffuse illumination in the RGB channels, and the specular component in the A channel. To compensate for the monochrome specular, it gets multiplied with the chromacity of the diffuse light buffer to approximate colored specular lighting. Finally, during the material phase the light contributions are combined with the appropriate textures, and added together with other material properites before the final output.

Further reading:

Insomniac's Prelighting (GDC 2009)

Engel's Prepass Renderer (SIGGRAPH 2009)

Deferred Lighting Approaches

ShaderX7, Chapter 8.5


(prelight) diffuse (prelight) diffuse chromacity (prelight) specular [monochrome] (prelight) specular [colored] (prelight) diffuse + colored specular

Figure 1. 1. Diffuse accumulation, 2. Diffuse chromacity, 3. Specular accumulation [monochrome], 4. Specular accumulation [diffuse chromacity], 5. Material pass results


Log Lights:

Here's the real meat of this post, directly following from my consideration of Prelighting. For the best results, we want to perform linear HDR accumulation of lights. This usually mandates that we use at least an FP16 RT, since integer formats don't have the necessary range or precision. However, fp16 RTs eat memory, are slower to render, and can have restricted support on some platforms. So if we are limited to RGBA8 RTs, how much can we get out of them?

When used with a prelighting renderer, I have found that RGBA8 can do a decent job. Storing the raw lighting data, it handles linear light accumulation fairly well when they don't overlap much. However, it does suffer from some visual artifacts on dim lights. Though more than anything, it is limited to the range [0,1]. So when lights pile up on a surface, they will soon saturate at 1.0 and begin to lose normal-mapping detail. These limitations are simply unacceptable, but how can we do better?

The concepts of Log Lights was first introduced to me in the gamedev post "Light Prepass HDR" by a fellow named drillian (link). The idea is to use the properties of power functions to make best use of an integer RT's precision. It allows a form of linear light accumulation with extended range, while still taking advantage of hardware blending. Plus, it does all this in the confines of individual RGBA8 components, so it doesn't have to steal another one to add precision.

This trick works by exploiting the fact that when power functions are multiplied, their exponents are added together. This allows us to add light values by storing them in the exponent, and using multiplicative blending on the output of the light shader. After that, it is a simple matter to extract the value from the exponent via a logarithm. To make sure that the power function stays in the range [0, 1] for storage in and RGBA8 RT, the light values are negated before being stored in the exponent. This requires that the recovered values be negated again to be correct. Easy enough, right?

Now, it does come with some deficiencies. For starters, darker light values show slightly worse artifacts than straight RGBA for dim lights. Then there is the issue that this doesn't provide a true HDR range of values, only really providing MDR. Finally, you are restricted in the kinds of operations you can perform on the light buffer, since it is no longer storing raw rgb data. There is also the issue that this technique only looks decent for accumulation of raw light values. I have found that multiplying each contribution by a texture will produce nasty visual artifacts. So this technique wouldn't be good for accumulating lights in a standard deferred renderer.

After much testing, I have concluded that this technique can nicely fill the niche of the light accumulation phase of a prelighting renderer. The added range may not be the best, but is substantially better than using raw RGBA values. Yet it uses the exact same storage media, and only requires a small amount of overhead to (un)pack the values. Overall, it is a useable solution to the problems I have outlined, depending on what you are trying to do.


// Needed for multiplicative blending, since values need to start at one



// Light shader output

outputLight = exp2(-lightContribution);

// Material shader input

recoveredLight = -log2(lightAccumulation);


(log light) diffuse buffer [exp2] (log light) specular buffer [exp2]

Figure 2. 1. Diffuse accumulation [exp2], 2. Specular accumulation [monochrome, exp2] 


(rgba) 0.1 (log-light) 0.1 (rgba) 1.0 (log-light) 1.0 (rgba) 2.0 (log-light) 2.0 (rgba) 3.0 (log-light) 3.0

Figure 3. Alternating brightness values, RGBA first, then Log-Light. 0.1, 1.0, 2.0, 3.0


(log-light) artifacts [standard deferred]

Figure 4. 1. Standard deferred artifacts


Demo Update:

I almost have the detail mapping demo code to where I want it, so that part shouldn't take too much longer. Right now, I am waiting on a request I made to an artist friend to make some cool geometry/textures so it looks good, and doesn't use textures I don't own. I promise I will release it as soon as I feel everything is ready, so please bear with me.