Partial Derivative Normal Maps (Article Ver. 2)

Friday, September 4, 2009 at 7:37PM

n00body in Normal Maps

n00body in Normal Maps

**UPDATE:**

In the comments section, *the_best_flash* wrote a really in-depth explanation of the math that makes this technique work. Be sure to check it out!

**UPDATE2:**

I recently discovered a little trick to shave off another instruction from PDN shader recovery and combination. This now makes it have the same cost as uncompressed for a single map, and makes it cheaper than uncompressed for combining maps!

Basically, all you do is use the scale bias MADD and a swizzle to assign a 1 to the z component, so that you don't need a MOV instruction later.

Example:

pdn.xy = h4tex2D(normalMap, texcoord.st).ag;

tangentN.xyz = normalize(pdn.xyy * half3(-2.0h, -2.0h, 0.0h) + half3(1.0h, 1.0h, 1.0h));

**Original:**

Okay, I'm going to try this again, and hopefully not have to make any further revisions to the article afterward.

For my project, I wanted to have sufficiently crisp normal maps so that when the camera neared a surface, it would not lose too much detail. Problem is, the kinds of texture resolutions needed to accomplish this goal would quickly become a problem for storage, both offline and at runtime. Then I came across the idea of detail mapping which, in the form I am researching, involves combining a tiled detail normal map with a surface's base normal map.

This approach, while not ideal, can work surprisingly well if you break down your surfaces by material type, and share the corresponding detail texture between multiple surfaces. Depending on the resolution of the detail map and the tiling frequency, you can produce rich surface details that will look sharp even when the camera comes in for a very close inspection. Even a 2048x2048 texture would eventually start to look blurry, while a detail-mapped surface can potentially go much further.

Now the problem inherent with this tactic is that it makes the shader more complicated. You have more textures to sample, you have to use more code to recover the normals from their compressed state, and you then have to actually combine them. This can become surprisingly expensive in no time at all, since there are usually a bunch of sqrt(), normalize(), and misc simple ops when all is said and done.

So, I looked into the options on how I could reduce the complexity of recovering and combining compressed normal maps. To my surprise, I stumbled across an approach that greatly reduces the instruction counts, while not requiring drastic changes to the textures. Partial Derivative Normal maps, as they are called, were first introduced by Insomniac Games for Ratchet and Clank Future: Tools of Destruction (link, pg 27).

According to their description, PDNs use essentially the same code whether they are a single normal map, or a base map and a detail map. They also work well with the standard DXT5 compression trick, not showing any worse compression artifacts than tangent-space normal maps using the same scheme.

So let's take a look at some comparisons of this technique versus some more standard approaches, and see how it holds up.

**Authoring:**

Fortunately, they seem to have no real quality loss when being converted from a standard tangent-space normal map. However, it is best to convert them from the floating point tangent-space data that is generated before it gets stored in an RGB8 texture. Otherwise, you will be losing quality by having low-precision integers for input/output.

Formula

// C implementations, need to clamp to [0, 1]

pdn.xy = -tangentNormal.xy / tangentNormal.z;

**Figure 1. ***1, base bump map; 2, base tangent-space normal map; 3, base PDN map, **4, detail bump map; 5, **detail** tangent-space normal map; 6, **detail** PDN map*

**Recovery:**

As I mentioned earlier, they come into play when you want to use compressed normal maps. For the sake of comparison, I will also examine the "standard" approaches using tangent-space normal maps.

Please note, I will be using 'half' types/functions extensively in my code examples, as this has proven to reduce instruction counts on Nvidia hardware. This is particularly important, since it reduces all normalize() calls from 3 instructions to 1 instruction. For comparison's sake, I have included the instruction counts of code using float types/functions as well.

As far as I know, the compressed tangent-space normal map code doesn't need a normalize(). This is a consequence of the Pythagorean theorem fomula, which should produce a Z value that will make the normal a unit vector.

Those who read the Insomniac article may notice the negation op missing from my PDN code. I actually found a way to remove that instruction by handling it in the scale-bias operation. So instead of the standard ** 2.0 - 1.0*, I use ** -2.0 + 1.0* to get the sample into the correct range, and perform the negation. This trick only shaves off one instruction, but every little bit helps.

So let's take a look at some pseudo-code:

Tangent-space (uncompressed)

tangentN.xyz = h3tex2D(normalMap, texcoord.st).xyz * 2.0 - 1.0;

tangentN.xyz = normalize(tangentN.xyz);

~5 instructions (float)

~3 instructions (half)

Tangent-space (compressed)

tangentN.xy = h4tex2D(normalMap, texcoord.st).ag * 2.0 - 1.0;

tangentN.z = sqrt(1.0 - dot(tangentN.xy, tangentN.xy));

~7 instructions (float)

~7 instructions (half)

Partial Derivative

pdn.xy = h4tex2D(normalMap, texcoord.st).ag * -2.0 + 1.0;

tangentN.xyz = normalize(half3(pdn.xy, 1.0));

~6 instructions (float)

~4 instructions (half)

**Figure 2. ***1, recovered from XY **(tangent-space)**; 2, **recovered from PDN** **(tangent-space)**; 3, difference between the two*

If you squint, you can see that they aren't perfect, and show some distortion for normals beyond 45 degrees from the Z axis. However, this artifact has proven to be negligable during my observations.

**Detail Maps:**

Okay, here's the area that mattered most, and the primary reason I considered this technique in the first place. Now, for comparison, I will be looking at two approaches that can be used for tangent-space normal maps, as well as the one approach for partial derivative normal maps.

The more "correct" approach averages the normals, but produces slightly flattened normals that seem to lose depth when shaded. I've seen some implementations use lerp(), and this would offer more control. However, since I use a value of 0.5, I can save an instruction by just averaging them together.

The other approach, which I believe is used by Unreal Engine 3 games, seems to remedy the flattened look (link, link). This one works by preserving the XY components of both the base and detail maps, while adjusting each one's contribution to the Z. In the simplest case, it seems that they just throw away the detail map's Z entirely.

Originally, when I read the article from Insomniac, I thought that when recovering and combining PDNs, you had to set each one's Z component to 1.0. So when you added them together, it would be 2.0. However, this seemed to be producing some flattened normals as well. After some experimenting, I came to the conclusion that the 1.0 is part of the final step, where you add together all the XY components of the PDNs. So it should be *float3(pdn1.xy + pdn2.xy, 1.0)*.

Pseudo-code:

Tangent-space (uncompressed)[standard]

tangentN1.xyz = h3tex2D(normalMap1, texcoord.st).xyz * 2.0 - 1.0;

tangentN2.xyz = h3tex2D(normalMap2, texcoord.st).xyz * 2.0 - 1.0;

tangentN.xyz = normalize((tangentN1.xyz + tangentN2.xyz) * 0.5);

~8 instructions (float)

~6 instructions (half)

Tangent-space (uncompressed)[UE3]

tangentN.xyz = h3tex2D(normalMap1, texcoord.st).xyz * 2.0 - 1.0;

tangentN.xy += h2tex2D(normalMap2, texcoord.st).xy * 2.0 - 1.0;

tangentN.xyz = normalize(tangentN.xyz);

~9 instructions (float)

~7 instructions (half)

Tangent-space (compressed)[standard]

tangentN1.xy = h4tex2D(normalMap1, texcoord.st).ag * 2.0 - 1.0;

tangentN1.z = sqrt(1.0 - dot(tangentN1.xy, tangentN1.xy));

tangentN2.xy = h4htex2D(normalMap2, texcoord.st).ag * 2.0 - 1.0;

tangentN2.z = sqrt(1.0 - dot(tangentN2.xy, tangentN2.xy));

tangentN.xyz = normalize((tangentN1.xyz + tangentN2.xyz) * 0.5);

~17 instructions (float)

~15 instructions (half)

Tangent-space (compressed)[UE3]

tangentN.xy = h4tex2D(normalMap1, texcoord.st).ag * 2.0 - 1.0;

tangentN.z = sqrt(1.0 - dot(tangentN.xy, tangentN.xy));

tangentN.xy += h4htex2D(normalMap2, texcoord.st).ag * 2.0 - 1.0;

tangentN.xyz = normalize(tangentN.xyz);

~12 instructions (float)

~10 instructions (half)

Partial Derivative

pdn1.xy = h4tex2D(normalMap1, texcoord.st).ag * -2.0 + 1.0;

pdn2.xy = h4tex2D(normalMap2, texcoord.st).ag * -2.0 + 1.0;

tangentN.xyz = normalize(half3(pdn1.xy + pdn2.xy, 1.0));

~9 instructions (float)

~7 instructions (half)

**Figure 3. ***1, tangent-space(uncompressed)[standard]; 2, **tangent-space(uncompressed)[UE3]; **3, tangent-space(compressed)[standard]; 4, **tangent-space(compressed)[UE3]; 5, PDN*

**Figure 4. ***tangent-space compressed-PDN difference*

**Conclusions:**

So I think that just about covers everything. Hopefully you can see how partial derivative normal maps can be used efficiently for both a single and detail mapped surface. Seeing the negligible difference of quality, but considerable difference of performance vs standard compressed tangent-space normal maps.

Considering their benefits, I'm surprised they haven't seen more wide-spread use. Suffice it to say, they have been working well for my project, thus far. I'm curious to see if anyone else will give them a try after reading this post. ;)

My current plan is to release a demo application comparing all these techniques in simple lighting environment. I'm not sure when this will be available, but I will keep everyone posted. Please try to forgive my almost OCD attempts to revise this article. However, I wanted the most complete set of information that was available.

If you have any comments, critiques, questions, please feel free to speak up! ;)

Article originally appeared on Crunchy Bytes (http://n00body.squarespace.com/).

See website for complete article licensing information.