Deformable Snow Shader System - Part 2: The Command Buffer

Article / 25 June 2019

Part 2: The Command Buffer

Now that we have our Depth Processing shader, it's time to built the structure to see it in action! Luckily for us, Unity has undertook a very convenient path to render customization, starting with command buffers back in Unity 5.0. The next, greatly awaited step was introducing customizable rendering pipelines in Unity 2019 (can't wait to dive in it!) but for the sake of retrocompatibility we'll stick with command buffers. The general idea is very simple: add a bunch of low-level graphic commands to be executed every frame, before or after a certain stage of the rendering pipeline.

In our case, the depth processing should occur before we actually render the snow, and of course after the depth texture itself is rendered, so attaching the command buffer to our depth camera using CameraEvent.BeforeForwardOpaque enum value should be fine.

What do we need to put in our command buffer then?

We will need a texture to be our depth buffer to which output our calculations, and another texture as a support buffer, to copy the current state of our calculation just before updating it (that is because, of course, you cannot read and write into the same texture within a single blit operation)

The most basic version of the SetupCommandBuffer() method could look like this:

void SetupCommandBuffer()
{
cb = new CommandBuffer();
cb.name = "Depth Processor Buffer"; 
//get the RenderTargetIdentifiers for our render textures
RenderTargetIdentifier dst = new RenderTargetIdentifier(depthBuffer);
RenderTargetIdentifier dstSupport = new RenderTargetIdentifier(depthBufferSupport); 
//copy the dst buffer into the supportDst, which will be used in the next step
cb.Blit(dst, dstSupport);
//blit the rendered camera depth into the depthBuffer
cb.Blit(dstSupport, dst, depthProcessorMaterial, 0);
//set the depth buffer as the height texture
cb.SetGlobalTexture(heightTexID, dst); 
//add the command buffer before forward opaque rendering, in order to have
//the height texture available at render time
snowCamera.AddCommandBuffer(CameraEvent.BeforeForwardOpaque, cb);
}

Of course, as always, there's room for improvement. For instance, the CommandBuffer class provides a method specifically intended to copy one texture to another, namely CommandBuffer.CopyTexture(), which is actually faster than Blit() because it directly copy values from a texture to another, without drawing a full screen quad with the default blit shader. The downside is that this operation is not supported on every platform, and has some specific restriction, so is a good idea to perform a support check with the SystemInfo.copyTextureSupport enum, and choose between CopyTexture() and Blit() accordingly. Since we mentioned the smoothing phase before, we could add another texture (called dstBlurred from now on) and blit the dst buffer into it, using the second pass of our DepthProcessing material.

NOTE: remember to set dstBlurred as the global texture instead of dst, and DO NOT BLIT dstBlurred back to any of the other two buffer. I mean, you can try and see by yourself why it's not a very good idea, at least for snow trails. I could use this trick in another experiment with water/mud trails maybe, since it looks promising in this kind of situations.

Last thing to consider is that I exposed the choice between blur and no blur as a class parameter to improve flexibility on older systems. Note that once the command buffer is attached to a camera is unmodifiable, so you will need to remove it from the camera, dispose it, and regenerate it calling SetupCommandBuffer() again.

You can take a look at the system in action here:

You can find the complete code here on GitHub

This is the end of Part 2. You can find Part 1 here and Part 3 here

Report

Deformable Snow Shader System - Part 1: The Depth Processor Shader

Article / 25 June 2019

Since we're in summer finally I thought everybody could use some cold, and that's why I came up with a Deformable Snow shader system. It's actually a project that I finished some three years ago, but working on Gridd: Retroenhanced first and as a freelances after has been quite a commitment, and I was able to finalize it only recently.

Now, be advised that this is not an easy challenge, because it requires knowledge of the rendering pipeline, of data precision and storage, and of a bit of scripting. If you feel confident go ahead, I'll try my best to be as clear as possible.

The idea came from this fantastic presentation by the amazing Colin Barré-Brisebois, held at GDC in 2014, about how they implemented the deformable snow system in Batman: Arkham Origins.

You can watch the techinque in action here:

The technique is essentially composed of:

An ortographic camera with a very short frustum, pointing upwards, which renders a depth texture;
A shader which processes the depth texture rendered by that camera;
A snow shader which blends from soft snow to trampled snow based on the texture processed by the shader from point 2.

I'm gonna break down the challenge in 3 different articles to improve readability

The Depth Processor Shader
The Command Buffer
The Snow Shader

Part 1: The Depth Processor Shader

So, first things first. As you can see from slide 20, the depth processing in a generic frame should work more or less like this:

Get the previous depth buffer, in order to mantain persistent changes in the snow surface;
Get the current depth texture;
Combine them, optionally with some sort of "accumulation" effect to make the trampled snow grow back over time in cases of snow falling.

Knowing this, our fragment function could be something like

fixed4 frag(v2f i) : SV_Target
{
//sample previous accumulated depth float
accumulatedDepth = tex2D(_MainTex, i.uv).r;
//invert x component of uv because camera is flipped
i.uv.x = 1.0 - i.uv.x;
//sample current depth value from depth texture float
depthValue = Linear01Depth(tex2D(_CameraDepthTexture, i.uv).r);
//compute time factor as 1.0 / persistence if persistence > 0.0, or 0.0 otherwise
float timeF = lerp(0.0, clamp((1.0 / _PersistenceF), 0.0, 1.0), sign(_PersistenceF));
//sum the time factor to the accumulated depth, multiply for the current depth value to keep zone flat
float actualDepth = clamp((accumulatedDepth + timeF) * depthValue, 0.0, 1.0);
//return the actual depth for the frame
return fixed4(actualDepth, actualDepth, actualDepth, 1.0);
}

It's all fun and games, 'till someone loses his precision. In fact, if you raise the value of _PersistenceF over 256.0, the snow will stop growing back. Moreover, you have a very limited range of time span to choose between to adjust the grow speed. If we take a look at this page in Unity documentation, we'll find out why. The fixed4 data type is described as such:

Lowest precision fixed point value. Generally 11 bits, with a range of –2.0 to +2.0 and 1/256th precision.

And that's the reason why we're having such issues. The solution is only one: use a full float precision value to store the depth in our texture. The approach is different depending on the hardware we're working on: on modern hardware we can declare a texture with a RFloat format, meaning that it will have the red channel only, with a 32 bit full-float precision. Nice and easy. The values are retrieved with tex2D(_Texture, uv).r and the frag function output should be float4 (of which only the first channel will actually be written on the texture).

On older hardware we can't rely on RFloat textures, so the idea is to encode our float into the 4 channels of a standard ARGB32 texture and decode it back for calculations. Looking here, we find out that Unity actually has a couple of nice functions that do just what we need: float4 EncodeFloatRGBA (float v) and float DecodeFloatRGBA (float4 enc). If you want to dig deeper into their functioning, this blog post from Aras explain how they actually work.

NOTE: the encoding function works with input in [0..1) range, meaning that 1.0 WILL NOT BE ENCODED PROPERLY. This happens because the encoding function relies on the frac(n) function, which returns the fractional part of a floating point number. frac(1.0) will indeed return 0.0, which is mathematically correct, but that will screw up the encoding process. This means that we need to clamp the depth in a [0.0, 0.999999] range to make the encoding function work with an acceptable loss of precision.

My approach to support the ARGB fallback without too much code rewriting was encapsulating the encoding-decoding portions of the frag function inside a conditional-compile block, and enabling shader multi compile with a keyword, like so: #pragma multi_compile ______ FALLBACK_ARGB. The FALLBACK_ARGB global shader keyword is then enabled via script if RFloat texture format is not supported (but we'll get back to the scripting part later).

Blur is also mentioned in the presentation: especially useful with low res depth buffer, it gives a smoother look to the traces left by our objects. Note that blur tends to be a resource-consuming thing to do, and as such should be treated carefully.

I tried different approaches, the first being separable gaussian filtering, but altough it has many proven advatages in blurring, it has a couple of tradeoffs: the first is the (relatively small) time cost of set up another render pass (which obviously is less than the time for a single-pass gaussian blur for large kernels, but remember: we're talking about a 3x3 kernel here), and the second, bigger tradeoff is that you'll need another texture, the size of the one you're blurring, to separate the horizontal and vertical passes. Now, normally for a screen blur effect this is an expected cost, especially because it's just one texture, but consider the use case of our technique: this would require one additional texture for each snow roof, and even if we keep the resolution as low as 512x512 pixels, thing can quickly add up and consume a lot of memory.

We've two possible solutions left: a simple box filter, or a poisson filter. Either ways are fine in my opinion, mostly because we'll not directly see the blurred texture but rather the resultant height, but I went with a 4 tap bilinear poisson filter, as suggested in the slides. It's quite diffcult to find information on poisson filtering because gaussian filtering infos keeps popping up everywhere, so i'll try to do brief explanation here and provide some links to extend your research.

The Poisson Distribution

The Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event. (yup, quoting Wikipedia here). The important fact for us is that under certain circumstances, the Poisson distribution can be transformed into a Gaussian distribution, and here's when things start to get interesting. (if you want to dig deeper into this, you can go to this thread on stackexchange/math) The Poisson distribution has many application in the game industry, for example algorithms to populate natural worlds with objects in a pseudo-random way, but within the computer graphic domain, it has the property of generate random points within the unit circle (look mom: a kernel!) that can be rotated and scaled arbitrarily. I found a beautiful tool from coderhaus which lets you generate a determined number of points with a fixed distance between them, and you have the source code too, so it's really worthy to open it up and take a look how it works. EDIT: sadly coderhaus is no longer reachable, so the tool is not available anymore. :c

Anyway, back to our Depth Processor Shader. This pass is quite simple: define an array of float2 which holds our Poisson kernel, and sum the samples, offsetting the uv by the poisson kernel value times the texel size (to keep them into the unit circle around our current pixel), much like so:

[...]
static const float2 poisson_kernel_4[4] =
{
float2( 0.4247072, -0.4262313),
float2(-0.3010053,  0.3568736),
float2( 0.8125032,  0.3971981),
float2(-0.4083271, -0.8709177)
};
[...]
//initialize col value
float col = tex2D(_MainTex, i.uv);
//loop trough a small poisson kernel, sample and add
for (uint j = 0u; j < 4u; j++) {
//the kernel is modulated by the texel size to sample around the current pixel
col += tex2D(_MainTex, i.uv + (poisson_kernel_4[j] * _MainTex_TexelSize.xy)); }
//get the mean of the sampled values
col /= 5.0;
//return color value
return float4(col, 0.0, 0.0, 0.0); 
[...]

As before, I added a #pragma multi_compile directive to support the ARGB fallback we discussed before: the code is almost identical, just remember to decode the float from the texture fetches and encode it back before returning.

You can find the complete shader code here on GitHub

This is the end of Part 1. You can find Part 2 here and Part 3 here

Report