Since we're in summer finally I thought everybody could use some cold, and that's why I came up with a Deformable Snow shader system. It's actually a project that I finished some three years ago, but working on Gridd: Retroenhanced first and as a freelances after has been quite a commitment, and I was able to finalize it only recently.
Now, be advised that this is not an easy challenge, because it requires knowledge of the rendering pipeline, of data precision and storage, and of a bit of scripting. If you feel confident go ahead, I'll try my best to be as clear as possible.
The idea came from this fantastic presentation by the amazing Colin Barré-Brisebois, held at GDC in 2014, about how they implemented the deformable snow system in Batman: Arkham Origins.
You can watch the techinque in action here:
The technique is essentially composed of:
- An ortographic camera with a very short frustum, pointing upwards, which renders a depth texture;
- A shader which processes the depth texture rendered by that camera;
- A snow shader which blends from soft snow to trampled snow based on the texture processed by the shader from point 2.
I'm gonna break down the challenge in 3 different articles to improve readability
- The Depth Processor Shader
- The Command Buffer
- The Snow Shader
Part 1: The Depth Processor Shader
So, first things first. As you can see from slide 20, the depth processing in a generic frame should work more or less like this:
- Get the previous depth buffer, in order to mantain persistent changes in the snow surface;
- Get the current depth texture;
- Combine them, optionally with some sort of "accumulation" effect to make the trampled snow grow back over time in cases of snow falling.
Knowing this, our fragment function could be something like
fixed4 frag(v2f i) : SV_Target
{
//sample previous accumulated depth float
accumulatedDepth = tex2D(_MainTex, i.uv).r;
//invert x component of uv because camera is flipped
i.uv.x = 1.0 - i.uv.x;
//sample current depth value from depth texture float
depthValue = Linear01Depth(tex2D(_CameraDepthTexture, i.uv).r);
//compute time factor as 1.0 / persistence if persistence > 0.0, or 0.0 otherwise
float timeF = lerp(0.0, clamp((1.0 / _PersistenceF), 0.0, 1.0), sign(_PersistenceF));
//sum the time factor to the accumulated depth, multiply for the current depth value to keep zone flat
float actualDepth = clamp((accumulatedDepth + timeF) * depthValue, 0.0, 1.0);
//return the actual depth for the frame
return fixed4(actualDepth, actualDepth, actualDepth, 1.0);
}It's all fun and games, 'till someone loses his precision. In fact, if you raise the value of _PersistenceF over 256.0, the snow will stop growing back. Moreover, you have a very limited range of time span to choose between to adjust the grow speed. If we take a look at this page in Unity documentation, we'll find out why. The fixed4 data type is described as such:
Lowest precision fixed point value. Generally 11 bits, with a range of –2.0 to +2.0 and 1/256th precision.
And that's the reason why we're having such issues. The solution is only one: use a full float precision value to store the depth in our texture. The approach is different depending on the hardware we're working on: on modern hardware we can declare a texture with a RFloat format, meaning that it will have the red channel only, with a 32 bit full-float precision. Nice and easy. The values are retrieved with tex2D(_Texture, uv).r and the frag function output should be float4 (of which only the first channel will actually be written on the texture).
On older hardware we can't rely on RFloat textures, so the idea is to encode our float into the 4 channels of a standard ARGB32 texture and decode it back for calculations. Looking here, we find out that Unity actually has a couple of nice functions that do just what we need: float4 EncodeFloatRGBA (float v) and float DecodeFloatRGBA (float4 enc). If you want to dig deeper into their functioning, this blog post from Aras explain how they actually work.
NOTE: the encoding function works with input in [0..1) range, meaning that 1.0 WILL NOT BE ENCODED PROPERLY. This happens because the encoding function relies on the frac(n) function, which returns the fractional part of a floating point number. frac(1.0) will indeed return 0.0, which is mathematically correct, but that will screw up the encoding process. This means that we need to clamp the depth in a [0.0, 0.999999] range to make the encoding function work with an acceptable loss of precision.
My approach to support the ARGB fallback without too much code rewriting was encapsulating the encoding-decoding portions of the frag function inside a conditional-compile block, and enabling shader multi compile with a keyword, like so: #pragma multi_compile ______ FALLBACK_ARGB. The FALLBACK_ARGB global shader keyword is then enabled via script if RFloat texture format is not supported (but we'll get back to the scripting part later).
Blur is also mentioned in the presentation: especially useful with low res depth buffer, it gives a smoother look to the traces left by our objects. Note that blur tends to be a resource-consuming thing to do, and as such should be treated carefully.
I tried different approaches, the first being separable gaussian filtering, but altough it has many proven advatages in blurring, it has a couple of tradeoffs: the first is the (relatively small) time cost of set up another render pass (which obviously is less than the time for a single-pass gaussian blur for large kernels, but remember: we're talking about a 3x3 kernel here), and the second, bigger tradeoff is that you'll need another texture, the size of the one you're blurring, to separate the horizontal and vertical passes. Now, normally for a screen blur effect this is an expected cost, especially because it's just one texture, but consider the use case of our technique: this would require one additional texture for each snow roof, and even if we keep the resolution as low as 512x512 pixels, thing can quickly add up and consume a lot of memory.
We've two possible solutions left: a simple box filter, or a poisson filter. Either ways are fine in my opinion, mostly because we'll not directly see the blurred texture but rather the resultant height, but I went with a 4 tap bilinear poisson filter, as suggested in the slides. It's quite diffcult to find information on poisson filtering because gaussian filtering infos keeps popping up everywhere, so i'll try to do brief explanation here and provide some links to extend your research.
The Poisson Distribution
The Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event. (yup, quoting Wikipedia here). The important fact for us is that under certain circumstances, the Poisson distribution can be transformed into a Gaussian distribution, and here's when things start to get interesting. (if you want to dig deeper into this, you can go to this thread on stackexchange/math) The Poisson distribution has many application in the game industry, for example algorithms to populate natural worlds with objects in a pseudo-random way, but within the computer graphic domain, it has the property of generate random points within the unit circle (look mom: a kernel!) that can be rotated and scaled arbitrarily. I found a beautiful tool from coderhaus which lets you generate a determined number of points with a fixed distance between them, and you have the source code too, so it's really worthy to open it up and take a look how it works. EDIT: sadly coderhaus is no longer reachable, so the tool is not available anymore. :c
Anyway, back to our Depth Processor Shader. This pass is quite simple: define an array of float2 which holds our Poisson kernel, and sum the samples, offsetting the uv by the poisson kernel value times the texel size (to keep them into the unit circle around our current pixel), much like so:
[...]
static const float2 poisson_kernel_4[4] =
{
float2( 0.4247072, -0.4262313),
float2(-0.3010053, 0.3568736),
float2( 0.8125032, 0.3971981),
float2(-0.4083271, -0.8709177)
};
[...]
//initialize col value
float col = tex2D(_MainTex, i.uv);
//loop trough a small poisson kernel, sample and add
for (uint j = 0u; j < 4u; j++) {
//the kernel is modulated by the texel size to sample around the current pixel
col += tex2D(_MainTex, i.uv + (poisson_kernel_4[j] * _MainTex_TexelSize.xy)); }
//get the mean of the sampled values
col /= 5.0;
//return color value
return float4(col, 0.0, 0.0, 0.0);
[...]As before, I added a #pragma multi_compile directive to support the ARGB fallback we discussed before: the code is almost identical, just remember to decode the float from the texture fetches and encode it back before returning.
You can find the complete shader code here on GitHub
This is the end of Part 1. You can find Part 2 here and Part 3 here