After creating the base test with convolution matrices, the project moved onto the main artifact creation of complex lighting optimised through parallel processing techniques in the compute shader. The first phase of this was to generate the large amount of data to process to simulate performance cliffs in game situations. The data is stored in multiple structured buffers and textures (2D, 2DMS, and potentially 3D), and create enough lighting situations to require non trivial ALU computations.
With the render pipeline in mind, deferred shading is the only approach that would utilise the geometry and pixel functions of the hardware while allowing computations at a separate stage. The unoptimised approach renders all of the scene geometry into several full screen buffers. The outputs are the depth buffer, a buffer for the albedo values, a buffer with the 3D positions (in a given space usually world or view), and a buffer for the normals.
![]() |
| Albedo Buffer |
![]() |
| Normal Buffer |
The first optimisation was to not store a position buffer, as the position can be reconstructed from the depth buffer and the screen x,y coordinates (or thread dispatch ID if in the compute shader). The trade off for removal of this buffer is a transformation from screen space x,y to homogeneous clip space to view or world space. The vector math is trivial for the hardware and is much more advantageous than storing in memory with the texture grab requirement. Further optimisations can be made through reducing the amount of data that must be stored and packing unused elements with lighting data. For example the normals can be represented as two floats rather than three through spherical or stereographic mapping since the length of each normal will always be 1 and the z component can be reconstructed in similar fashion to map projections. If the normal only uses 2 floats the rest of the pixel information can pack the specular amounts and powers and reduce the need for an extra buffer.
Another step that is still in progress is to store the gradient of the position z component. Along with the normal information the change in z can help identify changes in surfaces from one object to another form a screen space buffer. This will be useful to create edge detection algorithms as extra sampling is usually required at the edges of objects.
![]() |
| delta z buffer |
This buffer might also be able to be optimised as it appears the rate of change will not require the full 32 bits and could be packed to hold more data.
With all of the data created using shaders as dictated for each model, but sent to MRT rather than the back buffer, now all calculations can be done independently from the individual objects and their materials and now work in terms of screen space.



Exceptional post about gathering data.
ReplyDeleteSample CV Template