Here is my SSAO implementation. I’ve got some inspirations from HBAO (DICE and NVidia), but it is a more optimized method.
I’ve also implemented Crysis 1 SSAO for better comparison.
The time values you see on the image are only SSAO render times. Screen resolution is 1280*720, and the SSAO is rendered in full resolution on a GeForce GTS450 card. ( FPS = 1000 / time )
A new method for false occlusion removal is also implemented:
It’s a combination of a method implemented by Crytek in Crysis1, and my own method. It still needs further optimizations.
Another improvement I’ve made is using a half resolution depth buffer for the SSAO rendering. The method is used in Uncharted 2.
As you can see on the picture, this method almost doubles the speed of SSAO, with virtually no visual artifacts:
From a technical point of view, the main bottleneck of SSAO algorithms is their texture fetches. More specifically because of their random sampling nature, the amount of GPU cache trashing occurrence is high. And cache misses decrease shader performance dramatically.
So any method that reduces cache trashing would boost up the SSAO.
By using a half resolution depth buffer, one of the main benefits is that texture sampling points get closer, so the amount of cache trashing is reduced.
Also as seen in the results, there is no noticeable difference or artifacts.