This sample demonstrates the use of the extension GL_INTEL_fragment_shader_ordering, which is written against the OpenGL* 4.4 core profile and GLES 3.1 specifications. The minimum OpenGL version required is 4.2 or ARB_shader_image_load_store. The extension introduces a new GLSL built-in function, beginFragmentShaderOrderingINTEL(), which blocks execution of a fragment shader invocation until invocations from previous primitives that map to the same xy window coordinates have completed. The sample makes use of this behavior to provide a real-time solution for providing order-independent transparency (OIT) in a typical 3D scene.
Transparency is a fundamental challenge in real-time rendering due to the difficulty of compositing an arbitrary number of transparent layers in the right order. This sample builds on the work originally detailed in the articles on adaptive-transparency and multi-layer-alpha-blending by Marco Salvi, Jefferson Montgomery, Karthik Vaidyanathan, and Aaron Lefohn. These articles show how transparency can closely approximate the ground-truth results obtained from A-buffer compositing, but can be between 5x and 40x faster by using various lossy compression techniques to compress the transparency data. This sample demonstrates an algorithm based on these compression algorithms, suitable for inclusion in a real-time application such as a game.
The Transparency Challenge
The sample rendering the test scene using normal alpha transparency blending is shown in Figure 1:
Figure 1: OIT Sample
The geometry is rendered in a fixed order: ground followed by the objects inside the dome, then the dome, and finally the outside plants. Solid objects are drawn first and update the depth buffer, and then transparent objects are drawn in the same order without updating the depth buffer. The zoomed-in area highlights one of the visual artifacts resulting from this: the foliage is inside the dome but in front of several planes of glass. Unfortunately, the rendering order means that all of the glass planes, even the ones behind the foliage, are drawn on top. Having the transparent object update the depth buffer creates a different set of problems. Traditionally this could only be solved by breaking the objects into smaller pieces and sorting them front to back based on the camera view point. Even then, it's not perfect as objects can intersect, and the cost of rendering increases as more objects are sorted and drawn.
Figure 2 and Figure 3 show the visual artifact zoomed in, with all the glass planes drawn before the foliage in Figure 2 and correctly sorted in Figure 3.
Figure 2: Unsorted
Figure 3: Sorted
Real-time Order-Independent Transparency
There have been several attempts to solve the compositing of arbitrarily ordered geometric primitives without the need to sort on the CPU or to break the geometry down into non-intersecting elements. These include depth-peeling, which requires the geometry to be submitted multiple times and A-buffer techniques, where all fragments that contribute to a given pixel are stored in a linked list, sorted, and then blended in the correct order. Despite the A-buffer's success in offline renderers it has not been adopted by the real-time rendering community due to its unbounded memory requirements and generally low performance.
A New Approach
Rather than the A-buffer approach of storing all color and depth data in per-pixel lists and then sorting and compositing them, this sample uses the work of Marco Salvi and re-factors the alpha-blending equation to avoid recursion and sorting and produces a "visibility function" (Figure 4):
Figure 4: Visibility Function
The number of steps in the visibility function corresponds to the number of nodes used to store visibility information on a per-pixel level during the render stage. As pixels are added they are stored in the node structure until it is full. Then on attempted insertion of more pixels, the algorithm calculates which previous nodes can be merged to create the smallest variation in the visibility function while maintaining the data set size. The final stage is to evaluate the visibility function vis() and composite fragments using the formula final_color=.
The sample renders the scene in the following stages:
Given that the cost of reading the Shader Storage Buffer Objectin the resolve stage can be very high due to the bandwidth required, an optimization that this sample uses is to use the stencil buffer to mask the areas where transparent pixels would be blended into the frame buffer.
The gain from using the stencil buffer can be seen in the new cost of the resolve phase, which drops by 80%, although this is very dependent on the percentage of the screen that is covered by transparent geometry. The greater the percentage of the screen covered by transparent objects the smaller the performance win.
void PSOIT_InsertFragment_NoSync( float surfaceDepth, vec4 surfaceColor )
// Load AOIT data
// Update AOIT data
1.0f - surfaceColor.w, // transmittance = 1 - alpha
// Store AOIT data
Figure 7: GLSL Shader Storage Buffer Code
The algorithm above can be implemented on any device that supports Shader Storage Buffer Objects, but there is one very important flaw as currently described: it's possible to have multiple fragments in flight that map to the same window xy coordinates.
If multiple fragments run on the same xy coordinates at the same time, they will all read the same starting data in PSOIT_LoadDataUAV but will end up with different values that they try and store inPSOIT_StoreDataUAV with the last to complete overwriting the others that were processed. The effect of this is the compression routine can vary between frames, and this can be seen in the sample by disabling Pixel Sync. The user should see a subtle shimmer on areas where there are overlapping transparencies. The zoom feature was implemented to make this easier to see. The more fragments the GPU can execute in parallel, the greater the likelihood that the shimmer is visible.
By default, the sample avoids this problem by using the new GLSL built-in function,beginFragmentShaderOrderingINTEL(), which can be used when the extension stringGL_INTEL_fragment_shader_ordering is shown for the hardware. ThebeginFragmentShaderOrderingINTEL() function blocks fragment shader execution until completion of all shader invocations from previous primitives that map to the same window xy coordinates. All memory transactions from previous fragment shader invocations mapped to the same xy window coordinates are made visible to the current fragment shader invocation when this function returns. This allows the merging of the previous fragments to create the visibility function in a deterministic manner. ThebeginFragmentShaderOrderingINTEL function has no effect on shader execution for fragments with non-overlapping window xy coordinates.
An example of how to call beginFragmentShaderOrderingINTEL is shown in Figure 8.
GLSL code example
layout(binding = 0, rgba8) uniform image2D image;
... compute output color
if (color.w > 0) // potential non-uniform control flow
... read/modify/write image // ordered access guaranteed
... no ordering guarantees (as varying branch might not be taken)
... update image again // ordered access guaranteed
Figure 8: beginFragmentShaderOrderingINTEL
Note there is no explicit built-in function to signal the end of the region that should be ordered. Instead, the region that will be ordered logically extends to the end of fragment shader execution.
In the case of the OIT sample, it's simply added as shown in Figure 9:
void PSOIT_InsertFragment( float surfaceDepth, vec4 surfaceColor )
// from now on serialize all UAV accesses (with respect to other fragments shaded in flight which map to the same pixel)
PSOIT_InsertFragment_NoSync( surfaceDepth, surfaceColor );
Figure 9: Adding Fragment ordering to Shader Storage Buffer access
This is called from any fragment shader that potentially writes transparent fragments as shown in Figure 10.
out vec4 fragColor;// -------------------------------------
void main( )
vec4 result = vec4(0,0,0,1);
// Alpha-related computation
float alpha = ALPHA().x;
result.a = alpha;
vec3 normal = normalize(outNormal);
// Specular-related computation
vec3 eyeDirection = normalize(outWorldPosition - EyePosition.xyz);
vec3 Reflection = reflect( eyeDirection, normal );
float shadowAmount = 1.0;
// Ambient-related computation
vec3 ambient = AmbientColor.rgb * AMBIENT().rgb;
result.xyz += ambient;
vec3 lightDirection = -LightDirection.xyz;
// Diffuse-related computation
float nDotL = max( 0.0 ,dot( normal.xyz, lightDirection.xyz ) );
vec3 diffuse = LightColor.rgb * nDotL * shadowAmount * DIFFUSE().rgb;
result.xyz += diffuse;
float rDotL = max(0.0,dot( Reflection.xyz, lightDirection.xyz ));
vec3 specular = pow(rDotL, 8.0 ) * SPECULAR().rgb * LightColor.rgb;
result.xyz += specular;
fragColor = result;
if(fragColor.a > 0.01)
PSOIT_InsertFragment( outPositionView.z, fragColor );
fragColor = vec4(1.0,1.0,0.0,0.0);
Figure 10: Typical material Fragment Shader
Only fragments that have an alpha value above a threshold are added to the Shader Storage Buffer Object, effectively culling any fragments that wouldn't end up contributing any meaningful data to the scene.
Building the Sample
Install the latest Android SDK and NDK:
Add the NDK and SDK to your path:
1. cd to OIT_2014\OIT_Android folder
2. First time only, you may need to initialize your project:
android update project –path . --target android-19.
3. Build the NDK component:
4. Build the APK:
5. Install the APK:
adb install -r bin\NativeActivity-debug.apk or ant installd
6. Run it
The sample demonstrates how the research into adaptive order-independent transparency by Marco Salvi, Jefferson Montgomery, Karthik Vaidyanathan, and Aaron Lefohn originally done on high end discrete video cards using DirectX 11* can be implemented in real time on an Android tablet using GLES 3.1 and fragment shader ordering. The algorithm runs inside a fixed memory footprint that can be varied based on the required visual fidelity. Optimizations such as the stencil buffer allow the technique to be implemented on a wide range of hardware at acceptable performance providing a practical solution for one of the most challenging problems in real-time rendering. The principles shown in the order-independent transparency sample can be applied to a variety of other algorithms that would normally create per-pixel linked lists including volumetric shadowing techniques and post processing anti-aliasing.
For more such Android resources and tools from Intel, please visit the Intel® Developer Zone