Chapter 13. Occlusion Queries: Why Do More Work Than You Need To?

by Benjamin Lipchak

WHAT YOU’LL LEARN IN THIS CHAPTER:

Image

Complex scenes contain hundreds of objects and thousands upon thousands of polygons. Consider the room you’re in now, reading this book. Look at all the furniture, objects, and other people or pets, and think of the rendering power needed to accurately represent their complexity. Several readers will find themselves happily sitting on a crate near a computer in an empty studio apartment, but the rest will envision a significant rendering workload around them.

Now think of all the things you can’t see: objects hidden behind other objects, in drawers, or even in the next room. From most viewpoints, these objects are invisible to the viewer. If you rendered the scene, the objects would be drawn, but eventually something would draw over the top of them. Why bother doing all that work for nothing?

Enter occlusion queries. In this chapter, we describe a powerful new feature included in OpenGL 1.5 that can save a tremendous amount of vertex and pixel processing at the expense of a bit of extra nontextured fill rate. Often this trade-off is a very favorable one. We explore the use of occlusion detection and witness the dramatic increase in frame rates this technique affords.

The World Before Occlusion Queries

To show off the improved performance possible through the use of occlusion queries, we need an experimental control group. We’ll draw a scene without any fancy occlusion detection. The scene is contrived so that there are plenty of objects both visible and hidden at any given time.

First, we’ll draw the “main occluder.” An occluder is a large object in a scene that tends to occlude, or hide, other objects in the scene. An occluder is often low in detail, whereas the objects it occludes may be much higher in detail. Good examples are walls, floors, and ceilings. The main occluder in this scene is a grid made out of six walls, as illustrated in Figure 13.1. Listing 13.1 shows how the walls are actually just scaled cubes.

Figure 13.1. Our main occluder is a grid constructed out of six walls.

Image

Listing 13.1. Main Occluder with Six Scaled and Translated Solid Cubes

Image

In each grid compartment, we’re going to put a highly tessellated textured sphere. These spheres are our “occludees,” objects possibly hidden by the occluder. We need the high vertex count and texturing to accentuate the rendering burden so that we can subsequently relieve that burden courtesy of occlusion queries.

Figure 13.2 shows the picture resulting from Listing 13.2. If you find this workload too heavy, feel free to reduce the tessellation in glutSolidSphere from the 100s to smaller numbers. Or if your OpenGL implementation is still hungry for more, go ahead and increase the tessellation, introduce more detailed textures, or consider using shaders as described in subsequent chapters.

Figure 13.2. Twenty-seven high-detail spheres will act as our occludees.

Image

Listing 13.2. Drawing 27 Highly Tessellated Spheres in a Color Cube

Image

Listing 13.2 marks the completion of our picture. If we were happy with the rendering performance, we could end the chapter right here. But if the sphere tessellation or texture detail is cranked up high enough, frame rates should be unacceptable. So read on!

Bounding Boxes

The theory behind occlusion detection is that if an object’s bounding volume is not visible, neither is the object. A bounding volume is any volume that completely contains the object. The whole point of occlusion detection is to cheaply draw a simple bounding volume to find out whether you can avoid drawing the actual complex object. So the more complex our bounding volume is, the more it negates the purpose of the optimization we’re trying to create.

The simplest bounding volume is a cube, also called a bounding box. Eight vertices, six faces. You can easily create a bounding box for any object just by scanning for its minimum and maximum coordinates on each of the x-, y-, and z-axes. For our spheres with a 50-unit radius, a bounding box with sides of length 100 units will fit perfectly.

Be aware of the trade-off when using such a simple and arbitrary bounding volume. The bounding volume may have very few vertices, but it will touch many more pixels than the original object would have touched. With a few additional strategically placed vertices, you can turn your bounding box into a more useful shape and significantly reduce the fill-rate overhead. Fortunately, the bounding box is drawn without any fancy texturing or shading, so its overall fill-rate cost will most often be less than the original object anyway. Figure 13.3 shows an example of how different bounding volume shapes affect pixel count and vertex count using a sphere as the occludee. The choice of bounding volume depends entirely on your object shape, since you are no doubt drawing objects more interesting than spheres.

Figure 13.3. Various bounding volumes around a sphere with their pros and cons.

Image

When we draw our bounding volumes, we’re going to enable an occlusion query that will count the number of fragments that pass the depth test. Therefore, we don’t care how the bounding volumes look. In fact, we don’t even need to draw them to the screen at all. So we’ll shut off all the bells and whistles before rendering the bounding volume, including writes to the depth and color buffers:

glShadeModel(GL_FLAT);
// Texturing is already disabled
...
glDisable(GL_LIGHTING);
glDisable(GL_COLOR_MATERIAL);
glDisable(GL_NORMALIZE);
glDepthMask(GL_FALSE);
glColorMask(0, 0, 0, 0);

After all this talk about occlusion queries, we’re finally going to create some. But first, we need to come up with names for them. Here, we request 27 names, one for each sphere’s query, and we provide a pointer to the array of GLuint data where the new names should be stored:

// Generate occlusion query names
glGenQueries(27, queryIDs);

When we’re done with them, we delete the query objects, indicating that there are 27 names to be deleted in the provided GLuint array:

glDeleteQueries(27, queryIDs);

Occlusion query objects are not bound like other OpenGL objects, such as texture objects and buffer objects. Instead, they’re created by a call to glBeginQuery. This marks the beginning of our query. The query object has an internal counter that keeps track of the number of fragments that would make it to the framebuffer—if we hadn’t shut off the color buffer’s write mask. Beginning the query resets this counter to zero to start a fresh query.

Then we draw our bounding volume. The query object’s internal counter is incremented every time a fragment passes the depth test, and thus is not hidden by our main occluder, the grid which we’ve already drawn. For some algorithms, it’s useful to know exactly how many fragments were drawn, but for our purposes here, all we care about is whether the counter is zero or nonzero. This value corresponds to whether any part of the bounding volume is visible or whether all fragments were discarded by the depth test.

When we’re finished drawing the bounding volume, we mark the end of the query by calling glEndQuery. This tells OpenGL we’re done with this query and lets us continue with another query or ask for the result back. Because we’re drawing 27 spheres, we can improve the performance by using 27 different query objects. This way, we can queue up the drawing of all 27 bounding volumes without disrupting the pipeline by waiting to read back the query results in between.

Listing 13.3 illustrates the rendering of the bounding volumes, bracketed by the beginning and ending of the query. Then we proceed to possibly draw the actual spheres. Notice the code for visualizing the bounding volume whereby we leave the color buffer’s write mask enabled. This way, we can see and compare the different bounding volume shapes.

Listing 13.3. Beginning the Query, Drawing the Bounding Volume, Ending the Query, Then Moving on to Redraw the Actual Scene

Image

Image

Image

Image

DrawSphere contains the magic where we decide whether to actually draw the sphere. Our query results are waiting for us inside the 27 query objects. Let’s find out which are hidden and which we have to draw.

Querying the Query Object

The moment of truth is here. The jury is back with its verdict. We want to draw as little as possible, so we’re hoping each and every one of our queries resulted in no fragments being touched. But if you think about this grid of spheres, you know that’s not going to happen.

No matter from what angle we’re looking at our grid, unless we zoom way in, there will always be at least 9 spheres in view. Worst case is you’ll see all the spheres on three faces of our grid: 19 spheres. Still, in that worst case, we save ourselves from drawing 8 spheres. That’s almost a 30% savings in rendering costs. Best case, we save 66%, skipping 18 spheres. If we zoom in on a single sphere, we could conceivably avoid drawing 26 spheres!

So how do you determine your luck? You simply query the query object. That sounds confusing, but this is a regular old query for OpenGL state. It just happens to be from something called a query object. In Listing 13.4, we call glGetQueryObjectiv to see whether the pass counter is zero, in which case we won’t draw the sphere.

Listing 13.4. Checking the Query Results and Drawing the Sphere Only If We Have To

// Called to draw sphere
void DrawSphere(GLint sphereNum)
{
      GLboolean occluded = GL_FALSE;

       if (occlusionDetection)
       {
           GLint passingSamples;

           // Check if this sphere would be occluded
           glGetQueryObjectiv(queryIDs[sphereNum], GL_QUERY_RESULT,
                                           &passingSamples);
           if (passingSamples == 0)
               occluded = GL_TRUE;
        }

        if (!occluded)
        {
            glutSolidSphere(50.0f, 100, 100);
        }
}

That’s all there is to it. Each sphere’s query is checked in turn, and we decide whether to draw the sphere. We’ve included a mode in which we can disable the occlusion detection to see how badly our performance suffers. Depending on how many spheres are visible, you may see a boost of two times or more thanks to occlusion detection.

In addition to the query result, you can also query to find out whether the result is immediately available. If we didn’t render the 27 bounding volumes back to back, and instead asked for each result immediately, the bounding box rendering might still have been in the pipeline and the result may not have been ready yet. You can query GL_QUERY_RESULT_AVAILABLE to find out whether the result is ready. If it’s not, querying GL_QUERY_RESULT will stall until the result is available. So instead of stalling, you could find something useful for your application to do while you wait for the results to be ready. In our case, we planned ahead to do a bunch of work in between to be certain our first query result would be ready by the time we finished our 27th bounding volume query.

Other state queries include the currently active query name (which query is in the middle of a glBeginQuery/glEndQuery, if any) and the number of bits in the implementation’s pass counter. An implementation is allowed to advertise a 0-bit counter, in which case occlusion queries are useless and shouldn’t be used. In Listing 13.5, we check for that case during an application’s initialization right after checking for extension availability courtesy of GLee.

Listing 13.5. Ensuring That Occlusion Queries Are Truly Supported

GLint queryCounterBits;

// Make sure required functionality is available!
if (!GLEE_VERSION_1_5 && !GLEE_ARB_occlusion_query)
{
     fprintf(stderr, "Neither OpenGL 1.5 nor GL_ARB_occlusion_query"
                            " extension is available! ");

     Sleep(2000);
     exit(0);
}

// Make sure query counter bits are nonzero
glGetQueryiv(GL_SAMPLES_PASSED, GL_QUERY_COUNTER_BITS, &queryCounterBits);
if (queryCounterBits == 0)
{
     fprintf(stderr, "Occlusion queries not really supported! ");
     fprintf(stderr, "Available query counter bits: 0 ");
     Sleep(2000);
     exit(0);
}

The only other query to be aware of is glIsQuery. This command just checks whether the specified name is the name of an existing query object, in which case it returns GL_TRUE. Otherwise, it returns GL_FALSE.

Best Practices

To maximize this optimization and avoid the most rendering, you should draw the occluders first. This includes any objects inexpensive enough to render that you will always draw them unconditionally. Then conditionally draw the remaining objects in the scene, the occludees, sorted from front to back if your application is designed in a way that permits it. This will increase the chance that objects further from the eye will be occluded, if not by an occluder, then perhaps by a previously drawn fellow occludee.

Requesting the result of a query will stall the pipeline if the corresponding bounding volume hasn’t finished rendering yet. We avoided this situation in our example by filling the pipeline with 27 of these bounding volumes, virtually guaranteeing that the result from the first one would be ready by the time we finished issuing the last one. On some implementations, however, the very act of reading back the result may cause the rendering pipeline to drain. This could effectively negate the performance boost you were hoping to achieve. For this reason, applications often wait until a frame of rendering is complete before querying the occlusion results. The cost of reading the result can be hidden in the time spent waiting for a vertical retrace in order to swap buffers, for example. But isn’t it too late at that point to make any rendering decisions? This is where you can start making trade-offs between image fidelity and performance: You can use the last frame’s occlusion results to educate the next frame’s rendering decisions. In the worst case scenario, you fail to render an object that should have just barely become visible in this frame. But that will probably go unnoticed at 60fps, and the very next frame will remedy the situation.

In the same spirit of aggressive optimization, you may choose to skip rendering an object not only when the query result comes back with a zero, but when it is arbitrarily close to zero. You decide how close. Again, you may miss out on a sliver of an object that really should have been rendered, peeking out from behind an occluder. But will anyone notice, or will they just be appreciative of the higher framerates? It depends on how aggressive your threshold is. Beware. If you go too high, then the occludee will visibly pop in and out of the scene as you cross that threshold.

Summary

When rendering complex scenes, sometimes we waste hardware resources by rendering objects that will never be seen. We can try to avoid the extra work by testing whether an object will show up in the final image. By drawing a bounding box, or some other simple bounding volume, around the object, we can cheaply approximate the object in the scene. If occluders in the scene hide the bounding box, they would also hide the actual object. By wrapping the bounding box rendering with a query, we can count the number of pixels that would be hit. If the bounding box hits no pixels, we can guarantee that the original object would also not be drawn, so we can skip rendering it. Performance improvements can be dramatic, depending on the complexity of the objects in the scene and how often they are occluded.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset