Optimisation and OpenGL Games Programming

Optimisation and OpenGL Games Programming

Sections
Batch Transfer
Strips and Fans
Indexed Primitives
Vertex Data Formats
State Changes
Flush, Finish and Swap
Getting Data from OpenGL
Matrices

Here are some general tips on getting good performance from an OpenGL program.
Batch Transfer
In general, games will be faster if data is transferred to the driver using some sort of batch method rather than the immediate mode one function call per new vertex model. There are a number of ways of doing this.
Data can be placed in a vertex array, which is a batch of vertices, UV coordinates, colours etc, all with the same settings for texture, material and type of primitive (e.g. triangle, line etc). On some hardware you may gain caching advantages from interleaving the vertex arrays, either by using glInterleavedArrays or by setting the stride in the enabled arrays (vertex, colour etc) manually. If you are using vertex arrays, you can use the compiled vertex array extension to "lock" data, so that if e.g. you are using the same spatial geometry twice with different sets of texture coordinates to achieve a multipass texture effect, the driver only needs to transform the vertex data once.
Compiled vertex arrays are the main method used by Quake3 to transfer data to the driver. They may represent the fastest way to send information to a card which does not have onboard transformation and lighting hardware.
If you are sending data to a card with onboard transformation hardware, display lists may be superior to vertex arrays. Display lists enable you to specify that state other than the basic geometry is frozen, which can enable additional optimisations in a driver for a hardware transform and lighting card. It is probably advisable to consult the manufacturer of the hardware if you are considering sending data in this way, since setting some states inside a display list may make the list state too complex for the driver to optimise it effectively. Extensions for handling state and geometry objects explicitly, to make this sort of issue easier to deal with, are under discussion as of December 1999.
In general, the use of display lists enables considerable optimisations due to the extra assumptions the driver can make about the contents of the list. Not all drivers will necessarily take advantage of this, however.
One final point is that the use of display lists will cut the per frame function call overhead for transferring data to the driver to an absolute minimum, which can improve rendering speed.
Strips and Fans
The use of strips and fans can obviously reduce the amount of data which has to be transmitted to the driver and the hardware, although since the texture and material states cannot be changed within a strip or fan, the technique may not be quite as valuable for real game data as it might at first seem.
Links to source code which can stripify an arbitrary mesh in a way appropriate for OpenGL rendering can be found in the External Links section.
Indexed Primitives
If a vertex array is used, the vertex data can be passed either directly (in which case glDrawArrays can be used to render), or indirectly via a table of indices to the actual data, for which glDrawElements is the rendering function. On a machine without geometry acceleration hardware, this second approach minimises unneccessary vertex transformations on the CPU. It is the path used by many current OpenGL games, including Quake3.
If you do use glDrawElements, it is important to recognise that there is only one index for all currently enabled arrays, on the grounds that separate sets of indices would involve so much indirection that the benefits are questionable. This means that the data must be preprocessed to identify "corners" which correspond to a unique set of all the types of data you want to send, e.g. spatial vertex plus UV coordinate plus colour. This may involve repeating some spatial vertices.
Finally, if you use glDrawElements it is worth sending the vertex data in strip order, but without explicitly stripifying it. Many current drivers perform stripification checks on data sent via glDrawElements before passing it to the actual rendering hardware, i.e. they look for strips using a fast algorithm which checks only the passed vertex indices. This is liable to work better and run faster if the data is already in strip order.
Vertex Data Formats
On most current drivers, the optimal format for data appears to be floats for vertices, normals and texture coordinates, and unsigned bytes for colour.
State Changes
An important issue for the optimisation of OpenGL programs is the minimisation of state changes. This means both sorting and partitioning data before it is sent to the driver, e.g. by sorting to place polygons with the same texture next to each other, and caching changes, so that redundant state change instructions are not dispatched. In general, drivers will not check for redundant changes themselves, to avoid imposing extra overhead on applications which do the checks, so every significant state change may cause a flush of some part of the hardware pipeline.
Another useful technique for minimising state changes is to place textures that are likely to be used on the same object within a larger texture, a "texture page", and use UV coordinates to access the correct part of the texture for rendering. Note that this can cause problems with mip maps, especially if glu's gluBuild2DMipMaps function is used.
Flush, Finish and Swap
glFinish tells the driver to complete all rendering operations that are currently in the pipeline before the call returns. On modern hardware, which may be highly parallelised against the CPU, this can obviously degrade performance heavily, and it should not be used unless you must have a finished back buffer before calling it, e.g. if you want to read back data using glReadPixels. Note that using glReadPixels is not especially compatible with maintaining interactive frame rates in any case.
glFlush tells the driver to perform a pipeline flush. In general this needs to be done before swapping the frame buffers. However, on Win32 SwapBuffers (or wglSwapBuffers on the 3dfxvgl.dll standalone driver) will perform a flush in any case. As a result, most Win32 games do not call glFlush explicitly, and may gain some performance benefits as a result.
Getting Data from OpenGL
In general, reading back data from an OpenGL driver on a frame to frame basis can harm performance significantly, depending on the implementation. It is generally advisable to keep copies of all the data you are sending to the driver, and not attempt to use GL_FEEDBACK or GL_SELECT modes at interactive frame rates.
One particular issue is reading back modelview and projection matrices. This may be fast on some implementations, where all the geometry is done on the CPU, but potentially slow on systems with hardware transformation support, where it is conceivable that it could force a hardware pipeline flush.
Matrices
Quality OpenGL implementations are likely to concatenate the modelview and projection matrices before applying them to geometry, to minimise the number of matrix operations required. However, if the user has requested a path which requires calculations to be done in eye / view space (e.g. OpenGL lighting, fog or user clip planes are enabled), then (assuming that optimisations such as object space lighting are not applied), the matrices cannot be concatenated in this way.
Matrix operations in drivers are often optimised for the special case of identity, for which no calculations need to be done, as used when e.g. the game supplies its own coordinates in screen space. If you want to trigger this sort of optimisation, you should in general use glLoadIdentity rather than loading your own identity matrix using glLoadMatrix.
Other optimisations, for e.g. matrices without scaling, or translation only matrices, are also available. In general, to trigger these optimisations you should use the glRotate, glTranslate etc functions to construct your matrix rather than loading it via glLoadMatrix. For at least some applications, the difficulty of loading matrices in this way may outweigh the advantages of possible driver level optimisations.
Note also that if you try to optimise your code by putting your modelview transformations directly into the projection matrix, you are liable to experience serious problems if you try to use fog, lighting, etc, for essentially the same reasons as those given above. This link contains a good summary of projection matrix issues, from Steve Baker.

Back to main