A New DirectXion

By Team Digit | Updated on 01-Oct-2006

01 Oct 2006 10:53

The tech industry is in a constant state of dÃ©jÃ vu. Things come and go in cycles and yet constantly manage to surprise. Not too long ago, the central processor that powers much of the industry went through a pivotal change-it began life as a piece of silicon designed to work on a limited set of data, and expanded its role to a more encompassing one. At first it could only handle one application at a time, it could only address a limited set of memory at once, and you had to talk to it in its own “assembly” silicon-language. With time, the CPU’s silicon budget increased, and with it, the number of tasks it could handle. Then you had time-sharing systems, wherein you were able to run multiple applications at the same time; moreover, each application was fooled into thinking that the entire system’s memory was its own playground, thanks to virtualisation. Today, we take these changes for granted and their benefits seem mundane.

Interestingly, the graphics card of yore-that dedicated pixel-pushing slave of our entertainment age-is undergoing a very similar evolution. What was once a slab of silicon meant for very specific tasks is fast evolving into a vastly parallel silicon factory, populated by a transistor task force half-a-billion strong, and powering a multi-billion dollar gaming industry. Interestingly, the graphics chip is increasingly being referred to as the graphics processing unit (GPU) because much like the CPU, from which it derives the name, its work is becoming less specialised and more general.

Before we take a look at how this is happening, it is important to identify the three major forces that make up the industry. These are the independent hardware vendors (IHV) who design and produce the graphics chipsets and the graphics cards, the independent software vendors (ISVs or game developers) who design and produce the games, and last, the people whose job it is to talk to both the ISVs and the IHVs to create the software rules and directions which enable the graphics cards to talk to the games. This final piece of the puzzle forms the application programming interface (API). The three players are constantly involved in discussions on how to take the industry forward; these discussions are translated into the API, which then serves as the template upon which a graphics chipset is designed. Based on which the game can offer its eye-candy.

Here, we take a look at one such API-Microsoft’s upcoming DirectX 10 (DX10), more specifically the Direct3D element of DX10. Why was this step taken, and where it will lead us?

Why DirectX 10?
With the introduction of NVIDIA’s very first GeForce chipset, the graphics industry took its first tentative step towards the GPU. The path that was taken then, today leads to a place where the graphics chip does more than push pixels. This semblance to a CPU is because of the identified need for the graphics processor to gain independence from the CPU. With DX10, engineers hope to break the shackles that tie graphics processing to the CPU, not only speeding up rendering but also granting the GPU more power to push floating point and integer data-power that will enable it to accelerate sundry tasks from 3D graphics, to physics, to audio, to DVD playback.

DirectX 10 is thus a very important step in taking the GPU forward into unknown regions. The API is primarily designed to interface with a more generic graphics processor. With the power of DX10 and the hardware supporting it, games of tomorrow will be richer, faster, more interactive, and more detailed. DX10 will take the gaming industry another step closer to the holy grail of truly realistic visuals

DÃ©jÃ Vu
One of the most unsatisfying aspects of a video game today is that every other foe you conquer bears a boring resemblance to the one you just dispatched, that every path you walk down seems somehow familiar, and that the ecosystem of the world you are eager to save is anaemically limited to the same flora and fauna. It almost makes you want to stop the butchery. Almost!

This sameness is an unfortunate side-effect of the current API structure. Today’s game needs to run to the API, which in turn runs to the driver, which then talks to the hardware, which finally complies and renders a tree. At each step, the API adds overhead-as much as 40 per cent of the entire cycle is taken up by it. Add more than one unique tree to a scene and the overhead adds up as well. Pretty soon, you would be staring at a slideshow of trees, instead of a smoothly-flowing game world. This is why game developers adopt the copy paste method of rendering one or two different types of tree, and then use the same models throughout the land, more or less. This method is also adopted for enemies, blades of grass, and so on.

DX10 will reduce the API overhead by half. This would give the game more time to talk with the GPU and the rest of the system. This means that a game can now pack more content, because the system has more time for the game. So the first thing you should expect from a DX10 game is a more detailed world. Will we see an army comprising unique soldiers in a future chapter of the Total War series? Perhaps not, but a jungle will certainly sport more than one species of tree.

Reduce API overhead will allow a game developer to add more detail to a title. Seen here, a screenshot from the game Crysis2

Doing A 360

Microsoft and ATI jointly worked on designing the Xbox 360, and are also two prominent members of the design team for DirectX 10. It is thus no surprise that DX10 borrows some design philosophies from the Xbox 360. The most important aspect of the next-generation API is the concept of the unified shader architecture.

DX10 will reduce the API overhead by half. This would give the game more time to talk with the GPU and the rest of the system. This means that a game can now pack more content

A quick primer: a vertex shader is a bunch of algorithms that manipulate the vertices of a triangle, which in turn make up any 3D object. Similarly, pixel shaders are software routines that manipulate individual pixels in a 3D scene. So while vertex shaders might work together to render a car, pixel shaders would colour the car, add smoke effects, rainfall, and so forth.

Traditionally, a graphics chipset carries banks of both pixel and vertex shaders. Based on a scene, the API puts each of these banks to work. Herein lies the problem. Imagine a scene that has a character standing against the sky-the pixel and vertex shaders would be equally needed to render it. Now the character steps inside a car, increasing the workload on the vertex shaders while the pixel shaders enjoy some time off. Let’s say the car blows up in the next scene: lots of explosive effects-smoke, fire, debris, and so on. The pixel shaders are now needed more than the vertex shaders, which can now sit idle.

What if a card has only four pixel shaders and 12 vertex shaders? What if a scene then requires processing beyond the 12 vertex shaders’ capacity? What if a scene is instead unusually high on the number of pixels pushed? All these scenarios would of course lead to game slowdown: almost all of us have noted an otherwise smoothly running game chug to below 10 frames per second at the trigger of an explosion. Here’s why: game cards have dedicated resources to pixel and vertex tasks; if the game exceeds the allocated capacity, everything comes down to a terrible frame rate. More damning is the wasted silicon-why are the pixel shaders twiddling their thumbs when the vertex shaders could use some help?

A unified approach to shaders hopes to solve these problems. A graphics card with a unified shader bank can act as a pixel shader and as a vertex shader, allocating shader resources to the game according to its requirements. In theory, this should make games much faster.

In practice: programmers of today will have to unlearn their habits of programming for contemporary architectures before they can truly make use of a unified architecture-a process that will certainly take time. Note that just because DX10 has a unified shader architecture, all DX10 cards need not have unified shaders. In fact, at least for the foreseeable future, NVIDIA will stick to a contemporary architecture, whereas ATI will go the unified route. It will be interesting to see which of these two philosophies have greater merit.

Apart from a unified architecture, DX10 shares another similarity with the Xbox 360: like the 360, a GPU under DX10 can stream out data to the card’s memory and then call it back in for further use within the GPU, without having to write to an external memory or disk resource. This greatly speeds things up and will allow for some interesting effects, such as realistic shadows.

Free The Processor
DX10 adds the geometry shader to the API lexicon. Simply put, a geometry shader deals with, well, geometry: every 3D object is made up of triangles, and a geometry shader can act on a bunch of vertices simultaneously. Thus it deals with shapes. To better understand its function, let’s consider a practical example: a smooth curve is represented as a series of jointed straight lines within a GPU-the more the number of constituent lines, the smoother the curve. This is known as tessellation, and is a very CPU-intensive task (how many times have you seen a car in a game with a polygonal wheel instead of a smooth circular one?). The geometry shader can perform tessellation within the GPU itself, and since the GPU is much more powerful than a CPU when it comes to floating-point calculations, it can do so faster. Plus, the GPU need not send data outside. The geometry shader can similarly create new triangles around a point, or extrude the sides of a triangle to turn it into a volume, and so on.

Under a traditional architecture, some might favour vertex shaders while other pixel shaders, leading to inefficiency in load-sharing.

The important thing to take from this is that a geometry shader further allows a GPU to offload work from the CPU. By reducing the amount of data that a GPU needs to outsource, two things happen-work is done quicker, and the CPU is free to do more. The graphics card can have a bigger hand in some of the tasks that were done by the CPU-things like rigid-body collision detection, calculating obscured geometry, audio acceleration, detailed shadows, and so on. The CPU, meanwhile, can work on other elements that add up to make a game more believable-such as advanced AI calculations and advanced physics simulations.

Sundry Bits
All this discussion was technically limited to the Direct3D element of DirectX 10. What about the other bits that constitutes DirectX? Can we expect major changes there? In a word, no!

One interesting feature does deserve mention though. With the next DirectInput, Microsoft has decided to blur the lines between its console offerings and the PC: you can take any Xbox 360 input device-be it a gamepad or a racing wheel-and use it under Windows. This is currently possible under Windows XP as well, and is thus not a feature exclusive to DX10. It is however, an evolution of the DirectInput constituent. DirectSound does not see any major changes.

The Thread arbriter under a unified architecture ensures that the unified shader is well-used by all shader operations.

Take It Outside
Traditionally, a graphics API has played in the kernel space of the OS. The result is the all-too-familiar system crashes following an errant driver install. With Windows Vista, Microsoft is taking security to heart. Major parts of the API will now be part of the user space of the OS. This means that the bits and bytes that make up graphics will not play in the same space that is meant for the essential parts of the OS. Not only does it increase security, it also makes sharing of 3D resources easier.

As you might know, under Vista, each window is a 3D surface. Thus a graphics card under Vista is always called for, whether or not you’re playing a game. Moreover, with the advent of high-definition video content, more and more 2D acceleration tasks will be offloaded to the GPU. The per-pixel shading capabilities of a GPU will be utilised to present everything from fancy transparent windows to smooth 720p video playback. Similarly, animations such as windows flipping to the foreground or cascading behind each other will be accelerated by the GPU. DX10 will thus expose the hardware to more and more applications, much like how the CPU evolved.

Having said that, the user-interface of Vista does not require DX10-the Aero Glass interface, as it’s called, runs under a variant of DirectX 9. This was done to ascertain that development of Vista is not slowed down by the development of DX10 (both were concurrent developments). Elements of DX9 used in Vista are also seen under DX10, and in the future, the entire interface might move to a DX10 environment-once DX10 reaches maturity and is deemed stable enough.

Windows Vista will thus ship with both DirectX 9.0 and 10. DirectX 10 is currently slated to be a feature exclusive to Windows Vista.

The Ever-Distant Horizon
It’s time for dÃ©jÃ vu yet again. We have been promised graphics nirvana before-virtually every piece of graphics hardware ever sold has been accompanied by shouts of “we have arrived.” Reality, however, walks a generation or two behind the latest and the greatest. With DX10, we should not expect things to change overnight. DX10 promises games that are more visually complex, offering a more compelling interaction and narrative. That promise will not be fulfilled with the release of Vista, but will see fruition a couple of years thence. And by then we would have forgotten all about DX10-and be talking about the next iteration.

By all indications, DX10 video cards will hit the half-a-billion transistor count, and will be so power-hungry that they will require dedicated power supplies. On the positive side, DX10 video cards will be virtually identical in the features that they offer. A DX10 card will have little to differentiate itself from a similar offering: this has been done largely on the behest of game developers who are tired of programming for esoteric feature sets (remember ATI’s TruForm?). This is a huge plus for gamers as well. For example, with DX9 class cards, one had to worry whether the card supported Shader Model 2.0 or 3.0; concerns of that nature need no longer send us scurrying to the nearest search engine, or to our friendly-neighbourhood 3D guru.

DirectX 10 might not bring about a real-time Toy Story to our monitors; it will certainly not bring about graphics comparable to the real world, but it is a vital step towards those horizons.

Team Digit

Team Digit is made up of some of the most experienced and geekiest technology editors in India! View Full Profile