Double Or Nothing!

Published Date
01 - Nov - 2005
| Last Updated
01 - Nov - 2005
Double Or Nothing!
There was a time when graphics was a simple deal-get a card, push it into a free PCI (Peripheral Component Interconnect) slot, and merrily start installing games such as Half-Life! Then they complicated things by introducing the AGP (Accelerated Graphics Port) slot on motherboards. This meant we had to upgrade our motherboards. Before we knew it, there were differences between 1x, 2x, 4x and 8x AGP motherboards; and since the newer 4x and 8x AGP cards were not backward-compatible with the older 1x and 2x AGP cards, the few who insisted on the best upgraded again and again!

Next, they upgraded the old PCI slots with higher bandwidth capacities, and the new PCI-express (PCIe) port was born; we all upgraded again! Someone at nVidia reinvented the SLI technology, and SLI graphics was born; some of us upgraded, yet again.

Sounds like a history lesson? Well, all this happened in less than five years! So as far as the graphics market goes, you can be sure you have to spend in excess of Rs 50,000 every few months if you want to have current-generation hardware.

The latest line of graphics cards are multi-GPU cards. These cards have two GPUs on a single card, and essentially offer double the speed of a single GPU card-theoretically. These cards are a boon to those of us with motherboards less than six months old, and only one PCIe slot. Why? Because right now, if you want the best graphics hardware available, it has to be either nVidia-based SLI cards, ATi's CrossFire cards, or a multi-GPU card. There's also the possibility of having two multi-GPU cards in SLI or CrossFire mode, but let's not complicate things!

nVidia's SLI (Scalable Link Interface) technology needs a motherboard with two PCIe slots. This technology seems relatively simple: put in two nVidia SLI cards (the cards should be identical; thus you should have two 7800 GTs or two 6800 GTXs, etc.) on to the motherboard. Connect them (like master and slave) via a small daughter card that fits on the top of the cards. Then just connect your monitor to the master, and you're set to go.

Unfortunately, it's not all that simple! The nVidia SLI solution uses technology that was actually created by its erstwhile rival, 3dfx. Back then, SLI stood for Scan Line Interlace. This old technology was used by two graphics cards working in tandem to render alternate lines of a display-one card rendered lines 1, 3, 5, etc., while the other rendered lines 2, 4, 6, etc.

Since nVidia announced its acquisition of 3dfx in 2000, the graphics race was down to only nVidia and ATi. 3dfx's SLI technology resurfaced in 2004 as nVidia's Scalable Link Interface. The major difference was that instead of rendering lines, the new SLI interface would split the'work' into two, giving each graphics card an equal amount.

This meant that both cards take the same time to finish rendering their allotted work, and this increases performance. Although nVidia had improved the technology earlier, they still had to wait for PCIe to enter the end-user segment-traditional PCI was way too slow for modern GPUs, and AGP didn't allow for dual graphics cards.

Split-frame rendering,  as seen in FarCry. The horizontal green line distinguishes between the rendering done by each card

How It Works
SLI works in two major ways. The first is Alternating Frame Rendering. Here, each card is given an alternate frame to render by the graphics driver. After the slave card renders its frames, it passes them to the master card, which then adds them alternatively to its own output.

Most games benefit from this form of rendering, as it is easier for the driver to send alternate frames to different cards. Because each frame is rendered on a different card, there is less geometry data that needs to be passed on to each card by the driver. That is, since each card is rendering one frame of a whole scene, each card calculates geometry individually, thereby doubling the geometry output. Most benchmarking software, such as 3DMark, show higher geometry scores because of this.

The second way SLI works is via Split Frame Rendering. This type of rendering divides each frame into work data, and evenly distributes this 'work' between the two graphics cards. This is the technique used to render games such as FarCry in SLI mode. Let's take an example of a single frame to better explain how split frame rendering works.

A common scene in FarCry is a jungle scene, where, close up, you have lots of plants and foliage, and there's the beach visible on the horizon. As usual, there's the sky above, and there's beautiful rendering of the water in the distance.

Here, the lighter element of the scene is the sky, which needs little or no geometry drawing, and is mainly a simple background. The water (FarCry's speciality), the trees and bushes, however, take a lot of detailing, and thus more geometry and pixel shading. What happens with split frame rendering is that the frame is split into equal processing parts by the driver and then passed on to the cards. Thus, one card gets much more of the frame to render than the other-say, all of the sky and only a part of the trees. The other card gets much less of the frame-the (glorious looking) water and some of the foliage.
The driver calculates the approximate time it will take to render the scene, splits the data in two equal workloads, and then distributes it to the cards. This is always an approximate calculation, because no matter how good the driver is, it cannot predict the exact outcomes of the rendering times. The aim, however, is to get both cards to finish rending their parts in the same time. The slave card's rendered data is sent to the master, which adds its own data and outputs the complete frame. As expected, this mode earns lower scores in 3DMark for geometry.

The only drawback of nVidia's SLI technology is that both cards have to be identical models to work. This means that if you have an older card, you cannot just buy a 7800 GTX and connect them both in SLI mode. Instead, you have to buy two 7800s and junk the existing card!

Let's take a look at ATi's multi-GPU offering.
After a long wait, ATi finally came up with a reply to nVidia's SLi technology. Christened CrossFire, ATi's multi-GPU solution won a lot of accolades from end users because of the flexibility of being able to add a brand new, current- generation ATi CrossFire card to a system with an older PCIe card, and have them work in tandem.

Remember, in the graphics card business, six months makes your card a relic, and the latest cards always cost in excess of Rs 30,000. Considering this, a user who just spent Rs 30,000 less than six months ago on an ATi card did not have to junk his or her card and get two new PCIe multi-GPU cards like their nVidia counterparts.

Like SLI, the CrossFire technology also has different ways in which the two cards work together.

Supertiling: Here's where the major difference between SLi and CrossFire lies. The supertiling rendering method is supported only in Direct3D rendering (Direct3D is Microsoft's proprietary Application Programming Interface (API), which is used to program many current generation games).

Basically, what happens in supertiling is that each frame is broken up into 32 x 32 pixel squares. Each graphics card renders every alternate square, one by one, until the frame is completely rendered. This sharing of the rendering of square 'tiles' that make up a screen is what gave the technique its name. This method is good for equally fast graphics cards, but an older generation card such as the X800 could slow down a newer and faster CrossFire card. So, the bottomline is that both cards will run at the speed of the slower card.

Scissoring: Here the frame is divided exactly in half, and one card renders the top half while the other renders the bottom half. Unfortunately, this isn't as efficient, theoretically, as SLI's Split Frame Rendering. This is because a frame generally contains some parts with more geometry and shading, and others with much less. You might therefore end up with the card running as master finishing up its render process and then doing nothing while waiting for the other card to finish its computing!

Alternate Frame Rendering: This is exactly the same as described for nVidia's SLI technology. The only thing to remember is that since alternate frames are rendered, the total graphics sub-system speed will depend solely on the slower card.

CrossFire cards are connected via an external dongle. This connector itself introduces some drawbacks into the system. For starters, the maximum resolution for 60 Hz monitor display is 1600 x 1200 pixels. This is ridiculous considering that hardcore gamers (the people who would spend the money to buy these cards) generally use high-end monitors capable of displaying 2048 x 1536 at 70 Hz or better.

One advantage that makes up for the lower resolutions in the latest ATi models is the enhanced anti-aliasing output that CrossFire offers. Basically, the two cards render frames with different anti-aliasing levels, and then the compositing chip puts them together using a new technique called Adaptive AA (Anti-Aliasing).

Adaptive AA is ATi's new method of improving graphic quality. Basically, it uses previous methods of multiple AA sampling, but intelligently.

What this means is that so far, chips were taking multiple samples of almost everything with a texture, and then sampling it multiple times to get the best quality textures. This sampling is done by shifting the centre of a pixel around and then rendering it. When done multiple times, the outputs are blended together to give you a higher resolution texture.

This, however, is not very efficient, as not all textures in the display need to be high resolution to output great graphics. This is where Adaptive AA comes in. It intelligently decides which textures will improve the visual output and which won't. The textures that need multiple rendering passes (to make the output look better) are given priority, thus increasing the visual detail.

Extreme Hardware 
For gaming freaks, there's also the option of nVidia's Quadro and ATi's FireGL range of workstation cards, which, in all honesty, we have ignored through all our articles thus far. Why? For starters, they are way too expensive!
These cards, (Quadro, FireGL, and even Creative's Oxygen line) are used for really high-end video rendering, in systems that costs lakhs of rupees. They support ridiculous levels of full screen anti-aliasing (32x) and offer multiple screen outputs. nVidia's cards can also be run in SLI mode, thus doubling their already superlative performance.
Overall, these are not mass consumer gaming cards-not that the Rs 30,000 nVidia and ATi cards can be bought by everyone anyway,but they're cheaper than workstation solutions.
For gaming addicts with money to burn, there are many more performance solutions out there: the Gigabyte GA-8N SLi Quad Royal motherboard, for example. This board offers four, yes, four, PCIe graphics card slots, though you should know that nVidia's drivers do not currently support four GPUs.
You could also overkill the idea of GPUs and plug in four 7800 GTs, which makes a total of eight GPUs running on your system. It's kind of pointless, since all four PCIe graphics ports running together only offer 8x PCIe speeds per port. Thus your 7800 GTs would be running considerably under power, but with a glass cabinet, a little in-cabinet lighting and one monstrous power supply, you could be the talk of the gamer clan across the world!
Single Card Multi-GPU
Both nVidia and ATi offer dual GPUs on a single card. These multi-GPU cards offer better performance, albeit only theoretically. Our various graphics card tests have shown us that dual-PCIe cards perform better than a single dual-GPU card-two 6800 GTXs in SLi mode outperformed a 6800 Ultra dual-GPU card.

Card makers have started putting two GPUs on cards to get the best performance possible. It gets better when you use two of these dual-GPU cards in SLI or CrossFire mode

There's a limitation in the motherboard's PCIe chipset architecture that reads two PCIe x16 slots as two PCIe x8 slots-due to the fact that the chipsets only support about 20 PCIe pipelines. Thus, even two graphics cards installed on two PCIe x16 slots, running in CrossFire or SLI mode, are used at x8 PCIe speeds each.

When compared to a single dual-GPU card, which should be detected as a full fledged x16 PCIe card, it's obvious that the single dual-GPU card should outperform the dual-card setup. In the real world, this just isn't true, and dual-card setups are faster-as of now! This is still so because motherboard chipset makers are still trying to catch up with the graphics card market in terms of speed and technology.

Currently, nVidia's flagship GPU is the 7800 GTX, while ATi launched the X1800 very recently. Both these chips will be used to create multi-GPU cards, which offer the ultimate in graphics technologies. We haven't had a chance to compare them in a direct shootout yet here at Digit, but we will, soon.

Here's a better picture of dual-GPUs on a single card, with the cooling fans removed

Basically, dual-GPU cards work exactly the same way as two cards in SLI or CrossFire mode, except that the card's memory is shared, and transfer between the chips is much faster, due to the fact that they're on the same PCB.

Dual-Core GPUs?
One would think that, logically, the advent of dual-core CPUs should now prompt graphics card manufacturers to shift towards dual-core GPUs as well. This doesn't seem to be the case, though.

nVidia, for one, has made it clear on several occasions that they currently have no interest in dual-core GPUs, simply because they are not needed. According to tests and benchmarks, two nVidia cards in SLI mode perform more floating point operations than the best Intel and AMD dual-core CPUs! So as far as current technologies go, it would be an understatement to say that GPU technology is way ahead of the CPU market!

Unfortunately, neither Intel nor AMD can match dual-GPU speeds, and essentially, the graphics sub-system is bottlenecked by the system architecture. This happens because it is the system CPU that send the requests for graphics processing to the GPUs, and thus far, even the best CPUs cannot do this fast enough. Perhaps dual-core CPUs will minimise the loss of speed we have faced.

Of course, by the time dual-core CPUs become the norm, the champions of graphics (nVidia and ATi) will have fought each other into a new era of GPUs all over again-if history is anything to go by. Overall, it's a great fight that's going on between these giants and it's us, the end-users, who will benefit. So far, a multi-GPU system-whether two cards or one-is the way to go. If you're the type who needs the best visual feel possible, you need to get yourself one!

It looks like parallel processing for graphics is here to stay, whether in the form of two cards running in parallel, or two chips on a single card parallel-processing textures and frames. We're not naïve enough to make this a long-term prediction, simply because the graphics market is one of the most competitive, and technologies come and go almost on a monthly basis.

So we'll leave you with a blanket statement: "If there's a change in this trend, we'll be the first to let you know."

Team DigitTeam Digit

All of us are better than one of us.