This Means War.

By Ahmed Shaikh | Updated on 01-May-2008

01 May 2008 17:25

A look at Intel’s upcoming GPU rival—the Larrabee processor—we reveal the why, the what, and the so-what?

“We’re going to open a can of whoop ass.” Not a line you would expect from the CEO of a multi-million dollar company, but NVIDIA’s Chief Executive Officer Jen-Hsun Huang had those fighting words for its now-rival, and processor King—Intel.

Intel and NVIDIA have been exchanging words lately. Intel has the graphics card industry set in sight with its future processor plans. NVIDIA, now the only independent graphic house post the AMD-ATI merger, is not willing to take the threat lying down.
The stakes are high, battle lines have been drawn; information warfare through presentation slides is being fought on an almost daily basis. But what is the battle all about? What brought these seemingly disparate entities—a CPU maker and a GPU maker—to head?

To understand why, we must take a step back in time to 1965, when Intel’s Gordon E. Moore put down an observation with far-reaching consequences.

Double The Fun
In the April edition of Electronics Magazine, 1965, Gordon Moore put forth the germ of what would eventually form the Moore’s Law. Moore’s original statement read that “The complexity for minimum component costs has increased at a rate of roughly a factor of two per year…”, this kernel of an observation was then elaborated by Moore into the statement that is the Law, that the number of transistors on a chip will double about every two years.

Moore’s Law can be exploited in two ways, both of which stem from a common root that transistor costs will fall with time. The implications are (1) increase the clock speed of a processor, (2) cram more components onto the surface of a chip.

Until the Pentium 4 processor, increasing the processor speeds was the option to pick. Moore’s Law turned into something of a self-fulfilling prophecy for the semiconductor industry. The Law formed the basis for future roadmaps for major silicon companies and thus lent further credence to the law—an electronic ouroboros, eating its own tail. Faster and faster the processors ran, right up to the point where they hit the thermal wall and brought the entire race to a standstill. With the Pentium 4, it was pretty clear that the thermal output of a fast processor took away any positives that a fast chip might bring to board.

It was time for a rethink.

Picking up the Red Pill
Option number two is where things are currently headed. Today, since processor speeds have reached something of a plateau, progress must be made via component integration. Sometimes a processor becomes so fast that a component that lies outside the processor becomes a hindrance, a bottleneck. Integration in this case, has a basis in solving a problem: access to memory too slow? Bring the pathways inside the processor. This was why AMD chose to integrate its processor’s memory controller within the processor itself.

Integration can also be an option when there is no other path to take. As things stand, the processors have hit the thermal wall, the clockspeed party is bust and the industry is staring at a very fuzzy road ahead.

The fuzziness is reduced a bit when one follows the money.

In 2007 a powerpoint presentation made the Internet-rounds. It was shared with university students by Intel’s Douglas Carmean — Chief Architect of their Visual Computing Group (VCG), in charge of GPU development at Intel. One of the slides in the presentation clearly read:
• CPU profit margins are decreasing
• GPU profit margins are increasing

And this heralded “another significant transformation.”

The Road Ahead
The “significant transformation” was already in the making. It began with the introduction of the GPU. More exactly with the introduction of the NV20 chip by NVIDIA in 2001 which introduced the concept of GPU programmability. Today the floating point power of a consumer GPU card is such that it has tremendous potential to do much more than just churn out pixels.

This thinking was the basis for the so-called GPGPU, or the General Purpose GPU. NVIDIA further underlined the GP part by introducing the ability to program their graphic cards with the help of C programming language—via the CUDA (Compute Unified Device Architecture) architecture and programming model.

CUDA provides developers access to low-level GPU hardware, allowing them to use the floating-point might of the GPU to solve complex calculations to simulate smoke, fluids, and also to accelerate applications in computational biology.

Essentially, what CUDA did was bring the GPU a bit closer to the CPU. And Intel noticed this breach of territory.

If the GPU could become more CPU, then what stops the CPU to take on a GPU workload? This is Intel’s point of thrust into the lucrative GPU market. By creating a processor that can handle both floating-point and scalar tasks, Intel hopes to capture the entirety of the processor market—from low-end processors used in cellphones, to high-end supercomputers churning through high-end physics simulations.

Intel is thus looking at a brave new world run on Intel platforms, or more exactly, run on Intel’s “systems-on-a-chip”.

A while back news broke of an initiative inside the Intel labs called the Tera-scale project. The goal of the project is to explore the possibilities of creating ‘many-core’ processors using current technologies as system interconnects to render future ideas such as a database-server on a chip. One possibility, for example, is that of a datacenter that is located entirely on a single microprocessor chip consisting of 80-cores—each core connected via optical fiber links and talking with each other using TCP/IP protocols.

This is an extreme example, more practical implementations though are already on the roadmap, and one such implementation is the root cause of the GPU versus CPU battle.

The Larrabee Solution
Larrabee is the code-name for a many-core architecture which is due out as engineering samples sometime by the end of 2008, and as a consumer product by 2009. A many-core processor is one that has more than eight cores, multi-core being processors with 2-8 cores. Little in the form of concrete information is known about the Larrabee, and what little was revealed by Intel, through a presentation, was later retracted when the information went public. The details are already out there, however, and even if subject to change, they reveal quite an interesting picture of what forms the face of the Larrabee architecture.

The first thing to understand is that ‘Larrabee’ is codename for an umbrella of products. Much like Core 2 Duo and Core 2 Extreme fall under the same multi-core banner of Intel marketing; Larrabee will consist of a series of implementations, sharing a common architecture. Differences in the detail will make one Larrabee product suitable for notebooks, another for a desktop graphic card, and yet another suitable for use in a transaction server.

The ‘many-core’ detail of this new architecture will be realized by 16-24 cores. Each core will run between 1.7 and 2.5 GHz depending upon the product that the processor will form. The minimum heat signature of the implementation will run at least 150W. The number of cores on each chip, as well as the clock speed, will depend on which product the architecture will form the basis of: integrated into the motherboard, high-end GPU, transactions co-processor, and so on.

Each of the cores will be capable of processing four simultaneous threads of execution. Each core will also come with an L1 cache as well as a floating point unit (FPU) which will implement a subset of the x86 instruction set. The x86 set is an important point of differentiation between the Larrabee implementation and current GPU implementations—a graphics programmer uses special instruction code to do what needs to be done via a GPU. This instruction set is hidden behind the more user-friendly API of either OpenGL or DirectX layers. While Larrabee will retain OGL and DX compatibility, it will also introduce a specific instruction set called the Advance Vector Extension (AVX) which will help leverage the FPU potential of the architecture.

The AVX set is to be seen in a similar light to SSE in current x86 processors—a “256-bit vector extension to SSE, for floating-point intensive applications”.

This software soup of x86 instructions and extensions is what Intel is betting on.

They hope that a more familiar set of code will encourage programmers to make better use of the floating point capabilities of the Larrabee architecture. This line of thinking could be construed as a bit naïve, considering the fact that very few games have made use of the HyperThreading capabilities of Intel processors; and that too, only recently.

Back to the Larrabee architecture—the many-core processor will consist of a large L2 cache. This pool of L2 will be shared by each core of the processor. Furthermore, each core will also be able to lock a part of the L2 cache for its own use. Larrabee will also see an on-die memory controller (the memory controller is moving on-die for a variety of Intel offerings, similar to what AMD already has). The final piece of the Larrabee die will determine the nature of the final product. This final piece is a fixed-function unit, dedicated to a single task. A Larabee GPU will have a texture sampler for this fixed-function unit (a texture sample is essentially one-stage of the DX / OGL pipeline), a Larabee server coprocessor will have a hardware encryption unit there, and so on.
All these components will be connected together via a ring bus, similar to what is used by IBM’s Cell.

The End Of An Era?
“Graphics that we have all come to know and love today…It’s coming to an end”

Intel’s senior vice president Pat Gelsinger would like to herald a new era: “Our multi-decade old 3D graphics rendering architecture that’s based on a rasterization approach is no longer scalable and suitable for the demands of the future”.

That ‘fixed-function unit’, nestled within a Larrabee processor will eventually take the form of an execution unit dedicated to the task of raytracing. This small detail is the source of the brewing war between Intel and graphic vendors; as well as a point of disbelief within the game developer community.

Raytracing is not just another way to do things. It is a completely different way to do things. It is a 180-degree turn to an industry that has long accepted rasterisation as the approach that makes the most sense.

Pat Gelsinger: “Graphics that we have all come to know and love today…It’s coming to an end”

So What Is Raytracing?
Imagine this scene before you: you are holding and reading this magazine. Now imagine your eye as the “camera” in a 3D game—the camera is the object that is doing the viewing, while the magazine is the scene being viewed. In raytracing an “eye ray” or a “primary ray” is projected from the viewpoint of the camera. Ray tracing uses physics simulations of propagation of rays to render an object. The algorithm first shoots the primary ray from the perspective of the eye. It then determines which object is hit first on the path of the ray. In this example, that object would be the magazine. At this point the magazine’s material will determine the behaviour of the primary ray. If the material is transparent, the ray will pass through. If it is a mirrored surface, the ray will be reflected (angle of reflection, etc. calculated by the raytracing algorithm). If the material is a slightly glossy magazine paper—it will reflect some and absorb some, and so on. The raytracing algorithm also determines if the object it hits is in shadow. To do this, it shoots off a ray from the magazine towards the source of light—if this ray hits the source, then the magazine can “view” the light source and is therefore lit. If the ray is obstructed, the magazine is under a shadow.

This is of course, an oversimplification of the process. The take-away information is that raytracing takes a holistic look at the entire scene that is to be rendered, whereas a “rasterization” approach basically breaks down the entire scene into billions of component triangles. Due to the means used, the cost of raster graphics processing is linear with the number of pixels to be processed. On the raytracing side: the cost of raytracing increases linearly with the number of rays shot.

Intel’s research into raytracing showed an interesting behaviour. If the scene being rendered is kept static, then there is a point at which software-assisted raytracing technique is almost as fast as a hardware accelerated raster technique. Software assisted implies CPU driven. Furthermore, the research found that if multiple processors are thrown at raytracing, the performance scales exponentially. Multiple processors became multi-core processors and will eventually become many-core processors.

John Carmack : “…while everybody thinks [ray-tracing] is going to be great, I have to reiterate that nobody has actually shown exactly how it’s going to be great”

Intel thus envisions a raytracing technique to render graphics, instead of a raster approach. If you consider the architecture of the Larrabee processor—16 to 24 cores, each running at 1.7GHz at least, along with a dedicated unit to accelerate some raytracing calculations and helped by an extended vector instruction set—you can see why raytracing is the future for Intel.

A Muddy Path
While Intel would like the entire industry to embrace raytracing as the future, ground-realities are different. The largest hurdle is that game developers do not see anything wrong with the contemporary rasterization approach to graphics. Neither do the GPU vendors of today. The current ecosystem has embraced raster graphics and has evolved to accommodate its idiosyncrasies. The ray-trace only path proposed by Intel has few backers outside of the company.

Cevat Yerli the CEO of Crytek (FarCry and Crysis), doesn’t think that there is a “compelling example for using pure classical ray tracing”. He opines that “…there are a variety of graphics problems which would suit a hybrid solution of rasterization and raytracing, and [that] most likely is the way to go.” His view is reflected by Intel itself—the initial builds of Larrabee will be derived from Intel’s G45 integrated graphics. This indicates that the early Larrabee variants will not be for high-end graphics processing, but will be more of an integrated graphics replacement—perhaps a notebook part.

The next step would then be a slight raytracing nudge to produce eye-candy either not possible or prohibitively costly, using traditional rasterization methods: effects such as detailed reflections, refractions, and shadows, perhaps. Intel’s goal could then be to slowly wean the industry on its diet of raytracing extensions, to eventually replace raster hardware accelerators by its own breed of processors.

John Carmack of id Software echoes this sentiment—“No matter who does what, the next generation is going to be really good at rasterization—that is a foregone conclusion. Intel is spending lots of effort to make sure Larrabee is a competitive rasterizer.”
Carmack also brings up an interesting point of ‘show, not tell’, when he says that “while everybody thinks [raytracing] is going to be great, I have to reiterate that nobody has actually shown exactly how it’s going to be great. The best way to evangelize your technology is to show somebody something… to kind of eat your own dog food, in terms of working with everything.” Intel recently made two important purchases—the Havok physics engine, and Project Offset game engine. Perhaps these purchases are meant to do exactly what John Carmack proposes—to show the development community exactly why raytracing is the way ahead.

Meanwhile, Intel faces off against an entire industry. Who will do the actual ass-whooping; remains to be seen