The Race For Speed

By Team Digit Published Date
01 - Sep - 2005
| Last Updated
01 - Sep - 2005
The Race For Speed

The only constant is change. In the computer world, the only constant is increase in speeds-read gigahertz. Ever since the public got sold on the idea of faster processors, Intel and AMD and the smaller processor manufacturers have been making their millions off just one idea-that people want faster computers. Of course, many things go into the making of a faster computer, but a faster processor is perhaps most important. So how do AMD and Intel make processors faster?

In what follows, we dwell on the why and what else and what if of dual-core processors more than on the processors themselves. This is because the actual concept is straightforward enough-it's the peripheral issues that are more pressing.

So, how does one make a faster processor? The first and most obvious answer is to make changes in the design of the processor such that it is intrinsically faster. To illustrate, pipeline implementations (see box Instructions And Pipelines) have a huge impact on processor performance-and breakthroughs that improve processor design come about once in a while. But some fundamental barriers are being reached, and the need is for a quick-fix solution to the 'gigahertz problem': faster processors, right now!

The second thing one could do is increase clock speeds. But that's another fundamental barrier that's being reached: the rate at which chips have been shrinking, manufacturers can't seem to achieve higher clock speeds without overheating the chip. Mandatory water cooling? That's a possibility… But even then, the heat we're talking about is due to the power that's supplied to the chip, and it takes a lot of power to get a processor working at high clock speeds. Now, a processor with a lot of electricity all around the die is prone to electrical noise. This electrical noise causes interference between the 'wires', or pathways, on the processor. There could then be some electrical radiation between adjacent pathways, which could, in turn, corrupt data.

Another solution lies in keeping the processor busy for more of the time. This is achieved via HyperThreading, for example (see box HyperThreading).

One more possible solution is increasing the L1 and L2 caches (See box Processor Caches for more). This is a definite possibility, and is being done-but cache memory is expensive, and increases the total cost of the chip. Both Intel and AMD have been increasing the size of the L2 caches. Intel has even added an L3 cache to the 'Extreme' Pentium versions-this was previously unheard of in a desktop processor.

However, it must be borne in mind that increasing the cache is a diminishing-returns situation: beyond a certain point, increasing the cache just doesn't help.

As for increasing transistor count, processors are already cramped. While it may be possible to cram more in the future, it doesn't seem feasible right now.

The most elegant solution comes about when you think of the way computers are used these days: one person running several applications at the same time. A quick-fix solution would seem to be, why not delegate two processors for the same jobs? In other words, if there are four tasks being performed, how about two processors for the job-each of them doing two tasks, both working at the same time?

In principle, that's what dual-core processors are about: they're essentially two processor cores on one die. They're not two physically separate processors-the cores are on the same die, and internally share the L1 and L2 caches (See box Processor Caches). The two cores also have separate pipelines (See box Instructions And Pipelines), so data gets processed that much faster. (A die is an integrated circuit chip as diced or cut from a finished wafer; a wafer itself is a thin slice of semi-conducting material, such as a silicon crystal, upon which the microcircuits are constructed.)

The two cores share some hardware, such as the memory controller and the bus, whereas a dual-processor system has completely separate resources-such as the caches-for each processor.

Remember that in general, a dual-core processor won't be as fast as a dual-processor configuration. There could be exceptions to this, depending on the software that runs on these newer systems-so we'll have to wait and watch to see if dual-core beats dual-processor.

So then, why two cores on one die, when you could have a dual-processor rig? The answer is cost. A dual-core processor works out cheaper in terms of production cost than two processors taken together. This is not to mention that the motherboards required for a dual-processor configuration are more expensive too, because of the extra circuitry, and for other reasons as well.

Processor Caches 

Processor cache memory is expensive, very fast memory-much more expensive than, say, DDR-RAM. It is located on the same die as the processor. The processor cache is used by the processor to reduce the average time that it takes to access memory.
To understand how cache memory works, think about two instructions-"add A and B to make C" and "add D to C". Now, after the processor has added A and B to make C, C is stored in main memory (the RAM) as a value. But it's also stored in the cache. This is because it's very likely that there will be some instructions in the near future that will need C. Sure enough, our next instruction goes "Add D to C", and the processor, when it looks for the value of C, looks in the cache first. In our example, it will definitely find it in the cache, because just the previous instruction mentioned C. So overall, the CPU has to 'go' to main memory less often, with the help of caches. (Remember that main memory is much slower to respond than cache memory. Since the cache needs to be fast, it has to be placed physically very close to the processor.)
Now where 'L1' and 'L2' come in is when you consider the fact that L1 cache-the memory that is physically closest to the processor, after the actual CPU registers-is prohibitively expensive. Processors therefore can only afford a small L1 cache. If the cache size is to be increased, one needs to resort to a slower, larger L2 cache, which is further away from the processor. This is the multi-level cache concept.
The L1 cache, typically 8 to 64 KB in size, is where the processor looks first. If the item is not found, the processor looks in the larger (typically 256 KB to 2 MB), slower L2 cache, and so on, all the way to main memory. 

Think now about how two cores would help: the simplest scenario is when your computer is doing two entirely different things. One could be the encoding of a video file, and the other could be the manipulation of an image in Photoshop. These two tasks could, in principle, be handled by the two cores independently. There's another scenario here: the second processor will be able to keep the system running smoothly if the first core is being put to 100 per cent use. Think about how your system slows down when you're encoding a video file… with the help of the second processor core, you wouldn't even notice that there's some encoding going on in the background!

The Software Aspect
There's a difference between multi-tasking faster, and a single program running faster. You could expect a performance boost if, like we said, you're running two very different programs. But the story is different if you're running just one program. In this case, you may not even notice the difference between a regular processor and a dual-core.

So overall, what kind of an improvement can you expect with dual-core? The operating system and the software have to be aware that there are two cores out there doing the work, and make the threads accordingly. Remember that the two cores make sense only if there are entirely different threads to exploit-so the separate threads have to be created somehow. And it is one of the jobs of the OS and the software to do this. On top of that, they obviously cannot do it with 100 per cent efficiency (or segregation).

So can current OSes and software do this? Unfortunately, not all. Some OSes, such as Windows XP Home, simply cannot acknowledge that there are two processor cores. And most games and applications aren't written with this in mind. (Adobe Photoshop, amongst other programs, happens to be, so if you use it extensively, go right out and buy a dual-core!)

Meta Group vice president Steve Kleynhans says, "It's a lot more complex to write software that's multithreaded. It's also harder to check for errors. We could find ourselves with buggier software. Certainly software vendors have known this and been working on multithreaded applications for some time, but it is more complex."

That makes it sound like a pretty dismal situation. But keep in mind two things-first, that as dual-core becomes mainstream, operating systems and software will definitely evolve towards being multithreaded, complex though it may be. The market will demand it. And secondly, as we mentioned before, even if your applications aren't currently aware of two processors, and if one app maxes out one core, the other core will prevent your system from lagging.

But then again, another question raises its head. If dual-processor-aware applications aren't so many, why are the chipmakers pushing for their release right now, in their desktop avatars? The answer probably lies in competition-and early adopters of the technology apart, neither AMD nor Intel can afford to lag behind too much, lest its marketshare be gone by the time it gets in!

AMD's And Intel's Approaches
AMD's and Intel's approaches to dual-core is very different. In AMD dual-core processors, the two cores communicate via a Northbridge-like provision (see box The Northbridge) within the chip itself. AMD had to add this provision to the Athlon 64 die so as to support the onboard memory controller and HyperTransport link (see box HyperTransport), and this now comes in handy-a second core can easily be accommodated, with all inter-core communication happening within the chip.

By contrast, in Intel dual-core processors, the communication has to happen via the front-side bus and the Northbridge on the motherboard, meaning external access.

So are AMD's dual-core chips "better and faster" than the Intel one? It seems so. To quote, "Both Anandtech and Tom's Hardware-two hardware benchmarking sites-have published reports stating that, in their own tests, the dual-core Athlon 64 X2 chips generally edge out the Intel Pentium D and the dual-core Pentium 4 Extreme Edition, although the results vary by the tests. Anandtech found that the fastest dual-core Athlon, the 4800 , and often other dual-core Athlons, typically outperformed the Intel chips on tests for single applications, such as running Adobe Photoshop or DivX."

This is not surprising, since it seems that Intel's Pentium D, a dual-core processor, is just like two identical P4 Prescott processor dies tacked together.

Instructions And Pipelines 

A computer instruction is not as simple as "add A and B". There's more to it than that. For example, if you're talking about A and B, the instruction needs to be read first; then, the data needs to be brought from wherever it is-either from the CPU registers, the CPU cache, or from main memory. After that, the addition operation must be done, and the result placed somewhere so it can be accessed.
There are independent units in the processor that do each of these things. For example, one unit fetches the instruction; another checks where in the system the values are (assuming it's an 'add' operation we're talking about); another brings the values in; another does the actual calculation; and finally, one unit places the result in a suitable location.
What happens in pipelining is that sequential instructions are carried out simultaneously, based on the principle that each of the units mentioned above can do different things in different instructions.
To illustrate, suppose "add A and B" is followed by "add C and D". What happens? First, the "add A and B" instruction is fetched by the instruction fetcher. Then, while the appropriate unit is figuring out where A and B are, the "add C and D" instruction is fetched by the instruction fetcher. After that, while A and B are being added by the adding unit, the 'location figuring' unit is figuring where C and D are.
The basic idea of pipelining is as simple as that-each unit of the CPU is always doing something, instead of each instruction finishing before the next one is brought in.
Modern advances in computer architecture consist largely in figuring out how to make the pipeline architecture on the chip work better. 

Heat is the enemy of processors, to the extent that one of the reasons clock speeds can't just be bumped up is heat! How much heat do dual-core processors produce? Fortunately, not too much. AMD says their dual-cores will produce about the same amount of heat as their single-cores. Intel's dual-core processors, like the Pentium D, produce more heat than AMDs-but even this is not alarmingly high.

We've all heard about heating issues with processors, and now, when there are two cores on a single die, shouldn't there be double the amount of heat?

There are a couple of reasons that this is not the case, apart from the obvious fact that not both cores will be maxed out at the same time. Firstly, clock speeds: dual-core processors are, in general, clocked somewhat slower than their single-core brethren. And secondly, dual-core processors are being manufactured using the 90 nm fabrication process, just like the latest single-cores. The "90 nm" refers to the smallest 'feature' size on the die, usually the gate width of the transistors. A smaller fabrication size leads to less power dissipation, and therefore less heat.

The Motherboard Issue
The first run of the dual-core Athlon 64 X2 chips are compatible with any Socket 939/940 motherboard. All that's required is a BIOS update! But in the case of Intel, you need a new pair of supporting chipsets-the Intel 945P and 955X-meaning you'll have to buy a new motherboard if you want an Intel dual-core. This really makes it seem like AMD is already ahead in the race when it comes to dual-core.

Licensing issues
With the advent of dual-core, licensing issues have cropped up. If a software is licensed for one 'computer', but it's assumed that the computer has only one processor, what happens when the computer happens to have more than one processor? Does a dual-core processor count as two processors or one?

How multi-core chips will affect licensing is unclear, but industry analysts say a simple, uniform system is unlikely anytime soon. AMD recently released recommendations for how software companies should charge: it said companies should count processors based on the number of physical sockets used. This would mean a dual-core chip would count as a single processor. Microsoft, too, has said that it will consider a dual-core processor a single processor, and will charge per processor, not per core.

Oracle, on the other hand, considers a dual-core server a two-processor server when charging for its server software. Similarly, IBM considers two cores as two processors for software licensing purposes. But other software sellers are taking the opposite stance: Sun not only prefers the socket definition for a processor, it argues for a radical pricing structure change. When selling its server and desktop software, it charges a fixed annual fee based on the customer's total number of employees.

Kuldip Hillyer, a manager in the strategic marketing group at BEA, gives a compelling argument: "Dual-core does not equal two CPUs in terms of performance. Hardware vendors don't charge you double; they charge a 30 to 40 per cent premium."

Dwight Davis, vice president and Practice Director at Summit Strategies, sums it up: "Some licensing models are showing signs of strain, none more than the per-CPU model common to databases and application servers. The static per-CPU model seems destined for extinction."

Here To Stay
Call it a game. Or a race. The chipmakers simply had to get faster CPUs out there, and dual-core seems the best approach-regardless of how much software support currently exists.

The Northbridge 

The Northbridge is one of the two chips in the core logic chipset on a motherboard, the other being the Southbridge. Sometimes-but not too often-the two have been combined on to one die, when design complexity and fabrication processes permitted it. But in general, core logic chipsets are divided into these two main parts. The Northbridge typically handles communications between the CPU, RAM, AGP port or PCI Express, and the Southbridge. Some Northbridges also have integrated video controllers.
In our context, the point is that AMD's dual-cores have a Northbridge-like provision within the chip itself, so there's no need to go to the motherboard for inter-processor communication. 

However, regardless of whether the software supports it, the fact is that more and more applications are being run these days at the same time. We routinely run anti-virus software, firewalls, CD-burning, Web browsing, and much more, all at the same time. If the OS can properly schedule these software to run on two cores, you will see the difference. In addition, software will evolve, and individual pieces of software will soon become amenable to being run on two cores.

Whatever the case may be, you're going to get a dual-core processor soon. They will be mainstream by 2006-there's no escaping that!

Team DigitTeam Digit

All of us are better than one of us.