Memoirs Of Memory

Published Date
01 - Aug - 2006
| Last Updated
01 - Aug - 2006
 
Memoirs Of Memory
With AMD's jump on the DDR2 bandwagon, DDR SDRAM will ultimately find itself on the road to oblivion. Intel made the transition to DDR2 a few years ago with their 915/925 chipsets. With the end of the road for DDR memory clearly in sight, it's time to take a look at it in greater detail. While doing so, let's take a look at what's replacing it!

Was The Switchover Warranted?
DDR or Double Data Rate SDRAM is so called because data is transferred twice per clock, once each for the crest and trough edges of a signal. In comparison, earlier SDRAM could only manage one transaction per clock. The data transfer rate is therefore twice as much as the frequency of the memory cells, that is, 200 MHz DDR memory is called DDR 400.

Before we get into the details of one architecture and the pitfalls of another, it makes some sense to analyse why exactly the change was necessary. Let's look at the major problem all processors face today: the data transmission rate, or bandwidth. The performance of any PC depends upon just five major devices-the CPU, the chipset/motherboard, the RAM, the graphics solution and the hard disk(s). It's a known fact that the hard drives are by far the most serious bottleneck as far as data transmission goes. However, the processor and graphics card on a PC are fed directly by the system memory, and due to the speed limitations that plague RAM (even today) and latencies, another bottleneck emerges.

With today's gigahertz CPUs and dual and quad cores, number crunching is never going to be an issue, and the main challenge remains supplying these processors with sufficient data to satiate their enormous appetites. The RAM is that vital link entrusted with supplying this data, and the performance of the memory used in a PC will directly impact the overall system performance. In fact, it's not uncommon to find enthusiast users with 2 and even 4 GB of memory configured in dual-channel mode. Why? Quite simply, to ensure speedy performance! When a user says "PC performance upgrade," eight times out of 10, he means a memory upgrade, making this component one of the most frequently upgraded.

To represent bandwidth in a simple formula: Bandwidth = Frequency x Bus Width.

"Frequency" here means the rate of data transfer, (mostly measured in MHz), while bus width is measured in bits and represents the width of the path that the data flows through. Try to visualise RAM as a highway with multiple lanes. Traffic could be regarded as data. The wider the lanes, that is, the bus width, the more data can pass through in a given amount of time. Conversely, greater data speed (the speed of the traffic) will give the same result-more throughput. The first assumption about increasing the bus width of the memory isn't practical for all purposes. Memory width remains at 64 bits (remember it's DDR, so that's actually 32 x 2). 

For example, DDR 400 MHz memory runs at 200 MHz internally (200 x 2 because it's DDR), while the bit width is 64, which gives us a bandwidth figure of 3200 MBps. In megabits, we'd get 400 x 64 = 25600 megabits per second. To arrive at the figure in megabytes, 25600/8 = 3200 (MBps or megabytes per second). DDR 400 MHz memory is therefore also called PC 3200 RAM.
If we look at the bandwidth figure of a 32-bit, 3.2 GHz processor: 3200 x 32 = 102400 Mbps, or 12800 MBps.

Add to this the fact that memory has a latency-a wasted-clock-cycles figure that is astronomical as compared to that of a processor-and it's no wonder that even the fastest memory available bottlenecks today's CPUs.

DDR peaks out at 500 MHz, with some manufacturers going into the 550 MHz realm. But the problem was, DDR architecture was left with no headroom for future speed-ups. Although almost no amount of bandwidth increments will bring memory anywhere close to satisfying the processor's bandwidth requirements, DDR2 seems like a step in the logical direction, offering steep increments in raw clock speeds at least. With DDR2 promising that 1066 MHz will become mainstream by late 2007, and 800 MHz already available in goodly doses, it seems right on track to greet the latest monsters from Intel & AMD-the Conroe and AM2 processors respectively.

DDR Data flow diagram : note the two level multiplexing at work, represented by the two red data lines flowing into the 1/0 Buffer

Power consumption-wise, DDR2 is the hands-down winner, with a 1.8 V operating voltage as compared to the 2.5 V that DDR requires. The consumption of power will only increase as density (in gigabytes) of memory modules increases. For example, 4 GB of DDR memory would consume close to 40 watts of power for every read operation. Keeping in mind the universal drive for power gains through saving, DDR2 keeps environmentalists happy as well!

Another important advantage is DDR2's for the taking: due to its lower power consumption, much higher-density modules will be possible, and DDR2 memory can be available in 2 and 4 GB densities on a single DIMM! This is also possible due to the fact that DDR2 chips use an FBGA (Fine Ball Grid Array) packaging, while DDR utilises the older, albeit cheaper, TSOP (Thin Small-Outline Package) packaging, which was a serious hindrance to speed increments due to the inherent high resistance and inductance capacity in TSOP.

Architectures: DDR And DDR2 Compared
Let's look at a simple illustration showing data flow in DDR memory:

As we can see, there's a multiplexing of sorts taking place, since each bit from both pipelines is being fed onto the same output line. Here, "multiplexing" quite simply means sending multiple streams of data (let's look at it as a single bit of data for simplicity), at the same time and on the same stream, which are collated in the Input/Output buffer. We therefore have two bits at the end of one clock cycle for DDR memory. What happens is that both bits of data are captured at separate stages, fed into the pipeline, and then moved to the data bus, one each on the peak and trough clock signals. Each of the pipelines are 32 bits wide (there are two of them); the memory bus width is therefore 64 bits.

DDR 2 Memory Multiplexing

DDR 2 data flow diagram : Four tiers of multiplexing at work, 1/0 Buffer speed is also twice the DRAM frequency,unlike DDR RAM, which is synchronous

DDR2 RAM follows the double-up principle, by doubling the number of bits fetched from the data pipeline. The pipeline has been streamlined further to a 16-bit architecture, and the multiplexing principle is taken to the next level with four bits being supplied at each peak and trough. Therefore the 64 bits on a module are comprised of simultaneous transmission from each of the four banks.
Let's look at this in more depth. There are three different frequency values we need to understand. First we look at the core frequency/clock, which represents the actual speed of the DRAM chips. In the case of DDR 200 MHz memory, the core speed is 100 MHz; in the case of DDR 400 memory, it is 200 MHz.

Then we have the clock frequency, also called frequency of the I/O buffers. For DDR memory, the I/O buffer speed is synchronous with the DRAM frequency. However, for DDR2 memory, the I/O buffer speed is twice that of the core. Therefore the I/O buffers need to be supplied with twice the amount of data that the DDR memory buffers require, to avoid serious latency issues. This is exactly why the multiplexing level in the case of DDR 2 memory has increased to four, twice that of DDR. In the case of DDR memory at a core speed of 200 MHz, your I/O buffer speed is also 200 MHz, and the data speed becomes 400 MHz. In the case of DDR2 400 MHz memory, the data frequency and the I/O buffer remain identical, at 400 MHz and 200 MHz respectively. What changes is the core speed. It's down to 100 MHz! Remember what we said about the relationship between core speeds and I/O buffer speeds? Well, for a core speed of 200 MHz, the I/O buffer frequency becomes 400 MHz, and the data frequency becomes
800 MHz.

Looking at these figures, the difference is glaring-at the same core/DRAM speeds, the bandwidth effectively doubles!

The Role Of Latency
One of the most important characteristics of memory and a significant performance-affecting factor), is latency. Latency can be quite simply defined as wasted clock cycles. It occurs since DRAM memory cells have to continually refresh themselves. Latency, therefore, basically represents delay; it's the time taken for the memory to get ready for a fetch or deliver data transaction. There is also a certain amount of unavoidable latency that occurs between the activation of a column or row, due to the time required to set up the addresses of the same. Note that latency is an omnipresent factor; it can only be minimised, and never completely done away with.

Now latency is directly proportional to clock speeds, so when a MHz bump occurs, the latency figures also rise. Let's compare a DDR 400 and a DDR2 400 MHz module. Typical latencies would be 2.5-3-3-6 for a DDR 400 MHz module, while a DDR2 400 MHz would only manage 3-4-4-8, making it slower as far as data accessibility goes. So DDR 2 scores bandwidth-wise, but loses out latency-wise. A paradoxical situation, with only one solution-bump up the clock speeds further. And DDR2 has done exactly that, delivering 667 MHz and 800 MHz, with 1066 MHz promised in the near future. With AMD 64's efficient memory controller and Intel's Conroe's bumped up FSB speeds (1066 and 1333 MHz), this extra bandwidth will go down real easy!

  • RAM Latencies-What All Those Figures Mean 
So we know that latency-wise, DDR2 is slower than DDR memory. Latency figures are measured as 3-3-3-8, or something similar, and most users are lost when it comes to the meaning of such figures.
The figure 3-3-3-8 represents delay. The smaller these values can get, the faster the memory is (clock speeds remaining constant, of course). It's worth noting that these figures represent the minimum latency, meaning the timings cannot get any smaller, that is, tighter. DDR 400 MHz memory at timings of 2-3-2-6 would, for example, be much faster than similar memory at 3-3-3-8 timings, simply because the latency is reduced. Reduced latency leads to what is called "tighter timings" by enthusiasts.
The first figure in the above example is the most significant speed-wise. It represents the CAS or Column Address Strobe latency figure.
As we know, RAM first has to read a command sent to it, and then output some data based on that command. The CAS represents the delay between a registered read and data output. It's measured in clock cycles. Therefore a CAS latency value of 3 means three clock cycles will complete before data is ready to be sent forward. CAS Latency is often abbreviated as CL,
The second figure (the second 3 in the example) represents the RCD or Row - Column Delay. It is defined as the number of clock cycles required between RAS and CAS. As latency, it's the delay between defining a row and column in a particular memory block, and the read/write operation to that particular location.
RP is the Row Pre-charge time. It's denoted by the third 3 in our example. In memory, each row in the bank needs to be closed, that is, terminated before the next row can be accessed. The RP represents, in clock cycles, the time needed to terminate and open a row of memory (open being the current state), and to access the next row.
RAS stands for Row Address Strobe. This is the last number in our example, i.e. 8. There is a delay between requesting of data and the actual issue or a pre-charge command. This difference is basically the amount of clock cycles spent in order to access a certain row of data in memory. This delay between data request and pre-charge is called RAS or active to pre-charge delay.

What's New... About DDR2?

In addition to the additional bandwidth DDR2 provides, and its superior technology (TSOP is inferior to the FBGA packaging), savings in power consumption, and reduced thermal specs, there are also a number of new features inherent in DDR2. These are basically refinements that make DDR2 better, sort of an evolution over DDR1. Let's take a look at some of these.

1. ODT (On Die Termination): Perhaps one of the major drawbacks to extracting more speed (MHz) out of memory. Any signal moving along a bus reflects to a certain degree when it hits its intended target. These signal directions could go either way along the bus, and the reflected signal (or mirror signal) causes interference in the original signal, or could even cancel out the original signal depending on the original signal strength. This is where ODT scores, big time! DDR2 simply introduces a termination point to the original signal (DDR doesn't have ODT-refer to the features table above) once it has reached its target, by adding a resistor to ground it. This eliminates any reflection voltage as the resistor simply swallows up any signal voltage coming its way.

2. Posted CAS: Another necessary feature that DDR2 implements, whose basic aim is to eliminate data collisions along the bus. This is especially significant for DDR2 because of the high clock speeds involved, which increase the chances of collision. The command bus is responsible for issuing commands. A command buffer is placed on the DRAM chip that "bundles" these commands, that is, holds a command and schedules it for a later release. Therefore the command is pre-issued and stored, but the read operation is postponed. The command bus is freed from the burden of addressing exactly when to release that particular command. It can now activate the next bank. This delay is called Posted CAS. The delay that is specified during initialisation of the DRAM chip is called Additive Latency or AL. This, however, does play a part in increasing latency a bit; the read latency now becomes the sum total of the CAS latency and the Additive Latency. This isn't too detrimental to the bandwidth, however, because the command is issued early.

3. OCD (Off-Chip Driver) Calibration: DDR memory introduced the principle of clock forwarding. A single data strobe signal was used. The function of this strobe is to minimise signal skewness ("skewness" means deviations from the ideal curve that a signal is supposed to follow). Note that some amount of skewness is normal and cannot be avoided, but the goal here is to minimise it, not eliminate it. This single strobe signal is compared to a reference point signal, which is predetermined. The problem of comparison becomes worse because the degree of skew is also never the same, and may change from clock to clock.

DDR2 introduces a bidirectional, differential strobe. This ensures higher signal integrity because it allows the two strobes to be calibrated against each other, as opposed to DDR, which simply calibrates on the basis of a reference point signal. With two reference points, skewness is minimised, and therefore even at higher frequencies, signal integrity is maintained.

  • DDR And DDR2 At A Glance
Specifications  DDR DDR2
Clock speeds 200/266/333/400/500400/533/667/800/1066 
I/O Buffer frequency100/133/166/200/250 200/266/333/400/533 
DRAM core frequency 100/133/166/200/250 100/133/166/200/266 
Prefetch size (in bits)  2  4
Voltage (volts 2.5  1.8
Data strobeSingle DQSDifferential (DQS and /DQS) 
PackagingTSOP/TSO FBGA 
Support for ODT
OCD Calibration support 
 

Intel And AMD: Who Benefits More?
Looking at the comparative architectures of the Intel and AMD processors, it's quite easy to draw the conclusion that Intel CPUs are more memory bandwidth sensitive, rather than latency sensitive. In other words, an Intel processor performs better at higher DRAM frequencies, despite the higher latencies. In other words the Pentiums and their ilk are MHz demons, both clock-wise and memory requirement-wise!

The AMD 64 architecture is slightly different. AMD's 64-bit range has an integrated memory controller, which effectively negates the latencies that occur between the CPU and the Northbridge (where the MCH a.k.a. Memory Controller Hub typically resides). Due to this, AMD 64s become extremely latency sensitive, and such systems give blistering memory bandwidth scores. The controller can be as much as 90 per cent efficient when memory with tight timings is used. Compare this with Intel processors: their MCH isn't anywhere near as efficient. To make up for the efficiency of the controller, Intel processors need faster memory. They are perfect candidates to make use of the 800 MHz and 1066 MHz speeds that DDR2 dishes up.

This does not mean that AM2 processors are lacking in any way. In fact, the memory controller being efficient means that AM2s will be able to do wonders with the increased bandwidth. Although the latency will hit AMD harder, the silver lining is that DDR2 memory is continuously being refined, and memory with faster timings are being introduced every few months.

The drive for DDR2 is an obviously warranted one. The memory industry as a whole is still in a state of flux, with new technologies emerging and improvements being made. All this to feed the immense bandwidth requirements of today's users. Yes, that' correct, it's we users and our applications that drive the hardware industry as a whole, and not the other way round! The fact that DDR2 might well be a stop-gap for three to four years should also not escape you, with new technologies like DDR3 promising an emergence into the mainstream memory market. Remember when people said nothing could beat Dual DDR 400? Well, it just got beat-all the better for us!




Team DigitTeam Digit

All of us are better than one of us.