AMD “not threatened by the Fermi”; says it is “six months late” and still doesn’t deliver on Nvidia’s promises

By Abhinav Lal | Updated on 31-Mar-2010

31 Mar 2010 16:39

These views were shared with Digit when we interviewed Vamsi Krishna (full interview below), Senior Technical Manager of AMD India, at the India launch of the Opteron 6000 Series platform in New Delhi. The launch also introduced their new Opteron 6100 series Magny Cours x86 processors, which have 8-cores and 12-cores.

Other speakers included Ravi Swaminathan, Managing Director and Regional Vice President of Sales and Marketing, AMD India; and Arvind Chandrashekar, General Manager Business Development, AMD India.

AMD’s launch of its new Opteron 6000 Series platform in New Delhi

At the launch, apart from detailing the new products, the speakers talked about how AMD will be radically changing the server game again now in 2010, the way it did in 2003 with the launch of the original Opteron 64 platform. The new generation Opteron 6000 Series platform offers exponential performance improvement from the previous generation, and, at lower costs!

Ravi Swaminathan said the server and processor markets have historically been driven by clock speeds, but that is a “20th Century” way of looking at things. Now, it is all about performance per watt, or price per performance per watt. Mr. Swaminathan said that while “Innovation is our [AMD’s] DNA”, they use innovation to “create value” for their customers. Their motto is to provide “exceptional value, low total cost of ownership, and generational consistency”. Mr. Swaminathan also introduced its major OEM partners, such as Dell, HP, Cray, VMWare, HP, Acer, and SGI, who have adopted the new platform, and will start shipping servers based on them soon.

Refer to our previous article for more information about the new Magny Cours processors and Opteron 6000 series platform. In a nutshell, the Magny Cours processors eliminate the exorbitant ‘4P tax’, by making their processors capable of working in 1P, 2P, and 4P server designs, not requiring a more expensive chipset or processor when changing server design. The server lifecycle has also been kept as large as possible, by maintaining generational consistency, allowing users to upgrade to the 2011 Bulldozer (8, 12, 16 core) processors on the same socket and chipset.

The role of AMD India was also spoken about, of how the Indian contingent contributes significantly to AMD’s global engineering efforts. In fact, one of AMD’s proprietary power-saving technologies, C1E Power State, was developed completely in India.

We met Vamsi Krishna at the Oberoi Hotel in New Delhi, where Digit talked to him about cores versus threads, AMD India, the Magny Cours and Lisbon processors, the new 16-core Bulldozer processors, Intel’s competition and Xeon counterparts, Nvidia’s Fermi graphics cards, his killer gaming rig, and much more!

Vamsi Krishna, Senior Technical Manager, AMD India, at our interview

Q1: What part does AMD India play in AMD’s global affairs?

In India we have two functions, engineering and sales & marketing. Engineering is a more globally aligned organization in terms of designing our next generation processors and next generation graphics. In fact we have two centres in Bangalore which are focused on cutting-edge microprocessors, and we have two centres in Hyderabad focused on cutting-edge graphics.

They play a very critical role in the road-map of AMD from a corporate standpoint. As you have seen, a lot of the products that have come out since 2007 – the most significant examples being the “Shanghai” launch, the quad-core 45-nm launch, the Istanbul launch, and now the Magny Cours launch – have been developed with significant contributions from the Indian teams. And those are just a few examples, there are so many other examples. The next-generation Fusion…what better place than India, where you have two separate development centres that have a talent for developing volumetric solutions for processors and graphics in the same country. The Fusion is going to be a product that is going to be developed in a largely in India.

Q2: Any reactions to Nvidia’ Fermi launch and benchmark results?

The reactions came from the media. Benchmarking is just one of the equations. The other equation is the cost of adopting it, the power cost of running the product. The media say, it came six months late, and it came very hungry. The question, is it really a solution that customers want to adopt?

Our “Evergreen” products were the industry’s first DX11 products. In fact, today we still hold the ‘Fastest Graphics Card on the Planet’ title, with the HD 5970, and that title still has not been taken up by the competition even after the six month delay.

We are actually in a strong leadership position. We are not really threatened by the Fermi as a product, because of the strategy that we incorporated with the Evergreen products, where the mid-range solution is designed first before being scaled up to performance range and scaled down to the value range, because we have the advantage of manufacturing, and no great complexities involved. What the competition has adopted is that they build a huge massive die, which is really tough to manufacture, and really power hungry.

The specifications announced when the Fermi was introduced are not the same as what they [Nvidia] are releasing today. So, that was a strategy that really kept us going in the graphics market, and today we have the best graphics products in the world.

Fermi GPUs: GeForce GTX 470 (left) & GTX 480 (right)

Q3. Will there be any foundries coming to India?

Well, I can’t really make any statements, especially as Global Foundries is not a company fully-owned by AMD, it is owned by our partners as well.

Q4. How much of an advantage does Intel’s Advanced RAS (reliance, availability, and serviceability) MCA (Machine Check Architecture) Recovery place them at? Does AMD have something comparable?

Apples and oranges…the Nehalem-EX, extended edition as it is called, comes with MCA as a new feature. The reason why we haven’t incorporated MCA in our products at this point of time is because of what we are trying to do, to bring down the volumes of 2-ways and 4-ways to the masses.

When we are talking about 2-way and 4-way products, the real problem is not hardware failure. When you are talking about 8-way, 16-way, and 32-way, the hardware becomes more and more complex, more and more unmanageable. It is there that you need these “fancy” features of RAS incorporated at the product level. So, our Magny Cours processors are not targeted towards that very high-end and very niche server market, and that is the reason why we haven’t incorporated MCA.

Q5. How similar is Intel’s new QuickPath Interconnect technology to AMD’s industry winning HyperTransport 3.0?

They are both actually comparable neck-to-neck. They are both very similar in concept. Both are methods of serial communication buses, and while the protocols might vary a little bit here and there, the net result is the same. And the question is, what is the gigatransfer per cycle that each bus can run at?

The Magny Cours run at 6.4GT/cycle, more or less the same as the Nehalem-EX. But, we were the first ones to incorporate this kind of serial communication strategy and Direct Connect Architecture, and this gives us a good advantage in terms of optimizing the design and application path when we are designing a 2P or 4P server design.

Q6. When approximately will the Opteron 4100 series processors be launched? Will they also work on Socket G34?

They will be launched in the next quarter, and no, they will not fit on the Socket G34. The reason why we are launching the Opteron 4100 series is to bring the volume servers to market. The socket they are designed to fit on is C32 socket, which will be common to the 1P and 2P market. The G34 socket will be common to the 2P and 4P market.

The Opteron 4100 is targeted for entry-level stacks or Rack Density Deployments. Someone who is really, really bothered about power, and want very really-low power consuming products, would go for the 4100 series, especially in applications like cloud, where one would want a very rapid, dense deployment. When one is talking about cloud, one is really not bothered about how many memory channels you have, how mucg memory each processor has, but instead, one is bothered about what is the computing resources available are. Therefore, it is specifically targeted for those kind of rack density deployments.

Q7: How come you did not use the 32nm process, like Intel’s Westmere-EP Xeon 5600 series, to make the chips more energy efficient?

While the 4100 has not yet released, we can actually position our Opteron 6100 solutions against the Westmere-EP processors in the meanwhile, and the prices will still be very competitive.

Coming back to your question, when you are moving from manufacturing generation to a newer manufacturing generation, there are big advantages for the manufacturer on a 32-nm technology, where more dies can be cut per wafer, and the cost can be realized per processor.

I’ll give you an eye-opening example: compare the Westmere-EP against the older Nehalem product. What is the cost to produce each? It is definitely higher to produce a Westmere-EP than the existing Nehalem product.

While moving from 45nm to 32nm, we obviously expect a lower power consumption. Compare the existing the existing Nehalem vs Westmere EP, The Westmere H series consumes 130 watts, while the old Nehalem consumes 95 watts. So there is actually a power jump.

The end-customers are not buying nanometers per say. Imagine if the end-customer is getting lower power consumption and better performance at 45 nm, at a better price., how do nanometers matter to him? So, for the end-customer, nanometers are something that looks fancy on the specifications sheet, and it has nothing to do with the power consumption or the performance.

Q8: The upcoming Xeon 7500 Nehalem EX top-end processor, the X7560, will have almost twice the shared L3 cache, 4 more threads per processor, and 4 less cores per processor than the top-end 6176 SE Magny Cours. How do AMD’s Opteron 6100 processors rack up against this and still provide competitive performance?

Both are targeted at different customers with different requirements. The Nehalem EX comes with all the fancy specifications that you just talked about, but it was meant for a different class of customers who are looking for risk-free placements. The cost becomes the biggest apples to oranges comparison factor there. The Magny Cours are meant for customers who want to go for value 4P servers, who want to keep their data centres at a cost-conscious level. Both are different products to be compared. You can almost buy 2 4P Magny Cours parts at the cost of approximately 1 Nehalem-EX processor. That’s the biggest advantage.

There are also a whole lot of other challenges that Intel with have with the Nehalem-EX. The introduction of very expensive memory buffers on the motherboard, which will make the overall cost of the server unaffordable to IT managers, especially at this time, when the market is recovering from a beating, and when IT managers are very cost-conscious. A 4P AMD server will meet his requirements.

Q9: The 16, 12, and 8 core Bulldozer processors of 2011 are going to be made generationally compatible with the Opteron 6000 Series Platform and Socket G34. Does this not enforce limits to the Bulldozer processors’ potential?

You always have advantages and disadvantages with this [generational consistency] approach. The bigger picture is the advantage, and the smaller picture is the small trade-off when giving that upgrade path.

A simple example, among the top supercomputers, the #1 right now is the Jaguar. It was originally powered by the quad-core Shanghai-based processors, and was the #3 supercomputer last year. The platform was upgradeable to Istanbul processors. So when the folks at Jaguar decided to upgrade their quad-cores to six-cores, they became the #1 in the world. That is the bigger picture I am talking about, that is something that money cannot buy. Imagine if they had to rebuild their entire high performance computer (HPC), board to board, memory to memory, everything will have to be repurchased. It will cost them 20x or 30x the cost of simply upgrading.

That is the bigger picture I am talking about – the customers who buy our servers have the surety that their servers are not going to go obsolete in the next couple of years. That is the promise we are making to the customers, that is our commitment to our customers.

Additionally, the [newly introduced] Direct Connect Architecture 2.0 is designed in such a way that it can scale to Bulldozer requirements, the same way DCA 1.0 was designed to scale to six-cores. So that is why when we multiplied the paths of HyperTransport, doubled the memory controllers on DCA 2.0, we kept the Bulldozer in mind the entire time. So, there is no real trade-off when looking at this generational compatibility.

Q10: Is the Bulldozer based on the 32nm fabrication process?

Yes.

Q11: AMD has taken its server products on a path towards “exceptional value, low total cost of ownership, and generational consistency” instead of just performance. Will that alienate AMD’s servers from those corporations who do not care about the expense but only the performance?

Actually today there are almost no companies who are blind to costs. Less than 1% of the market might be like that. But there is no one who will say “I have loads of cash in my pocket, I can spend whatever I can, just give me the best in the world”. Nobody asks that one. And in any case, the top performing 2P server in the world right now runs on AMD Opteron platform.

Q12: There is a general consensus all over the web that the Opteron 12-cores will offer greater parallelism for servers and is ideal for compute heavy tasks cloud computing, but Intel’s 8-cores will be better at memory and processor scaling. What is your opinion?

They are right. But as I have said before, we are targeting different customers and different requirements.

Q13: Many recent advances have been made in chip fabrication, such as electron-beam lithography/nano assemblers or microlaser technology. Is AMD looking out for such advances? And, is there a portion of the company’s R&D dedicated to completely new processor technologies?

See, manufacturing is a very tricky business. Not everyone can come to the market, and say tomorrow I’m going to setup a fabrication plant. It’s a very complicated process.

It is not just the R&D, it is tools, the processes, it is the capital investment that is required to setup the simple fabrication unit. So there are just a handful of people in the world who are able to do the end-to-end of manufacturing, and AMD is one of them, a company that understands manufacturing better than most.

Well, I really cannot talk about the forecast of how we are moving in fabrication, but I can say that we are definitely moving in the right direction that will benefit our customers. What our customers require is what we will invest in, and what we will offer.

Q14. How exactly does the multi-core architecture of AMD differ from Intel’s multi-core architecture in terms of performance?

There are multiple benchmarks. The server side is a very tricky area, compared to the client side. Different applications depend on different parts of the CPU. An example is SPEC, a benchmark that focuses on two different aspects, integer performance and floating point performance. Generic applications like databases etc., use the integer performance. High-performance computing, technical super-computing and PIX [graphics] super computing, they focus on the floating point performance. So it really depends from application to application, you really cannot pinpoint your processor for one-application. Typically if you look at the performance, floating point has always been AMD’s domain, and we have always ruled floating point performance, and that’s why we were the de facto standards of high performance computing solutions, and in fact, 4 out of the top 5 HPCs are based on AMD.

Integer may be a very generic way of defining the performance, and really depends on how well the applications are optimized, and how they are determined to scale on the number of threads. We always believed more in real cores than real threads. Having more computing cores is always better, because if you can run more threads you don’t have the cores to complete the task of the threading, the task will be left waiting, and you are limiting the performance of the core.

Q15. Is there any significant performance gain other than scalability in maintaining the same chipset architecture across all platforms unlike Intel?

There is a huge advantage in having common chipset architecture between 1P, 2P and 4P. The first advantage is drivers. Develop drivers once, and use it all across. So, the cost of development comes down for our OEMs. Second advantage is, 1P and 2P are volume servers, and the low cost of chipset manufacturing will be suitable for them, and that economics can be rolled for 4P. 4P has higher manageability and higher reliability factor, so when you develop a driver for that particular solution, you can use it for 4P. So, one segment has an advantage using another segment’s solutions. The same thing applies for BIOS code, develop it for one solution, and use it across designs.

Q16: We see that Direct Connect Architecture 2.0 is the new architecture being implemented across multi-core AMD processors starting with the Opteron 6000 series. HyperTransport is an old concept; but what is new with HyperTransport 3.0?

Direct Connect Architecture 2.0 is based on HyperTransport 3.0 protocol, but at a faster frequency. In fact, HT3.0 was available in our products since the quad-core Shanghai processors that were released in 2008. Those speeds were running at 4.8 Gigatransfer per cycle, and the new DCA 2.0 allows it to run at 6.4 Gigatransfer per cycle.

Q17: We have seen that 8-core and 12-core AMD architecture is targeted on server platforms. However, is it optimized for home applications like multimedia and gaming?

Typically, the server receives the technology first, before they are brought into home use. Well, if someone wants to use a 12-core Magny Cours processor on their desktop for gaming, they can. There’s no problem, but the laws of economics do not work there, the cost and power consumption will not be feasible for gaming. The Opteron Series is therefore for enterprise markets, based on the demands and needs of data centres and servers.

Q18: What is your dream gaming PC configuration?

A gaming PC, well that’s our strength – graphics are our strength. What I use for gaming in my lab is the AMD Phenom II X4 965 Black Edition which I have overclocked to 4.2GHz from 3.4GHz, and I am really happy with its performance. What else is on the machine is our flagship product, the Evergreen HD 5970, which I have applied in a Crossfire mode and Eyefinity, and the power supply is close to 1500 watts. So, I do have the fastest gaming machine in India to be precise, and I really enjoy it.

Two of these monstrous HD 5970 cards Crossfire on Vamsi Krishna’s killer rig

(with inputs from Vinod Yalburgi)

Abhinav Lal

https://plus.google.com/u/0/118371002657670425415/posts View Full Profile