Qualcomm’s Alex and Kedar Discuss AI Innovations and Snapdragon X Elite in Microsoft Partnership

Updated on 25-Apr-2024

In this interview, we speak with Qualcomm’s Alex Katouzian, Group General Manager, Mobile, Compute, & XR (MCX), Qualcomm and Kedar Kondap, SVP and GM, Compute and Gaming, Qualcomm about their exciting collaboration with Microsoft for the upcoming Windows release. They share insights into how the partnership is driving advancements in AI and the intelligent capabilities of the Snapdragon X Elite.

Digit – Regarding your collaboration with Microsoft regarding the next release of Windows, what excites you the most? Because we know there’s going to be a lot of AI play and there’s this new release that’s also coming out around the same time. Is there something that you’re really excited about? 

Kedar – So you know for us, what we’ve announced with the Snapdragon X Elite, we focused on a couple of different vectors, right? First, we want to make sure that it is the best performing part in the market by far. Second, we make sure that we’re the most efficient platform that you can ever find in this space. And third, the most important vector (that seems to be getting more and more attention) is that it’s the most intelligent processor. So, I think a lot of work that we’re doing is on making sure how we can drive that intelligence factor into something that consumers can understand. And all the work that we’re doing and what we’ve announced even with the specs, like for example, we announced 45 TOPS. And the reason is because we’ve invested in AI for the last 10 years and we want to bring all the work that we’ve done and drive that to the Snapdragon X Elite. So the goal for us in working with Microsoft, is to drive a lot of use cases that will drive that differentiation. I think you’ve seen a lot of what they’ve announced starting (almost I think two years back) with they announced Windows Studio effects where they started to take advantage of a lot of the capabilities that we offer. So it’s not just on the Snapdragon X Elite, but even on the previous generations of the platforms that Qualcomm announced. It started with simple things, right? There’s background blur, bokeh, eye gaze correction, etc. A lot of these features seem natural to a user, especially more so during the last couple years when we all were on video calls. Then there’s the feature which automatically recognizes voices around and suppresses noise. But the point is, with a Qualcomm Snapdragon based platform, we were able to run it on an MPU. What that means is that it translates into substantially lower power because when you’re on a video call, we run in a heterogeneous system so a lot of the use cases that Microsoft has, for example with the Windows Studio effects we run them in a much lower power state. So, there’s a lot of work happening and we’re super excited about the partnership with Microsoft.

Qualcomm Snapdragon X Elite

Digit – Now, quite a lot of features within Windows Studio Effects tend to be ‘good to have‘ rather than being game changers. For example, let’s say if you have a big office and you have a lot of folks working along a particular process line and you have the opportunity to apply automation. Right now, all those automation tools are on the cloud or on a local server, but is there some way or some use case where you see potential for the NPU to accelerate workloads over there? 

Kedar – I think there isn’t a single use case where the NPU will not shine or show its benefits, right? Because AI is going to be pervasive, there is no doubt about it. Whether you look at things like manufacturing, operations, health care, engineering, etc., you name the industry, and it’s going to start to become more prevalent. The point is, with Qualcomm, we’ve always had a very heterogeneous architecture with our platforms. What that means is we have dedicated blocks for a lot of the cores like it’s not just a CPU and GPU type platform. You have a CPU, a GPU, an NPU, you have an audio core, a video core, a camera core and so on. For example, if you’re running a video-related task, we have a dedicated video call where it’s going to do a lot of the offloading. Then we have for example, our audio block has a small little NPU inside that block, and so when you talk about noise suppression and a lot of these use cases, they run on these dedicated NPUs. So, it just depends on the use cases.  

Even at Snapdragon Summit, where we announced the Snapdragon X Elite, we had several ISVs showcasing the benefits of what happens when you port use cases to Snapdragon. One of them, for example, was super resolution. Second was, you know, photo editing and it was shown by a company called Luminar Neo, in partnership with us. If you have a picture that is low in resolution and you want to optimize that picture, it does so in a matter of seconds and a significantly lower power relative to anything that you’d see from competition. 

And so, it’s just the beginning. I think you’ll see almost every single use case getting optimized on the NPU, and obviously what that means is better performance, lower power, and translates into better battery life. 

Digit – And based on the kind of use cases that you see growing, do you feel like you might have to go back to the design table for your next iteration and changes the amount of compute that has been given to, let’s say, part of the SoC that’s handling the audio, or maybe the part that handles floating point or integer instructions because that’s usually a balance that keeps on changing. Or have you had requirements coming from one of your clients who believes that they have a certain amount of laptops that sell for a particularly use case, and that they like a bit more acceleration in that direction? 

Alex: Well, I think what Kedar covered is the basis of the user interface actually changing into the device. As you as you can see, all of the all the functions that that we just talked about are running in the background. Now with LLMs and LVMs, visual models and Gen. AI, the user interface into the device is going to change and so the expectation from Microsoft is they’re going to try to get the user from launching an app or typing or doing a search to actually talk into the machine. And having it do things for you so it can save time. And that whole concept when the new OS comes out is going to be compiled and optimized on top of Snapdragon, to begin with. And it’s going to utilise the entire TOPS performance that we have on the NPU. And not only that, heterogeneously you can run stuff on the GPU or the CPU and you can share among all of these cores and then you get the best performance out of it. Of course, with the next iteration of our devices, we’re going to increase those types of performances because the more hardware acceleration that you make available, the more the software and the OS and the user experience is going to start to use that. And so now models are just on the device actually running in the background, and they’re contextually aware of your screen. You can recall things you can. You can draw things and it’ll generate an image for you and and the reason why the NPU is important is because it goes through those accelerations relatively quickly with a lot lower power and and those functions can now be used over and over and over without draining your battery. So, forget the thin-and-light. All that stuff is table stakes. If you don’t have that, you don’t have a modern PC. But then if you don’t have the AI functions we do, you don’t have a modern AI PC. And so that’s where everyone’s excited about and how you interface to the machine is going to change how it saves time for you is going to change how it takes notes for you or completes your sentences or summarises a meeting or starts a presentation and builds maybe a word file. All those things can get done relatively easily with what we have to offer.

Digit – Now back in October, when you announced the Snapdragon X Elite, the performance stated was 45 tops for the NPU itself. Since there’s going to be this eight-month difference from the time it was announced time it’s being deployed, and considering optimization in this power curve, and all those things. Are you looking at a number higher than 45 by the time the silicon goes into production? 

Alex – This generation is 45. 

Kedar – This generation obviously is 45 tops, but one of the points which you’d hinted at earlier, the work happening on AI is changing drastically. There’s a new model that comes out every day. And the scale that Qualcomm provides is the ability for us to provide sort of think of a horizontal layer where you have all of these different AI models that are rapidly coming out, talk about Lama or any of them that you know. A lot of the work that we do is being able to interface those models into any of the use cases that you have. So the translation work that happens, whether it’s quantization or taking it and translating into a particular use case that we want. So the way we work with these ISP’s is making that translation. So what necessarily happens right now is utilising the heterogeneous architecture more effectively. So take the model, run it effectively. We decide what’s the right place to run it, be it the CPU, GPU or NPU and then we modify it. So that’s the work that you’ll see significant that will happen between now and launch. 

Alex – And memory optimisation. So you want models to actually reside on the device and the beauty of it is if you don’t have any connection to the cloud, you could still run all your Gen AI stuff on the device. And then if you’re connected, what happens is you use a hybrid situation. So, you run some things on the cloud where you run on the processing power here and then the cloud. Also look at it the other way. The cloud will offload to the device so that cost doesn’t run up in your data center to a point where it’s not manageable. Because if the user interface changes and people start using these added functions just like everyone’s expecting to do then the queries going back to the cloud would be so high that you run out of bandwidth and you run out of performance capability. It will take longer time to come back. You want some privacy, you want personalisation, you want immediacy. So you run things on the device. We started with a 1 billion parameter Stable Diffusion Gen AI model. We ran it on the phone. Now you can draw what you want to see. It generates it, but we ran it in 20 seconds before and now it’s running on one second. Less than one second and then you super res (upscale) it to a resolution that you want. Now we have 7 billion parameters models running on the handset PCs running 30 billion parameters. So we can optimise these things down to the memory that’s contained within the device and then start running better and better AI functions. 

Digit – So are you collaborating with platforms like Hugging Face on which a lot of people submit their own models and then you’ll have your official variants as well?  

Alex – We’re collecting models for use cases on our solutions that people can pick and choose from. They can also go to Hugging Face and get those same models from there that have been optimised to our solution. Hugging face, Llama, Baichuan in China, we work with every one. We’re kind of agnostic towards that. Plus also run times such as ONNX, PyTorch, Gemini, etc, we work with all of them. And what we do is kind of route them to our API interfaces that go directly down to our NPU for best acceleration. So we’ve tested so many models and we’re going to continue doing those and make them available for our customers. 

Digit – So is there some Git repositories where developers can go and pick these optimized models. Or do they have to reach out to Qualcomm separately?  

Alex – Both. 

Digit – Regarding developers, there needs to be a significant effort involved in getting more and more developers to sort of build for ARM because that’s a challenge. And if you don’t have native apps then you have to rely on compatibility layers. And those things sometimes it works and sometimes it doesn’t. So, what kind of efforts are you taking to help developers out?  

Kedar – Almost two years back, we, with Microsoft and Satya, announced this development platform, called Volterra, using a Snapdragon based platform that we’ve been handing out to developers for many couple years now. Think of it as a three-step approach, one you want the app to run on the device. That means we need, even if it’s emulated, we make sure that with Microsoft the emulator performance is excellent, so at least it works flawlessy. Second step is to port the app and be native and so we’ve already got a large set of applications that we’ve already publicly talked about. We can get you the list of all these applications that have been ported. Things like Amazon Prime, all the the top hundred 300 apps that you see, a lot of them are already native. And then the third step is where a lot of these apps take advantage of the different courses because that’s where you’ll start to see the benefit of the architecture and. Folks like Adobe we’ve talked about, they they were on stage with us. You have people like Amazon. There are many, many ISVs, hundreds of ISVs, people like camo, people like Beethoven, etc. So lots of these ISVs that have actually gone to step #3 and optimised. So we’ve got hundreds of ISVs now running on our NPU. We have several other initiatives like this. We have our own device cloud setup where we have all of these devices in where ISVs or even developers can remotely log in. We give them full access to SDKs, APIs, all of the documentation, etc. So we feel pretty good about where we are. And from an ecosystem perspective, we’ve been doing a lot of partnership events to make sure that the ecosystem is ready at launch. 

Alex – Microsoft’s been super supportive of us as well, and, you know, making sure backwards compatibility is there. The emulation platform is actually running better than another emulation running on another device. It is actually better than that. So I think we we’re in a pretty good place. Of course, the work will continue with bringing on more and more ISVs and making sure that they could take advantage of what we have making those available to our OEM’s and partners to make that happen. And just continue to work that way. 

Disclaimer: Digit, like all other media houses, gives you links to online stores which contain embedded affiliate information, which allows us to get a tiny percentage of your purchase back from the online store. We urge all our readers to use our Buy button links to make their purchases as a way of supporting our work. If you are a user who already does this, thank you for supporting and keeping unbiased technology journalism alive in India.
Mithun Mohandas

Mithun Mohandas is an Indian technology journalist with 10 years of experience covering consumer technology. He is currently employed at Digit in the capacity of a Managing Editor. Mithun has a background in Computer Engineering and was an active member of the IEEE during his college days. He has a penchant for digging deep into unravelling what makes a device tick. If there's a transistor in it, Mithun's probably going to rip it apart till he finds it. At Digit, he covers processors, graphics cards, storage media, displays and networking devices aside from anything developer related. As an avid PC gamer, he prefers RTS and FPS titles, and can be quite competitive in a race to the finish line. He only gets consoles for the exclusives. He can be seen playing Valorant, World of Tanks, HITMAN and the occasional Age of Empires or being the voice behind hundreds of Digit videos.

Connect On :