NVIDIA DGX Spark
The NVIDIA DGX Spark might look like a normal Ubuntu-based mini-PC but it’s far from it. It looks like a compact desktop, sits quietly on a desk, plugs into a monitor and can technically be used like a small workstation. But treating it as just another mini PC would miss the point. DGX Spark is closer to a local AI lab in a small metal box, built for developers and researchers who want to run large models locally without immediately turning to the cloud.
That makes it an unusual product to review. A typical desktop review would begin with CPU performance, GPU performance, thermals, noise and pricing. DGX Spark needs all of that, but it also needs a different question: what can a capable user actually build with it?
The unit tested here was the NVIDIA Founder’s Edition, loaned to us by NVIDIA. In India, it retails for Rs. 5,04,990, which immediately places it outside the realm of casual experimentation for most consumers. Yet, in AI hardware terms, the price is not as absurd as it first looks. The reason is memory. There are plenty of faster GPUs, and there are cheaper desktops, but cheap consumer GPUs do not offer 128 GB of unified memory for local AI workloads. Trying to build a multi-GPU desktop with that kind of effective capacity would quickly become more expensive, more power-hungry and more cumbersome.
That is the interesting contradiction at the centre of DGX Spark. It is expensive for a desktop PC, but not outrageous for a local AI development box. It feels finished as hardware, but much of the software journey still has a developer-kit flavour. It is not something most people should buy, but it is one of the clearer previews yet of where local AI computing is headed.
DGX Spark is built around NVIDIA’s GB10 Grace Blackwell superchip. It combines a 20-core Arm CPU with a Blackwell GPU architecture, fifth-generation Tensor Cores and 128 GB of coherent unified memory. NVIDIA positions it as a personal AI supercomputer capable of up to 1 PFLOP of FP4 AI performance, with support for AI models up to 200 billion parameters on a single unit.
The headline specification here is not just the compute number, though. It is the memory architecture. DGX Spark’s 128 GB coherent unified system memory gives the CPU and GPU access to a common memory pool. This is one of the keymetrics for hosting large language models because the limiting factor on a consumer desktop is often not raw shader performance, but whether the model and context can fit cleanly into memory without awkward compromises.
| Specification | NVIDIA DGX Spark Founder’s Edition |
| Processor | NVIDIA GB10 Grace Blackwell superchip |
| CPU | 20-core Arm CPU |
| AI performance | Up to 1 PFLOP FP4 |
| Memory | 128 GB coherent unified system memory |
| Memory bandwidth | 273 GB/s |
| Storage | Up to 4 TB |
| Networking | ConnectX-7, 10 GbE, Wi-Fi 7 |
| Expansion and I/O | Four USB Type-C ports, HDMI, Ethernet, ConnectX-7 port |
| Dimensions | 150 x 150 x 50.5 mm |
| India price | Rs. 5,04,990 |
There are several similarly capable GB10 units available in the market. The ASUS Ascent GX10, GIGABYTE AI TOP ATOM, Lenovo ThinkStation PGX and Dell Pro Max(GB10) are some of the units selling in India, although some of these are priced much higher for no compute advantage over the vanilla DGX Spark. The I/O on the back can seem a bit minimalist. There is a power button, four USB Type-C ports, HDMI, Ethernet and the ConnectX-7 port, which allows DGX Spark units to be connected together. The first Type-C port is used for power input, so the usable port count will depend on how the system is set up.
For regular desktop use, DGX Spark has enough connectivity. For its actual role, the ConnectX-7 port is the more interesting bit. DGX Spark is not only meant to be used as a single compact AI box, but also as part of a small local AI setup where four DGX units can be linked for larger workloads. Each ConnectX-7 port is capable of up to 400 Gb/s bandwidth.
DGX Spark has the kind of physical finish one expects from NVIDIA’s Founder’s Edition products. The chassis feels dense, premium and tightly engineered. It does not have the parts-bin feel of a typical small-form-factor desktop. It looks and feels like a hefty purpose-built device.
We took the unit apart partially. The bottom rubberised plate is held in place by strong magnets. Removing it exposes the Wi-Fi antennae. Beneath that, screws hold the bottom metal plate in place. Once removed, the internal structure reveals a large metal shield with a cutout for SSD access.
This is also where the limits of user serviceability become clear. The SSD is accessible, so storage is the one practical upgrade or service point. Beyond that, the system becomes much less inviting. The metal shield, PCB and heatsink are tightly integrated, and going further would mean disturbing thermal pads and risking the carefully assembled cooling interface. So we didn’t go for a rull teardown, we stopped at the heatsink-visible stage. A deeper teardown would have offered better photos, but also increased the chance of affecting thermals.
Suffice to say, this is not a tinker-friendly desktop built around modular consumer parts. If that’s what you want, buy a couple of GPUs and throw them into a server. The DGX Spark is a tightly packed AI appliance with one user-accessible component. Pulling out the whole assembly after removing the bottom plate felt risky, and that is usually the point at which most users should stop.
Thermal behaviour was impressive. Under load, the top surface reached around 41.4 degrees Celsius, while the rear exhaust area climbed to around 52.6 degrees Celsius. Peak wall power draw was around 220 W. More importantly, noise was negligible. DGX Spark remained very quiet even when running heavy workloads, which is a significant advantage over many high-end desktop AI setups. We’ve set up desktop PCs with four graphics cards running at full load and we’ve been on the business side of a GPU render farm inside a data centre, and the noise in either case is ridiculous. So or a device intended to live on a desk and run long local experiments, the DGX Spark is very well-designed.
If you start reading up about the DGX Spark, you’ll often come across comparisons against the Apple Mac Mini which offers a significant advantage over typical PC desktops thanks to its unified memory. So for our testing, we decided to sneak away one of the Mac Studio machines that our designers use. In essence, the DGX Spark was compared against a Mac Studio with the M2 Ultra and 64 GB of unified memory. This was the largest capacity Mac Studio with the latest Apple silicon that we could find. We’re well aware of the M3 Ultra based machines available in the market but Apple rarely sends those out to us. That being said, the 64 GB configuration also introduces a hard practical ceiling that becomes very relevant once 70B-class models enter the picture.
At a glance, the M2 Ultra has a significant memory bandwidth advantage. Its unified memory bandwidth is rated at 800 GB/s, while the DGX Spark’s 128 GB coherent unified memory runs at 273 GB/s. For LLM decode performance, memory bandwidth matters a lot because token generation is largely bound by how quickly model weights can be streamed through memory. A 70B model quantised to Q4_K_M occupies roughly 42.5 GB, which means the M2 Ultra’s theoretical ceiling works out to around 18.8 tokens per second.
In real-world usage, local inference runtimes rarely hit the full theoretical limit. Framework overhead, scheduling, memory access patterns and runtime behaviour usually bring sustained performance down to around 70 to 80 percent of that ceiling. That places the M2 Ultra 64 GB configuration in the region of 13 to 15 tokens per second for a 70B Q4 model, which is about what we experienced.
The DGX Spark, in contrast, delivered 37.76 tokens per second in the GPT-OSS 120B decode benchmark. Remember, this comes from a machine with lower raw memory bandwidth than the M2 Ultra, but with a platform built specifically around NVIDIA’s AI software stack and 128 GB of coherent unified memory. It is not simply a case of comparing bandwidth figures in isolation. DGX Spark has more breathing room for larger models, larger contexts and NVIDIA-accelerated workflows.
| System | Unified memory | Memory bandwidth | Decode performance |
| NVIDIA DGX Spark | 128 GB | 273 GB/s | 37.76 tok/s in GPT-OSS 120B |
| Apple Mac Studio M2 Ultra | 64 GB | 800 GB/s | Around 13 to 15 tok/s for 70B Q4-class models |
On a 64 GB machine, macOS does not make the full memory pool available to a single local inference process. In practice, the usable allocation for a large model often sits around the 48 GB to 51 GB range, leaving a 70B Q4 model very close to the system’s comfortable operating limit. That affects how much context can be used, how much room remains for the runtime, and how reliably larger models can be pushed without compromise.
Turning the tables, prompt processing behaves differently. Unlike decode, which is heavily memory-bandwidth bound, prefill is more compute-sensitive. The M2 Ultra can process prompts at a much higher rate than it generates tokens, with 70B-class workloads typically sitting around the 220 tokens per second region and smaller 32B-class models moving closer to 410 tokens per second in favourable conditions. That means the Mac Studio can still feel responsive when ingesting a prompt, even though sustained generation on large models is much slower than the DGX Spark result seen here.
The Mac Studio M2 Ultra remains a strong local AI machine for developers already inside the Apple ecosystem, especially for models that fit comfortably within the available memory. It is quiet, efficient and supported by a growing local inference community. But the 64 GB SKU is also operating near its practical ceiling with 70B Q4 models.
DGX Spark feels more purpose-built for this class of work. Its 128 GB memory pool gives it more room for larger models and heavier local AI experiments, while the NVIDIA stack makes it better suited to workflows involving Riva, NIMs, TensorRT, local LLMs and multi-stage AI pipelines. At the eend of the day, your experience is not dependant only on the tokens per second metric. It is also about how much model headroom the system gives developers before they run into the wall.
Neither system showed signs of throttling during repeated runs. DGX Spark’s sustained behaviour remained stable, and the system stayed surprisingly quiet under load. That matters because long local AI sessions are not like short synthetic benchmarks. Developers may leave models running, iterate through workflows, test agents, or repeatedly restart containers and services. In that kind of usage, stability and thermals are just as important as a peak number.
The Mac Studio M2 Ultra still makes sense as a general-purpose creative and local AI machine. But DGX Spark is more focused. It is not trying to be the better all-round desktop. It is trying to be a compact NVIDIA-native AI development system with enough memory to make serious local experimentation practical. On that front, it makes a much stronger case than the raw bandwidth numbers might suggest.
Up until now, the DGX Spark’s hardware feels polished but the software journey is what matters more. This is NVIDIA’S home ground, so we’re expecting a very frictionless experience. However, having worked with Ubuntu since it’s early days, we know all too well that a developer’s journey is anything but straightforward. NVIDIA knows that as well. So to make things easier, NVIDIA has been publishing playbooks and recipes through its Build site, and those are genuinely useful. They make the system feel less like a bare metal science project and more like a guided development platform. But like we said, this is still Linux. Anyone who has spent time getting AI toolchains running locally will recognise the rhythm: install, configure, wait, troubleshoot, fix a dependency, retry a download, point one service to another, then discover that something has changed upstream.
That does not mean the experience is bad. It means the user needs the right temperament. DGX Spark is not a one-click AI appliance for people who simply want a polished assistant. It rewards patience, and it assumes at least some comfort with Linux, containers, models, endpoints and local service orchestration.
In our experience, the system itself was stable. There were no random crashes or unexplained hardware issues. The friction came from the process of setting up tools, downloading models and making different components talk to each other.
One of the more interesting experiments we we running involved setting up NemoClaw as a local assistant. NVIDIA’s Build recipes made the basic setup relatively approachable, especially for a developer or enthusiast willing to follow instructions. The recipe or Playbook that we used was called “Run NemoClaw with a local LLM” was helpful in getting started. If you’re reading this after 2026, there’s a good possibility that the Playbook might have been updated or renamed to keep up with ongoing developments.
While running the command was simpler that buttering bread, the problem was not the initial setup. The problem was everything around model acquisition and reliability.
The LLM model download was routed through official sources, and the servers were painfully inconsistent. A Qwen 35B-class model would begin downloading at excellent speeds, then slow to 2-7 KB/s halfway through. Since the final file was around 23 GB, getting stuck after 11 GB meant a ridiculous time sink. Download failures made it worse, as the process often had to be reinitialised. Ollama does allow you to continue paused or failed downloads but that didn’t work here.
The eventual workaround was more manual: use Ollama to download the model directly, keep an eye on the process, then point NemoClaw to the locally installed model. Once that was done, things moved along.
As an assistant, NemoClaw could go through the material that was shared with it, but the sandboxing created limits. When asked to write a DGX Spark review as a fun experiment, the result was not good enough. The agent could not research freely, and the output felt closer to the early days of ChatGPT, appearing slowly, almost word by word. Some of that was down to token generation speed and the model being used, but the larger issue was that a sandboxed local agent is only as useful as its access, tooling and workflow design allow it to be.
This made the experiment more valuable, not less. It showed what DGX Spark can enable, but it also showed that “local AI agent” is not magic. The hardware can run the workload. The developer still has to design the workflow, solve the plumbing and decide what level of access the agent should have.
That is probably the right way to understand DGX Spark. It’s not a magical gizmo that removes the hard parts of building AI tools. Rather, it gives developers a local machine powerful enough to confront those hard parts with enough compute power.
Obviously, we wanted to see if the download issue was persistent. So we reset the DGX Spark back to factory conditions and ran the playbook again. This time around, while the download behaviour was similarly fast for the first half and came down to a crawl during the second half of the download, we didn’t see a download failure. And the playbook ran perfectly. If this behaviour feels a little mercurial, then this is how working with a constantly evolving development tech stack feels like. This is completely normal and very unlike running typical Windows software which functions the same way for years on end.
The second experiment was more ambitious, it involved setting up a real-time, low-latency audio intelligence pipeline. The intended version was a high-fidelity digital human or interactive spatial avatar on screen, something that could be interrupted mid-sentence and respond like a natural phone call. The planned stack combined NVIDIA Riva for local accelerated speech recognition and text-to-speech, a quantised large language model running locally on Grace Blackwell, and NVIDIA Omniverse Audio2Face for animating a 3D mesh using the generated speech.
This Iron-Man-esque Jarvis-like implementation is exactly the kind of workload that makes DGX Spark interesting. It combines multiple neural pipelines into a single loop: audio to text, text to reasoning, reasoning to speech, and speech to facial animation. This is not a simple chatbot. It is a multi-stage local AI system where latency, memory bandwidth and software orchestration all matter.
The full avatar system did not come together because the Audio2Face endpoint was deprecated from the NIM being used. As a result, the final polished version with a spatial avatar was not completed. What did work was the core audio loop: speech was captured using a USB headset microphone, converted to text, passed to a local LLM, then converted back into speech.
Was this a slightly convoluted way to reproduce things that ready-made solutions already do? Yes. But that was the point. This is what makes the DGX Spark very interesting because it lets a developer build the pieces of a voice assistant locally, see how the parts fit together, and understand where the friction lives. Or for that matter, any other application involving an AI-pipeline.
Qwen 35B worked for the pipeline, though it came with a noticeable memory footprint. We didn’t exactly measure the end-to-end latency since we were so engrossed with getting the entire pipeline to work properly, and barge-in was not implemented during testing so we had to wait for the app to stop speaking before we could give it another audio prompt. So, that meant that this experiment should not be presented as a finished real-time avatar demo. Nevertheless, it was great fun working towards a working prototype. NVIDIA has released Audio2Face as open source, so with a few more days, we could have got the entire app working as initially planned. It was very satisfying to put together a local AI pipeline, watching it fail, fixing it, then watching it finally respond. But even in its unfinished state, the experiment made DGX Spark’s purpose clearer than a benchmark chart ever could.
The biggest strength of DGX Spark is that it puts a serious local AI environment into a compact, quiet and power-efficient form factor. It does not need a noisy tower, multiple large GPUs or a dedicated lab corner. It sits on a desk and behaves like a polished piece of hardware. The 128 GB unified memory pool is the real attraction. For developers experimenting with larger models, memory capacity often becomes the wall. DGX Spark gives them enough room to work with models that would be impractical on ordinary consumer desktops.
The NVIDIA ecosystem is another advantage. Riva, NIMs, CUDA, TensorRT, NeMo-related workflows and Omniverse components all make more sense on a machine like this than on a random collection of parts. The system’s silence also deserves credit. Running local AI workloads on high-end desktop hardware often means accepting fan noise, heat and power draw as part of the deal. DGX Spark remained quiet, with manageable surface and exhaust temperatures. For long experimentation sessions, that makes a difference.
The biggest limitation is the typical friction involved in developing applications. If you’ve never developed apps before and believe that the DGX Spark is going to turn you into a world-class software developer, then there’s something to be said about having delusions of grandeur. You should spend some time learning coding and building a few apps to get your way around APIs, frameworks and the typical tools. Especially considering that the DGX Spark is expensive at Rs. 5,04,990, and that immediately narrows its audience. For simple experimentation, many developers may prefer a cheaper machine, cloud credits or a Mac mini-class system, accepting slower performance in exchange for a much lower entry cost. At the same time, DGX Spark’s value proposition becomes stronger when one compares it with a multi-GPU desktop built for large local models. A setup with several RTX 4090 or RTX 5090-class cards would cost far more, consume more power and be harder to manage.
The second limitation is Linux comfort. This is not a system for users who expect consumer app polish. NVIDIA’s playbooks are helpful, but users still need patience. Downloads can fail. Model setup can be messy. Sandboxed agents have practical limits. Deprecated endpoints can break a planned workflow. None of this is unusual in local AI development, but it does make DGX Spark unsuitable for anyone expecting a turnkey assistant box.
The third limitation is audience clarity. DGX Spark is not a gaming PC, not a mainstream creator desktop and not a workstation replacement for everyone. It is a developer platform that happens to be extremely polished as hardware. That distinction must be clear before anyone looks at the price.
Ahh, the most important question! NVIDIA’s DGX Spark makes the most sense for AI developers, researchers, data scientists, robotics teams, university labs, startup engineering teams and serious enthusiasts who want to experiment with local AI models without constantly pushing workloads to the cloud. It also makes sense for teams that care about privacy, repeatability and cost control. Cloud experimentation is convenient until the bills begin adding up. A local machine has a high upfront cost, but for certain workflows, especially repeated testing and prototyping, it can be more predictable.
It is also useful for developers who specifically want to build on NVIDIA’s local AI stack. If the goal is to test Riva, NIMs, local LLMs, multimodal workflows or AI agents in a compact environment, DGX Spark is far more relevant than a generic desktop. It is not for people who do not know Linux and do not want to learn. It is not for someone who wants ChatGPT in a box. It is not for a creator who only needs video editing performance. It is not for a gamer. And it is not for anyone who expects local AI development to be a one-step process. DGX Spark is best described as a very capable developer platform and a preview of the next AI PC era.
Recently, NVIDIA unveiled the RTX Spark at COMPUTEX 2026. RTX Spark is aimed at slim Windows laptops and compact desktop PCs, and NVIDIA is positioning it as a platform for personal AI agents. It brings together a lot of NVIDIA technologies in a more mainstream PC context. That changes how DGX Spark should be viewed. It is no longer just an expensive local AI box for a niche audience. It now looks like the early, specialist form of an idea NVIDIA wants to push into the broader PC market.
DGX Spark is the dev kit version of the future. RTX Spark could be the consumer and creator version. But we could be wrong. We’d prefer to wait until the RTX Spark launches in India before figuring out where the two offerings end up distinguishing themselves. But this does not make DGX Spark obsolete. In fact, it makes the system more important as a reference point. DGX Spark shows what local AI development looks like when the hardware is already capable, but the tools still require patience. RTX Spark may eventually take some of that capability into Windows PCs with better app integration, better user-facing workflows and broader OEM support. Also, the RTX Spark allows for customisation of the hardware, at least in terms of memory. So you might end up with 32 or 64 GB of memory with RTX Spark machines. And that would end up presenting a few limitations to your development journey with larger models. Between the lines lies the larger conversation. A chunk of your AI tools are moving from cloud-dependent demos to real machines that can run meaningful workloads on the desk. So you can think of the DGX Spark as a very clear signpost.
NVIDIA DGX Spark is an impressive machine, but it is not a simple one. As hardware, it is compact, quiet, premium and surprisingly practical. The 128 GB unified memory pool gives it a meaningful role in local AI development, and the Grace Blackwell platform makes it much more than a small desktop with a fast chip. From a user experience perspective, it is still very much a developer’s machine. The system itself is stable, but the journey around it involves Linux troubleshooting, large model downloads, sandbox limitations, shifting endpoints and the usual rough edges of fast-moving AI tooling. NVIDIA’s playbooks help, but they do not turn DGX Spark into a consumer appliance.
At Rs. 5,04,990, DGX Spark is not meant for everyone. It is for developers, labs, researchers and enthusiasts who understand what they are buying and why local AI matters to their workflow. In a way, while the DGX Spark shows the future, I personally feel that the RTX Spark may eventually democratise it.