By Z. Aw | Published 3 May 2026

Strix Halo, DGX Spark, M3 Ultra: choosing a local-AI workstation in 2026

Three classes of desktop machine now dominate the local-AI conversation: AMD's Ryzen AI Max+ 395 (codename Strix Halo), NVIDIA's DGX Spark, and Apple's M3 Ultra Mac Studio. Each holds at least 96GB of unified memory, each runs production-class models on a single box, and each comes with a different idea about who you are and what you are trying to do.

We run Strix Halo daily as the inference backbone for our news pipeline, our sector advisor agents, and Lyra. We have not put a DGX Spark or M3 Ultra into production for ourselves, but we have evaluated both for client work and read every credible benchmark we could find. What follows is the practitioner's view: where each box wins, where each box loses, and what is coming behind them in 2026 and 2027.

The three platforms, side by side

By the numbers (May 2026)

Strix Halo: 128GB at 256 GB/s, ~$2,300. DGX Spark: 128GB at 273 GB/s, $4,699. M3 Ultra Mac Studio: 192-512GB at 819 GB/s, $4,000-9,500.

AMD Ryzen AI Max+ 395 (Strix Halo). A Zen 5 CPU bonded to a Radeon 8060S iGPU and a 50 TOPS XDNA NPU, sharing 128GB of LPDDR5X-8000 unified memory. The Framework Desktop, the GMKtec EVO-X2, the Beelink GTR9, the HP ZBook Ultra G1a all carry the same chip in different chassis. Linux is the strongest software story; Windows works but ROCm support there is still catching up. List prices in Singapore start around SGD 2,200-2,500 for a 128GB mini-PC and roughly SGD 3,500 for the Framework Desktop.

Framework Desktop mini-PC on a wooden home-office desk, showing the silver chassis with mesh ventilation grille and modular front-panel I/O cards. — Framework Desktop — illustrative product render (real press images at frame.work/desktop).

NVIDIA DGX Spark. A GB10 Grace Blackwell APU with 128GB of LPDDR5X at 273 GB/s, 1 PFLOPS sparse-FP4 throughput, and the full CUDA-X stack pre-baked into DGX OS. NVIDIA raised the Founders Edition MSRP from $3,999 to $4,699 on 27 February 2026, citing memory supply constraints. ASUS, Lenovo, and HPE all sell badged variants of the same reference design. The selling proposition is simple: this is what you put on a desk if your team's muscle memory is CUDA, NCCL, and the nvidia/ container catalog.

NVIDIA DGX Spark — illustrative product render (real press images at nvidia.com).

Apple M3 Ultra Mac Studio. Two M3 Max dies fused over UltraFusion, with a 32-core Neural Engine and unified memory configurations from 96GB up to a singular 512GB option. The 819 GB/s memory bandwidth is the headline number; nothing in the desktop class touches it. MLX is the native runtime; llama.cpp and PyTorch (via MPS) work but with a quality-of-life tax. The base 96GB Studio starts around USD 4,000 and a fully-specced 512GB build crosses USD 9,500.

Mac Studio on display in an Apple Store. Photo: Apple Store visitor, CC BY-SA via Wikimedia Commons.

How they actually differ in production

The spec sheets disagree on how to score these machines because the scoring depends on workload shape. Three lenses make the differences readable.

Memory bandwidth and the model size you can run

Token generation on transformer models is bandwidth-bound past a certain model size, not compute-bound. This is the single most important number to internalise.

The M3 Ultra at 819 GB/s gives you roughly three times the steady-state token throughput of either Strix Halo or DGX Spark on the same model. On a 70B-class dense model in BF16, that translates to the difference between a usable interactive coding assistant and an experience that feels like waiting for fax. If your workflow is "I want to chat with a 70B model and feel responsive", the M3 Ultra is the answer that arrives without an asterisk.

Strix Halo and DGX Spark sit in the same bandwidth band (within 7% of each other). Both will run Qwen3.6-35B-A3B at roughly 40-45 tokens per second of steady text generation. Both choke on dense 70B at BF16; you would push down to 4-bit quantisation and accept a different speed-quality curve.

Compute throughput and image-class workloads

For everything that is not text token generation — image and video generation, vision-language inference, dense matrix workloads — compute throughput matters again, and the picture inverts.

The DGX Spark pushes around 120 TFLOPS at BF16 on tested image-generation workloads, generating FLUX.1 Dev images roughly 2.5 times faster than the Strix Halo, which sits closer to 46 TFLOPS in the same test. NVIDIA's GB10 silicon is engineered for this; its sparsity-aware Tensor cores do real work on workloads that match the data shape. Strix Halo can do these workloads, just not with the same wallclock.

The M3 Ultra falls between the two on raw compute, but its Neural Engine path delivers very competitive image-generation throughput once the workload is ported to MLX or Core ML. The catch is the porting effort. If your image pipeline is built around ComfyUI on a CUDA stack, the Mac is a rewrite.

An interesting outlier: in double-precision HPC workloads (the kind a research lab actually cares about), Strix Halo posts roughly 1.6 TFLOPS FP64 against the Spark's 0.7 TFLOPS. AMD's chip retains broader IEEE precision support; NVIDIA's GB10 is heavily slanted to AI-shape arithmetic.

The software story is not optional

This is where most teams underweight the decision.

The DGX Spark gives you CUDA, cuDNN, NCCL, TensorRT-LLM, the nvidia/ NIM containers, and DGX OS pre-tuned for those layers. Every recent paper's reference implementation has CUDA paths that build out-of-the-box. If your team has trained on NVIDIA hardware for the last decade, the Spark is the lowest-friction port of that experience to a desk.

Strix Halo gives you ROCm 7.x, Vulkan, llama.cpp Vulkan/HIP, PyTorch ROCm, and an active community of kyuz0, hec-ovi, IgnatBeresnev, and others maintaining toolboxes for the gfx1151 GPU target. Real workloads we have run land on production reliability after a weekend of configuration; the journey to that reliability is more documented today than it was six months ago, but it is still a journey. Some workloads (vLLM with bleeding-edge model classes is the example from our last month) are not yet stable on gfx1151 — they will be, but on a timeline you cannot dictate.

The M3 Ultra gives you Apple Silicon, MLX, Metal Performance Shaders, and the warmest single-vendor support story of the three. It also requires you to live entirely inside Apple's view of the world: no NVIDIA containers, no ROCm, and a constant low-grade compatibility tax on any open-source project that is not first-class on macOS.

The family — what you are actually buying into

None of these are a single SKU. Each is the leading product in a family with surrounding boxes that solve adjacent problems.

The AMD Strix Halo family

Strix Halo itself is the Ryzen AI Max+ 395 (16 cores, 128GB memory limit, full Radeon 8060S graphics) and the Ryzen AI Max 390 / 385 (cut-down core counts and memory ceilings). Around it sits a wider Ryzen AI line:

Ryzen AI Max 400 series, codename Gorgon Halo, expected to refresh the high-end mini-PC tier later in 2026 with Zen 5 + RDNA 3.5 stepping improvements.
Strix Point and Krackan Point on the laptop side, for thinner machines that want NPU-class AI but not the full 128GB memory budget.
The Ryzen AI 9 HX 370 series, the one tier down, which we see in many SG-import mini-PCs at the SGD 1,500-2,000 mark.

The boxes that ship Strix Halo today: Framework Desktop, GMKtec EVO-X2, Beelink GTR9, HP ZBook Ultra G1a, ASUS ROG Flow Z13 (the only laptop carrier so far), and a growing list of Chinese mini-PC variants. We have used the Framework and the EVO-X2 in client deployments and would order either again. Framework's repairability story is genuine; the EVO-X2 is cheaper for the same silicon if you do not care about modular RAM you cannot upgrade anyway.

The NVIDIA DGX family

The Spark sits at the bottom of the DGX desktop line. Above it, in ascending order:

DGX Station (announced; positioned as the mid-tier desk box with greater memory and thermal headroom — shipping later in 2026).
DGX H200 / DGX B200 — the rack systems, $300K+ class, with 8 H200 or B200 cards.
DGX SuperPOD — the multi-rack reference architectures.

The Spark is also available rebadged as the ASUS Ascent GX10, the Lenovo ThinkStation PGX, and HPE's developer-class equivalent. The hardware reference is identical. The reason to pick a partner SKU over the Founders Edition is usually local enterprise support contracts, not silicon.

If you outgrow the Spark, NVIDIA's promised on-ramp is the DGX Station, then the rack DGX boxes — but the practical reality for most teams is that the cloud cluster equivalents (H100/H200 instances on AWS, GCP, Lambda) are cheaper and faster to scale into than buying the next box up. The Spark is best understood as a developer endpoint, not the bottom rung of a hardware ladder.

The Apple Silicon family

The M3 Ultra Mac Studio leads the desktop side. Around it:

Mac Studio M3 Max (96GB max, half the bandwidth of the Ultra, two-thirds the price).
Mac Pro M3 Ultra (the same silicon as the Studio in a tower with PCIe slots — niche, mostly relevant for media-production buyers).
The MacBook Pro M4 Max (up to 128GB unified memory; meaningfully slower memory bandwidth than the desktop Ultras, but mobile).

M4 Max already ships in MacBook Pros. M4 Ultra has been widely expected to land in a refreshed Mac Studio at WWDC 2026 or shortly after; if it follows the M3 Ultra's playbook, the bandwidth ceiling moves from 819 GB/s closer to a terabyte per second. We would not buy the M3 Ultra Mac Studio at full price right now if you can wait two months for the M4 Ultra disclosure. We would buy a refurbished M3 Ultra at a meaningful discount today.

What is coming next, and what to wait for

If you have a 6-12 month buying horizon, three roadmaps are worth tracking.

AMD: Gorgon Halo (2026), Medusa Halo (2027)

Gorgon Halo (Ryzen AI Max 400 series) is positioned as a 2026 refresh of the same silicon family, expected on Zen 5 + RDNA 3.5 with bandwidth and clock improvements. It will not change the architectural picture meaningfully. If you are buying Strix Halo today, Gorgon Halo will not embarrass that decision in 6 months.

Medusa Halo (Ryzen AI Max 500 series) is the real next-generation step, expected 2027-2028. It is rumoured to land Zen 6 cores, an RDNA 5 GPU, and — most importantly — LPDDR6 memory, which would lift bandwidth from today's roughly 256 GB/s to something approaching 460 GB/s. That is roughly 80% more memory bandwidth, which on token-bound workloads translates almost directly into 80% more tokens per second. Medusa Halo would close most of the bandwidth gap to today's M3 Ultra at a far lower price point.

NVIDIA: DGX Spark refresh, Vera Rubin desktop

The Spark's silicon is GB10, derived from the Grace Blackwell line. The successor architecture is Vera Rubin, with first products expected to ship into the data centre across 2026 and into 2027. NVIDIA has not committed to a desktop Vera Rubin product, but a refresh of the Spark to Rubin silicon in late 2027 is the path the roadmaps point to.

Closer in: DGX Station, due later in 2026, sits one tier above the Spark with more memory headroom and a higher TDP envelope. If you are evaluating a Spark and the workload is genuinely 96GB-class today, the Station is the box to keep an eye on.

Apple: M4 Ultra and M5 generation

M4 Ultra is the immediate next step, plausibly arriving in a Mac Studio refresh later in 2026. M5 generation has been previewed in the laptop line and will arrive in desktops a cycle later. Apple's bandwidth advantage is structurally sticky — UMA on packaged HBM-class memory is what they do best — and we would expect their lead on token-throughput-per-dollar at the high end to persist through 2027.

How to choose, in three sentences

If your work is NVIDIA-shaped: training small models, running CUDA-native research code, deploying NIM containers — buy the DGX Spark, accept the price, and budget the time savings against a year of CUDA-versus-ROCm friction.

If your work is bandwidth-bound and you live happily inside Apple's stack: buy a refurbished M3 Ultra now or wait two months for the M4 Ultra. Bandwidth will continue to be where Apple wins.

If your work is local LLM inference at the 30-50B class with predictable throughput, image generation that does not need the absolute fastest box, and a budget that wants to deploy something this quarter: Strix Halo is the answer. We have shipped real client workloads on it. The community is healthy. The price/performance is unbeatable for the workload class.

The decision SMEs in Singapore actually face

The cost picture for SG businesses is sharper than the global one. Cloud GPU spend in Singapore carries roughly the same per-hour rate as the US or EU, but local revenue per AI workload sits lower in many sectors. The gap between cloud-cost-curve and SME-revenue-curve is what kills sustained AI deployment in this market — PwC Singapore's commentary on Budget 2026 names it directly: SMEs explored AI tools, then "recurring operating costs such as the cost of tokens or licenses to use cloud and AI services soon outweighed perceived benefits."

A SGD 3,500 Strix Halo Framework Desktop sitting on a desk in Tuas or Tai Seng turns the recurring-cost shape into a one-time capital cost plus a small electricity line. For the workload classes we routinely see in our SME advisory — document classification, supplier extraction, internal search, content generation, sector triage — the local box is the right shape of answer.

The DGX Spark and M3 Ultra are excellent machines for teams that already have the infrastructure and software muscle to extract their advantages. For an SME making its first serious AI hardware decision, the Strix Halo class buys you time and certainty. The other two buy you peak performance you may not be set up to exploit.

What a single Strix Halo box is actually capable of

To give a sense of the workload density a single 128GB Ryzen AI Max+ 395 supports — based on what we've tested and put into production over the last six months — here's the practical capability envelope:

Text inference at the 30-50B class: large mixture-of-experts models running responsively for chat, drafting, classification, structured extraction, and agentic tool use.
Vision-language inference: 30B-class VLMs reading photos, screenshots, and document scans for layout, OCR, and structured answer extraction.
Layout-aware OCR for messy real-world documents: phone-photo invoices, delivery orders, mill certificates, multi-page scans, and mixed-language letterheads — a class of input where cloud OCR APIs are pricey per page and most of the heavy lifting happens locally.
Image generation: Flux, SDXL, and similar diffusion models for marketing, mock-ups, and avatar work, running through a ComfyUI-style graph environment.
Smaller embeddings, classifiers, and re-rankers: as utility layers for search, deduplication, and routing — the unglamorous workloads that tend to be the highest-volume in a real deployment.
Multiple of the above, simultaneously: a 128GB unified memory pool comfortably fits a 35B language model, a 32B vision model, an OCR layer, and headroom for diffusion work — all live at the same time, served behind ordinary HTTP endpoints to whichever tools your team uses.

That last point is the one most teams underweight. The interesting workloads in an SME aren't single-model; they're compositions. A single supplier-extraction flow might touch a layout OCR model, an embedding model for matching, a 35B LLM for reasoning, and a classifier for routing — all in the same request. On a cloud-API stack each of those is a separate bill and a separate latency tail. On one Strix Halo box they're four function calls inside the same 128GB pool with no network between them.

We are not arguing this is the right answer for every business. We are arguing that "buy the cheapest cloud GPU" and "buy the most expensive local box" are both common mistakes that the framing above can save you from. For the workload classes most SG SMEs we advise are actually trying to ship — document understanding, internal search, content generation, sector triage, customer-conversation analysis — a single Strix Halo box is closer to overkill than undersized, and it's the closest thing in the desktop class to a real "set it once and forget" cost line.

Where to buy in Singapore

AMD Strix Halo machines

Framework Desktop (Ryzen AI Max+ 395, 128GB): direct from frame.work/desktop. Ships globally; Singapore pricing typically lands around SGD 3,500-3,800 with import duty. Modular and repairable.
GMKtec EVO-X2: direct from gmktec.com EVO-X2 page, or via Shopee SG / Lazada SG. The cheapest 128GB Strix Halo on the market — typically SGD 2,300-2,500 from local SG marketplaces.
Beelink GTR9 Pro: direct from bee-link.com GTR9 Pro, or on Shopee SG / Amazon SG.
BOSGAME M5 (Ryzen AI Max+ 395, 96GB or 128GB + 2TB): direct from bosgame.com BOSGAME M5, or on Shopee SG. Often the price-leader for the spec.
Marketplace search (any Strix Halo): Shopee SG: Ryzen AI Max+ 395 · Lazada SG: Ryzen AI Max 395 · Amazon SG: Ryzen AI Max+ 395

NVIDIA DGX Spark

NVIDIA Founders Edition (USD 4,699 since Feb 2026): nvidia.com/en-sg/dgx-spark
ASUS Ascent GX10 (badged DGX Spark, available on SG marketplaces): Shopee SG · Lazada SG · Amazon SG.
Lenovo ThinkStation PGX (badged DGX Spark with Lenovo support contracts): lenovo.com SG ThinkStation PGX
HPE developer-class equivalents are sold via business channels in SG; ask your enterprise reseller. Same silicon as the Founders Edition.

Apple M3 Ultra Mac Studio (and refurbished alternatives)

Apple Singapore Store: apple.com/sg Mac Studio
Marketplace listings (often discounted): Shopee SG: Mac Studio M3 Ultra · Lazada SG: Mac Studio M3 Ultra · Amazon SG: Mac Studio M3 Ultra
Best Denki and Challenger often carry the Mac Studio in SG retail; the M4 Ultra disclosure expected mid-2026 may push retailers to discount the M3 Ultra noticeably.

Editorial transparency

None of the links in this post are affiliate links. We have no commercial relationship with AMD, NVIDIA, Apple, Framework, GMKtec, Beelink, BOSGAME, Lenovo, or any of the marketplace vendors listed above. The Strix Halo recommendation is the one we'd give whether or not we earned anything from the click — we run the machine ourselves, and the analysis is grounded in months of operating it as a production endpoint for our own clients.

Frequently asked

Strix Halo, DGX Spark, or M3 Ultra for a Singapore SME local-AI workstation?

Strix Halo for value — SG retail S$2,500–3,500 for 128GB (GMKtec EVO-X2 at the lower end, Framework Desktop at the higher), runs production agents. DGX Spark (~USD 4,699 from Feb 2026) if your team's muscle memory is CUDA. M3 Ultra Mac Studio if you need raw memory bandwidth (819 GB/s) for very-long-context agentic work and the Apple ecosystem. For most SG SMEs in 2026, Strix Halo wins on cost-per-deployed-agent.

Can a single 128GB workstation run a real production AI agent?

Yes. We run Qwen 3.6 35B, Qwen3-VL-32B, and a news-pipeline agent off a single Strix Halo box at Altronis. It serves multiple chat sessions, generates blog drafts, and runs the sector advisors at altronis.sg. Production-class for a sub-100-user SaaS or an internal-tool deployment.

Is buying a local AI workstation in Singapore tax-deductible?

Yes, as a capital expense for AI/automation infrastructure. Pair with PIC-equivalent productivity grants where applicable. The hardware also typically qualifies under EDG-funded AI projects when the box is part of a deployment.