273 GB/s. 819 GB/s. 30.7 TB/s. Only one crosses the bar.
Decode at 120 tok/s with three concurrent users at 500k context is bounded by single-node memory bandwidth — not aggregate, and not by the cluster network, which cannot route around per-node latency. The three named stacks differ by one to two orders of magnitude.
Single-node memory bandwidth, log scale · sustained tok/s/stream · 120 tok/s × 3-stream bar at ≈ 12 TB/s
Meets the barAUD 620k hwSupermicro / Dell / AU integrator
·Why bandwidth, not compute
decode is memory-bandwidth-bound: per-token cost = active params bytes ÷ aggregate HBM bandwidth
first principles
·Why the cluster network does not help
200 GbE Spark and Thunderbolt-5 Mac fabrics enable model-fit, not faster single-stream decode — inter-node hops add latency per token rather than dividing it
topology bound
·The H200 headroom
120 tok/s bar met with 2–3× headroom; the binding constraint becomes prefill TTFT (50–100 s cold at 500k), not steady-state decode
meets
·The H100 alternative
8× H100 SXM (640 GB) is workable but caps at ~430k context for Llama 4 / GQA models; H200 (1.13 TB) is the right node size at 500k
AUD 480–580k hw
·The Blackwell upgrade path
8× B200 HGX delivers ~2.5× H100 inference at similar power; supply constrained through mid-2026
AUD 420–525k GPUs
SourceOn-prem frontier LLM briefing, §2 Hardware sizing and §3 The three stacks costed. Aggregate H200 bandwidth derived from 4.8 TB/s HBM × 8 GPUs × ~80% achievable. Mac Studio MLX-lm and DGX Spark llama.cpp throughput numbers cited verbatim from the published benchmarks referenced in the briefing's source appendix.
Correction · briefingThe Mac Studio M3 Ultra 512GB unified-memory configuration was withdrawn from the Apple Store on or around 5 March 2026 amid the global DRAM squeeze. As of 11 May 2026 the SKU is available only via secondary-market channels at approximately 10–25% over launch (AUD 16,500–18,500 against AUD 14,999). The configuration is retained in the costing because it was specified in the brief, but a 4-unit fresh procurement from Apple is no longer a viable path.