Alphafold3 Performance Benchmark: DGX Spark vs X86 RTX PRO 6000

We took the DGX Spark — NVIDIA’s new ARM64 desktop workstation with the GB10 GPU — for a real-world test: running AlphaFold3 protein structure prediction against an established x86 RTX PRO 6000 workstation. The results tell an interesting story.

AlphaFold3 is open-source protein structure prediction software by Google DeepMind. However, the official distribution only supports x86 Linux — there is no native ARM64 build available.

This project adapts the entire AlphaFold3 toolchain for ARM64, enabling it to run on the NVIDIA DGX Spark (GB10, aarch64). All dependencies were verified, rebuilt, or reconfigured for aarch64, and unified memory settings were tuned for the GB10 GPU.

Alphafold 3 protein prediction on Dgx Spark

Showcase of Protein Prediction Result with AF3 on DGX Spark

Test protein: ABL1 kinase + imatinib (797 aa + drug ligand) Databases: uniref90 (67 GB), BFD (17 GB), mgy_clusters (~130 GB), uniprot (102 GB)

Note: Both platforms had Triton autotuning caches warm. First run adds ~20-30% overhead.

Hardware

  DGX Spark x86 Workstation
GPU NVIDIA GB10 (CC 12.1) NVIDIA RTX PRO 6000 (CC 12.0)
GPU Memory 121 GB LPDDR5X (unified) 96 GB GDDR7
CPU 20-core ARM (Grace) AMD Ryzen 9 9950X3D 16-Core
System RAM 121 GB 186 GB
GPU TDP ~40W ~125W (inference)

Results

MSA Search Timings

Database Size DGX Spark RTX PRO 6000 Ratio
BFD 17 GB 145s 101s 1.4x
uniref90 67 GB 566s 480s 1.2x
mgy_clusters ~130 GB 586s 503s 1.2x
uniprot 102 GB 738s 650s 1.1x
MSA Total   767s (12.8 min) 651s (10.9 min) 1.2x

Full Pipeline

Stage DGX Spark RTX PRO 6000 Ratio
C++ compile ~120s ~90s 1.3x
MSA search 767s 651s 1.2x
Template search 10s 9s 1.1x
Data pipeline total 780s (13.0 min) 665s (11.1 min) 1.2x
Featurisation 10s 11s 0.9x
Model inference 544s (9.1 min) 66s (1.1 min) 8.3x
Total ~24.5 min ~13.5 min 1.8x

Prediction Quality

Both platforms produce structurally identical results (iptm = 0.98 for ABL1).

Analysis

MSA Search: CPU-Bound, Not I/O-Bound

SSD read speed: 7.7 GB/s. Actual jackhmmer data rate: 140-260 MB/s (only 2-3% utilization). The bottleneck is HMMER’s profile-HMM comparison algorithm, which is CPU-intensive and saturates at ~8 threads per database search. The DGX Spark’s 20-core ARM CPU handles this efficiently, resulting in only a 1.2x gap.

Model Inference: GPU Compute Gap

The 8.3x difference reflects the GB10’s unified memory architecture (~500 GB/s bandwidth) vs RTX PRO 6000’s dedicated GDDR7 (~1.8 TB/s) and higher CUDA core count. This gap widens with larger proteins (>1000 tokens).

Scaling with Protein Size

Protein Size DGX Spark RTX PRO 6000 Gap
< 100 aa ~3-5 min ~2-3 min ~2x
100-400 aa ~8-15 min ~3-8 min ~2x
400-800 aa ~20-30 min ~10-15 min ~2x
> 800 aa* ~40-60 min ~15-25 min ~2.5x

* Estimated; may require memory tuning

Power Efficiency

Metric DGX Spark RTX PRO 6000
GPU Power (inference) 35-40W 120-125W
System Power (estimated) ~60W ~200W
Energy per prediction (797 aa) ~0.025 kWh ~0.045 kWh

DGX Spark uses ~55% less energy per prediction.

Key Takeaways

  1. MSA search is practical — DGX Spark is only 1.2x slower, well within acceptable range for research use
  2. Model inference is the bottleneck — 8.3x slower due to GPU compute limits, but still usable for non-HTS workflows
  3. Prediction quality is identical — no numerical differences between ARM64 and x86_64 platforms
  4. Small proteins are fast — < 200 aa completes in 5-10 minutes
  5. Power-efficient — 40W GPU makes DGX Spark suitable for office/lab environments without special cooling

Reproducibility

To reproduce these benchmarks:

# Prepare single-protein input
echo '{"name":"test","sequences":[{"protein":{"id":["A"],"sequence":"MLEICLKLVGCKSKKG..."}}],"modelSeeds":[1],"dialect":"alphafold3","version":1}' > input/test.json

# Run with timing
time docker run --rm --gpus all \
  -v ./weights:/weights:ro -v ./db:/db:ro \
  -v ./input:/input:ro -v ./output:/output \
  alphafold3:arm64 \
  uv run python3 run_alphafold.py \
    --model_dir=/weights --db_dir=/db \
    --input_dir=/input --output_dir=/output
Timing logs: MSA search: Finished Jackhmmer (<db>) in X seconds Model inference: Running model inference with seed 1 took X seconds

Conclusion

The DGX Spark proves that accessible desktop hardware can deliver production-quality results. While inference runs 8x slower than a dedicated RTX PRO 6000, the predictions are structurally identical — the same accuracy, the same confidence scores — just with longer wait times.

For research labs, teaching labs, and academic institutions, this matters more than raw speed. A $3,500 desktop system that fits on a desk and draws 40W can run the same AlphaFold3 workload as a workstation that costs 3-4x as much, needs a dedicated rack, and pulls 3x the power.

Real-world protein structure prediction rarely demands instant turnaround. If your workflow involves queueing a few samples overnight, the 25-minute per prediction makes DGX Spark not just viable but genuinely attractive — especially when deploying at scale, where the power and space savings compound.

Source:
https://www.mutek.com/alphafold3-performance-benchmark-dgx-spark-vs-x86-rtx-pro-6000 This article may be cited in other works. Please link to this article as the original source.