Alphafold3 Performance Benchmark: DGX Spark vs X86 RTX PRO 6000

We took the DGX Spark — NVIDIA’s new ARM64 desktop workstation with the GB10 GPU — for a real-world test: running AlphaFold3 protein structure prediction against an established x86 RTX PRO 6000 workstation. The results tell an interesting story.

AlphaFold3 is open-source protein structure prediction software by Google DeepMind. However, the official distribution only supports x86 Linux — there is no native ARM64 build available.

This project adapts the entire AlphaFold3 toolchain for ARM64, enabling it to run on the NVIDIA DGX Spark (GB10, aarch64). All dependencies were verified, rebuilt, or reconfigured for aarch64, and unified memory settings were tuned for the GB10 GPU.

Alphafold 3 protein prediction on Dgx Spark

Showcase of Protein Prediction Result with AF3 on DGX Spark

Test protein: ABL1 kinase + imatinib (797 aa + drug ligand) Databases: uniref90 (67 GB), BFD (17 GB), mgy_clusters (~130 GB), uniprot (102 GB)

Note: Both platforms had Triton autotuning caches warm. First run adds ~20-30% overhead.

Hardware

	DGX Spark	x86 Workstation
GPU	NVIDIA GB10 (CC 12.1)	NVIDIA RTX PRO 6000 (CC 12.0)
GPU Memory	121 GB LPDDR5X (unified)	96 GB GDDR7
CPU	20-core ARM (Grace)	AMD Ryzen 9 9950X3D 16-Core
System RAM	121 GB	186 GB
GPU TDP	~40W	~125W (inference)

Results

MSA Search Timings

Database	Size	DGX Spark	RTX PRO 6000	Ratio
BFD	17 GB	145s	101s	1.4x
uniref90	67 GB	566s	480s	1.2x
mgy_clusters	~130 GB	586s	503s	1.2x
uniprot	102 GB	738s	650s	1.1x
MSA Total		767s (12.8 min)	651s (10.9 min)	1.2x

Full Pipeline

Stage	DGX Spark	RTX PRO 6000	Ratio
C++ compile	~120s	~90s	1.3x
MSA search	767s	651s	1.2x
Template search	10s	9s	1.1x
Data pipeline total	780s (13.0 min)	665s (11.1 min)	1.2x
Featurisation	10s	11s	0.9x
Model inference	544s (9.1 min)	66s (1.1 min)	8.3x
Total	~24.5 min	~13.5 min	1.8x

Prediction Quality

Both platforms produce structurally identical results (iptm = 0.98 for ABL1).

Analysis

MSA Search: CPU-Bound, Not I/O-Bound

SSD read speed: 7.7 GB/s. Actual jackhmmer data rate: 140-260 MB/s (only 2-3% utilization). The bottleneck is HMMER’s profile-HMM comparison algorithm, which is CPU-intensive and saturates at ~8 threads per database search. The DGX Spark’s 20-core ARM CPU handles this efficiently, resulting in only a 1.2x gap.

Model Inference: GPU Compute Gap

The 8.3x difference reflects the GB10’s unified memory architecture (~500 GB/s bandwidth) vs RTX PRO 6000’s dedicated GDDR7 (~1.8 TB/s) and higher CUDA core count. This gap widens with larger proteins (>1000 tokens).

Scaling with Protein Size

Protein Size	DGX Spark	RTX PRO 6000	Gap
< 100 aa	~3-5 min	~2-3 min	~2x
100-400 aa	~8-15 min	~3-8 min	~2x
400-800 aa	~20-30 min	~10-15 min	~2x
> 800 aa*	~40-60 min	~15-25 min	~2.5x

* Estimated; may require memory tuning

Power Efficiency

Metric	DGX Spark	RTX PRO 6000
GPU Power (inference)	35-40W	120-125W
System Power (estimated)	~60W	~200W
Energy per prediction (797 aa)	~0.025 kWh	~0.045 kWh

DGX Spark uses ~55% less energy per prediction.

Key Takeaways

MSA search is practical — DGX Spark is only 1.2x slower, well within acceptable range for research use
Model inference is the bottleneck — 8.3x slower due to GPU compute limits, but still usable for non-HTS workflows
Prediction quality is identical — no numerical differences between ARM64 and x86_64 platforms
Small proteins are fast — < 200 aa completes in 5-10 minutes
Power-efficient — 40W GPU makes DGX Spark suitable for office/lab environments without special cooling

Reproducibility

To reproduce these benchmarks:

# Prepare single-protein input
echo '{"name":"test","sequences":[{"protein":{"id":["A"],"sequence":"MLEICLKLVGCKSKKG..."}}],"modelSeeds":[1],"dialect":"alphafold3","version":1}' > input/test.json

# Run with timing
time docker run --rm --gpus all \
  -v ./weights:/weights:ro -v ./db:/db:ro \
  -v ./input:/input:ro -v ./output:/output \
  alphafold3:arm64 \
  uv run python3 run_alphafold.py \
    --model_dir=/weights --db_dir=/db \
    --input_dir=/input --output_dir=/output

Timing logs: MSA search: Finished Jackhmmer (<db>) in X seconds Model inference: Running model inference with seed 1 took X seconds

Conclusion

The DGX Spark proves that accessible desktop hardware can deliver production-quality results. While inference runs 8x slower than a dedicated RTX PRO 6000, the predictions are structurally identical — the same accuracy, the same confidence scores — just with longer wait times.

For research labs, teaching labs, and academic institutions, this matters more than raw speed. A $3,500 desktop system that fits on a desk and draws 40W can run the same AlphaFold3 workload as a workstation that costs 3-4x as much, needs a dedicated rack, and pulls 3x the power.

Real-world protein structure prediction rarely demands instant turnaround. If your workflow involves queueing a few samples overnight, the 25-minute per prediction makes DGX Spark not just viable but genuinely attractive — especially when deploying at scale, where the power and space savings compound.

Source:
https://www.mutek.com/alphafold3-performance-benchmark-dgx-spark-vs-x86-rtx-pro-6000 This article may be cited in other works. Please link to this article as the original source.

Categories: Discovery