Apple
Mac Mini M4 with 16GB unified memory — the most affordable entry point for local AI inference on Apple Silicon.
Mac Mini M4 with 24GB unified memory — run 14B parameter models locally at Q4 quantization.
Mac Mini M4 with 32GB unified memory — the sweet spot for running 20B+ parameter models on the base M4 chip.
Mac Mini M4 Pro with 24GB unified memory — 16-core GPU with 273 GB/s bandwidth for faster local inference.
Mac Mini M4 Pro with 48GB unified memory — a compact local inference powerhouse. Run Llama 3.1 70B Q4 locally.
Mac Mini M4 Pro with 64GB unified memory — run 45B+ parameter models locally with 273 GB/s bandwidth.
Mac Studio M3 Ultra with 256GB unified memory — the highest-capacity Apple Silicon machine for running 180B+ parameter models locally.
Mac Studio M3 Ultra with 96GB unified memory — 60-core GPU with 819 GB/s bandwidth for high-throughput local inference.
Mac Studio M4 Max with 128GB unified memory and 40-core GPU — run 90B+ parameter models at 546 GB/s bandwidth.
Mac Studio M4 Max with 36GB unified memory — 30-core GPU with 410 GB/s bandwidth for high-speed local inference.
Mac Studio M4 Max with 48GB unified memory — run 33B parameter models at high speed with 410 GB/s bandwidth.
Mac Studio M4 Max with 64GB unified memory — run 45B+ parameter models locally with 410 GB/s bandwidth.
MacBook Pro M4 Max with 128GB unified memory — run 70B+ models at full precision on a laptop.