System

Mac Mini M4 with 16GB unified memory — the most affordable entry point for local AI inference on Apple Silicon.

Mac Mini M4 with 24GB unified memory — run 14B parameter models locally at Q4 quantization.

Mac Mini M4 with 32GB unified memory — the sweet spot for running 20B+ parameter models on the base M4 chip.

Mac Mini M4 Pro with 24GB unified memory — 16-core GPU with 273 GB/s bandwidth for faster local inference.

Mac Mini M4 Pro with 48GB unified memory — a compact local inference powerhouse. Run Llama 3.1 70B Q4 locally.

Mac Mini M4 Pro with 64GB unified memory — run 45B+ parameter models locally with 273 GB/s bandwidth.

Mac Studio M3 Ultra with 256GB unified memory — the highest-capacity Apple Silicon machine for running 180B+ parameter models locally.

Mac Studio M3 Ultra with 96GB unified memory — 60-core GPU with 819 GB/s bandwidth for high-throughput local inference.

Mac Studio M4 Max with 128GB unified memory and 40-core GPU — run 90B+ parameter models at 546 GB/s bandwidth.

Mac Studio M4 Max with 36GB unified memory — 30-core GPU with 410 GB/s bandwidth for high-speed local inference.

Mac Studio M4 Max with 48GB unified memory — run 33B parameter models at high speed with 410 GB/s bandwidth.

Mac Studio M4 Max with 64GB unified memory — run 45B+ parameter models locally with 410 GB/s bandwidth.

MacBook Pro M4 Max with 128GB unified memory — run 70B+ models at full precision on a laptop.