Apple Silicon LLM Speed

Select a chip to see how fast it runs local LLMs at different model sizes. Token generation is memory-bandwidth-bound — bigger bandwidth = faster output.

Data compiled from: llama.cpp #4167 (Gerganov) • GPU-Benchmarks-on-LLM-Inference (Dai) • LLM Speed Comparison (Singh) • Apple MLX M5 Researchlike2byte.com M4 BenchmarksAwesome Agents M4 MaxGemma 3 on M3 Ultra
All speeds are token generation (not prompt processing). Q4 quantization unless noted. Only exact measured numbers, no ranges or estimates.