In July 2022, I bought an Apple M1 Max Mac Studio - the base model.
(Brief specs: 10-core CPU, 24-core GPU, 1 core Neural Engine, 32GB RAM.)
Today, 2.5 years after, I've only just discovered how well the M1 Max handles LLMs, particularly the Qwen 32B Q4_K_M model!
A demonstration of performance - via keyword extraction and glossing:
Not bad! In comparison I've a separate 64GB M4 Max Macbook Pro, and a roaring RTX 4090 - but this is definitely a respectable speed.
The only problem: context size. If I hit up all the layers, my Studio can no longer fit a Qwen-32B with a 12K context size. (Surprisingly the model does load, but outputs garbage, and 'out of memory' errors. Which is not the Studio's fault, really - I chose the base model, after all.)
For now, I've set the context size to 8K - which is good enough for general legal/drafting/writing, glossaries, ideas, prompts, and medium-sized texts.
I use my Mac Studio mainly for processing work, and it's a fantastic Swiss knife to throw things at when my main machines are laden with heavier things (image generation, audio processing, etc).
So I'm happy to have discovered even more daily uses for my now 2.5 year old machine - now as an LLM host!