Running DeepSeek R-1 Distilled Locally: A Game Changer

The DeepSeek R-1 distilled Qwen 32B quantized to FP4 (try saying that name three times fast!) is a remarkable achievement in local AI deployment. What's truly incredible is that I can run this model on my computer with a single RTX 4090, and it outperforms GPT-3.5/4 - models that were state of the art just a couple of years ago.

Performance Context

While my vibes check doesn't have this beating GPT-4o and o1 yet, and Claude 3.5 remains a superior coding model, that doesn't diminish how remarkable this is. Having access to a model of such high caliber running entirely offline on my local machine, with no data center needed, is a significant milestone.

It's worth noting that it's slower and doesn't perform quite as well as the Llama 70B distilled version running on Groq. However, that's not exactly a fair comparison - Groq has data centers filled with custom LPU chips specifically designed for this purpose.

Standout Use Case

I've found that these "thinking models" excel particularly at Tool/Function Calling. Even the smaller versions of the model perform impressively well at this task, making them particularly useful for specific programmatic applications.

This kind of capability running locally represents a significant step forward in democratizing AI technology, allowing developers to work with powerful models without relying on cloud services or external APIs.

As always feel free to connect with me on LinkedIn or follow me on X @groffdev.