Reality Check: Distilled Models ≠ Full Model

I've seen a post today advertising a deployment of DeepSeek R1, but they were actually deploying the 1.5B Qwen distillation—a model that is nowhere near the full DeepSeek R1's performance. I went through DeepSeek's own benchmarks from their paper on R1 and compared 3 common benchmarks to show you some quick differences between the full R1, distilled versions, OpenAI's models, and Claude 3.5 sonnet. Take a look for yourself:

ModelAIME 2024 (pass@1)MATH-500 (pass@1)GPQA Diamond (pass@1)
DeepSeek-R1 (full)79.8%71.5%90.6%
OpenAI-o1-121779.2%75.7%93.4%
DeepSeek-R1-Distill-Qwen-32B72.6%94.3%62.1%
DeepSeek-R1-Distill-Llama-70B70.0%94.5%65.2%
DeepSeek-R1-Distill-Qwen-14B69.7%93.9%59.1%
OpenAI-o1-mini63.6%90.0%60.0%
DeepSeek-R1-Distill-Qwen-7B55.5%92.8%49.1%
DeepSeek-R1-Distill-Llama-8B50.4%89.1%49.0%
QwQ-32B-Preview50.0%90.6%54.5%
DeepSeek-R1-Distill-Qwen-1.5B28.9%83.9%33.8%
Claude-3.5-Sonnet-102216.0%78.3%65.0%
GPT-4o-05139.3%74.6%49.9%

Source: DeepSeek-R1 Distilled Model Evaluation

Yes, distilled models are cheap to run, but cheap compute ≠ good performance. The 1.5B model is not DeepSeek R1. It's significantly weaker, and using it without disclosing that fact is misleading.

Groq & The "Right" Distilled Model

Groq is offering DeepSeek-R1 Distill Llama-70B on their LPUs, claiming it's the best balance of performance and cost at scale. Based on the benchmarks, it's the strongest distilled variant, but it's still not the full DeepSeek R1. Pricing from Groq: $0.75 /M tokens for input and $0.99 / M tokens for output.

Full DeepSeek R1 Availability & Pricing

Right now, I've only seen Together AI offering full DeepSeek R1, and they charge $7/million tokens, compared to OpenAI o1's $15/million. Meanwhile, DeepSeek AI's own API is overloaded and stores data in China.

Amazon and Microsoft employees have hinted on LinkedIn that official DeepSeek R1 support is coming. The question is: at what price? Hopefully, it stays competitive.

Takeaway

Cheaper, smarter LLMs are great for competition, but do your research before jumping in. The full DeepSeek R1 is a strong contender, but distilled versions are not the same thing—don't let anyone convince you otherwise. Find the best price and performance trade-off that works for your use case and a privacy policy that matches your desires.

As always feel free to connect with me on LinkedIn or follow me on X @groffdev.