DeepSeek’s 685B Models Rival GPT‑5: China’s Sparse Attention

DeepSeek’s new 685B‑parameter models, powered by sparse attention and released open‑source, rival GPT‑5 on reasoning and coding, potentially democratizing frontier AI access globally.

DeepSeek released its new 685‑billion‑parameter models - DeepSeek‑V3.2 and DeepSeek‑V3.2‑Speciale - claiming performance comparable to top-tier models like GPT‑5, while introducing a major efficiency upgrade through a novel “sparse attention” mechanism. Rather than hammering every token in long contexts equally, the model selectively attends only to the most relevant parts, dramatically cutting compute and inference cost when handling large documents or extended codebases.

The key technical lift is called DeepSeek Sparse Attention (DSA). By upending the typical quadratic cost growth of attention (where every token attends to every other token), DSA reduces long‑context complexity, letting V3.2 handle up to 128,000–token contexts (think hundreds of pages or massive codebases) with far less computational overhead. According to published numbers, this reduces inference cost roughly by half compared to previous versions, making heavy-duty workflows more feasible and cheaper.

The fact that DeepSeek released the models under an open-source license (MIT) makes this even more interesting. Instead of locking frontier models behind paywalls or usage quotas, this move lowers the barrier for researchers, developers, and educational institutions worldwide, enabling anyone to experiment, build, or deploy without paying premium API fees.

Real‑world implications could be significant: from analyzing huge technical documents, to building AI-assisted tools for coding, research, legal review or long‑form content generation - now at lower cost and with global access. For developers, educators, or institutions in places with fewer resources, this could shift frontier‑scale AI from being a big‑budget luxury to a practical utility. China’s AI scene just dropped a serious wildcard. Maybe more than one.

The Takeaway

DeepSeek‑V3.2 shows that high‑parameter, high‑performance AI doesn’t always need blockbuster compute or sky‑high fees. With smart design (sparse attention, open licensing, and long‑context scaling) frontier capabilities might finally become more global, more accessible, and more diverse in user base. It’s not hype. It’s a potentially meaningful shift in how AI spreads across economies and geographies.

Join the Discussion

Enjoyed this? Ask questions, share your take (hot, lukewarm, or undecided), or follow the thread with people in real time. The community’s open, join us.

Discord Community

Chat, code sharing & more

YouTube Comments

Video version & comments

Published December 5, 2025 • Updated December 6, 2025

published