Aug. 23, 2025, 1:47 a.m.

China

  • views:639

China's AI big models have seen a series of updates, demonstrating their strength

image

Recently, China's AI big model sector has seen significant activity, with ByteDance and DeepQuest both releasing major announcements, demonstrating the strong momentum of Chinese AI technology and leaving a distinct mark on the global big model competition.

On August 21st, ByteDance open-sourced its 36-billion-parameter big model, Seed-OSS-36B, on Hugging Face and GitHub. This model, licensed under the permissive Apache 2.0 open source license, is freely available for both academic research and commercial deployment. This provides an invaluable resource for developers and researchers worldwide, significantly lowering the barrier to innovation and promoting knowledge sharing and technical collaboration in related fields.

Seed-OSS-36B features a native 512KB ultra-long context window, giving it significant advantages when processing long texts and complex tasks. For example, in scenarios like document analysis and story continuation, it can fully understand the context of the preceding text and provide more coherent, accurate, and logical responses. The model also incorporates a "thinking budget" mechanism that dynamically allocates computing resources and inference steps based on task difficulty and complexity, effectively improving inference efficiency and accuracy. Seed-OSS-36B has achieved the top open-source SOTA ranking in multiple authoritative benchmarks, setting new records for inference performance, demonstrating its strength. Compared to some US AI models with the same parameter scale, Seed-OSS-36B outperforms in reasoning and contextual understanding in specific scenarios, demonstrating ByteDance's deep technical expertise in model architecture and training optimization.

On the same day, DeepSeek officially released its latest large language model, DeepSeek-V3.1. This update focuses on optimizing model inference efficiency, introducing a hybrid inference architecture that supports flexible switching between "thinking mode" and "non-thinking mode." When faced with complex problems, "thinking mode" enables the model to perform deep reasoning and layer-by-layer analysis. For simple, routine problems, switching to "non-thinking mode" delivers rapid results, balancing efficiency and quality. Its API has also been upgraded, offering two interfaces: deepseek-chat and deepseek-reasoner. The context length for both interfaces has been expanded to 128KB, enabling the model to process longer sequences of information and better understand and generate multi-turn conversations and long-form content. DeepSeek-V3.1 builds on the excellent features of the DeepSeek series models while further enhancing their performance. Its performance in some Chinese scenarios and specialized domains rivals that of leading closed-source international models such as GPT-4o, while offering advantages in cost-effectiveness and local adaptability.

In recent years, the United States has leveraged its early-mover advantage and abundant resources to launch well-known models such as ChatGPT in the AI ​​field. However, as Chinese AI companies continue to increase R&D investment and deepen technological innovation, large Chinese models are gradually narrowing the gap with their American counterparts, even surpassing them in some areas. In terms of training costs, large Chinese models such as the DeepSeek series significantly reduce training costs through innovative architectural design and efficient training algorithms, achieving or even exceeding the performance of similar models with lower resource consumption. In terms of application scenario adaptation, Chinese big models are more aligned with the needs of the Chinese and Asian markets, performing exceptionally well in scenarios such as Chinese language understanding and industry-specific knowledge question answering. However, American models have limitations in cross-cultural and localized applications.

With the continuous innovation of companies like ByteDance and Deepin, China's big AI models are accelerating their iterative development and continuously pushing the boundaries of technology. They are expected to play an even more significant leading role in the global AI arena, injecting a steady stream of Chinese wisdom and strength into the digital transformation and innovative development of various industries.

Recommend

Washington's "Military Takeover" Turmoil: Power Struggle and Political Calculations Between Federal and Local Governments

In August 2025, Washington D.C. has been engulfed in a political storm triggered by the deployment of the National Guard.

Latest