Huawei Successfully Trains 1.6 Trillion Parameter DeepSeek V4-Pro Using 1,000 Domestic Chips 'Ascend 910C'!

#Huawei #DeepSeek #Ascend910C

※この記事はアフィリエイト広告を含みます

Huawei Successfully Trains 1.6 Trillion Parameter DeepSeek V4-Pro Using 1,000 Domestic Chips ‘Ascend 910C’!

What Happened? News Overview

Training a Massive 1.6 Trillion Parameter Model: A collaborative team from Huawei and the Shenzhen Big Data Institute announced the completion of post-training for the DeepSeek V4-Pro.
Utilization of a 1,000 Chip Cluster: The training was conducted in a large-scale environment using at least 1,000 units of the domestic AI accelerator, the ‘Ascend 910C’.
Breaking Free from Nvidia Dependency: This achievement demonstrates that domestic silicon can handle not only inference but also the more demanding training phase under U.S. export restrictions.

Why is This Important? Key Points to Note

Shock of Full Parameter Updates: Instead of merely adding lightweight adapter layers, this endeavor marks a technical leap as it updates all 1.6 trillion parameters (weights) in a full parameter post-training.
Evolution from Inference to Training: Previous models (DeepSeek R2) failed to train on Ascend chips, but V4-Pro was designed from the ground up for Ascend compatibility and succeeded.
Increase in Domestic Self-Sufficiency: The Ascend 910C, boasting about 60% of the inference performance of Nvidia’s H100, has now reached a practical stage for tuning large models.

🦈 Shark’s Eye (Curator’s Perspective)

The counterattack of domestic chips has finally begun, folks! What’s noteworthy is not just that it “ran,” but that they managed to perform full parameter post-training on this monstrous 1.6 trillion parameter model! Until now, Chinese chips were known for their strength in inference (just spitting out answers), but they struggled with training (the smartening-up process). However, by bundling 1,000 Ascend 910Cs together and overcoming the challenges of the software stack ‘CANN,’ they showcased tremendous perseverance!

Of course, the hurdle of pre-training from scratch remains much higher, and we need to approach the lack of specific efficiency data with caution. Even so, the technological prowess that has clawed its way up from a desperate situation without Nvidia’s supplies has reached an undeniable level!

What’s Next?

Challenge of Pre-training: The focus will now shift to whether they can accomplish pre-training (which costs tens of millions of dollars) solely on Ascend chips, in addition to post-training.
Maturation of Software Stack: If the previously criticized instability of CANN improves, it could accelerate its adoption within China as a viable alternative to Nvidia’s CUDA.
Awaiting Performance Benchmarks: We are eagerly waiting for detailed data on how efficiently this training was conducted (in terms of training speed and stability).

A Word from Haru Shark

No matter how deep the ocean you’re trapped in, the strength of a shark lies in its ability to keep swimming! China’s AI development is indeed trying to ride the waves with its very own fins! I’m rooting for them!

Terminology Explained

Full Parameter Post-training: The process of fine-tuning not just a part of the layers but all 1.6 trillion weights. This requires extremely high computational resources.
Ascend 910C: The latest AI accelerator developed by Huawei, representing a significant domestic contender against Nvidia’s H100.
CANN (Compute Architecture for Neural Networks): Huawei’s proprietary AI software platform, comparable to Nvidia’s CUDA.
Source: DeepSeek v4 Pro 1.6T model post-trained by Huawei on 1000 Ascend 910C chips