Huawei’s Ascend 910C: A Challenger in the AI Inference Arena
This blog post was automatically generated (and translated). It is based on the following original, which I selected for publication on this blog:
DeepSeek research suggests Huawei’s Ascend 910C delivers 60% of Nvidia H100 inference performance | Tom’s Hardware.
Huawei's Ascend 910C: A Challenger in the AI Inference Arena
While Huawei's HiSilicon Ascend 910C processor may not lead in AI training performance, it presents a compelling option for AI inference, achieving a reported 60% of Nvidia's H100 performance. This raises the question: Can the Ascend 910C effectively reduce China's dependence on Nvidia GPUs?
Inference Prowess and Optimization
DeepSeek researchers indicate that the Ascend 910C exceeded expectations in inference performance. Further efficiency gains are possible through manual optimizations of CUNN kernels. DeepSeek's native support and PyTorch repository streamline CUDA-to-CUNN conversion, simplifying integration with Huawei's hardware. This advancement suggests that Huawei's AI processor capabilities are rapidly evolving despite U.S. sanctions.
Training Limitations
Despite progress, AI training remains a challenge. According to DeepSeek's Yuchen Jin, long-term training reliability is a critical weakness of Chinese processors. Nvidia's established hardware and software ecosystem, cultivated over two decades, poses a significant hurdle. While inference performance can be optimized, sustained training workloads require continued improvements in Huawei's hardware and software infrastructure.
Chiplet Design and Manufacturing
Like the original Ascend 910, the Ascend 910C utilizes chiplet packaging. The compute chiplet, featuring approximately 53 billion transistors, is manufactured by SMIC using its 2nd Generation 7nm-class process technology (N+2). While Huawei and SMIC have narrowed the gap with TSMC's capabilities from 2019-2020, resulting in a chip competitive with Nvidia's A100 and H100, the Ascend 910C may not be optimal for AI training.
Future Trajectory
Experts speculate that the importance of Nvidia's software ecosystem may diminish as AI models converge towards Transformer architectures. DeepSeek's expertise in hardware and software optimization could further lessen reliance on Nvidia, offering a more cost-effective alternative, particularly for inference. However, overcoming training stability challenges and refining AI computing infrastructure remain crucial for China to compete globally.
Strategic Implications
The emergence of Huawei's Ascend 910C raises several questions about the future of AI hardware. Will Nvidia adapt its product line to differentiate between inference and HPC applications? As Chinese companies continue to develop and optimize their AI hardware, can they overcome the challenges of training stability and establish a competitive ecosystem? The answers to these questions will shape the landscape of the AI industry in the years to come.