Recently, China’s National Artificial Intelligence Standardization Technical Subcommittee (SAC/TC28/SC42) has been engaged on the development of a national standard entitled GB/T XXXXX—XXXX Artificial intelligence — Specification for performance benchmarking for server systems (hereinafter referred to as the Specification). The Specification has already been published as an association standard (CESA 1169) in 2019 and IEEE standard (IEEE P2937) in 2022.

The Specification under development specifies the performance test methods for artificial intelligence server systems (including AI servers, AI server clusters, and AIHPC computing facilities). A total of 16 domestic and foreign mainstream manufacturers of AI server system and components, as well as AI applications, are actively participating in the process. These include, for instance, Nvidia, Intel, Huawei, and Intide.

The development of the Specification responds to the need to address a series of problems faced by the performance testing of AI server systems. These problems, which are illustrated in the drafting notes of the Specification, cannot be addressed by the current representative general-purpose AI benchmarks, HPC performance benchmarks and server specifications, such as MLPerf, AI Benchmark, benchcouncil, AI-HPL, Linpack, DAWNBENCH, T/CESA 1043-2019 Server for deep learning specification, GB/T 9813.3 General specification for computer – Part 3: Server, T/CESA 1119-2020 AI chips – Test metrics and test method of deep learning chips for cloud side, and AIIA DNN benchmark, etc. Specifically:

  • The general-purpose server technical specifications are not tailored to AI server systems. For instance, the general-purpose specifications usually only specify test metrics such as end-to-end runtime and energy consumption, but these cannot accurately reflect the performance of AI server systems.
  • The general-purpose AI server performance tests using publicly available models/datasets limits their competence in testing AI servers employed by specific industries such as finance and The Specification, however, provides methodological guidance for both testing the performance of both general-purpose AI servers and industry-specific AI servers.
  • The test benchmarks arelimited to steady-state runtime, without considering the real operating environment and the real state of the system itself.

To address the problems mentioned above, the drafting team incorporated testing technology, standardized test methods, and referred to use cases in both general and industrial applications. The goal is to generate more comprehensive and accurate test results under the guidance of the Specification. In addition, the SAC/TC28/SC42 simultaneously developed supporting testing tools, which could help obtain the performance data of the AI server.

On June 28, 2024, the working group of AISBench (established under SAC/TC28/SC42) organized an exchange workshop on AI server benchmarking. A total of 24 enterprises agreed on a joint initiative committed to the development of the Specification, optimization of the testing tools, establishment of evaluation system for industrial applications, and exchange of front-line information. According to the National Public Service Platform for Standards Information, the draft of the Specification was released for public comments in April and is now in the review stage. This standardization project can be regarded as a trial for China to adopt a parallel approach to the development of the standards, that is, simultaneously initiating the development of a standard both domestically and through international platforms (IEEE in this case). In addition, the broad engagement and involvement of multiple stakeholders, especially domestic and foreign top AI server manufacturers, are expected to lead to the successful completion of the project.