In a recent white paper, the French multinational said that AI workloads were challenging and offered guidance for optimising future data centres. Some recommendations will make existing facilities redundant because AI workloads often require efficient, low-latency, high-bandwidth networking. This means the densification of racks, ultimately putting pressure on existing data centres’ power delivery and thermal management.
Generally, GPUs consume up to 700W, and servers chew up 10kW. Hundreds of these systems may be required to train a large language model in a reasonable timescale. According to Schneider, this is already at odds with what most data centres can manage at 10-20kW per rack, Schneider Electric’s report said.
Training workloads benefit heavily from maximising the number of systems per rack as it reduces network latency and costs associated with optics. In other words, spreading the systems out can reduce the load on each rack, but if doing so requires slower optics, bottlenecks can be introduced that negatively affect cluster performance.
It is not so bad once the training is done and the models are busy generating text and images, analysing mountains of unstructured data, or overthrowing humanity -- as fewer AI accelerators per task are required compared to training.
The whitepaper highlights several changes to data centre power, cooling, rack configuration, and software management that operators can implement to mitigate the demands of widespread AI adoption.