Basically all of (Machine Learning) Models nowadays are behemoths of engineering challenges.
NVidia’s GPU’s have NVLink → which allows extremely high throughput for GPU-GPU communication at training time over the network.
There’s different ways to parallelise DNN training: