So up to 2 MPI process, the program scales perfectly but as soon as more than 2 MPI process is used., the CPU time increase and the scalability seems to be gone ?
Current tutorial combines CPU and GPU training which creates issues when running tutorials on Colab as the runtime needs to be reset and a bunch of cells need to be re-run.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results